Our research partially focuses on psychophysics and signal detection.  Currently, we are investigating the detection of spatially dynamic signals associated with moving sources or resulting from head and torso motion in a stationary sound field.  In this context, we have also developed novel signal-processing techniques for efficient and rapid generation of dynamic signals (see below).

Motion Processing by the Auditory System

In one of our current projects we study the ability of human observers to detect the motion of interaurally decorrelated waveforms.  Naturally occurring acoustic signals at the two ears are never completely correlated due to extraneous noise, differential filtering effects of external ear structures (pinna), and internal neural noise that is uncorrelated in the left and right auditory tracts.  We are investigating effects of interaural decorrelation on motion detection, not only because natural signals are only partially binaurally correlated, but more importantly for what it can theoretically reveal about how the system combines information from the two ears, as well as for better characterization of the resiliency of binaural processing of corrupted sounds.

Figure 1 shows averaged data from four observers for one such experiment in which each observer had to decide if a Gaussian noise event was in motion or stationary, as a function of interaural correlation and motion velocity.  Such headphone-presented sounds are perceived intracranially and simulated motion is usually perceived along the interaural axis.  Motion velocity is reported in ms/s since azimuthal motion of a real sound-source produces a dynamic change interaural delay.  Colors represent the value of the detection index (d') with cooler colors representing higher values, and thus greater detectability of motion.  Note the nearly linear trade-off between velocity and decorrelation.  Also note that for lower velocities, motion detection is possible even for correlation values as low as r = 0.3 and is still above chance for r = 0.1, demonstrating the robustness of the binaural system in processing corrupt signals.

Figure 2 shows the output of a cross-correlation model of binaural interaction in response to correlated and partially correlated Gaussian noise (left set of panels).  The model incorporates standard stages of peripheral processing, including a GammaTone filterbank, half-wave rectification, spectral weighting, and a two-dimensional (delay-by-frequency) cross correlation.  The model's output after frequency integration is shown in the right panels.  Correlation values are shown within each panel.  Note that the model predicts signal loss between r = 0.2 and r = 0.4.  Figure 3 shows the responses of two tectal neurons from the barn owl in response to the same stimuli of Fig. 2 (data collected at Caltech).  Behavioral experiments in owls were also consistent with these neurally determiend correlation thresholds (see Saberi et al., Neuron, 1998, v. 21, pp. 789-798).  Note that these neuronal responses are  consistent with human psychophysical performance in that binaural signals remain detectable for very low correlations.

 

Complementary Discrete Fourier Transforms (DFTs) for efficient generation of dynamic signals

We have developed signal-processing techniques for on-the-fly generation of complex dynamic signal used in our motion experiments.  One such technique uses a pair of complementary DFTs for which the component spacing in one series is different than that of the other.  The appeal of this technique is its wide applicability, since it can generate real-time motion stimuli of any velocity and starting interaural delay for complex broadband or filtered noise waveforms and nonstationary sounds such as speech, music, and other natural sounds.
 

A discrete time-domain waveform x(n), n = 0, 1,…, N-1, may be represented in the frequency domain by its Discrete Fourier Transform (DFT)

where

N is the number of samples, W is the sampling frequency and Dw is the frequency resolution or bin spacing of the DFT output in Hz, i.e., how far apart are the samples in the frequency domain.  Note that the frequency components kDw are equally spaced starting at 0 (the dc value) and ending at (N -1)Dw.  The spacing Dw depends on N, which is directly related to the waveform duration.  For a fixed sampling frequency, Dw is inversely proportional to the waveform duration, i.e., stimulus duration in seconds T = N /W = 1/Dw.  Thus, there is spectral energy measured only at the harmonics of 1/T.

The DFT components of a complex waveform are separated by the inverse of the waveform duration, that is Dw=1/T Hz.  For a stimulus duration of 100 ms, this spacing is 10 Hz.  A 100-ms broadband waveform will thus have energy at {0,…, 500, 510, …, 1000, 1010, …} effectively up to the Nyquist limit. Simulation of linear motion would require component frequencies at one ear that increase proportionately relative to the corresponding component at the other ear.  For the current example, a velocity of 2000 ms/s would require component frequencies to the contralateral ear equal to {0,…, 501, 511.02, …, 1002, 1012.02, …}.  To generate such a dichotic waveform, one has to simply select the appropriate component spacing, which in this case is Dw2=Dw1(1+V) = 10.02 Hz where V is velocity in s/s.  A time-domain array may then be generated, representing the contralateral waveform with a duration of T2=Dw2-1= [Dw1(1+V)]-1. This shorter-duration waveform (N2 =T2W) will result in an appropriately wider component spacing, and will create exactly the type of matched components at the left and right ears that simulates the desired linear change in interaural delay.  This array’s bins are then filled with the matched frequency components (amplitude and phase spectrum) of the waveform to channel 1 obtained from its DFT:

X2(kDw2) = X1(kDw1)   for k = 0, 1, …, N2 -1

with non-matching bins (due to the different number of samples) zeroed out.2  Inverse Fourier transform of X2 will then provide the time-domain signal for this channel.

Click here to listen to a sample speech waveform modified using this technique to simulate motion (requires headphone listening).