Chapter 4:
Temporal processing in the auditory system
-
Introduction.
-
This chapter deals with the limits of the
auditory system in detecting changes over time.
-
Almost all sounds fluctuate over time.
-
Much of the information in speech and music
is carried in the changes themselves.
-
Distinguish between time resolution and temporal
integration.
-
Distinguish between fine structure and envelope
of sound.
-
Temporal resolution in the auditory system
must take account of peripheral filtering and involves two main processes.
-
Analysis of the time pattern occurring within
each frequency channel.
-
Comparison of the time patterns across channels.
-
This chapter mainly concerned with within-channel
processes.
-
A major difficulty in measuring temporal resolution
in the auditory system is that changes in temporal pattern are usually
associated with changes in magnitude spectra which can also be used to
discriminate the change.
-
For example, subjects can discriminate a single
click from a pair of clicks separated by as little as a few tens of microseconds.
-
The single click and the double click have
spectral differences that are most detectable at high frequencies.
-
When noise is added to mask frequencies above
10 kHz, the threshold value of the gap increases dramatically, indicating
that the initial result was not a direct measure of temporal resolution.
-
Two major approaches to dealing with this
problem.
-
Use signals, such as white noise, whose magnitude
spectrum is not changed when their time pattern is altered.
-
Use stimuli whose spectra are altered by changes
in time pattern, but mask the spectral changes.
-
Temporal resolution measured by the discrimination
of stimuli with identical magnitude spectra: Broadband sounds.
-
The detection of gaps in broadband noise.
-
Present subjects with two successive bursts
of noise, one containing a brief interruption and one not, and ask which
contained the gap (two-alternative forced choice).
-
Gap threshold for this task is 2 to 3 msec,
except at very low sound levels where the threshold is larger.
-
The discrimination of time-reversed signals.
-
The long-term magnitude spectrum of a sound
is not changed when that sound is played backward in time.
-
Discrimination of a time-reversed sound from
the original, thus reflects sensitivity to the difference in time pattern
of the two sounds.
-
Ronken (1970) presented subjects with two
clicks of different amplitudes (A and B) with a gap between them.
-
Subjects were asked to discriminate between
the orders AB and BA.
-
Subjects could do this down to gaps of 2 to
3 msec.
-
Temporal modulation transfer functions
(TMTF).
-
Instead of trying to use a single number to
characterize temporal resolution, one can measure threshold for detecting
changes in the amplitude of a sound as a function of the rapidity of the
changes.
-
For example, white noise can be sinusoidally
amplitude modulated and the threshold for detecting the modulation determined
as a function of modulation rate.
-
Typical results for such an experiment are
shown in Fig. 4.1.
-
For modulation rates below 16 Hz, performance
is limited by the amplitude resolution of the auditory system, rather than
temporal resolution, so threshold is independent of modulation rate.
-
For modulation rates from 16 Hz to about 1000
Hz, threshold increases with modulation rate, thus showing the limits of
temporal resolution.
-
Above 1000 Hz, the modulation cannot be detected
at all.
-
The shape of TMTFs vary little with sound
level, but thresholds do increase at low sound levels.
-
Temporal resolution measured by the discrimination
of stimuli with identical magnitude spectra: Effects of center frequency.
-
Temporal resolution might be expected to be
poorer at lower frequencies because the auditory filters become narrower
as center frequency decreases and filters with narrower bandwidths have
poorer temporal resolution (longer ringing responses).
-
Fig. 4.2 shows how the gap in a sinusoidal
signal is partially filled in by the ringing response of a simulated auditory
filter.
-
We can not determine a possible relationship
between center frequency and temporal resolution by studying the broadband
sounds discussed to this point.
-
Green (1973) studied discrimination of time-reversed
pairs of brief sinusoidal pulses differing in level by 10 dB.
-
The total stimulus duration required for 75%
correct discrimination was between 1 and 2 msec for center frequencies
of 2 and 4 kHz.
-
This value increased to between 2 and 4 msec
for a center frequency of 1 kHz.
-
Suggests the response time of the auditory
filters may have played a role below 2 kHz.
-
Detection of temporal gaps in narrowband
sounds (with masking of spectral splatter).
-
Detection of gaps in bands of noise.
-
Gap thresholds for noise bands could be expected
to depend on two factors, as illustrated in Fig. 4.3.
-
The rapidity of random fluctuations in the
noise increases with bandwidth making random dips in the noise which might
be confused with the gap less likely (lower gap thresholds).
-
The bandwidth of the noise interacts with
the bandwidth of the auditory filter.
-
When the noise bandwidth is less than that
of the auditory filter, fluctuations are unchanged by the filter and gap
threshold should depend on the confusability of the gap with fluctuations
in the noise.
-
When the noise bandwidth is greater than that
of the auditory filter, fluctuations at the output of the filter are slower
than at the input so gap thresholds should decrease with increasing bandwidth
of the auditory filter (higher center frequencies).
-
Actual results show the expected decrease
in gap threshold with increasing noise bandwidth.
-
However, this decrease continues even when
the noise bandwidth exceeds the bandwidth of any single auditory filter
stimulated by the noise and little effect of center frequency is obtained.
-
It appears that subjects make use of the output
of more than one auditory filter to detect gaps in noise so that there
is little effect of center frequency.
-
Gap thresholds for narrowband noises also
decrease with increasing sound levels up to about 30 dB SL.
-
Detection of gaps in sinusoids.
-
Sinusoids are presented in continuous noise
with a spectral notch at the frequency of the sinusoid to mask spectral
splatter.
-
Results are strongly affected by phase.
-
Shailer and Moore (1987) used three phase
conditions, all of which began the gap at a positive-going zero-crossing,
as shown in Fig. 4.5.
-
In standard phase the sinusoid was
turned back on at a positive-going zero-crossing.
-
In reverse phase the sinusoid was turned
back on at a negative-going zero-crossing.
-
In preserved phase the sinusoid was
turned back on at the phase it would have had if it continued through the
gap.
-
The results for a 2AFC experiment using these
conditions are shown in Fig. 4.6.
-
For the preserved phase condition, gap detection
improved monotonically with gap duration.
-
For the other two conditions results were
nonmonotonic and 180 degrees out of phase with each other.
-
The latter results appear to indicate that
the gap is much more difficult to detect when the sinusoid is turned back
on in phase with the ringing response of the auditory filter, as shown
in Fig. 4.7.
-
Using the preserved phase condition to estimate
gap threshold (75% correct), it appears to be about 4.5 msec for center
frequencies of 400, 1000, and 2000 Hz.
-
Other studies have found increases in gap
thresholds at lower frequencies (100 and 200 Hz).
-
We may conclude that the auditory filter plays
a role in determining the form of results for standard and reversed phase
conditions, but the ringing response of the auditory filter only appears
to limit gap detection in the preserved phase condition at very low center
frequencies.
-
Modeling temporal resolution.
-
The response of the auditory filter is too
fast to be a limiting factor in most tasks involving temporal resolution,
except at low frequencies.
-
This has led to the idea that there is a process
at levels of the auditory system higher than the auditory nerve which is
sluggish (smoothes representation of the stimulus over time), and thus
limits temporal resolution.
-
The author of your text reviews various models
which have been proposed for this purpose.
-
While these models are interesting, we shall
not consider them for the following reasons.
-
The smoothing almost certainly operates on
neural activity, but the models operate on simple transformations of the
stimulus, rather than its neural representation.
-
The various stages of such processing are
purely speculative.
-
There is currently no empirical way to isolate
the processes assumed at one stage of such processing from those assumed
at another.
-
A modulation filter bank?
-
It has been suggested that perception of sounds
that are amplitude modulated depends on feature detectors that are tuned
to specific modulation rates.
-
Each neuron can be considered to be a filter
in the modulation domain and they can be referred to collectively as a
modulation filter bank.
-
Neurons with appropriate properties have been
found in the cochlear nucleus and the inferior colliculus.
-
Use of the modulation filter bank to explain
certain perceptual phenomena is new and still controversial.
-
Masking in the modulation domain with broadband
carriers.
-
Modulation masking refers to an increase
in the threshold for detecting modulation of a carrier produced by additional
amplitude modulation.
-
Houtgast (1989) studied detection of sinusoidal
amplitude modulation of a pink noise carrier when no other modulation was
present and when a masker modulator was added.
-
As shown in Fig. 4.10, when no masker modulator
was present, threshold for detecting the modulation increased with modulation
frequency (TMTF).
-
When a half-octave band of noise centered
at 4, 8 or 16 Hz was added as a masker modulation, thresholds for modulation
detection increased most markedly at the center frequency of the masker.
-
This could be interpreted as selectivity in
the modulation-frequency domain, analogous to the frequency selectivity
in the audio-frequency domain we studied in Chapter 3.
-
A second study by Houtgast was analogous to
Fletcher�s band-widening experiment.
-
The masker modulator was a variable-width,
constant-spectral-density noise band centered at 8 Hz.
-
The signal was sinusoidal modulation at 8
Hz.
-
The results shown in Fig. 4.11 indicate that
threshold for detecting the sinusoidal signal modulation increased with
masker bandwidth and then leveled off.
-
Your text reviews several other experiments
which can be interpreted in terms of a modulation filter bank.
-
If these modulation filters exist, the data
indicate that they are not nearly as sharply tuned as the auditory filters
in the audio-frequency domain.
-
Modulation detection interference.
-
The detection or discrimination of amplitude
modulation of a sinusoidal carrier can be impaired by the presence of one
or more modulated sounds with different carrier frequencies.
-
The basis for this phenomenon is still poorly
understood and we shall not consider it in any greater detail.
-
Accounting for TMTFs with a modulation filter
bank.
-
TMTFs were previously explained in terms of
models incorporating a low-pass filter at some stage of the auditory system
after the auditory nerve.
-
An alternative possibility is that TMTFs result
from detecting modulation of a particular frequency by monitoring the output
of a modulation filter tuned close to that frequency.
-
Ability to detect the modulation may be partly
limited by inherent random amplitude fluctuations in the noise at the output
of the modulation filters.
-
Modulation masking data suggest that the bandwidths
of the modulation filters increase with center frequency.
-
Thus, more random modulation would appear
at the outputs of modulation filters with higher center frequencies, making
it progressively harder to detect modulation as the modulation frequency
increases.
-
Duration discrimination.
-
A large number of studies have examined discrimination
of changes in the duration (T) of auditory stimuli or of silent intervals
marked by auditory stimuli.
-
We shall not consider these studies in detail,
but shall briefly summarize their main results.
-
For values of T greater than 10 msec, the
smallest detectable increase in T (D
T), increases with T and is fairly
independent of spectral characteristics of the sounds.
-
D T increases at low sound
levels and when the auditory markers of a silent interval differ in level
or frequency.
-
Temporal analysis based on across-channel processes
(limits of the ability to compare timing across frequency channels).
-
Huffman sequences are brief broadband sounds,
like clicks, that have identical long-term magnitude spectra, but energy
in a certain frequency region is delayed relative to other frequency regions.
-
Green has found that subjects can detect differences
in amount of delay of 2 msec or greater at frequency regions ranging from
650 to 4200 Hz.
-
The difference is heard as a subtle change in timbre,
not as one part of the sound following another.
-
Subjects require extensive practice to get this good
and the task requires considerable concentration.
-
Detection of onset and offset asynchrony in multicomponent
complexes.
-
In order to examine sensitivity to temporal differences
for longer duration stimuli, one can examine ability to detect slight onset
or offset asynchronies in one component of complex signals composed of
many sinusoidal components.
-
As with Huffman sequences, very small asynchronies (a
few msec or less) can be detected.
-
Onset asynchrony is easier to detect than offset asynchrony.
-
Onset and offset asynchronies are easier to detect in
harmonic complexes than in logarithmically spaced complexes, probably because
the former are heard as a single sound and the latter as many different
sounds. It is difficult to compare the timing of sound elements that are
perceived as coming from different sources.