Chapter 4:
Temporal processing in the auditory system
  1. Introduction.
    1. This chapter deals with the limits of the auditory system in detecting changes over time.
      1. Almost all sounds fluctuate over time.
      2. Much of the information in speech and music is carried in the changes themselves.
    2. Distinguish between time resolution and temporal integration.
    3. Distinguish between fine structure and envelope of sound.
    4. Temporal resolution in the auditory system must take account of peripheral filtering and involves two main processes.
      1. Analysis of the time pattern occurring within each frequency channel.
      2. Comparison of the time patterns across channels.
      3. This chapter mainly concerned with within-channel processes.
    5. A major difficulty in measuring temporal resolution in the auditory system is that changes in temporal pattern are usually associated with changes in magnitude spectra which can also be used to discriminate the change.
      1. For example, subjects can discriminate a single click from a pair of clicks separated by as little as a few tens of microseconds.
      2. The single click and the double click have spectral differences that are most detectable at high frequencies.
      3. When noise is added to mask frequencies above 10 kHz, the threshold value of the gap increases dramatically, indicating that the initial result was not a direct measure of temporal resolution.
    6. Two major approaches to dealing with this problem.
      1. Use signals, such as white noise, whose magnitude spectrum is not changed when their time pattern is altered.
      2. Use stimuli whose spectra are altered by changes in time pattern, but mask the spectral changes.
       
       
  2. Temporal resolution measured by the discrimination of stimuli with identical magnitude spectra: Broadband sounds.
    1. The detection of gaps in broadband noise.
      1. Present subjects with two successive bursts of noise, one containing a brief interruption and one not, and ask which contained the gap (two-alternative forced choice).
      2. Gap threshold for this task is 2 to 3 msec, except at very low sound levels where the threshold is larger.
       
    2. The discrimination of time-reversed signals.
      1. The long-term magnitude spectrum of a sound is not changed when that sound is played backward in time.
      2. Discrimination of a time-reversed sound from the original, thus reflects sensitivity to the difference in time pattern of the two sounds.
      3. Ronken (1970) presented subjects with two clicks of different amplitudes (A and B) with a gap between them.
        1. Subjects were asked to discriminate between the orders AB and BA.
        2. Subjects could do this down to gaps of 2 to 3 msec.
         
    3. Temporal modulation transfer functions (TMTF).
      1. Instead of trying to use a single number to characterize temporal resolution, one can measure threshold for detecting changes in the amplitude of a sound as a function of the rapidity of the changes.
      2. For example, white noise can be sinusoidally amplitude modulated and the threshold for detecting the modulation determined as a function of modulation rate.
      3. Typical results for such an experiment are shown in Fig. 4.1.
        1. For modulation rates below 16 Hz, performance is limited by the amplitude resolution of the auditory system, rather than temporal resolution, so threshold is independent of modulation rate.
        2. For modulation rates from 16 Hz to about 1000 Hz, threshold increases with modulation rate, thus showing the limits of temporal resolution.
        3. Above 1000 Hz, the modulation cannot be detected at all.
        4. The shape of TMTFs vary little with sound level, but thresholds do increase at low sound levels.
         
  3. Temporal resolution measured by the discrimination of stimuli with identical magnitude spectra: Effects of center frequency.
    1. Temporal resolution might be expected to be poorer at lower frequencies because the auditory filters become narrower as center frequency decreases and filters with narrower bandwidths have poorer temporal resolution (longer ringing responses).
      1. Fig. 4.2 shows how the gap in a sinusoidal signal is partially filled in by the ringing response of a simulated auditory filter.
      2. We can not determine a possible relationship between center frequency and temporal resolution by studying the broadband sounds discussed to this point.
    2. Green (1973) studied discrimination of time-reversed pairs of brief sinusoidal pulses differing in level by 10 dB.
      1. The total stimulus duration required for 75% correct discrimination was between 1 and 2 msec for center frequencies of 2 and 4 kHz.
      2. This value increased to between 2 and 4 msec for a center frequency of 1 kHz.
      3. Suggests the response time of the auditory filters may have played a role below 2 kHz.
       
  4. Detection of temporal gaps in narrowband sounds (with masking of spectral splatter).
    1. Detection of gaps in bands of noise.
      1. Gap thresholds for noise bands could be expected to depend on two factors, as illustrated in Fig. 4.3.
        1. The rapidity of random fluctuations in the noise increases with bandwidth making random dips in the noise which might be confused with the gap less likely (lower gap thresholds).
        2. The bandwidth of the noise interacts with the bandwidth of the auditory filter.
          1. When the noise bandwidth is less than that of the auditory filter, fluctuations are unchanged by the filter and gap threshold should depend on the confusability of the gap with fluctuations in the noise.
          2. When the noise bandwidth is greater than that of the auditory filter, fluctuations at the output of the filter are slower than at the input so gap thresholds should decrease with increasing bandwidth of the auditory filter (higher center frequencies).
      2. Actual results show the expected decrease in gap threshold with increasing noise bandwidth.
      3. However, this decrease continues even when the noise bandwidth exceeds the bandwidth of any single auditory filter stimulated by the noise and little effect of center frequency is obtained.
      4. It appears that subjects make use of the output of more than one auditory filter to detect gaps in noise so that there is little effect of center frequency.
      5. Gap thresholds for narrowband noises also decrease with increasing sound levels up to about 30 dB SL.
    2. Detection of gaps in sinusoids.
      1. Sinusoids are presented in continuous noise with a spectral notch at the frequency of the sinusoid to mask spectral splatter.
      2. Results are strongly affected by phase.
      3. Shailer and Moore (1987) used three phase conditions, all of which began the gap at a positive-going zero-crossing, as shown in Fig. 4.5.
        1. In standard phase the sinusoid was turned back on at a positive-going zero-crossing.
        2. In reverse phase the sinusoid was turned back on at a negative-going zero-crossing.
        3. In preserved phase the sinusoid was turned back on at the phase it would have had if it continued through the gap.
      4. The results for a 2AFC experiment using these conditions are shown in Fig. 4.6.
        1. For the preserved phase condition, gap detection improved monotonically with gap duration.
        2. For the other two conditions results were nonmonotonic and 180 degrees out of phase with each other.
        3. The latter results appear to indicate that the gap is much more difficult to detect when the sinusoid is turned back on in phase with the ringing response of the auditory filter, as shown in Fig. 4.7.
      5. Using the preserved phase condition to estimate gap threshold (75% correct), it appears to be about 4.5 msec for center frequencies of 400, 1000, and 2000 Hz.
      6. Other studies have found increases in gap thresholds at lower frequencies (100 and 200 Hz).
      7. We may conclude that the auditory filter plays a role in determining the form of results for standard and reversed phase conditions, but the ringing response of the auditory filter only appears to limit gap detection in the preserved phase condition at very low center frequencies.
       
  5. Modeling temporal resolution.
    1. The response of the auditory filter is too fast to be a limiting factor in most tasks involving temporal resolution, except at low frequencies.
    2. This has led to the idea that there is a process at levels of the auditory system higher than the auditory nerve which is sluggish (smoothes representation of the stimulus over time), and thus limits temporal resolution.
    3. The author of your text reviews various models which have been proposed for this purpose.
    4. While these models are interesting, we shall not consider them for the following reasons.
      1. The smoothing almost certainly operates on neural activity, but the models operate on simple transformations of the stimulus, rather than its neural representation.
      2. The various stages of such processing are purely speculative.
      3. There is currently no empirical way to isolate the processes assumed at one stage of such processing from those assumed at another.
       
  6. A modulation filter bank?
    1. It has been suggested that perception of sounds that are amplitude modulated depends on feature detectors that are tuned to specific modulation rates.
      1. Each neuron can be considered to be a filter in the modulation domain and they can be referred to collectively as a modulation filter bank.
      2. Neurons with appropriate properties have been found in the cochlear nucleus and the inferior colliculus.
      3. Use of the modulation filter bank to explain certain perceptual phenomena is new and still controversial.
    2. Masking in the modulation domain with broadband carriers.
      1. Modulation masking refers to an increase in the threshold for detecting modulation of a carrier produced by additional amplitude modulation.
      2. Houtgast (1989) studied detection of sinusoidal amplitude modulation of a pink noise carrier when no other modulation was present and when a masker modulator was added.
        1. As shown in Fig. 4.10, when no masker modulator was present, threshold for detecting the modulation increased with modulation frequency (TMTF).
        2. When a half-octave band of noise centered at 4, 8 or 16 Hz was added as a masker modulation, thresholds for modulation detection increased most markedly at the center frequency of the masker.
        3. This could be interpreted as selectivity in the modulation-frequency domain, analogous to the frequency selectivity in the audio-frequency domain we studied in Chapter 3.
      3. A second study by Houtgast was analogous to Fletcher�s band-widening experiment.
        1. The masker modulator was a variable-width, constant-spectral-density noise band centered at 8 Hz.
        2. The signal was sinusoidal modulation at 8 Hz.
        3. The results shown in Fig. 4.11 indicate that threshold for detecting the sinusoidal signal modulation increased with masker bandwidth and then leveled off.
      4. Your text reviews several other experiments which can be interpreted in terms of a modulation filter bank.
      5. If these modulation filters exist, the data indicate that they are not nearly as sharply tuned as the auditory filters in the audio-frequency domain.
    3. Modulation detection interference.
      1. The detection or discrimination of amplitude modulation of a sinusoidal carrier can be impaired by the presence of one or more modulated sounds with different carrier frequencies.
      2. The basis for this phenomenon is still poorly understood and we shall not consider it in any greater detail.
    4. Accounting for TMTFs with a modulation filter bank.
      1. TMTFs were previously explained in terms of models incorporating a low-pass filter at some stage of the auditory system after the auditory nerve.
      2. An alternative possibility is that TMTFs result from detecting modulation of a particular frequency by monitoring the output of a modulation filter tuned close to that frequency.
        1. Ability to detect the modulation may be partly limited by inherent random amplitude fluctuations in the noise at the output of the modulation filters.
        2. Modulation masking data suggest that the bandwidths of the modulation filters increase with center frequency.
        3. Thus, more random modulation would appear at the outputs of modulation filters with higher center frequencies, making it progressively harder to detect modulation as the modulation frequency increases.
         
  7. Duration discrimination.
    1. A large number of studies have examined discrimination of changes in the duration (T) of auditory stimuli or of silent intervals marked by auditory stimuli.
    2. We shall not consider these studies in detail, but shall briefly summarize their main results.
      1. For values of T greater than 10 msec, the smallest detectable increase in T (D T), increases with T and is fairly independent of spectral characteristics of the sounds.
      2. D T increases at low sound levels and when the auditory markers of a silent interval differ in level or frequency.
       
  8. Temporal analysis based on across-channel processes (limits of the ability to compare timing across frequency channels).
    1. Huffman sequences are brief broadband sounds, like clicks, that have identical long-term magnitude spectra, but energy in a certain frequency region is delayed relative to other frequency regions.
      1. Green has found that subjects can detect differences in amount of delay of 2 msec or greater at frequency regions ranging from 650 to 4200 Hz.
      2. The difference is heard as a subtle change in timbre, not as one part of the sound following another.
      3. Subjects require extensive practice to get this good and the task requires considerable concentration.
    2. Detection of onset and offset asynchrony in multicomponent complexes.
      1. In order to examine sensitivity to temporal differences for longer duration stimuli, one can examine ability to detect slight onset or offset asynchronies in one component of complex signals composed of many sinusoidal components.
      2. As with Huffman sequences, very small asynchronies (a few msec or less) can be detected.
      3. Onset asynchrony is easier to detect than offset asynchrony.
      4. Onset and offset asynchronies are easier to detect in harmonic complexes than in logarithmically spaced complexes, probably because the former are heard as a single sound and the latter as many different sounds. It is difficult to compare the timing of sound elements that are perceived as coming from different sources.