Skip to Main Content
While non-stationary stochastic techniques have led to substantial improvements in vocabulary size and speaker independence, most automatic speech recognition (ASR) systems remain overly sensitive to the acoustic environment, precluding robust widespread applications. Our approach to this problem has been to model fundamental aspects of auditory perception, which are typically neglected in common ASR front ends, to derive a more robust and phonetically relevant parameterization of speech. Short-term adaptation and recovery, a sensitivity to local spectral peaks, together with an explicit parameterization of the position and motion of local spectral peaks reduces the error rate of a word recognition task by as much as a factor of 4. Current work also investigates the perceptual significance of pitch-rate amplitude-modulation cues in noise.