By Topic

An Auditory Motivated Asymmetric Compression Technique for Speech Recognition

Sign In

Cookies must be enabled to login.After enabling cookies , please use refresh or reload or ctrl+f5 on the browser for the login options.

Formats Non-Member Member
$33 $13
Learn how you can qualify for the best price for this item!
Become an IEEE Member or Subscribe to
IEEE Xplore for exclusive pricing!
close button

puzzle piece

IEEE membership options for an individual and IEEE Xplore subscriptions for an organization offer the most affordable access to essential journal articles, conference papers, standards, eBooks, and eLearning courses.

Learn more about:

IEEE membership

IEEE Xplore subscriptions

3 Author(s)
Serajul Haque ; Dept. of Electr., Electron., & Comput. Eng., Univ. of Western Australia, Crawley, WA, Australia ; Roberto Togneri ; Anthony Zaknich

The Mel-frequency cepstral coefficient (MFCC) parameterization for automatic speech recognition (ASR) utilizes several perceptual features of the human auditory system, one of which is the static compression. Motivated by the human auditory system, the conventional static logarithmic compression applied in the MFCC is analyzed using psychophysical loudness perception curves. Following the property of the auditory system that the dynamic range compression is higher in the basal regions than the apical regions of the basilar membrane, we propose a method of unequal (asymmetric) compression, i.e., higher compression applied in the higher frequency regions than the lower frequency regions. The methods is applied and tested in the MFCC and the PLP parameterizations in the spectral domain, and the ZCPA auditory model used as an ASR front-end in the temporal domain. The extent of the asymmetric compression is applied as a multiplicative gain to the existing static compression, and is determined from the gradient of the piece-wise linear segment of the perceptual compression curve. The proposed method has the advantage of adjusting compression parametrically for improved ASR performance and audibility in noise conditions by low-frequency spectral enhancement, particularly of vowels with lower F1 and F2 formants. Continuous-density HMM recognition using the Aurora 2 corpus and the TIdigits show performance improvements in additive noise conditions.

Published in:

IEEE Transactions on Audio, Speech, and Language Processing  (Volume:19 ,  Issue: 7 )