By Topic

An Auditory-Based Feature Extraction Algorithm for Robust Speaker Identification Under Mismatched Conditions

Sign In

Cookies must be enabled to login.After enabling cookies , please use refresh or reload or ctrl+f5 on the browser for the login options.

Formats Non-Member Member
$31 $13
Learn how you can qualify for the best price for this item!
Become an IEEE Member or Subscribe to
IEEE Xplore for exclusive pricing!
close button

puzzle piece

IEEE membership options for an individual and IEEE Xplore subscriptions for an organization offer the most affordable access to essential journal articles, conference papers, standards, eBooks, and eLearning courses.

Learn more about:

IEEE membership

IEEE Xplore subscriptions

2 Author(s)
Qi Li ; Li Creative Technol., Inc., Florham Park, NJ, USA ; Yan Huang

An auditory-based feature extraction algorithm is presented. We name the new features as cochlear filter cepstral coefficients (CFCCs) which are defined based on a recently developed auditory transform (AT) plus a set of modules to emulate the signal processing functions in the cochlea. The CFCC features are applied to a speaker identification task to address the acoustic mismatch problem between training and testing environments. Usually, the performance of acoustic models trained in clean speech drops significantly when tested in noisy speech. The CFCC features have shown strong robustness in this kind of situation. In our experiments, the CFCC features consistently perform better than the baseline MFCC features under all three mismatched testing conditions-white noise, car noise, and babble noise. For example, in clean conditions, both MFCC and CFCC features perform similarly, over 96%, but when the signal-to-noise ratio (SNR) of the input signal is 6 dB, the accuracy of the MFCC features drops to 41.2%, while the CFCC features still achieve an accuracy of 88.3%. The proposed CFCC features also compare favorably to perceptual linear predictive (PLP) and RASTA-PLP features. The CFCC features consistently perform much better than PLP. Under white noise, the CFCC features are significantly better than RASTA-PLP, while under car and babble noise, the CFCC features provide similar performances to RASTA-PLP.

Published in:

Audio, Speech, and Language Processing, IEEE Transactions on  (Volume:19 ,  Issue: 6 )
Biometrics Compendium, IEEE