In this paper we investigate the use of formant and anti formant measurements of nasal consonants for speaker verification. The features are obtained using a pole-zero vocal tract model estimate optimized by minimizing a logarithmic criterion which is motivated by the perception of amplitude by the human auditory system. A GMM-UBM approach is used for performing speaker comparisons within the likelihood-ratio framework. Results are compared with systems based on Mel Frequency Cepstral Coefficients (MFCCs) as well as formant center frequencies and bandwidths obtained using the Snack Toolkit. The formant and anti-formant based system attains comparable results to the MFCC system and outperforms the formant-based approach while offering a more straight for ward interpretation in terms of a physical speech production model.
Published in:
Acoustics, Speech and Signal Processing (ICASSP), 2011 IEEE International Conference on
Date of Conference: 22-27 May 2011