By Topic

Speaker Identification and Verification by Combining MFCC and Phase Information

Sign In

Cookies must be enabled to login.After enabling cookies , please use refresh or reload or ctrl+f5 on the browser for the login options.

Formats Non-Member Member
$31 $13
Learn how you can qualify for the best price for this item!
Become an IEEE Member or Subscribe to
IEEE Xplore for exclusive pricing!
close button

puzzle piece

IEEE membership options for an individual and IEEE Xplore subscriptions for an organization offer the most affordable access to essential journal articles, conference papers, standards, eBooks, and eLearning courses.

Learn more about:

IEEE membership

IEEE Xplore subscriptions

3 Author(s)
Nakagawa, S. ; Dept. of Inf. & Comput. Sci., Toyohashi Univ. of Technol., Toyohashi, Japan ; Longbiao Wang ; Ohtsuka, S.

In conventional speaker recognition methods based on Mel-frequency cepstral coefficients (MFCCs), phase information has hitherto been ignored. In this paper, we propose a phase information extraction method that normalizes the change variation in the phase according to the frame position of the input speech and combines the phase information with MFCCs in text-independent speaker identification and verification methods. There is a problem with the original phase information extraction method when comparing two phase values. For example, the difference in the two values of π-mathtildeθ1 and mathtildeθ2=-π+mathtildeθ1 is 2π-2mathtildeθ1 . If mathtildeθ1 ≈ 0, then the difference ≈ 2π, despite the two phases being very similar to one another. To address this problem, we map the phase into coordinates on a unit circle. Speaker identification and verification experiments are performed using the NTT database which consists of sentences uttered by 35 (22 male and 13 female) Japanese speakers with normal, fast and slow speaking modes during five sessions. Although the phase information-based method performs worse than the MFCC-based method, it augments the MFCC and the combination is useful for speaker recognition. The proposed modified phase information is more robust than the original phase information for all speaking modes. By integrating the modified phase information with the MFCCs, the speaker identification rate was improved to 98.8% from 97.4% (MFCC), and equal error rate for speaker verification was reduced to 0.45% from 0.72% (MFCC), respectively. We also conducted the speaker identification and verification experiments on a large-scale Japanese Newspaper Article Sentences (JNAS) database, a similar trend as NTT database was obtained.

Published in:

Audio, Speech, and Language Processing, IEEE Transactions on  (Volume:20 ,  Issue: 4 )
Biometrics Compendium, IEEE