Skip to Main Content
A whisper is a speech production mode used by us to protect our privacy. Due to the differences between whispered and neutral speech, in both excitation and vocal tract function, the performance of speaker identification systems trained with neutral speech degrades significantly. This paper describes a neutral/whisper mismatched closed-set speaker identification system. The acoustic characteristics of vowels and voiced consonants are different between whispered and neutral speech. The acoustic characteristics of unvoiced consonants are relatively similar between whispered and neutral speech. In order to improve system performance, a feature extraction algorithm based on linear frequency scale is applied in this paper. The static linear frequency cepstral coefficient vectors are extracted as features from neutral and whispered unvoiced consonants. The closed-set speaker ID system using unvoiced consonants based on linear frequency cepstral coefficients achieves an absolute improvement for speaker recognition.