Skip to Main Content
The features used for speech recognition should emphasize linguistic information while suppressing speaker differences. For speaker recognition, features should have more speaker individual information while attenuating the linguistic information. In most studies, however, the identical acoustic features are used for the different missions of speaker and speech recognitions. In this paper, we propose a new physiological feature extraction method which emphasizes individual information for speaker identification. For the purpose, physiological features of speakers were analyzed from the point of view of speech production. It is found that the speaker individual information is encoded in different frequency regions of speech sound. The speaker discriminative information was quantified using Fisher's F-ratio in each frequency region. Based on the F-ratio, we proposed a non-uniform sub-band processing strategy to extract new feature which can emphasize or refine the physiological aspects involved in speech production. We combined the new feature with GMM for speaker identification task and applied on NTT-VR speaker recognition database. Compared with MFCC feature, by using the proposed feature, the identification error rate was reduced 20.1%.