Skip to Main Content
In this paper, a new method for text independent speaker recognition is proposed. Based essentially on formants frequencies position, the speaker is characterized by only the formants position of his first voiced speech frame, called the attack state. Fundamental frequency "pitch" is combined with these formants in order to study the effect of this assortment on the recognition rate. To validate our approach, two different methods are used for the attack state formants positions computing. The first method consists on checking the formants position in the power spectral domain using the YULE-WALKER's equations. The second method uses the frequency response of a numeric filter, corresponding to the vocal tract's transfer function. These methods are based on a high auto-regressive (AR) model order of the voice. A multi-layer neural network trained by the back-propagation algorithm is proposed for training and classifying the extracted data. Two classification methods are used: The serial classification and a new proposed method called the cascade classification. In each method, different networks structures are tested in order to carry out the finest results. Good recognition rates are obtained using this attack state approach. In all tests the found recognition rates are improved by the cascade classification.