Skip to Main Content
This paper presents our recent work on recognizing human emotion from the speech signal. The proposed recognition system was tested over a language, speaker, and context independent emotional speech database. Prosodic, Mel-frequency cepstral coefficient (MFCC), and formant frequency features are extracted from the speech utterances. We perform feature selection by using the stepwise method based on Mahalanobis distance. The selected features are used to classify the speeches into their corresponding emotional classes. Different classification algorithms including maximum likelihood classifier (MLC), Gaussian mixture model (GMM), neural network (NN), K-nearest neighbors (K-NN), and Fisher's linear discriminant analysis (FLDA) are compared in this study. The recognition results show that FLDA gives the best recognition accuracy by using the selected features.
Multimedia Signal Processing, 2004 IEEE 6th Workshop on
Date of Conference: 29 Sept.-1 Oct. 2004