Abstract:
It is believed that modeling temporal structure of the speech data may be useful for the problem of speech emotion recognition (T. Nwe et al., 2003). In this paper, Gauss...Show MoreMetadata
Abstract:
It is believed that modeling temporal structure of the speech data may be useful for the problem of speech emotion recognition (T. Nwe et al., 2003). In this paper, Gaussian mixture vector autoregressive model is proposed as a statistical classifier for this task. The main motivation behind using such a model is its ability to model the dependency among extracted speech feature vectors as well as the multi-modality in their distribution. When applied to the Berlin emotional speech database, the proposed technique provides a classification accuracy of 76% versus 71% for the hidden Markov model, 67% for the k-nearest neighbors, 55% for feed-forward neural networks. The model gives also better discrimination between high-arousal, low arousal, and neutral emotions than the HMM.
Published in: 2007 IEEE International Conference on Acoustics, Speech and Signal Processing - ICASSP '07
Date of Conference: 15-20 April 2007
Date Added to IEEE Xplore: 04 June 2007
ISBN Information: