Skip to Main Content
In this paper, we study the effectiveness of anchor models applied to the multiclass problem of emotion recognition from speech. In the anchor models system, an emotion class is characterized by its measure of similarity relative to other emotion classes. Generative models such as Gaussian Mixture Models (GMMs) are often used as front-end systems to generate feature vectors used to train complex back-end systems such as support vector machines (SVMs) or a multilayer perceptron (MLP) to improve the classification performance. We show that in the context of highly unbalanced data classes, these back-end systems can improve the performance achieved by GMMs provided that an appropriate sampling or importance weighting technique is applied. Furthermore, we show that anchor models based on the euclidean or cosine distances present a better alternative to enhance performances because none of these techniques are needed to overcome the problem of skewed data. The experiments conducted on FAU AIBO Emotion Corpus, a database of spontaneous children's speech, show that anchor models improve significantly the performance of GMMs by 6.2 percent relative. We also show that the introduction of within-class covariance normalization (WCCN) improves the performance of the anchor models for both distances, but to a higher extent for euclidean distance for which the results become competitive with cosine distance.