By Topic

Audio-Visual Emotion Recognition Using Gaussian Mixture Models for Face and Voice

Sign In

Cookies must be enabled to login.After enabling cookies , please use refresh or reload or ctrl+f5 on the browser for the login options.

Formats Non-Member Member
$31 $13
Learn how you can qualify for the best price for this item!
Become an IEEE Member or Subscribe to
IEEE Xplore for exclusive pricing!
close button

puzzle piece

IEEE membership options for an individual and IEEE Xplore subscriptions for an organization offer the most affordable access to essential journal articles, conference papers, standards, eBooks, and eLearning courses.

Learn more about:

IEEE membership

IEEE Xplore subscriptions

3 Author(s)
Metallinou, A. ; Sch. of Electr. Eng., Univ. of Southern California, Los Angeles, CA ; Sungbok Lee ; Narayanan, S.

Emotion expression associated with human communication is known to be a multimodal process. In this work, we investigate the way that emotional information is conveyed by facial and vocal modalities, and how these modalities can be effectively combined to achieve improved emotion recognition accuracy. In particular, the behaviors of different facial regions are studied in detail. We analyze an emotion database recorded from ten speakers (five female, five male), which contains speech and facial marker data. Each individual modality is modeled by Gaussian mixture models (GMMs). Multiple modalities are combined using two different methods: a Bayesian classifier weighting scheme and support vector machines that use post classification accuracies as features. Individual modality recognition performances indicate that anger and sadness have comparable accuracies for facial and vocal modalities, while happiness seems to be more accurately transmitted by facial expressions than voice. The neutral state has the lowest performance, possibly due to the vague definition of neutrality. Cheek regions achieve better emotion recognition accuracy compared to other facial regions. Moreover, classifier combination leads to significantly higher performance, which confirms that training detailed single modality classifiers and combining them at a later stage is an effective approach.

Published in:

Multimedia, 2008. ISM 2008. Tenth IEEE International Symposium on

Date of Conference:

15-17 Dec. 2008