Abstract:
In this paper, we propose a multi-modal multi-feature (M^{3}F) approach for in-the-wild valence-arousal estimation. In the proposed M^{3}F framework, we fuse both vis...Show MoreMetadata
Abstract:
In this paper, we propose a multi-modal multi-feature (M^{3}F) approach for in-the-wild valence-arousal estimation. In the proposed M^{3}F framework, we fuse both visual features from videos and acoustic features from the audio tracks to estimate the valence and arousal. We follow a CNN-RNN paradigm, where the spatio-temporal visual features are extracted with a 3D convolutional network and/or a pretrained 2D convolutional network, and a bidirectional recurrent neural network. We evaluated the M^{3}F framework on the validation set provided by the Affective Behavior Analysis in-the-wild (ABAW) Challenge, held in conjunction with the IEEE International Conference on Automatic Face and Gesture Recognition (FG) 2020, and it significantly outperforms the baseline method.
Published in: 2020 15th IEEE International Conference on Automatic Face and Gesture Recognition (FG 2020)
Date of Conference: 16-20 November 2020
Date Added to IEEE Xplore: 18 January 2021
ISBN Information: