Abstract:
Automatic emotion recognition has faced the challenge of lacking large-scale human labeled dataset for model learning due to the expensive data annotation cost and inevit...Show MoreMetadata
Abstract:
Automatic emotion recognition has faced the challenge of lacking large-scale human labeled dataset for model learning due to the expensive data annotation cost and inevitable label ambiguity. To tackle such challenge, previous works have explored to transfer emotion label from one modality to the other modality assuming that the supervised annotation does exist in one modality or explored semi-supervised learning strategies to take advantage of large amount of unlabeled data with the focus on a single modality. In this work, we address the multimodal emotion recognition problem with the acoustic and visual modalities and propose a multi-modal network structure of the semi-supervised learning approach with an improved generative adversarial network CT-GAN. Extensive experiments conducted on a multi-modal emotion recognition corpus demonstrate the effectiveness of the proposed approach and prove that utilizing unlabeled data via GANs and combining multi-modalities both benefit the classification performance. We also carry out some detailed analysis experiments such as influence of unlabeled data quantity on recognition performance and impact of different normalization strategies for semi-supervised learning etc.
Published in: 2019 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC)
Date of Conference: 18-21 November 2019
Date Added to IEEE Xplore: 05 March 2020
ISBN Information: