Abstract:
Existing studies on multi-modal emotion recognition in conversation (MMERC) mainly focus on multi-modal fusion and context modeling for emotion representation, facing lim...Show MoreMetadata
Abstract:
Existing studies on multi-modal emotion recognition in conversation (MMERC) mainly focus on multi-modal fusion and context modeling for emotion representation, facing limitations in uncovering the intrinsic structure of emotion-related data. The existing geometric consistency regularization (GCR) technique aims to build meaningful latent feature structures in multiple modalities and has been validated for non-conversational data, therefore we further explore its application to the MMERC task, which involves conversational data. However, we find that the geometric consistency in conversational data varies among different speaker and conversation relations (intra-speaker, inter-speaker, and inter-conversation relations), probably due to differences in speaker expressions and conversation topics. This makes the direct application of GCR less effective. To address this issue, we propose a Multi-Relational Geometric Regularization Framework for MMERC (R4-MMERC). Our framework constructs geometric structures of conversational data and performs dynamically balanced consistency regularization based on multiple speaker and conversation relations (intra-speaker, interspeaker, and inter-conversation relations). We tested our framework by integrating it with three typical MMERC models on the IEMOCAP benchmark dataset. The results show significant performance improvements, demonstrating the effectiveness of our approach1.
Published in: ICASSP 2025 - 2025 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)
Date of Conference: 06-11 April 2025
Date Added to IEEE Xplore: 07 March 2025
ISBN Information: