Abstract:
In recent years, multimodal sentiment analysis (MSA) has gained prominence with the proliferation of social media. However, prior studies have often disregarded the possi...Show MoreMetadata
Abstract:
In recent years, multimodal sentiment analysis (MSA) has gained prominence with the proliferation of social media. However, prior studies have often disregarded the possibility of spurious correlations between multimodal data and sentiment labels. Neglecting these factors often results in significant performance degradation, hampering the model's ability to generalize in out-of-distribution (OOD) scenarios. To gain a comprehensive understanding of multimodal knowledge and enhance the model's generalization across diverse distribution scenarios, we present the Multimodal Debiasing Framework (MulDeF). This model-agnostic framework addresses label bias through causal intervention and tackles multimodal biases using counterfactual reasoning. During the training phase, MulDeF rectifies multimodal representations through frontdoor adjustment in causal intervention, effectively eliminating label bias. In order to model conditional expectation calculations within the context of frontdoor adjustment, we introduce multimodal causal attention (MCA). In the inference phase, it employs counterfactual reasoning to eliminate multimodal biases. To further refine our debiasing strategy, we categorize multimodal biases into two distinct types: nonverbal bias and verbal bias. Nonverbal bias is addressed at the utterance level, involving the establishment of unimodal models for audio and visual modalities to estimate their biases concerning sentiment labels. Conversely, verbal bias mitigation occurs at the word level. Here, we mask “harmless” words to generate corresponding counterfactual texts, which are then assessed by the text model to identify word-level bias. Experimental results validate the effectiveness of MulDeF, showcasing its superior performance in OOD settings compared to state-of-the-art methods, while also achieving competitive results in independent and identically distributed (IID) settings.
Published in: IEEE Transactions on Multimedia ( Volume: 27)