MHSDB: A Comprehensive Benchmark for Multimodal Humor and Sarcasm Detection Leveraging Foundation Models | IEEE Conference Publication | IEEE Xplore

MHSDB: A Comprehensive Benchmark for Multimodal Humor and Sarcasm Detection Leveraging Foundation Models


Abstract:

Understanding multimodal humor and sarcasm detection remains a key challenge in artificial intelligence. Despite recent advances, inconsistencies in feature extraction, e...Show More

Abstract:

Understanding multimodal humor and sarcasm detection remains a key challenge in artificial intelligence. Despite recent advances, inconsistencies in feature extraction, evaluation methods, and experimental setups have hindered fair comparisons across different approaches. To address this issue, we propose the Multimodal Humor and Sarcasm Detection Benchmark (MHSDB), the first unified evaluation platform specifically designed for these tasks. MHSDB combines four datasets in English and Hindi and standardizes feature extraction and evaluation processes to facilitate consistent comparisons. We systematically evaluate mainstream foundation models across audio, video, and text modalities. Unimodal representations are assessed using self-attention mechanisms, while multimodal representations are evaluated through mainstream fusion strategies, including utterance-level and sequence-level approaches. Our experimental results reveal that multimodal approaches outperform unimodal ones in capturing complex contexts and multi-layered semantics. Additionally, specific fusion strategies excel at integrating cross-modal information, achieving state-of-the-art performance, and paving the way for future research on optimizing feature representation and multimodal fusion.
Date of Conference: 06-11 April 2025
Date Added to IEEE Xplore: 07 March 2025
ISBN Information:

ISSN Information:

Conference Location: Hyderabad, India

References

References is not available for this document.