Conferences >ICASSP 2025 - 2025 IEEE Inter...

MHSDB: A Comprehensive Benchmark for Multimodal Humor and Sarcasm Detection Leveraging Foundation Models

Download PDF
Download References
Request Permissions
Save to
Alerts

Abstract:

Understanding multimodal humor and sarcasm detection remains a key challenge in artificial intelligence. Despite recent advances, inconsistencies in feature extraction, e...Show More

Metadata

Abstract:

Understanding multimodal humor and sarcasm detection remains a key challenge in artificial intelligence. Despite recent advances, inconsistencies in feature extraction, evaluation methods, and experimental setups have hindered fair comparisons across different approaches. To address this issue, we propose the Multimodal Humor and Sarcasm Detection Benchmark (MHSDB), the first unified evaluation platform specifically designed for these tasks. MHSDB combines four datasets in English and Hindi and standardizes feature extraction and evaluation processes to facilitate consistent comparisons. We systematically evaluate mainstream foundation models across audio, video, and text modalities. Unimodal representations are assessed using self-attention mechanisms, while multimodal representations are evaluated through mainstream fusion strategies, including utterance-level and sequence-level approaches. Our experimental results reveal that multimodal approaches outperform unimodal ones in capturing complex contexts and multi-layered semantics. Additionally, specific fusion strategies excel at integrating cross-modal information, achieving state-of-the-art performance, and paving the way for future research on optimizing feature representation and multimodal fusion.

Published in: ICASSP 2025 - 2025 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)

Date of Conference: 06-11 April 2025

Date Added to IEEE Xplore: 07 March 2025

ISBN Information:

ISSN Information:

DOI: 10.1109/ICASSP49660.2025.10887877

Conference Location: Hyderabad, India

Contents

I. Introduction

As communication methods diversify, effective emotion recognition has become increasingly critical. Among various emotional expressions, humor and sarcasm stand out as particularly complex and prevalent, attracting considerable research interest [1]. Humor often involves irony or exaggeration, while sarcasm typically relies on a delicate interplay of vocabulary, gestures, and tone. Detecting humor and sarcasm through text or speech alone is more challenging compared to classic emotion recognition tasks [2] –[4], as these phenomena require deeper semantic understanding. Thus, integrating multimodal signals, such as visual cues and speech patterns, becomes vital for capturing the subtleties of these complex emotions.

References is not available for this document.

MHSDB: A Comprehensive Benchmark for Multimodal Humor and Sarcasm Detection Leveraging Foundation Models

Abstract:

Metadata

Abstract:

ISSN Information:

I. Introduction

References

IEEE Account

Purchase Details

Profile Information

Need Help?

MHSDB: A Comprehensive Benchmark for Multimodal Humor and Sarcasm Detection Leveraging Foundation Models

Alerts

Abstract:

Metadata

Abstract:

ISSN Information:

I. Introduction

Authors

Figures

References

Keywords

Metrics

Footnotes

References

IEEE Account

Purchase Details

Profile Information

Need Help?