Loading [MathJax]/extensions/MathZoom.js
IdSarcasm: Benchmarking and Evaluating Language Models for Indonesian Sarcasm Detection | IEEE Journals & Magazine | IEEE Xplore

IdSarcasm: Benchmarking and Evaluating Language Models for Indonesian Sarcasm Detection


Data processing pipeline for the creation of the Reddit Indonesia sarcasm dataset.

Abstract:

Sarcasm detection in the Indonesian language poses a unique set of challenges due to the linguistic nuances and cultural specificities of the Indonesian social media land...Show More

Abstract:

Sarcasm detection in the Indonesian language poses a unique set of challenges due to the linguistic nuances and cultural specificities of the Indonesian social media landscape. Understanding the dynamics of sarcasm in this context requires a deep dive into language patterns and the socio-cultural background that shapes the use of sarcasm as a form of criticism and expression. In this study, we developed the first publicly available Indonesian sarcasm detection benchmark datasets from social media texts. We extensively investigated the results of classical machine learning algorithms, pre-trained language models, and recent large language models (LLMs). Our findings show that fine-tuning pre-trained language models is still superior to other techniques, achieving F1 scores of 62.74% and 76.92% on the Reddit and Twitter subsets respectively. Further, we show that recent LLMs fail to perform zero-shot classification for sarcasm detection and that tackling data imbalance requires a more sophisticated data augmentation approach than our basic methods.
Data processing pipeline for the creation of the Reddit Indonesia sarcasm dataset.
Published in: IEEE Access ( Volume: 12)
Page(s): 87323 - 87332
Date of Publication: 20 June 2024
Electronic ISSN: 2169-3536

Funding Agency:


References

References is not available for this document.