Loading [MathJax]/extensions/MathMenu.js
Separating Hate Speech from Abusive Language on Indonesian Twitter | IEEE Conference Publication | IEEE Xplore

Separating Hate Speech from Abusive Language on Indonesian Twitter


Abstract:

Social media is an effective tool for connecting with people and distributing information. However, many people often use social media to spread hate speech and abusive l...Show More

Abstract:

Social media is an effective tool for connecting with people and distributing information. However, many people often use social media to spread hate speech and abusive languages. In contrast to hate speech, abusive languages are frequently used as jokes with no purpose of offending individuals or groups, even though they may contain profanities. As a result, the distinction between hate speech and abusive language is often blurred. In many cases, individuals who spread hate speech may be prosecuted as it has legal implications. Previous research has focused on binary classification of hate speech and normal tweets. This study aims to classify hate speech, abusive language, and normal messages on Indonesian Twitter. Several machine learning models, such as logistic regression and BERT models, are utilized to accomplish text classification tasks. The model's performance is assessed using the F1-Score evaluation metric. The results show that BERT models outperform other models in terms of F1-Score, with the BERT-indobenchmark model, which was pretrained on social media text data, achieving the highest F1-Score of 85.59. This also demonstrates that pretraining the BERT model using social media data improves the classification model significantly. Developing such classification model that can distinguish between hate speech and abusive language would help individuals in preventing the spread of hate speech that has legal implications.
Date of Conference: 06-07 July 2022
Date Added to IEEE Xplore: 25 August 2022
ISBN Information:
Conference Location: Bandung, Indonesia
Department of Computer Science, School of Computer Science, Bina Nusantara University, Indonesia
Department of Statistics, School of Computer Science, Bina Nusantara University, Indonesia
Department of Statistics, School of Computer Science, Bina Nusantara University, Indonesia
Department of Statistics, School of Computer Science, Bina Nusantara University, Indonesia
Department of Mathematics Education, STKIP Surya, Indonesia
Mathematics Education Department, Universitas Ahmad Dahlan, Indonesia

I. Introduction

Any speech directed at a person or group that conveys hatred based on something about that person or group is considered hate speech. Ethnicity, religion, handicap, gender, and sexual orientation are all commonly used to justify hatred. Hate speech propagation is a dangerous practice that can lead to prejudice, societal turmoil, and even genocide. In ordinary life, hate speech is frequently accompanied by abusive language, particularly on social media [3]. Abusive language is an expression that incorporates offensive words or profanities aimed at individuals or groups. Hate speech that includes harsh words/phrases that provoke emotions frequently increases the initiation of social conflict [4]. In Indonesia, abusive phrases are mainly formed from an unpleasant situation such as mental illness, sexual deviation, physical impairment, a condition where someone lacks etiquette, and other conditions connected to unfortunate circumstances; animals with a negative trait; astral creatures that regularly interfere with human existence; a dirty and filthy environment. [5]. Due to the use of abusive words/phrases that stimulate emotions, the spread of hate speech accompanied with abusive language generally increases the prevalence of social conflict. [6]. Even though harsh language is sometimes used as a joke (not to insult someone), its use on social media can nevertheless cause conflict owing to misconceptions among users. Despite being relatively close, abusive language is not necessarily hate speech [7]. To reduce conflicts between individuals and children who are exposed to hate speech and abusive language from the social media they use, hate speech and abusive language on social media must be monitored [8]. In recent years, some researchers have investigated hate speech identification and abusive language detection in various methods. [9]. Hate speech has a distinct objective, classification, and degree while abusive language is not categorized into any specific target, group, or levels [4]. Hate speeches are intended towards a specific individual or group with a high level of animosity and fall under a variety of categories, including ethnicity, religion, race, sexual orientation, and others [10].

Department of Computer Science, School of Computer Science, Bina Nusantara University, Indonesia
Department of Statistics, School of Computer Science, Bina Nusantara University, Indonesia
Department of Statistics, School of Computer Science, Bina Nusantara University, Indonesia
Department of Statistics, School of Computer Science, Bina Nusantara University, Indonesia
Department of Mathematics Education, STKIP Surya, Indonesia
Mathematics Education Department, Universitas Ahmad Dahlan, Indonesia

Contact IEEE to Subscribe

References

References is not available for this document.