Journals & Magazines >IEEE Transactions on Computat... >Volume: 11 Issue: 4

Offensive Language Detection for Low Resource Language Using Deep Sequence Model

Download PDF
Download References
Request Permissions
Save to
Alerts

Abstract:

Social media platforms are heavily used by people to express their views in their native languages. Besides positive views, people often use abusive or offensive language...Show More

Metadata

Abstract:

Social media platforms are heavily used by people to express their views in their native languages. Besides positive views, people often use abusive or offensive language to express their anger or frustration. Resource-rich languages have offensive language detection systems which automatically monitor and block offensive content, however, they are very rare for low-resourced languages. This is because of the nonavailability of datasets for local languages. This article proposes a model which automatically detects offensive language for a very low-resource language, i.e., Pashto. The Roman Pashto dataset is created by picking 60 thousand comments from different social media and labeling them manually. The proposed model is trained and tested using three different feature extraction approaches, i.e., bag-of-words (BoW), term frequency-inverse document frequency (TF-IDF), and sequence integer encoding. Four traditional classifiers and a deep sequence model are used to train on this task. Experimental result shows that the random forest classifier works best and give 94.07 % testing accuracy on a combination of unigrams, bigrams, and trigrams. The same classifier gives maximum accuracy of 93.90 % with TF-IDF. However, the overall highest testing accuracy of 97.21% is achieved by using bidirectional long short-term memory (BLSTM). The corpus created in this work is made available for the researcher working in this domain.

Published in: IEEE Transactions on Computational Social Systems ( Volume: 11, Issue: 4, August 2024)

Page(s): 5210 - 5218

Date of Publication: 19 June 2023

ISSN Information:

DOI: 10.1109/TCSS.2023.3280952

Contents

References is not available for this document.

Offensive Language Detection for Low Resource Language Using Deep Sequence Model

Abstract:

Metadata

Abstract:

ISSN Information:

References

IEEE Account

Purchase Details

Profile Information

Need Help?

Offensive Language Detection for Low Resource Language Using Deep Sequence Model

Alerts

Abstract:

Metadata

Abstract:

ISSN Information:

Authors

Figures

References

Citations

Keywords

Metrics

References

IEEE Account

Purchase Details

Profile Information

Need Help?