Abstract:
The prevalence of adult content on social media has harmful effects on the moral values of young individuals. Therefore, effectively filtering inappropriate content on so...Show MoreMetadata
Abstract:
The prevalence of adult content on social media has harmful effects on the moral values of young individuals. Therefore, effectively filtering inappropriate content on social media like Twitter is essential. Researchers have utilized machine learning and natural language processing techniques to develop automated systems that can identify adult content. However, using Transformer to detect adult content in the Indonesian language has yet to be thoroughly explored. Identifying adult content in a text is relatively challenging due to its subjective and context-dependent nature. The same words can be used in explicit or non-explicit contexts depending on the context or intended meaning of the surrounding text. This study aims to explore the implementation of fine-tuned Transformer-based models for identifying adult and sexually explicit content in Indonesian Twitter texts. We fine-tuned five pre-trained Transformer-based models: IndoBERT, IndoBERTweet, mBERT, XLM-RoBERTa, and DistilmBERT. Based on our experiments, we can see that all the models showed effectiveness in accurately classifying adult and non-adult content. Among the Transformer-based models, XLM-RoBERTa and IndoBERTweet demonstrated effective adult content identification in Indonesian tweets compared to other pre-trained models. XLM-RoBERTa showed a slightly better performance, which can be attributed to its larger size and advanced training techniques.
Published in: 2023 6th International Conference on Applied Computational Intelligence in Information Systems (ACIIS)
Date of Conference: 23-25 October 2023
Date Added to IEEE Xplore: 28 December 2023
ISBN Information: