Abstract:
The rise in the number of active online users has subsequently increased the number of cyber abuse incidents being reported as well. Such events pose a harm to the privac...Show MoreMetadata
Abstract:
The rise in the number of active online users has subsequently increased the number of cyber abuse incidents being reported as well. Such events pose a harm to the privacy and liberty of users in the digital space. Conventionally, manual moderation and reporting mechanisms have been used to ensure that no such text is present online. However, there have been some flaws in this method including dependency on humans, increased delays and reduced data privacy. Previous approaches to automate this process have involved using supervised machine learning and traditional recurrent sequence models which tend to perform poorly on non-English text. Given the rising diversity of users being a part of the cyberspace, a flexible solution able to accommodate multilingual text is the need of the hour. Furthermore, text in colloquial languages often hold pertinent context and emotion that is lost after translation. In this paper, we propose a generative deep-learning based approach which involves the use of bidirectional transformer-based BERT architecture for cyber abuse detection across English, Hindi and code-mixed Hindi English(Hinglish) text. The proposed architecture can achieve state-of-the-art results on the code-mixed Hindi dataset in the TRAC-1 standard aggression identification task while being able to achieve very good results on the English task leaderboard as well. The results achieved are without using any ensemble-based methods or multiple models and thus prove to be a better alternative to the existing approaches. Deep learning based models which perform well on multilingual text will be able to handle a broader range of inputs and thus can prove to be crucial in cracking down on such social evils.
Published in: TENCON 2019 - 2019 IEEE Region 10 Conference (TENCON)
Date of Conference: 17-20 October 2019
Date Added to IEEE Xplore: 12 December 2019
ISBN Information: