BERT for Natural Language Processing in Bahasa Indonesia | IEEE Conference Publication | IEEE Xplore

BERT for Natural Language Processing in Bahasa Indonesia


Abstract:

Indonesian is the national language of Indonesia. Besides Indonesian, there are 700 foreign and local languages used to communicate in Indonesia. Even though it is used b...Show More

Abstract:

Indonesian is the national language of Indonesia. Besides Indonesian, there are 700 foreign and local languages used to communicate in Indonesia. Even though it is used by more than 275.7 million people, Indonesian and local Indonesian are still not getting more attention in the Natural Language Processing (NLP) community. Currently the Bidirectional Encoder Representations from Transformers or BERT model is a state of the art performance in NLP. This article aims to conduct a review of the BERT model in Indonesian and local Indonesian language. Some of the findings in this article can be used as ideas for developing NLP using the BERT model in Indonesian. From the search, found 7 pretrained BERT models in Indonesian and local Indonesian language. 5 of them are monolingual BERT Model in Bahasa Indonesia and 1 monolingual in Local Indonesian Language (Sundanese). Meanwhile, only 1 multilingual BERT model in Indonesian-Javanese-Sundanese. The majority of monolingual BERT models are in Indonesian, while only 1 monolingual BERT model is in the local Indonesian language, Sundanese. The downstream task of the Indonesian BERT model and the Indonesian local BERT model are sentiment analysis, classification, and text summarization. There are 3 extrinsic evaluation benchmarks for Indonesian BERT, namely IndoNLU, IndoNLG, and IndoLEM.
Date of Conference: 15-16 December 2022
Date Added to IEEE Xplore: 08 February 2023
ISBN Information:
Conference Location: Bandung, Indonesia

Contact IEEE to Subscribe

References

References is not available for this document.