Abstract:
Indonesian is the national language of Indonesia. Besides Indonesian, there are 700 foreign and local languages used to communicate in Indonesia. Even though it is used b...Show MoreMetadata
Abstract:
Indonesian is the national language of Indonesia. Besides Indonesian, there are 700 foreign and local languages used to communicate in Indonesia. Even though it is used by more than 275.7 million people, Indonesian and local Indonesian are still not getting more attention in the Natural Language Processing (NLP) community. Currently the Bidirectional Encoder Representations from Transformers or BERT model is a state of the art performance in NLP. This article aims to conduct a review of the BERT model in Indonesian and local Indonesian language. Some of the findings in this article can be used as ideas for developing NLP using the BERT model in Indonesian. From the search, found 7 pretrained BERT models in Indonesian and local Indonesian language. 5 of them are monolingual BERT Model in Bahasa Indonesia and 1 monolingual in Local Indonesian Language (Sundanese). Meanwhile, only 1 multilingual BERT model in Indonesian-Javanese-Sundanese. The majority of monolingual BERT models are in Indonesian, while only 1 monolingual BERT model is in the local Indonesian language, Sundanese. The downstream task of the Indonesian BERT model and the Indonesian local BERT model are sentiment analysis, classification, and text summarization. There are 3 extrinsic evaluation benchmarks for Indonesian BERT, namely IndoNLU, IndoNLG, and IndoLEM.
Published in: 2022 2nd International Conference on Intelligent Cybernetics Technology & Applications (ICICyTA)
Date of Conference: 15-16 December 2022
Date Added to IEEE Xplore: 08 February 2023
ISBN Information: