Loading [MathJax]/extensions/MathMenu.js
Improving Indonesian Informal to Formal Style Transfer via Pre-Training Unlabelled Augmented Data | IEEE Conference Publication | IEEE Xplore

Improving Indonesian Informal to Formal Style Transfer via Pre-Training Unlabelled Augmented Data


Abstract:

Neural Machine Translation is already used for Indonesian informal to formal style transfer. It works by translating the input language to the target language. In Indones...Show More

Abstract:

Neural Machine Translation is already used for Indonesian informal to formal style transfer. It works by translating the input language to the target language. In Indonesian informal-to-formal style transfer task, informal sentence work as an input language, and formal sentence is the target the model needs to translate to. Currently, the STIF parallel dataset is the only manually labelled informal-to-formal dataset. We need sufficient data to achieve a good model for style transfer performance. In contrast, the current Indonesian informal to formal dataset is insufficient. We adopted the pre-train augmentation architecture introduced by work done in GEC tasks to elevate the Low-Resource data. We create the augmented dataset with a simpler word replacement approach. We benchmark several transformer-based pre-trained model architectures, including BART, GPT2, and BERT Encoder Decoder. We train the augmented dataset to all models as a pre-trained model and fine-tune it with the STIF dataset. We perform the sacreBLEU benchmarking techniques to find which approach with better style transfer quality. The result is the BART model that was pre-trained with an augmented dataset and fine-tuned with the STIF dataset with a score of sacreBLEU 53,19.
Date of Conference: 14-15 September 2023
Date Added to IEEE Xplore: 04 December 2023
ISBN Information:
Conference Location: Lombok, Indonesia

Contact IEEE to Subscribe

References

References is not available for this document.