Variational Sentence Augmentation for Masked Language Modeling | IEEE Conference Publication | IEEE Xplore

Variational Sentence Augmentation for Masked Language Modeling


Abstract:

We introduce a variational sentence augmentation method that consists of Variational Autoencoder [1] and Gated Recurrent Unit [2]. The proposed method for data augmentati...Show More

Abstract:

We introduce a variational sentence augmentation method that consists of Variational Autoencoder [1] and Gated Recurrent Unit [2]. The proposed method for data augmentation benefits from its latent space representation, which encodes semantic and syntactic properties of the language. After learning the representation of the language, the model generates sentences from its latent space with the sequential structure of Gated Recurrent Unit. By augmenting existing unstructured corpus, the model improves Masked Language Modeling on pre-training. As a result, it improves fine-tuning as well. In pre-training, our method increases the prediction rate of masked tokens. In fine-tuning, we show that variational sentence augmentation can help semantic tasks and syntactic tasks. We make our experiments and evaluations on a limited dataset containing Turkish sentences, which also stands for a contribution to low resource languages.
Date of Conference: 06-08 October 2021
Date Added to IEEE Xplore: 18 November 2021
ISBN Information:
Conference Location: Elazig, Turkey

References

References is not available for this document.