Journals & Magazines >IEEE Access >Volume: 11

LiDA: Language-Independent Data Augmentation for Text Classification

LiDA: A Language-independent Data Augmentation technique for text classification that works at the sentence embedding level.

Abstract:

Developing a high-performance text classification model in a low-resource language is challenging due to the lack of labeled data. Meanwhile, collecting large amounts of ...Show More

Metadata

Abstract:

Developing a high-performance text classification model in a low-resource language is challenging due to the lack of labeled data. Meanwhile, collecting large amounts of labeled data is cost-inefficient. One approach to increase the amount of labeled data is to create synthetic data using data augmentation techniques. However, most of the available data augmentation techniques work on English data and are highly language-dependent as they perform at the word and sentence level, such as replacing some words or paraphrasing a sentence. We present Language-independent Data Augmentation (LiDA), a technique that utilizes a multilingual language model to create synthetic data from the available training dataset. Unlike other methods, our approach worked on the sentence embedding level independent of any particular language. We evaluated LiDA in three languages on various fractions of the dataset, and the result showed improved performance in both the LSTM and BERT models. Furthermore, we conducted an ablation study to determine the impact of the components in our method on overall performance. The source code of LiDA is available at https://github.com/yest/LiDA.

LiDA: A Language-independent Data Augmentation technique for text classification that works at the sentence embedding level.

Published in: IEEE Access ( Volume: 11)

Page(s): 10894 - 10901

Date of Publication: 03 January 2023

Electronic ISSN: 2169-3536

DOI: 10.1109/ACCESS.2023.3234019

Funding Agency:

Contents

References is not available for this document.

LiDA: Language-Independent Data Augmentation for Text Classification

Abstract:

Metadata

Abstract:

Funding Agency:

References

IEEE Account

Purchase Details

Profile Information

Need Help?

LiDA: Language-Independent Data Augmentation for Text Classification

Alerts

Abstract:

Metadata

Abstract:

Funding Agency:

Authors

Figures

References

Citations

Keywords

Metrics

Footnotes

References

IEEE Account

Purchase Details

Profile Information

Need Help?