Chinese Multilabel Short Text Classification Method Based on GAN and Pinyin Embedding | IEEE Journals & Magazine | IEEE Xplore

Chinese Multilabel Short Text Classification Method Based on GAN and Pinyin Embedding


Model framework. It is mainly composed of two parts: text representation and classification.

Abstract:

With the development of the Chinese Internet, a large amount of Chinese short text data has been generated. The multilabel classification of Chinese short texts enables m...Show More

Abstract:

With the development of the Chinese Internet, a large amount of Chinese short text data has been generated. The multilabel classification of Chinese short texts enables more effective management and analysis. However, due to the sparsity of Chinese short text features, and the fact that commonly used multilabel classification models are primarily designed and developed in English, traditional sampling methods can easily lead to poor classification results. In response to these challenges, we propose a Chinese multilabel short text classification method based on GAN and enhanced with pinyin. Firstly, we utilize BERT, augmented by pinyin embedding, as a method for text vector representation to enrich text information. Secondly, multiple hidden layers of BERT are integrated with the generators of the GAN model to comprehensively learn the feature distribution. Finally, the improved sampling method is used to help the model learn better. Experimental results show that the method proposed in this article performs better in processing Chinese multilabel short text classification tasks.
Model framework. It is mainly composed of two parts: text representation and classification.
Published in: IEEE Access ( Volume: 12)
Page(s): 83323 - 83329
Date of Publication: 11 June 2024
Electronic ISSN: 2169-3536

References

References is not available for this document.