Model framework. It is mainly composed of two parts: text representation and classification.
Abstract:
With the development of the Chinese Internet, a large amount of Chinese short text data has been generated. The multilabel classification of Chinese short texts enables m...Show MoreMetadata
Abstract:
With the development of the Chinese Internet, a large amount of Chinese short text data has been generated. The multilabel classification of Chinese short texts enables more effective management and analysis. However, due to the sparsity of Chinese short text features, and the fact that commonly used multilabel classification models are primarily designed and developed in English, traditional sampling methods can easily lead to poor classification results. In response to these challenges, we propose a Chinese multilabel short text classification method based on GAN and enhanced with pinyin. Firstly, we utilize BERT, augmented by pinyin embedding, as a method for text vector representation to enrich text information. Secondly, multiple hidden layers of BERT are integrated with the generators of the GAN model to comprehensively learn the feature distribution. Finally, the improved sampling method is used to help the model learn better. Experimental results show that the method proposed in this article performs better in processing Chinese multilabel short text classification tasks.
Model framework. It is mainly composed of two parts: text representation and classification.
Published in: IEEE Access ( Volume: 12)
Keywords assist with retrieval of results and provide a means to discovering other relevant content. Learn more.
- IEEE Keywords
- Index Terms
- Classification Methods ,
- Generative Adversarial Networks ,
- Text Classification ,
- Short Text ,
- Chinese Text ,
- Chinese Classification ,
- Short Text Classification ,
- Hidden Layer ,
- Sparsity ,
- Text Data ,
- Chinese Data ,
- Traditional Sampling ,
- Chinese Internet ,
- Generative Adversarial Networks Model ,
- Short Task ,
- Text Classification Tasks ,
- Traditional Sampling Methods ,
- Amount Of Text Data ,
- Multiple Genes ,
- Convolutional Neural Network ,
- Positive Samples ,
- Negative Samples ,
- Traditional Classification Methods ,
- Chinese Characters ,
- BERT Model ,
- Multiple Labels ,
- Text Representation ,
- F1 Score ,
- Convergence Difficulties ,
- Deep Learning Models
- Author Keywords
Keywords assist with retrieval of results and provide a means to discovering other relevant content. Learn more.
- IEEE Keywords
- Index Terms
- Classification Methods ,
- Generative Adversarial Networks ,
- Text Classification ,
- Short Text ,
- Chinese Text ,
- Chinese Classification ,
- Short Text Classification ,
- Hidden Layer ,
- Sparsity ,
- Text Data ,
- Chinese Data ,
- Traditional Sampling ,
- Chinese Internet ,
- Generative Adversarial Networks Model ,
- Short Task ,
- Text Classification Tasks ,
- Traditional Sampling Methods ,
- Amount Of Text Data ,
- Multiple Genes ,
- Convolutional Neural Network ,
- Positive Samples ,
- Negative Samples ,
- Traditional Classification Methods ,
- Chinese Characters ,
- BERT Model ,
- Multiple Labels ,
- Text Representation ,
- F1 Score ,
- Convergence Difficulties ,
- Deep Learning Models
- Author Keywords