Word-Level and Pinyin-Level Based Chinese Short Text Classification

Short text classification is an important branch of Natural Language Processing. Although CNN and RNN have achieved satisfactory results in the text classification tasks, they are difficult to apply to the Chinese short text classification because of the data sparsity and the homophonic typos problems of them. To solve the above problems, word-level and Pinyin-level based Chinese short text classification model is constructed. Since homophones have the same Pinyin, the addition of Pinyin-level features can solve the homophonic typos problem. In addition, due to the introduction of more features, the data sparsity problem of short text can be solved. In order to fully extract the deep hidden features of the short text, a deep learning model based on BiLSTM, Attention and CNN is constructed, and the residual network is used to solve the gradient disappearance problem with the increase of network layers. Additionally, considering that the complex deep learning network structure will increase the text classification time, the Text Center is constructed. When there is a new text input, the text classification task can be quickly realized by calculating the Manhattan distance between the embedding vector of it and the vectors stored in the Text Center. The Accuracy, Precision, Recall and F1 of the proposed model on the simplifyweibo_4_moods dataset are 0.9713, 0.9627, 0.9765 and 0.9696 respectively, and those on the online_shopping_10_cats dataset are 0.9533, 0.9416, 0.9608 and 0.9511 respectively, which are better than that of the baseline method. In addition, the classification time of the proposed model on simplifyweibo_4_moods and online_shopping_10_cats is 0.0042 and 0.0033 respectively, which is far lower than that of the baseline method.


I. INTRODUCTION
The continuous development of social media has gradually made it the main platform for netizens 1 to express their views and opinions. A large number of active netizens publish a large number of micro-blogs, tweets and other short texts bearing user information everyday, which contains abun-The associate editor coordinating the review of this manuscript and approving it for publication was Kathiravan Srinivasan . 1 It refers to all people who conduct network activities through computers and the Internet. dant valuable information reflecting public opinion, social hot spots and user interests. However, how to quickly and accurately mine important information from massive texts according to the personalized needs of society and users is still facing huge challenges. The research of short text classification technology can help the system more efficiently ''understand'' and ''manage'' all kinds of short texts, which has an important role in promoting the development of social intelligence.
As an important research content of Natural Language Processing (NLP) [1], text classification has been widely used in search engines [2], [3], information filtering [4], [5], subject tracking [6], [7], mail classification [8], [9] and other fields. The traditional machine learning classification method divides the whole text classification task into feature engineering and classifier. Feature engineering is divided into three parts: text preprocessing, feature extraction and text representation. The ultimate goal is to convert text into a format that can be understood by the computer and encapsulate enough information for classification, that is, strong feature expression ability. Commonly used machine learning classifiers include Naive Bayes (NB) [10], K-Nearest Neighbor (KNN) [11], Decision Tree (DT) [12], Support Vector Machine (SVM) [13], etc. Feature engineering is a very complicated process, and the quality of feature selection directly determines the classification results of machine learning classifiers. With the continuous development of deep learning models, more and more researchers use deep learning models for text classification.
Deep learning method does not require complex feature engineering, but relies on its powerful ability to fit data distribution to automatically learn rules by training massive data. As Convolutional Neural Network (CNN) [14] can extract text features through one-dimensional convolution operation and retain the most important features through pooling operation, many researchers use CNN for text classification [15]. Text data can be regarded as words that have sequence relationships, which are difficult for CNN to capture [16]. Therefore, Recurrent Neural Network (RNN) [17] is used in text classification due to its powerful ability in processing variable-length sequence input. However, RNN has the problem of gradient disappearance when the text length is too long, and it is difficult to capture long-distance global dependencies [18], which is solved by Long Short-Term memory(LSTM) [19]. Considering that LSTM only propagate forward and can easily ignore important content after the current time node, Bidirectional Long Short-Term Memory (BiLSTM) [20] is proposed to use both forward and backward temporal features for text classification.
Although CNN and RNN have achieved satisfactory results in the text classification tasks, most of them are for English and there are only a few studies on Chinese text classification [21], [22], [23]. With the rapid development of Chinese social network, the classification of Chinese short text becomes more and more important. Since Chinese is based on pictograms and there are many homonyms in Chinese, homophonic typos problem 2 often occurs in Chinese short texts, which is difficult to be handled by traditional text classification models. Considering that Pinyin in Chinese is a system that takes Latin letters as the modern standard Chinese phonetics, which can be used to solve the homophonic typos problem of Chinese short texts. Therefore, word-level and Pinyin-level based short text classification model, WP-STC, is constructed in this paper. Since WP-STC contains both 2 It means that Chinese words are wrongly written into other words with the same Pinyin but other meanings.
word-level and Pinyin-level features, it can also effectively solve the data sparsity problem 3 of short texts.
Considering that RNNs assign the same attention to all contexts and the text classification results of deep learning model alone are always not satisfactory [15], a deep learning model based on BiLSTM, Attention mechanism and CNN is proposed in this paper. BiLSTM is used to obtain bidirectional temporal features, Attention mechanism is used to assign different weights to the context, and CNN is used to extract local features and reduce dimension. In addition, the residual network is used to solve the gradient disappearance problem with the increase of network layers [25].
Additionally, considering that the complex deep learning network structure will increase the text classification time, the concept of Text Center is innovatively proposed. When there is a new text input, the text classification task can be quickly realized by calculating the Manhattan distance between the embedding vector of it and the vectors stored in the Text Center.
The main contributions of this paper are as follows: (1) Word-level and Pinyin-level based Chinese short text classification model is proposed, which can well solve the data sparsity and the frequent occurrence of homophonic typos problems of Chinese short texts.
(2) A deep learning model based on BiLSTM, Attention mechanism, CNN and residual network is proposed. BiLSTM is used to obtain bidirectional temporal features, Attention is used to assign different weights to the context, and CNN is used to extract local features and reduce dimension. In addition, the residual network is used to solve the gradient disappearance problem with the increase of network layers.
(3) The concept of Text Center is innovatively proposed, which can greatly reduce the classification time by just comparing the embedding vector of the input text with the vectors stored in the Text Center.
(4) Multi-group comparison experiments on two public baseline datasets prove that the proposed model not only get the best classification Accuracy, Precision, Recall and F1, but also greatly reduces the classification time.
The remainder of this work is organized as follows. Section II introduces the latest research results of text classification. Section III introduces the background of LSTM, Attention mechanism and CNN. Section IV introduces the consturction of the proposed model. Section V introduces the experimental results of the proposed model and the comparison models. Section VI provides summary and future work.

II. RELATED WORK A. TEXT CLASSIFICATION
Commonly used text classification methods include machine learning and deep learning.

1) MACHINE LEARNING
Luo et al. [13] implemented the SVM in classifying English text and documents. Experimental results performed on a set of 1033 text document demonstrated that the classifier provided the best results when the feature set size is small. Richard et al. [26] reviewd how the NLP techniques of TF-IDF combined with the supervised machine learning model of SVM and word embedding approaches such as Word2vec could be used to categorize/label protocol deviations across multiple therapeutic areas. Annalisa et al. [27] presented an updated survey of 12 machine learning text classifiers applied to a public spam corpus. They proposed a new pipeline to optimise hyperparameter selection and improve the models' performance by applying specific methods (based on NLP) in the preprocessing stage. Luo [13] implemented the SVM model in classifying English text and documents. Abdalla et al. [28] proposd similarity measures with machine learning models and presented benchmarking studies for integration methodology over balanced/imbalanced datasets.

2) DEEP LEARNING
Machine learning methods highly rely on feature engineering, which is very complex and time-consuming, therefore. Since deep learning methods do not need complex feature engineering, more and more researchers use them for text classification.
Cheng et al. [29] proposed a novel text classification model based on hierarchical self-attention mechanism capsule network, which was composed of the capsule network and hierarchical self-attention network. The experimental results on 5 text classification datasets showed that the proposed model achieved the best classification results compared with other baseline models. Kong et al. [30] proposed a hierarchical BERT with an adaptive fine-tuning strategy (HAdaBERT). HAdaBERT consisted of an attention-based gated memory network as the global encoder and a BERT-based model as the local encoder. Experimental results on different corpora indicated that HAdaBERT outperformed the state-of-theart pretrained language models. Sergio et al. [31] proposed Stacked DeBERT, which improved the robustness of incomplete data by designing a novel encoding scheme in BERT. Stacked DeBERT took advantage of stacks of multilayer perceptrons for the reconstruction of missing words' embeddings by extracting more abstract and meaningful hidden feature vectors, and bidirectional transformers for improved embedding representation. Liu et al. [32] proposed a Co-attention Network with Label Embedding (CNLE) that jointly encoded the labels and text into their mutually attended representations, which was able to attend to the relevant parts of both. Experiments showed that CNLE achieved competitive results on 2 multi-label and 7 multi-class classification benchmarks. Zhang [33] proposed a news text classification method based on the combination of deep learning (DL) algorithms. Gao et al. [34] introduced four methods to scale BERT to perform document classification on clinical texts several thousand words long.

B. SHORT TEXT CLASSIFICATION
Although machine learning and deep learning models are widely used text classification models, considering the data sparsity of short texts, simple machine learning or deep learning network cannot classify them well. Therefore, researchers use more complex models for short text classification.
Yang et al. [35] proposed a novel heterogeneous Graph Neural Network (GNN)-based method for semi-supervised short text classification, leveraging full advantage of limited labeled data and large unlabeled data through information propagation along the graph. Wang et al. [36] proposed SHINE, which used Graph Neural Network (GNN) for short text classification. Wang et al. [37] proposed a short text classification method based on semantic extension and CNN. Škrlj et al. [39] proposed tax2vec, a parallel algorithm for constructing taxonomy-based features, which could well solve the data sparsity problem of short texts. Zhou et al. [40] proposed a semantic extension-based classification algorithm for short texts and both ordinary 1D convolution and atrous convolution are performed. The model achieved the best classification results and had lower computational complexity than BERTbase. Yu et al. [41] proposed Deep Pyramid Temporal Convolutional Network for short text classification, which was mainly consisting of concatenated embedding layer, causal convolution, 1/2 max pooling down-sampling and residual blocks.

C. CHINESE SHORT TEXT CLASSIFICATION
Although deep learning models have achieved satisfactory results in short text classification tasks, most of them are for English and there are only a few studies on Chinese short text classification.
Hao et al. [21] proposed a Mutual-Attention CNN framework, which integrated features at the word and character levels for Chinese short text classification. Lyu et al. [22] introduced HowNet 4 as an external knowledge base, and proposed a language knowledge enhancement graph converter to deal with Chinese word ambiguity problem. Feng et al. [23] proposed a sentiment classification model for short Chinese texts. The model combined word features with part-of-speech features, position features and dependent syntactic features to form three new combined features, which were input into the multi-channel CNN, and combined with the multi-head attention mechanism to more fully integrate the multi-head attention mechanism. Yang et al. [42] proposed a character-word graph attention network to explore the interactive information between characters and words for Chinese text classification.
The existing Chinese short text classification models are basically based on the characteristics of words or characters, and it is difficult to deal with the homophonic typos problem that often occurs in Chinese short texts. In order to overcome the above problem, word-level and Pinyin-level based Chinese short text classification model, WP-STC, is constructed in this paper.

III. BACKGROUND
A. LSTM LSTM [19] is a special type of Recurrent Neural Network (RNN) [43] designed to deal with the gradient disappearance problem faced by RNNs. Like other types of RNNs, LSTM generates its outputs based on the input of the current time step and the output of the previous time step, and sends the current output to the next time step. Each LSTM cell consists of a memory cell c t that maintains its state at arbitrary time intervals and three nonlinear gates, including an input gate i t , a forget gate f t , and an output gate o t , which are designed to regulate the flow of information into and out of the memory cell. An LSTM containing a hidden layer is defined as follows [19].
where i t , f t , o t and c t denote the input gate, forget gate, output gate, and memory cell activation vector at moment t, respectively. σ (.) denotes the logistic sigmoid function, ⊗ denotes the element-level multiplication operation. The and b c denote the bias terms. d and H denote the dimension and the hidden layer of the input.

B. ATTENTION MECHANISM
The Attention mechanism [38] can highlight important information from contextual information by setting different weights, thus paying more attention to the parts that are similar to the elements of the input and suppressing other useless information.
Let h = (h 1 , h 2 , · · · , h T ) denote the input of the Attention network, then the correlation e tj between the jth input h j and the current hidden state s t−1 is calculated as follows [38]: where score() denotes the weight-focused multiplication.
Assuming that v is the trainable parameter, the associated likelihood a tj and the content vector c t for time step t are calculated as follows [38]: The CNN [14] for text classification includes multiple onedimensional (1D) convolution layers and pooling layers. 1D convolution is used to extract deep features, and pooling is used to further obtain important features and reduce dimensions. For the Chinese text S containing s embedding vectors, assuming that the length of each embedding vector is e, then S is a matrix with the size of s × e. Multiple linear filters with the size of h×e are used to perform the convolution operation to generate the feature map M = [m 0 , m 1 , · · · , m s−h ]. h is the length of the linear filter and 1 ≤ h ≤ s. Let S i:j represents the matrix from the ith word to the jth word in S, then the ith feature sequence of M is generated by the following formula [14]: where f () denotes the activation function Relu, W denotes the weight and b denotes the bias term. After the convolution operation, the max pooling is used to reduce the dimension and extract the key features b [50]:

IV. MODEL CONSTRUCTION
WP-STC is constructed in this paper for Chinese short text classification, and the structure is shown in Figure 1. In the first setp, the Chinese text of input layer is transformed into word-level and Pinyin-level features. In the second step, the word-level and Pinyin-level features are transformed into embedding vectors by pre-trained Word2vec [44]. In the third step, a deep learning model, Bi-Att-CNN, which is based on BiLSTM, Attention and CNN is constructed, and the word-level and Pinyin-level embedding vectors are both fed into Bi-Att-CNN to obtain the hidden vectors. In the fourth step, the word-level and Pinyin-level hidden vectors are added together to fed into the fully connected network and the softmax function is used to get the classification results. At the same time, the word-level and Pinyin-level hidden vectors are used to get the Text Center of different text categories. When there is a new text input, the text classification task can be quickly realized by calculating the Manhattan distance between the embedding vector of it and the vectors stored in the Text Center. The implementation process of the model is described in detail below.

A. ACQUISITION OF WORD-LEVEL AND PINYIN-LEVEL FEATURES
The processes of converting Chinese text in input layer into word-level and Pinyin-level features are shown in Figure 2.
Given a Chinese short text T of length z, firstly, T is transformed into word-level feature T word = {w 1 , w 2 , · · · , w b } containing b words by jieba, 5 and then each Pinyin of T word is obtained by pyPinyin 6 to get the Pinyin-level feature  T Pinyin = {p 1 , p 2 , · · · , p z }. For every word has a Pinyin, then b = z.

B. ACQUISITION OF EMBEDDING VECTOR
Word2vec [45] is the most commonly used static word vector model, which can convert word and Pinyin into fixed length word vectors and a pre-trained Word2vec [44] is used to transform both T word and T Pinyin into embedding vectors For computational convenience, assume that the length of each word and Pinyin embedding vector is p, then V word ∈ R b×p , V Pinyin ∈ R z×p .

C. CONSTRUCTION OF BI-ATT-CNN
The deep learning model based on BiLSTM, Attention and CNN, Bi-Att-CNN, is constructed to get the hidden vectors of V word and V Pinyin , and the structure of Bi-Att-CNN is shown in Figure 3. Bi-Att-CNN can well extract the deep hidden features of the embedding vectors of the embedding layer. The specific processes are as follows. Firstly, V word and V Pinyin from the embedding layer are fed into BiLSTM to obtain the forward and backward hidden vectors Secondly, the forward and backward hidden vectors are concatenated to obtain the bi-directional timing vectors h word and h Pinyin : Thirdly, h word and h Pinyin are fed into the Attention network separately, so as to assign different attentions to the context. Specifically, at first, the hidden representations x word and x Pinyin of h word and h Pinyin are obtained respectively: x Pinyin = tanh(W Pinyin h Pinyin + b Pinyin ) where W word and W Pinyin denote weights and b word and b Pinyin denote bias terms. Next, the importances of the context are calculated based on the similarities between y word , y Pinyin and x word , x Pinyin , where y word and y Pinyin are the randomly initialized context vectors. After obtaining the weights, the softmax function is used to normalize them to obtain the weight vectors r word and r Pinyin : r Pinyin = exp(x Pinyin y Pinyin ) exp(x Pinyin y Pinyin ) Finally, the comment vectors f word and f Pinyin containing all contextual attention are obtained by weighted summation of r word and r Pinyin : Fourthly, the residual network is used to solve the gradient disappearance problem with the increase of network layers by adding the output vectors of the Attention network and the initial embedding vectors: (24) fr Pinyin = r Pinyin + V Pinyin (25) Fifthly, fr word and fr Pinyin are fed into the convolutional networks of CNN to obtain the results of convolutional operation t word and t Pinyin : t Pinyin = relu(W cnn Pinyin fr Pinyin + b cnn Pinyin ) where W cnn word and W cnn Pinyin denote weights and b cnn word and b cnn Pinyin denote bias terms. Sixthly, t word and t Pinyin are fed into the max pooling networks of CNN to get the result of pooling operation g word and g Pinyin : Seventhly, g word and g Pinyin are added to get the output vector g = g word + g Pinyin of Bi-Att-CNN network.

D. CLASSIFICATION RESULTS
The output vector g of the Bi-Att-CNN network is fed to the fully connected network, and then the softmax function is used to get the final classification result label. label = argmax(softmax(tanh(gW g ))) (30) where W g denotes the weight.

E. TEXT CENTER
Since the number of samples used for training is always large, it is very time-consuming to traverse all training samples and compute their distances from the new input text when a new text input is available. In order to reduce the computational complexity, the concept of Text Center is proposed. When there is a new text input, the text classification task can be quickly realized by calculating the Manhattan distance [46] between the embedding vector of it and the vectors stored in the Text Center. The original and improved calculation methods are shown in Figure 4 and Figure 5, respectively. We analyze the calculation process of the original method and Text Center separately. Suppose the example of the sample library is denoted as X = {X 1 , X 2 , · · · , X k }, k is the length  of the sample library, X i = {x i1 , x i2 , · · · , x in }, n is the length of the feature dimension, x label = {1, 2, · · · , a}, a denotes the total class of the text. The word vector corresponding to the new short text to be classified is assumed to be Y = {y 1 , y 2 , · · · , y n }. The process of text classification by the original method is as follows.
It is not difficult to obtain the time complexity to be o(k × n). In order to reduce the amount of computation and speed up the classification, the time complexity can be reduced by establishing the Text Center of the sample library. The specific process is as follows.
First, the Text Center T 1 -T a are established for each of the a categories of text: T a = {t a1 , t a2 , · · · , t an } VOLUME 10, 2022 x ai , X label = a (32) where k i denotes the number of samples labeled i, i = 1, 2, · · · , a. The improved similarity is calculated as follows: TestT a = |T a − Y | = |t a1 − y 1 | + |t a2 − y 2 | + · · · + |t an − y n | y label = x label min(TestT 1 , TestT 2 , . . . , TestT a ) (33) It is easy to see that only a calculations need to be performed to derive the classification results for the new input text.

A. DATASET
Two Chinese baseline datasets are selected, and the details are as follows. simplifyweibo_4_moods 7 : The dataset contains more than 200,000 Weibo 8 records labeled with four types of emotions, including about 50,000 joyful, angry, disgusted and depressed Weibo records each. online_shopping_10_cats 9 : The dataset contains more than 60,000 reviews for products in 10 categories, with about 30,000 positive and 30,000 negative reviews each.

KNN [47]: A commonly used machine learning algorithm for text classification.
Decision Tree (DT) [48]: A commonly used machine learning algorithm for sentiment classification.
SVM [49]: A commonly used machine learning algorithm for text classification.
TextCNN [50]: This model extracts local features by convolution operation and dimensionality reduction by pooling operation, and is a commonly used deep learning algorithm for text classification.
LSTM [19]: This model effectively solves the gradient disappearance problem of RNN by setting multiple gates, and is a commonly used deep learning algorithm for text classification.
BiLSTM [20]: This model takes into account the forward and backward temporal content, and is a commonly used deep learning algorithm for text classification.
HGAT [35]: This model is a heterogeneous GNN-based method for semi-supervised short text classification, leverag-7 https://zhuanlan.zhihu.com/p/80029681 8 One of the largest social platforms in China. 9 https://github.com/SophonPlus/ChineseNlpCorpus/raw/master/datasets/ ing full advantage of limited labeled data and large unlabeled data through information propagation along the graph. SHINE [36]: This model first models the short text as a hierarchical heterogeneous graph consisting of word-level component graphs which introduce more semantic and syntactic information. Then, the model dynamically learns a short document graph to facilitate effective label propagation among similar short texts.
T-SNE [51]: This model proposes a method to generate a word-level emotion distribution vector for short text classification.
BERT-TER [52]: This model proposes a dual-channel system for multi-class short text emotion recognition, and develops a technique to explain its training and predictions.
The baseline methods are reproduced according to the references, and the optimal values of each baseline method after repeated parameter adjustment are obtained as the experimental results.

C. EVALUATION CRITERIA
Accuracy, Precision, Recall and F1 are used as evaluation criteria, which are calculated as follows: where P and N denote the number of forward and reverse samples, respectively, TP and TN denote the number of correctly predicted forward and reverse samples, respectively, and FP and FN denote the number of incorrectly predicted forward and reverse samples, respectively [53].

D. EXPERIMENTAL ENVIRONMENT
Deep learning model based on BiLSTM, Attention and CNN, Bi-Att-CNN, is constructed in this paper. There are three layers of BiLSTM networks, and the number of neurons is 256, 256, 128, respectively. Relu is used as activation function, and 0.5 dropout is used at the end of each layer. There are two layers of CNN networks. The first layer contains 128 convolutional kernels of size 3, and the size of pooling template is 3. The second layer contains 64 convolution kernels of size 3, and the size of pooling template is 3. Two layers of fully connected networks are contained. The number of neurons in the first layer is 256, and that in the second layer is equal to the number of text categories in the dataset. The length of the embedding vector obtained through Word2vec is 100. The learning rate is set to 0.0001, the batch size is set to 64, and the epoch is set to 100.

E. EXPERIMENTAL RESULTS
In this section, multiple sets of experiments are conducted to verify the classification effects of WP-STC, WP-STC with Text Center and baseline methods. To facilitate understanding, WP-STC-Center is used to represent WP-STC with Text Center. For all experiments, the ratios of train set, test set and validation set on the two baseline datasets are set to 7:2:1. In addition, in order to make the classification results more accurate, 5-fold cross-validation is adopted in all experiments. WP-STC, WP-STC-Center and baseline methods are applied to the two baseline datasets, respectively, and the classification results are shown in Tables 1 and 2, respectively. It can be seen that the classification results of WP-STC on the two baseline datasets are better than all baseline methods. The Accuracy, Precision, Recall and F1 are improved by at least 0.0538, 0.0533, 0.0625 and 0.0579, respectively. However, although WP-STC outperforms the state-of-the-art baseline method, it takes the second longest time, while WP-STC-Center takes the shortest time and has the best classification Accuracy, Precision, Recall and F1, which indicates that the text-centered classification model proposed in this paper can not only improve the classification Accuracy, Precision, Recall and F1, but also reduce the classification time.

2) COMPARISON OF DIFFERENT DISTANCE METHODS
Text Center is constructed in this paper for fast implementation of classification of new input text. Considering the simplicity and accuracy of Manhattan distance [46], the category of new input text is obtained by calculating Manhattan distance between the embedding vector of the new input text and the vectors stored in the Text Center. Since there are various methods to calculate the distance between vectors, the following experiments are conducted to demonstrate the superiority of Manhattan distance. Besides Manhattan distance, Euler distance [54], Minkowski distance [55], Chebyshev distance [56], and cosine similarity [57] are also selected, and the experimental results are shown in Tables 3 and 4, respectively. From Tables 3 and 4, it can be seen that Manhattan distance achieves the optimal Accuracy, Precision, Recall, F1 and classification time on both baseline datasets.

3) IMPORTANCE ANALYSIS OF BI-ATT-CNN
To further demonstrate the contribution of Bi-Att-CNN to text classification results, a group of comparative experiments are   conducted. Keep other structures in WP-STC and WP-STC-Center unchanged, and replace Bi-Att-CNN with baseline methods respectively. The classification results are shown in Figures 10-13 respectively. It can be seen that when Bi-Att-CNN is replaced by baseline methods, the classification results are greatly improved compared with baseline methods, but still lower than WP-STC and WP-STC-Center using Bi-Att-CNN, which not only shows that the word-level and Pinyin-level classification models constructed in this paper greatly improve the classification results of Chinese short texts, but also shows that the deep learning model based on BiLSTM, Attention and CNN constructed in this paper outperforms all the baseline methods.

4) CLASSIFICATION RESULTS FOR IMBALANCED DATASETS
The baseline datasets selected in this paper are all balanced datasets, but the actual situation is often that the forward short texts are much larger than the reverse short texts. Therefore, the classification results on the unbalanced datasets are   tested by the following experiments. The ratios of forward and reverse short texts in both dataset are set to 1:1-10:1, respectively, and WP-STC and WP-STC-Center are used for text classification. The classification results are shown in Tables 5-8, respectively. From Tables 5-8, it can be seen that the classification results of both WP-STC and WP-STC-Center gradually decrease as the ratio of forward and reverse short texts gradually increases. Although the classification results has decreased, they still remain at high values. Even when the ratio of forward and reverse short texts is 10:1, the classification Accuracy, Precision, Recall and F1 of WP-STC on both datasets are all greater than 0.9, and the classification Accuracy, Precision, Recall and F1 of WP-STC-Center on both datasets are all greater than 0.91. Therefore, WP-STC and WP-STC-Center are applicable to both balanced and unbalanced datasets.

5) ANALYSIS OF THE EFFECT OF SOLVING HOMOPHONIC TYPOS PROBLEM
WP-STC can solve the homophonic typos problem that often occurs in Chinese short texts by adding the Pinyin-level feature. The following experiments are carried out to verify the above conclusion. 1000 short text comments from Weibo 10 are crawled as a test set, and a word in each short text is manually and randomly selected to change into a homonym. The trained WP-STC in simplifyweibo_4_moods dataset with and without Pinyin-level features are used to test the classification effect. It can be seen from the experimental results that the classification Accuracy of WP-STC with Pinyin-level feature is 0.9227, and that of WP-STC without Pinyin-level feature is 0.7866. Therefore, it can be concluded that WP-STC can well solve the homophonic typos problem of Chinese short texts.

VI. CONCLUSION
Word-level and Pinyin-level based Chinese short text classification model is constructed in this paper, and a deep learning model based on BiLSTM, Attention and CNN is proposed. In order to reduce the classification time, the concept of Text Center is innovatively proposed. Multiple experiments on two baseline datasets demonstrate that WP-STC with Text Center not only outperforms the state-of-the-art text classification model in terms of classification Accuracy, Precision, Recall and F1, but also greatly reduces the classification time.
However, since different Text Centers are constructed for different datasets, the classification results of data outside this dataset is not ideal after being input into the Text Center. Therefore, the future work of us will focus on improving the 10 One of the largest social platforms in China. VOLUME 10, 2022 robustness of the model, so that the Text Center does not rely on the construction of a single dataset.