Network-Based Bag-of-Words Model for Text Classification

The rapidly developing internet and other media have produced a tremendous amount of text data, making it a challenging and valuable task to find a more effective way to analyze text data by machine. Text representation is the first step for a machine to understand the text, and the commonly used text representation method is the Bag-of-Words (BoW) model. To form the vector representation of a document, the BoW model separately matches and counts each element in the document, neglecting much correlation information among words. In this paper, we propose a network-based bag-of-words model, which collects high-level structural and semantic meaning of the words. Because the structural and semantic information of a network reflects the relationship between nodes, the proposed model can distinguish the relation of words. We apply the proposed model to text classification and compare the performance of the proposed model with different text representation methods on four document datasets. The results show that the proposed method achieves the best performance with high efficiency. Using the Eccentricity property of the network as features can get the highest accuracy. We also investigate the influence of different network structures in the proposed method. Experimental results reveal that, for text classification, the dynamic network is more suitable than the static network and the hybrid network.


I. INTRODUCTION
During the last decades, people have witnessed the impact of the advancement of information technology. The rapid development of social media on the internet has been producing more and more information, in which text information plays a significant role. Meanwhile, a typical scenario is how to classify text data into topic sets by computer so that people can conveniently search the data they want. The text classification task, which assigns the documents to the best-suited topic, has drawn much attention from researchers.
A typical text classification work includes text preprocessing, feature selection, feature extraction, similarity computation, and classifier determination [1]. Though owing to the advantage in understanding human language, it is natural for people to judge whether a document belongs to a particular topic directly by reading and understanding, this process is not practical for a computer. So the text classification of a computer starts with the text representation, which transfers The associate editor coordinating the review of this manuscript and approving it for publication was Dominik Strzalka . text data to the form that is convenient for computer processing. The commonly used text representation method is the bag-of-words (BoW) model [2]- [4]. This model maps a document into a vector as v = [x 1 , x 2 , . . . , x n ], where x i denotes the occurrence of the ith word in basic terms. The basic terms are collected from the datasets, which are usually the top n highest-frequency words. The value of the occurrence feature can be a binary, term frequency, or TF-IDF. A binary value denotes whether the ith word is presented in a document, which reckons without the weight of words. The term frequency is the number of occurrences of each word. Generally, the word with high frequency in a document contains the representative idea about this document, with the exception that some words may have high frequency among all documents. TF-IDF (term frequency-inverse document frequency) balances the weight of the words that always have a high frequency. It assumes that the importance of a word increases proportionally to its frequency in a document but is offset by its frequency in the whole corpus [5], [6]. Though the BoW model is a useful and straightforward method for text representation, there are still some problems. VOLUME 8, 2020 This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/ The value of x i , whether in binary, term frequency, or TF-IDF form, is matched and counted without considering the influence of others words. So the processing of text data may lose much context information without dealing with correlated words. To illustrate this limitation, we provide two simple sentences as a toy example: Sen 1, ''a cat ate a small white mouse;'' Sen 2, ''a small white mouse ate a cat.'' The basic terms are (cat, eat, small, white, mouse), and for both two sentences, each word in basic terms occurs once. The BoW model will project Sen 1 and Sen 2 to the same vector, i.e. v 1 = v 2 = [1, 1, 1, 1, 1], though the two sentences have the opposite meaning.
In this paper, we adopt the network model to overcome the limitation of the BoW model mentioned above. The complex network is now attracting much attention in the study of real-world systems [7] (such as social systems, biological systems, and authors systems). The advantage of the network model to analyze text data is that through the network tools, one can have an insight view of several features of texts, e.g., complexity [8], and symmetry [9]. By using the network model, we can take more context information of the text into account. To extend the application of the network model to BoW, we come up with a network-based strategy: Attribute of Network Extended to BoW (AEBoW). AEBoW maps documents to vectors in which the value of the corresponding word is replaced by the weight of the network node attribute. The main difference between AEBoW and BoW is that the value of x i will not only match the frequency of the ith basic term but also match the role it plays in high-level features of the text, e.g., the structural and semantic difference. By using the Degree of the network model, the AEBoW model will project Sen 1 and Sen 2 to v 1 = [1, 2, 2, 2, 1] and v 2 = [1, 2, 1, 2, 2], which can capture the meaning difference of two sentences (see details in section IV.F).
We summarize the main contributions of this paper as follows " We propose the AEBoW model to maintain correlated information among the words in the text. " We demonstrate the efficiency of the AEBoW model by applying it to text classification. We also verify the performance of the proposed model by comparing it with seven text representation methods and the word embedding model (deep learning method) on four different datasets. " We present the results of the AEBoW model based on three kinds of network tools: the dynamic network, the static network, and the hybrid network. " By comparing the performance of the AEBoW model based on different kinds of networks, we observe the dynamic network is more suitable for the AEBoW model. This paper is organized as follows. In Section II, we introduce some related works, including the studies on text representation and text complex networks. The proposed AEBoW model is presented in Section III. In Section IV, we give the experimental results on the performance of the proposed model and the comparison with different representation methods. We extend the proposed model to more possible applications in Section V. And, at last, we provide the concluding remarks in Section VI.

II. RELATED WORK
Because our work aims to incorporate the network model into BoW, in this section, we give a brief review of these two associated works, respectively.

A. TEXT REPRESENTATION METHODS
In the field of text data mining, text representation is the keystone for the computer to understand. Though the BoW model is simple and commonly used, it suffers from the sparsity with high dimensionality and the loss of relations among words. To improve the BoW model, researchers have proposed some methods like latent semantic analysis (LSA) [10] and topic model [11]. LSA applies the singular value decomposition (SVD) to transfer the original BoW representation to the vectors with a lower dimension. If the origin vectors are frequency-based, the transferred vectors are also approximately linearly related to the term frequency. The topic model, attaching the probability distribution of words to the topic probability distribution, though has a more mature mathematical foundation than LSA, it is still a frequency-based method, which may not be able to capture the genuine semantic relations. Being different from the BoW model, word embedding maps the words into dense and low-dimensional vectors through machine learning methods [12]- [14], e.g., multilayers neural networks. This kind of method can capture the relations of words like ''king + woman ≈ queen.'' Nevertheless, the mapped vectors are learned from a large corpus of text data, making this training process very time-consuming and highly dependent on the quality of training corpus. There is also the representation model that combines word embedding model and deep learning with BoW [15], which uses the pre-trained word embeddings to get the fuzzy matching for the BoW model. The matching process is based on the whole basic terms, which is sometimes redundant (we will explain it in section IV). In this paper, the proposed AEBoW model is a combined method, which adopts the simplicity of the BoW model while considering the inner-correlation of words by a network tool.

B. THE NETWORK MODEL FOR TEXT ANALYSIS
In recent years, more and more works studies on the network model in analyzing human language. The network is constructed from a series of nodes connected by their interrelations. The network model has been used for different complex systems because of its simplicity and generality. Without loss of generality, the networks of text share the same properties that unveiled from other complex systems like the small-world structure and scale-free phenomena [16]- [19]. Moreover, the network properties have been proved to be a powerful tool to capture the features of texts. The out degrees, clustering coefficient, and deviation of network growth are related to the text quality [20] while the community structures and weighted edges of the network can be used to detect the key segments [21], [22]. The topological properties of networks will help enhance the performance of several tasks (authors recognization [9], [23], text similarity [24], text summarization [25], text classification [26], and shorts text analysis [27], [28]). In recent years, the image analysis approach based on the network model is proposed to be supplementary on semantic-based applications, as the mesoscopic structure can reveal the visual ''calligraphy'' of a document [29]. The network model, when applied to text analysis, can capture subtle interactions among words, which will provide richer information than the occurrence feature.

III. THE PROPOSED MODEL
In this section, the AEBoW model is presented. It should be noted that the underlying assumption is that the node properties of the complex network can reflect their specific relevancy among other nodes. Of particular influence on the structure of the network, the linguistic units, and their relations to form edges determine the topological configuration, which affects the corresponding relevancy of nodes [18]. We introduce three different sub-structures of text complex networks: the static semantic network, the co-occurrence network, and the hybrid network. Moreover, based on the same text representation model, we compare the performance of these sub-structures in practical use in section IV.
Before going into details about the proposed model, the general steps to deal with specific problems using this model are summarized as follows: STEP 1: Lemmatize all the words in training data, and eliminate the stop-words. Lemmatization makes the words transferred into their original forms, e.g., the nouns are converted to the singular forms, and the verbs are converted to the infinitive forms. The stop-words are words that occur high frequency with little useful semantic content. STEP 2: For each text sample in training data and test data, represent the text as networks (the type of network is a hyperparameter). Then get the value of particular network property at all nodes. Each node in a network is bounded to a word in correspond text sample. STEP 3: Represent each sample as a column vector, in which the value of each element is the network property of the corresponding node that obtained in step 2. The value of the node that not included in a text sample will be replaced by '0' in the corresponding column vector. Note that the full words bag of big training data is considerably large, which causes the column vector high-dimensionality and sparse. One optional solution is to adopt the most used words in the datasets, which called basic terms, to reduce dimensionality. STEP 4: Train the classifier using the vectors of training data obtained from step 3 as inputs.
The above steps are presented as a flowchart in figure 3.
The following part of this section will go into detail about the proposed model.

A. REPRESENTING TEXTS AS NETWORKS
Generally, the network model can be described as a graph with graph theory [16]. An undirected network that we adopt to represent text is generally represented as G = (N , E), where N = {n 1 , n 2 , . . . , n l } denotes the set of nodes (or vertices) and E = {e 1 , e 2 , . . . , e k } denotes the set of links between particular double nodes. We can use an adjacency matrix A = (a ij ) l×l to represent graph G, in which the element a ij is defined as follows: The appropriately represented texts as networks are the inventories of text units with organized relations among them. For example, when the text units are words, the relations among them may be the semantic relations or their positional relations in actual language use [18]. Different organized relations may lead to different network structures in terms of the same text. If the text network is modeled with the words as nodes and the words' semantic relations as edges, this kind of network, called static linguistic networks, contains relative fixed nodes relationships. Another kind of network, named dynamic linguistic network, is modeled with the links being the naturally-occurring of words in texts, reflecting much information on actual language style. This paper introduces the co-occurrence network as the sub-network of dynamic linguistic networks, the static semantic network as the sub-network of the static linguistic network. The co-occurrence network describes the texts as the network in which the nodes (words) are joined when they co-occur within a distance [17]. Moreover, the static semantic network, describing the texts as the inventories of semantic relations, is constructed following the rule that two nodes (words) are connected when they are organized in the same class of a dictionary [18]-in this paper, this relationship is captured through the WordNet [30]. Based on the WordNet, the words as nodes are linked when they are in the same word set with hypernymy, meronymy (including the entailment of verbs), or synonymy relationship. For a combination of the above two kinds of networks, we propose the hybrid network that contains relations both in static semantic network and co-occurrence network. The hybrid network has the information held in both the dynamic network and the static network, making it more helpful in text classification work.
The process of text network construction starts with a text preprocessing. Firstly, lemmatize the words [8] (e.g., the nouns are converted to the singular forms, and the verbs are converted to the infinitive forms). Then, eliminate words with little useful semantic content, which are named as stop-words, because in some text processing like classification, these words are helpless, sometimes misleading [24]. Figure 1 shows three text networks of the following documents. A more detailed process to construct these network models is described in the Supplementary Information. (1) This handsome man has a beautiful wife.
(2) He owns a medicine factory and a dog.
(3) This beautiful woman likes her poodle. (4) The pretty girl is the chief of this company. Figure 1(a) is a co-occurrence network, and figure 1(b) is a static semantic network. In figure 1(b), ''handsome'' and ''beautiful'' are synonyms; ''dog'' has a hypernymy relationship with ''poodle.'' As a mixed form of both types of networks, Figure 1(c) shows a hybrid network with static and dynamic relations, which in some extents, contains complementary information.

B. TO AVOID ISOLATED NODES IN THE STATIC SEMANTIC NETWORK
The above mentioned static network of text is an ideal model for the static property: from the view of the formation process, the edges of the words have already been pre-defined in the corpus (WordNet). However, in some short texts, this kind of static network contains many isolated nodes, e.g., ''factory,'' ''medicine,'' and ''own'' in figure 1. Not only are these isolated nodes not helpful in text analysis, but they cause computing problems in a network model, e.g., the calculating of some properties of the network model requires that the network is connected. To deal with this problem, we make the following assumptions: 1. The static semantic network is not allowed to contain isolated nodes.
2. If the semantic relevancy in the WordNet is not enough to avoid the existence of isolated nodes, the nodes with no edges are randomly connected to be a circle, i.e., the isolated nodes form a sub-network with every node having two neighbors.
3. The isolated nodes are connected to the other nodes following the laws that the nodes are more likely to link to the nodes with more neighbors.
With the above assumptions, we construct the static semantic network used in this paper, as shown in figure 2. Note that assumptions 2 and 3 do not have a complete theory explanation but are only made to avoid isolated nodes without losing the unique information of other nodes. Assumption 2 guarantees that the isolated nodes are homogeneous (the nodes in the sub-network of isolated nodes all contain two neighbors). Assumption 3 retains the disassortativity of text networks [18], which means the weakly linked nodes are more likely to attach to nodes with a large degree. The nodes connected with edges formed in semantic relevancy are the same as figure 1(b) shows, and the other nodes which have no neighbors are connected to the network with the laws described in assumption 2 and 3.

C. AEBoW: A REPRESENTATION OF THE INTER-CORRELATION AMONG WORDS
The AEBoW (Attribute of Network Extended to BoW) model is a simple extension of the BoW model, where the mapped vectors contain the elements with the value being a particular attribute of the network. The attributes of the network, which are also named the properties, are the fundamental quantities used to describe the structure properties (or topology) of a network.
For a document (denote as d with the corresponding network model g d ), the representation by the AEBoW is In (2), f d g is the function that returns the value of an individual node against the property a and network model g d ; w i denotes the ith word in the basic terms. We show the process of the AEBoW model in figure 3. Firstly, the documents are transformed into networks, and the kind of networks (static, dynamic, or hybrid) should be pre-defined. The idea of the BoW model is used to collect the words among all the documents in binary form. Then the extracted properties are located to the corresponding place. We also list the procedure of AEBoW in Algorithm 1. An illustration of AEBoW by a toy example is shown in figure 4. The pseudo samples -d 1 ''A cat is sitting on the table while a dog is running towards it'' and d 2 ''A cat and a dog were both sitting on the table, and the dog ran away later'' -are represented as vectors of AEBoW model. The vector mapped from d 1 is [1, 2, 2, 2, 1, 0, 0] because the Degree of node 'cat,' 'dog,' 'sit,' 'table' and 'run' is 1, 2, 2, 2, 1, respectively, while 'away' and 'later' do not occur in d 1 .
Similarly, the vector of d 2 is projected.

Algorithm 1 AEBoW Framework
Inputs: Text corpus T including v documents, network property a, and the network type g.
Outputs: Text vectors Z of T.

1.
Collect the basic terms B based on the frequency that words occur in T.

2.
for d in T: Construct the network g d of d: return Z The development of complex networks has induced various indexes for the observed properties of real networks, e.g., node degree, betweenness, and clustering [16]. Though there are various property measures, the experimental results show that not all of them are suitable for text classification. The following part of this sub-section introduces network properties that perform well in the experimental results.
Degree: The degree k i of a node i is the number of its neighbor nodes or the edges incident with it in the complex network. The Degree denotes the connectivity of a node, which shows the ability to integrate with other nodes. For an undirected graph, given the adjacency matrix A, the degree k i of node i is defined as where N is the size of matrix A, i.e., the number of nodes in the complex network. In the matrix A, the element a ij is binary value denoting that whether node i and node j is connected through an edge. Eccentricity: The eccentricity ec i of a node i is the maximum distance from e i to other nodes in the complex network. For a network G, the eccentricity ec i is defined as where l ij is the shortest distance from node i to node j.
In some cases, the text network may be disconnected, which means that the network contains more than one part without links between each other. In this paper, for convenience, we assume that the eccentricity ec i of the network that is not connected is the maximum distance from e i to its reachable nodes. PageRank: PageRank is initially designed for ranking web pages based on the directed graph [31]. The idea is that the more web pages that a page is pointed to and the more critical the pointing webs are, the more weighted this pointed page is. The definition is a voting process, which needs recursive computing. The rank of a given node (page) i is defined to be where P i is the set of nodes that point to i, and num(j) is the number of links that point out from j in graph G. For a start, we can arbitrarily assign the ranking to all the nodes of graph G, e.g., r 0 (i) = 1/l, i ∈ N , and successively update the ranks of the nodes by (5). In this paper, we adopt this method to the undirected graph by assuming that each undirected edge (i, j) is equal to two directed edges i → j and j → i. Accessibility: This concept is used to measure the ability of a node to reach the number of nodes after h steps implemented through self-avoiding random walks [38]. It is mathematically defined as where P (h) (i, j) denotes the possibility of node i reach node j after h steps. The accessibility measures the influence of a node in the complex network, i.e., the nodes playing more critical roles usually can access more neighbors.

IV. EXPERIMENTAL RESULTS
In this section, we apply the AEBoW model in text classification. The proposed method is compared with seven VOLUME 8, 2020 text representation methods on four datasets. Furthermore, we also compare AEBoW with the deep learning algorithm at the end of this section.

A. DATASETS DESCRIPTION
There are four datasets used in the experiments. 20Newgroups is a group of news with nearly 20000 documents and 20 news topics. This dataset is kindly preprocessed in [32], [33].
WebKB is collected from webpages by the World Wide Knowledge Base project [32]. The training data and testing data of these documents were predesignated in [33], [34]: 2803 documents for training and 1396 documents for testing.
Reuters 52 is extracted from Reuters 21578 by [32]. This dataset includes 52 categories, deleting some categories of Reuters 21578 that contain only a few documents.
Amazon Reviews contains 10000 labeled reviews with 2 categories. The original dataset can be found in [35].
We list the details of these datasets in table 1. Note that all the datasets are preprocessed by removing the stop words and lemmatizing.

B. EXPERIMENTAL SETUP
The classification work is done by KNN measure [36], and the similarity distance is computed through the cosine similarity [24]. Classification accuracy [37] is used to evaluate performance. Firstly, we briefly describe the KNN measure, cosine similarity, and classification accuracy.

KNN:
The KNN (k-Nearest-Neighbors) is a simple and effective non-parametric classification method. The idea of KNN, as shown in figure 5, is that a node in space is more likely to be the same type as the nodes occur most in its k nearest neighbors, which are captured based on particular similarity distance. Because this method is parameter-free except k, making it a lazy learning method, it is used in many applications.
Cosine similarity: The cosine similarity computes the similarity distance of two vectors in space. For vector , v j2 , . . . , v jl ], the cosine similarity is defined as Classification Accuracy: The classification accuracy (CA) is defined as (8), denoting the accuracy of predicted labels comparing with the labels given in the test data. For (8), T is the document set of test data and |T | is the number of documents in set T . E(p i , g i ) = 1 if p i = g i (p i denotes the predicted label of document i while g i is the given label in test data corresponding to i), and E(p i , Train & Test: The training and testing process all precompute the cosine similarity of documents using (7). Next, a similarity matrix is as input for nearest-neighbor searching. After the training step, the test data are all labeled with the trained model. Then the CA is obtained using (8).
We use the following seven text representation methods to compare the performance of the AEBoW model.
BoW: The BoW model is described in section I. LSA: Latent Semantic Analysis [10] is a method to reduce the dimensionality based on BoW.
LDA: Latent Dirichlet Allocation [39]. Net-Local: A complex network method for text classification [26]. We label this method as Net-local, where ''local'' denotes the local strategy. We only choose the local strategy because the global strategy performs weakly in the experiments, which may be due to that the dimensionality of the representation vector is too low for big datasets.
AE: The average embedding for text representation [15]. AE represents a document as the average of all embeddings of words in the document.
FBoW & FBoWC: FBoW is a fuzzy bag-of-words model [15], which conducts a fuzzy matching through word embeddings. This method is a word embedding based method. FBoWC is an extension of FBoW, which matches the clusters of word embeddings instead.
The word embedding based methods, including AE, FBoW, and FBoWC use the data that are not lemmatized because the learning of word embedding can distinguish all word types. The other methods will use the data after lemmatization.
The implementation of all the methods mentioned above is based on Python 3.7 with windows 10 environment. The configuration of the machine we used is Inter R Core TM i7-8565U CPU @ 1.80GHz; Memory 16.0 GB. LSA, LDA, and BoW are based on sklearn module. The word embeddings of AE are looked up from the pre-trained word embedding dictionary [40], and the words that not in the word embedding dictionary are discarded. AEBoW, FBoW, FBoWC, and Net-local method all run with multi-threads within the permission of the memory.
The dimensionality of representation vectors is set to 3000 for AEBoW, BoW, LSA, LDA, FBoW, and FBoWC. For the Net model, because the number of chosen properties is 8, we set the dimensionality of each property to 3000. So the concatenated vector has a dimensionality of 24000. The vector that projected from AE has dimensionality equal to the word embedding, which is set to 300 in this paper.

C. PERFORMANCE ANALYSIS
Based on the properties of the complex network, including the Degree (D), Eccentricity (E), PageRank (P), and Accessibility (A), we analyze the performance of the AEBoW model. The classification accuracy (CA) is obtained from the dynamic network (co-occurrence network), static network (static semantic network), and hybrid network, respectively. Then the best result for each property is selected. The obtained CA is shown in table 2. With the same environment, we also get the running time of every method. The results are listed in table 3. Note that the time costs are only counted for the vector projecting process, i.e., the counted period is after the data preprocessing and before the classification.
First, we can observe that the BoW model is the fastest method, though the CA is relatively low. The increase of performance by other methods shows that it is needed to scarify the time for accuracy. The other methods all considerably increase the time costs of text representation while increasing the performance. AEBoW gets the highest CA in 20Newsgroups, WebKB, and Reuters 52, while FBOWC gets the highest CA in Amazon Reviews. LSA, LDA, FBOW, and FBOWC are all dimensionality reduction methods. Among these methods, LSA has the lowest time consumption, the accuracy, however, is not competitive. LDA is an iterative approach, the time cost of which is counted within 100 iterations. It can be observed that the CA of LDA can outperform LSA on specific datasets, though the time consumption is always much higher than LSA. The FBOW model and FBOWC model get better accuracy than LSA and LDA. Though FBOWC is better than FBOW, the increase in CA is not acceptable when considering the sharp increase in time cost. The time consumption of FBOWC includes two parts. The first part is the operation of k-means clustering (FBOWC-c in table 3), which rapidly increases following the explosion of the number of vocabulary in the datasets. The second part is the similarity counting between the clusters and the words of a document (mean, max, and min in table 3). This process makes the similarity calculating repeat thousands (the number of word embedding clusters) of times more than FBOW in small batches, which causes the increase in time costs. Note that the time costs of FBOWC are counted in cases that four threads are used (other methods use eight threads) to avoid out of memory.
AE, FBOW, and FBOWC are word embedding based methods, which all use the pre-trained word embedding during word matching. AE is a simple application on word embedding, which represents a document by simply summing up the embeddings of words in the document. The simple operation loses much high-level information. The results show that, in some cases, the CA of AE is worse than BoW.
AEBoW and Net-local are network-based methods. The main difference between AEBoW and Net-local is that AEBoW uses the individual property as features and uses the BoW idea to collect them. In contrast, Net-local uses different properties that reflect the symmetry of the network and concatenates them as features. Net-local can be seen as the particular case of AEBoW when several properties are chosen, and the top-k features are concatenated. However, using too many local properties can not always improve the performance of text classification while reducing the efficiency on the contrary. The results show that the CA of AEBoW is better than Net-local, and the time costs of AEBoW are much smaller than Net-local.
AEBoW, FBOW, and FBOWC are based on the BoW model. The differences exist that AEBoW is still the sparse representation like BoW, while FBOW and FBOWC solve this limitation by fuzzy matching. However, from the results, we see that the dense representation may not always entirely reflect the right discriminative information for text classification. On the other hand, the dense representation only shows its advantage when using it for dimensionality reduction. If the dimensionality is set to equal in experiments, the sparse characteristic can reduce memory consumption by converting the representation into sparse form (In python 3.7, we can use scipy.sparse module). In contrast, the dense representation can not use specific tools to reduce memory needs. We also tried to use lower dimensionality for FBOW and FBOWC, but this will cause performance reduction. We can also observe that FBOW and FBOWC need more time to process data than AEBoW. We can ascribe it to the difference in matching approach. The properties of the network model will be calculated through matching the words only contains a document, which sometimes only need to match the neighbors, e.g., Degree, Accessibility. On the contrary, FBOW needs to calculate the similarity between each word in a document and all basic terms. Because the basic terms always contain words much more than a document, the time costs are much higher than AEBoW.

D. COMPARISON AMONG THREE KINDS OF NETWORKS
Next, we compare the performance of AEBoW based on three kinds of networks. Figure 6 lists the CA of four datasets. First, figure 6 shows that the Eccentricity property can always perform well in all the datasets. It is the only property that produces high CA on three kinds of networks. The other properties all have poor behavior on the static network. We can also observe that the hybrid network can perform a little better on WebKB and Amazon Reviews datasets, which indicates that the combination with relations in both static network and dynamic network can improve the performance of AEBoW in some instances. However, there is no such thing as a free lunch. The hybrid network can not always perform the best.
As table 3 shows, AEBoW on the dynamic network has the best efficiency compared with the hybrid network and network. At the same time, the dynamic network produces competitive results in all four datasets. So the dynamic network is more suitable than the static network and the hybrid network for the AEBoW model. Figure 7 shows the CA of every method in the searching range of k. The results are obtained from the WebKB dataset.

E. THE INFLUENCE OF K OF KNN IN TEXT CLASSIFICATION
As is shown in figure 7, the accuracy reaches the best in different k for each method, which is the reason that we adopt a searching range of k to select the best results. Most methods reach the best performance when k is around 15, while AEBoW is an exception. The Eccentricity gets the highest CA at k = 21.
From figure 7, we can also observe that the results of LSA and BoW are nearly in the same trends, which indicates that LSA is the linearity mapping of BoW with a dimensionality reduction approach. Among the four dimensionality reduction methods (LSA, LDA, FBOW, FBOWC), only FBOW and FBOWC get a satisfactory improvement compared with BoW.
The accuracy of the Eccentricity keeps the best among three kinds of text networks, and the PageRank follows. The results show that some features in AEBoW have a steady performance despite the kind of networks, and the correlation of words based on the network model can reflect more information than that not based on the network model in text classification. The Degree, Accessibility properties are all local structural properties, and the CA of them is relatively low, indicating that the high-level information of words needs a non-local strategy to extract.

F. HOW DOES AEBOW WORK
The experimental results show that the AEBoW model could outperform the BoW model in specific tasks. In this section, we will discuss part of the reason that the properties of the complex network can perform better.
In the complex network, the nodes affect each other through their links between each other. Even two nodes that are not directed linked can get the influence from the other side through a particular path. The addition and deletion of an edge in the complex network will affect a series of nodes. This character makes the complex network have the ability to capture text structure and semantic change in various ways, and therefore suitable for processing text data. To further explain this characteristic without complex math symbols, we use Sen 1 and Sen 2 mentioned in section I as a toy example. Different vector forms of these two sentences are listed in table 4. With the BoW model, one can get the same vector to represent the two sentences because there are the same words in the basic terms. However, two sentences contain the opposite meaning. For the AEBoW model, four properties of the complex network all capture the difference between the two sentences.

G. COMPARISON WITH DEEP LEARNING ALGORITHM
The above experiments are all based on the KNN. Next, we also compare the performance of AEBoW with the word embedding model based on the deep learning algorithm. The deep learning algorithm is deployed on TensorFlow 2.0. In this experiment, the AEBoW and word embedding model are all applied with the deep learning algorithm. Note that we use different deep learning algorithms for two models because the word embedding model has the corresponding algorithm in deep learning [12] that AEBoW does not fit. The structure of the deep learning algorithm for the two models is listed in table 5. For AEBoW, the inputs are the vectors, and three dense layers are followed. Dense layer 1 and dense layer 2 activate the outputs with Rectified Linear Unit (relu). For the word embedding model, the inputs are the documents after labeling and padding (symbolize the words and pad all documents to the same length). The embedding layer will transfer each word to a vector, the dimensionality of which is 300. The outputs of the embedding layer are convoluted by 1D convolution layer, of which the filter size is 5. The convolution layer will produce 300 filters with relu activation, and the max-pooling layer downsamples the outputs. After downsampling and flattening, the dense layer is used for classification. Note that the dense layer (except the output layer) and the convolution layer all use the biases. Dropout is used before the output layer with a rate of 0.5.
We use the Adam optimization algorithm to update the parameters with mini-batch set to 32, and the learning rate is set to 1e-03. The training epochs is set to 5, and 10% of the training data are selected for cross-validation. The results are listed in table 6. The AEBoW model is based on the dynamic network.
The main part of time costs is different for AEBoW and the word embedding model. For AEBoW, projecting vectors is before training deep learning models. On the contrary, the two steps are finished simultaneous for the word embedding model. Thus the training for AEBoW is much faster than  The results further certify that the AEBoW can capture more information from text data.

V. DISCUSSION
From the experimental results, it can be observed that the AEBoW model gets good results with high efficiency in text classification. We believe that the application of AEBoW will not only limited to text classification. There are some possible application scenarios of this model, including text interpretation, text clustering, text summarization, and identification of authorship. Next, we briefly describe each application. Furthermore, we also give some ideas for text interpretation.
Text Interpretation is the process of extracting high-level semantics from the raw text data. The high-level semantics are the structured indexes for the raw text data.
Text Clustering is an unsupervised method of machine learning to cluster the documents with high similarity into categories. The AEBoW outputs can be directly used for clustering.
Text Summarization is to catch the key phrase of a document. The key phrase is always a bunch of words from the original document with complete syntax and content.
Identification of Authorship. Each author has his (her) style in their work. The author's style is reflected in the structure, words, or tone of his (her) work. The high-level information can be captured through the AEBoW model.
The following are some ideas about applying AEBoW on text interpretation.
The text interpretation includes processing the unstructured text and extracting the high-level semantics. For the first step, the computer will interpreter a free text correctly into the surface-level form. The free text is analyzed through its syntactic structure, lexical meaning, and then the subsequent computation will take place. By using AEBoW, the surface-level of raw text data can be preprocessed with a network tool, and AEBoW is applied to obtain extra structural and semantic information. For the second step, a series of indexes and complicated relations are derived from the surface-level information. The network model may further explain the patterns of the surface-level information, and the AEBoW model will produce the inputs of the instances object model, which maps the patterns from the surface-level meaning into high-level instance assertions.
It should be noted that the AEBoW model is only a complement to existing methods of text interpretation because there are limitations for AEBoW in grammar parsing and abduction. The AEBoW will not capture the grammars and proper word meaning. So it is needed to introduce the grammar parser and background knowledge.
The AEBoW model is a powerful network-based tool for text analysis, which are possible to be applied to different application scenarios. The introduction of the network model makes AEBoW capture high-level structural and semantic meaning of the text. The application of AEBoW may also need other state-of-the-art studies for a complement.

VI. CONCLUSION
In this paper, we have proposed the AEBoW model based on the complex network to represent text. The AEBoW is an improvement on the BoW model, taking the correlation of words reflected in the text network into consideration. The structure of a text network varies when the different relations of words that form an edge are considered. We have introduced the dynamic network (co-occurrence network) and static network (static semantic network). We have also proposed the hybrid network that contains relations in both the dynamic network and the static network. We have compared the performance of AEBoW with seven text representation methods in text classification.
Experimental results revealed that the proposed AEBoW could get the best performance with high efficiency. The best feature in AEBoW was the Eccentricity, which is a shortestpath-based property of text network. Further analysis showed that for most methods, the performance reaches the best when k is around 15 with KNN as the classifier. For the Eccentricity of AEBoW, the best accuracy exists at k = 21. The comparison of the three kinds of networks showed that the dynamic network is more suitable for text classification.
We have also investigated the performance of AEBoW in the deep learning algorithm. By comparing it with the word embedding model, we certified the high efficiency and excellent performance of AEBoW.
The application of AEBoW is not limited to text classification. Future investigations will be concentrated on using the AEBoW in more text analysis, e.g., text interpretation, text clustering, text summarization and identification of authorship.
SHUANG GU was born in Yichun, Heilongjiang, in 1994. She is currently pursuing the Ph.D. degree with the Key Laboratory of Rail Traffic Control and Safety in Beijing Jiaotong University. Her main research interest is the complex networks.
LIU YANG was born in Guizhou, in 1990. She received the B.Sc. degree in information and computing science and the master's degree in logistics engineering from Guizhou University. She is currently pursuing the Ph.D. degree with the Key Laboratory of Rail Traffic Control and Safety in Beijing Jiaotong University. Her main research interest is complex network and risk analysis in transportation.