Topological graph convolutional network based on complex network characteristics

Graph convolutional neural networks have received a lot of attention in various tasks dealing with graph data by aggregating information from neighboring nodes and passing node information. Many recent studies have looked at the impact of topological features on node classification tasks by altering aggregation based on degree values or incorporating topological data analysis into graph convolutional neural networks; however, graph data itself has many topological characteristics in complex networks. In most circumstances, the graph’s topological characteristics reveal the nodes’ similarity and facilitate the node classification task. This paper proposes a topological structure feature extraction method based on the concept of complex topological characteristics, which can obtain deeper topological features in the graph structure and use node features to obtain feature information that is more important for classification from both feature spaces. Evidence from experimental studies has established that the topological structure obtained by the method in this paper can be used as input to the GCN,and good results can be achieved on the classification task even without any external information on the nodes. In the graph dataset of connected topologies, the method exhibits a very large increase in accuracy and macro F1-score when compared to the state-of-the-art baseline model after mixing the node features.


I. INTRODUCTION
Machine learning and deep learning have evolved rapidly, especially in the area of processing structured data (such as speech, image and natural language, etc.), and have made great strides. However, the vast majority of things in the real world are better represented by graph than by a sequence or a grid (such as social networks, knowledge graphs, complex file systems, etc.) [1].In order to better handle such unstructured data, graph neural networks were created.
Graph neural networks have made significant progress in recent years, It has been used to perform various tasks on graph data, and existing studies have demonstrated its great potential in applications such as node classification [2] [3], graph classification [4], link prediction [5] [6] and recommendation [7] [8]. This paper is most concerned with graph convolutional neural networks, where feature information is passed through the network topology to nodes by aggregating neighbor nodes and message passing, and the learned node embeddings are used in classification tasks [9]. For graph structure data, the features of each node are important as graph structure data, as are the topological features of the entire network structure, and they are all key features in the node classification task. The graph data itself possesses many topological features. We start from the topological features of the graph data itself. How to make good use of the graph topological features in classification tasks is a problem worthy of continued research. topological characteristics of the network itself, and obtains the node embedding with the node feature graph, the original graph and its combination (T-GCN). Technically speaking, in order to make better use of obtaining topological features of graph structures, this paper obtains graph topological feature graphs by clustering coefficient topological characteristics. Meanwhile, this paper uses the k-nearest neighbor graph formed by the node features as the node feature graph in order to fully utilize the information in the feature space and learn the node features. The node features are propagated over the topological feature graph, the node feature graph, and the original graph, resulting in the extraction of three specific node embeddings in the three feature spaces via specialized convolution. This paper utilizes a convolution with the same parameters to jointly extract the common embeddings in the three feature spaces, and then uses an attention mechanism to learn the weights of the different embeddings and fuse them adaptively to extract the most useful information, taking into account the common relationship between the two feature graphs and the original graph.

II. RELATED WORK
Over the past few years, several researchers have increasingly begun to extend deep learning to graph structured data. One of the results is the graph neural network. The existing graph neural networks are mainly divided into two categories, one is the spectral domain-based graph convolution network [12] [13] , and the other is the spatial-based graph convolution network [3]. Spectral-domain graph convolution is based on spectral graph theory and convolution theorem, converting data from the spatial domain to the spectral domain for processing, which has a more solid theoretical foundation. Bruna et al. [14] proposed a graph convolution using a learnable diagonal matrix instead of a convolution kernel in the spectral domain. Since the eigenvalue decomposition of the Laplace matrix needs to be computed in the process, this is a very time-consuming operation and the parameter complexity of the model is large. For this reason, Michaël Defferrard et al. [15] used Chebyshev polynomials instead of convolution kernels in the spectral domain, simplifying the operation of convolution. Thomas N. Kipf et al. [3] proposed a graph convolutional network (GCN), which can be considered as a further simplification of ChebNet, considering only 1st order Chebyshev polynomials and only one parameter per convolution kernel. Although GCN is a simplified spectral domain approach, it can also be seen as a spatial domain approach. From the node's point of view, GCN aggregates information from its neighbors when updating the node's representation. There are also many recent approaches to design aggregation and delivery of messages based on spatial domains, Yotam Hechtlinger et al. [16] use a convolutional approach that fixes the number of neighbor nodes and orders the fixed nodes. William L. Hamilton et al. [12] used a convolutional approach of sampling and information aggregation; Petar Velickovic et al. [17] used an attention mechanism for differentiated aggregation of neighborhood nodes. The process of aggregation in the above conventional graph convolution approach is a topology learning process in the spectral and spatial domains, but he does not directly take into account the existing topological aspects of the graph data itself.
In recent studies, a number of researchers have taken into account the importance of topological features and applied them to models, and DEMO-Net [18] proposes a degree specific multi-task graph convolution function to learn node representations. AM-GCN [19] is considering node features and topology and adaptive fusion of both features, but AM-GCN is acquiring topological features directly from the original graph.Max Horn et al. [20] used persistent homology to integrate global topological information of graphs and compute topological features of structured datasets, and designed a TOGL layer that can be easily integrated into any type of GNN. Z-GCNETs [21] use zigzag persistence as a topological feature and combine it in the time dimension to propose a new topological summary-a new topological summary, zigzag persistence image. Recent work has considered topological features in several ways, and in fact, several topological features already exist for graph data itself. Can deeper topological features be obtained if the topological features of the graph data itself are used and combined into the graph convolutional network? Based on this, we look at topological characteristics of complex networks and do some work on combining node features.

III. THE PROPOSED MODEL
In this section, the key issue is topological features and node features to be considered in the graph convolutional network, and there are many topological features in the graph network itself, such as clustering coefficient, Betweenness Centrality, etc. The topological feature extraction approach suggested in this research calculates the clustering coefficient of each node to generate the topological feature graph, utilizes the Knearest neighbor graph as the node feature graph, and adaptively fuses all the features. The following first introduces key concepts and definitions.
A graph is defined as G = (A, X), where A ∈ R n×n is the adjacency matrix, n denotes the node, and X ∈ R n×d is the node feature matrix, d is the dimension of the node features. Specifically, A i,j = 1 means there exists an edge between nodes i and j. Conversely, A i,j = 0. We consider each node as belonging to one of the classes C. Figure 1 depicts the framework diagram presented in this paper. The main aim is to extract topological feature information that is more relevant to the topology's node labels and to make optimal use of the node characteristics. We independently design a graph structure topology feature extraction method that mines the original graph structure for features more relevant to the label. With two specific convolution modules, X is able to propagate over the topological feature graph and the node feature graph to learn two specific node embeddings, Z T and Z F , respectively, while in order to retain the features on the original graph structure, feature learning is also performed using specific convolution modules to get the node embedding Z P . To consider common features in the topological feature graph, node feature graph, and original graph, we use a common convolution module with shared weight parameters to jointly learn the common embeddings of nodes Z CT , Z CF and Z CP . Furthermore, considering the relevance of several features, we used an attention mechanism to adaptively fuse these node embedding learning weights to obtain the most relevant information Z for the final classification task.

A. TOPOLOGY GRAPH CONVOLUTION MODULE
First, in order to obtain the topological feature graph in the graph structure, we design a method to obtain the topological feature graph G t = (A p , X) from the original graph G p = (A t , X), where X is the node feature matrix and the adjacency matrix of the topological feature graph obtained by A t . Specifically, we use the clustering coefficient of each node in the graph to obtain topological features to design the topology capture function, and calculate the clustering coefficient of each node for the original graph structure where v j , v k denote the neighboring nodes of node i,e jk denotes the connected edges of v j and v k , and k i denotes the number of edges connected to v i .
Multiply the clustering coefficient of the nodes with connected edges The topological feature adjacency matrix A t is obtained. If the clustering coefficient of the current node is 0, we use the clustering coefficient of the multi-order neighbors of the node with clustering coefficient of 0 as a substitute value. If the clustering coefficient of the current node's multiorder neighbors is 0, we use the Z-score normalization of the current node degree to replace the case where the clustering coefficient is 0.
Then, we take G t = (A t , X) as the input feature space, and the output Z (l) t of the lth layer can be expressed as is the weight matrix of the lth layer in GCN, ReLU is the ReLUactivation function and initializes Z Dt is the diagonal degree matrix of the adjacency matrix ∼ At. In this way we can obtain the information Z T of the topological feature space by convolution.

B. NODE FEATURE GRAPH CONVOLUTION MODULE
This module, we in order to construct the node feature graph in the graph structure and capture the hidden features in the node feature space, we first construct the similarity matrix S ∈ R n×n from n nodes using cosine similarity, and then the node feature graph G f = (A f , X) is constructed by a predetermined number of k-nearest neighbors, where X is then the feature matrix of the nodes and A f is the adjacency matrix in the KNN graph.
Cosine similarity can be measured by the cosine of the angle between two vectors where x j and x k are the feature vectors of node j and node k.
Next, we take G f = (A f , X) as the input feature space, and the output Z (l) f of the lth layer can be expressed as whereÃ f = A f + I f is the node feature graph adjacency matrix, initialized Z (0) f = X, and the other parameters are the same as in Eq. 3. In this way we can obtain the information Z F in the feature space of the node.

C. ORIGINAL GRAPH CONVOLUTION MODULE
This module, keeping the basic features in the original graph, takes as input the original graph G p = (A p , X), where A is the adjacency matrix of the original graph, and obtains the information Z P in the feature space of the original graph by the above convolution form.

D. COMMON GRAPH CONVOLUTION MODULE
In this module, the features of the several feature graphs we obtained above are not completely unrelated. For the node classification task, he relies on topological features as well as node features, and their common features are equally important. Therefore, we adopt a parameter sharing approach to design a common GCN module that allows us to extract features that are common to all three graph structures.
First, we extract the node embedding Z c is the common weight matrix of the lth layer and Z (0) ct = X. For the node feature graph G f = (A f , X), the node embedding Z (l) cf is also extracted by GCN: where W (l) c is the common weight matrix of the lth layer and Z With the same treatment, we extract the node embedding Z (l) cp from the original graph G p = (A p , X) by GCN. By sharing the weights we can obtain the common features in the feature space of the three graph structures. According to different input graphs, we can get three output node embeddings Z CT , Z CF and Z CP . We obtain the final embedding Z C from the three node embeddings.

E. ATTENTION MECHANISM
In this module, we obtain four node embeddings Z T , Z F , Z P and Z C . To make them a better integration, we use an attention mechanism that adaptively learns the corresponding weights. α t , α f , α p , α c ∈ R n×1 denotes the attention weights of n nodes in Z T , Z F , Z P , and Z C , respectively.
Here we focus on node i, Z i CT ∈ R 1×h denotes the node embedding of node i in Z T , we use a nonlinear transformation and then use a shared attention a ∈ R h×1 to get the attention value w i T with the following equation.
where W ∈ R h×h is the weight matrix and b ∈ R h×1 is the bias vector. Similarly, we can calculate w i F , w i P , w i C , who are the weights of Z F , Z P , and Z C , respectively, according to the above formula. Finally we use the sof tmax function to normalize w i T , w i F , w i P , w i C to get the final weights. Similarly, can all be be computed separately. For n nodes, the embedding weights of each feature can then be expressed as Finally, we fuse the four embeddings to obtain the final embedding Z.

F. OBJECTIVE FUNCTION
Our model achieves reasonably decent results without adding any additional loss functions. Our objective function is the cross-entropy error.
According to the final embedding Z obtained by Eq. 11, we are able to obtain the prediction categoryŶ for n nodes.  Training Test  Citeseer  3327 4732  6  3703  120  1000  UAI2010 3067 28311  19  4973  380  1000  ACM  3025 13128  3  1870  60  1000  BlogCatalog 5196 171743  6  8189  120  1000  Flickr  7575 239738  9  12047  180  1000 where softmax(x) is a normalization of all classes. If L is said to be the training set, then for each l ∈ L the true label is Y l and its predicted label isŶ, then the crossentropy loss of all training nodes for node classification is denoted as L t : publicly accessible Flickr photographs. Photographs from the same place, images uploaded to the same gallery, group, or collection, images with similar tags, images taken by friends, and so on create edges. All nodes are grouped into nine categories depending on user interests, with edges expressing the link between them. Baseline: The state-of-the-art two-class method, which includes two word embedding methods and six graph neural network approaches, is compared to our methodology.

VOLUME 4, 2016
This article has been accepted for publication in IEEE Access. This is the author's version which has not been fully edited and  : a semi-supervised graph convolutional network model that learns node representations from their neighbors. • KNN-GCN : As a contrast, instead of using the original graph as the input graph for GCN, we utilize the sparse k-nearest neighbor graph generated from the original graph adjacency matrix and describe it as KNN-GCN. • Cluster-GCN : Also as a comparison, we employ the Cluster-GCN, which uses the topological feature graph generated from the original graph adjacency matrix as the input to the GCN. • GAT [17]: a graph neural network model that aggregates node characteristics via an attention mechanism. • DEMO-Net [18]: a degree-specific graph neural network for node classification. • AM-GCN [19] :Specific and common embeddings are extracted from node features, topologies and their combinations, and the attention mechanism is used to learn the adaptive importance weights of the embeddings for fusion to obtain the final node embeddings. Parameter settings: To further analyze and demonstrate the model's performance, we randomly choose 1000 points as the test set and 20 nodes from each class in the dataset as the training set. All baselines are initialized using the same values provided in their work. We need to train six 2level GCNs with the same input hidden dimension (nhid1) and output hidden dimension (nhid2) at the same time for our model, where nhid1 ∈ {512, 768} and nhid2 ∈ {32, 128, 256}. We utilize the Adam optimizer with a learning rate of 0.0001 ∼ 0.005, a dropout rate of 0.5, weight decay ∈ {5e − 3, 5e − 4}, and k ∈ {2...9} for the knn graph. After ten runs with the identical partitions, we summed the results for all approaches, and we assessed the models using accuracy (ACC) and marco F1-Score (F1).

B. NODE CLASSIFICATION
The results on the node classification task are compared in Table 2, and the following are some of our observations: • In comparison with the baseline method, T-GCN shows better performance in all datasets.As a result of T-GCN, we were able to improve accuracy by 8  adaptive fusion and can extract more valuable information from all three feature spaces.

C. VISUALIZATION
We did the visualization task on the BlogCatalog dataset for a more visual comparison and to further demonstrate the effectiveness of our model, where we output the node embeddings in the test dataset and plotted the coloring of the BlogCatalog node classification by t-SNE [27] dimensionality reduction, as shown in Figure 2. Figure 2 shows that GCN and GAT have poor classification, however AM-GCN can identify the classes more clearly, though the classification is not as excellent as T-GCN in certain areas. T-GCN outperforms the others in terms of visualization, as it learns node embeddings with a more compact structure, the highest intra-class similarity, and the most striking distinctions between various types.

D. THE RELATIONSHIP BETWEEN TOPOLOGY AND CLASSIFICATION
T-GCN performs better in terms of accuracy and F1 scores after the analysis in IV-B, however the improvement varies between datasets. In order to analyze the relationship between the methods we used and the classification task, we analyzed some basic topological characteristics of the experimental dataset as shown in Table 3.
We can see from Table 3 that for BlogCatalog and Flickr, which have large performance gains, they are fully connected graphs and are disassortative, while for Citeseer and ACM, which have smaller gains, they are non-connected graphs and are assortativity. In terms of clustering coefficient, the overall average clustering coefficient isn't the determining factor in the effectiveness of T-GCN, but the number of points with a clustering coefficient of 0 has a greater impact on the method, implying that the stronger the data clustering, the better the effect.There are more connected subgraphs in Citeseer and ACM. To further analyze these two datasets, We drew the scale distribution of the connected subgraphs of both datasets Citeseer and ACM as shown in Figure 3.
Both datasets have the same one feature, both have a larger connected subgraph with many connected subgraphs of fewer nodes. The graph is more fractured as a whole, and the overall aggregation is low. The topological feature information collected using the approach described in this research is insufficient, resulting in a minor performance increase.
After studying numerous datasets, it's clear that the strategy suggested in this research may significantly enhance   performance on some well-connected and disassortative data, but it won't help much with other basic datasets with low aggregation. This also demonstrates that the strategy suggested in this study is better suited to data sets with complicated topologies and high connectedness.

E. TOPOLOGICAL FEATURES STUDY
We utilized the clustering coefficient to get topological structural characteristics in this paper, but is the value of the clustering coefficient we used useful, and can we get decent performance by superimposing the original graph structure? We attempted to replace the topological feature graph with the original graph for the structure of Figure 3 based on this problem, and the experimental findings are displayed in Table  4 by comparing the overlaid original graph with our structure. It is discovered that employing a topological feature graph outperforms merely overlaying the original graph, with the topological features obtained being deeper and the clustering coefficient values being valid. Topological characteristics cannot be represented simply by superimposing the structure of the original graph; instead, a more sophisticated method of acquisition is required.

F. COMPUTATIONAL COMPLEXITY ANALYSIS
In this section, we analyze the computation of the clustering coefficient and the time consumption of the training process to analyze the advantages and disadvantages of the model in terms of computational complexity. The most commonly used method for computing clustering coefficients is intersecting adjacency lists, which has a time complexity upper bound of O V · d 2 max , where V denotes the number of nodes and d max is the vertex with the largest adjacency matrix. The core computational time complexity of our method is not high for the computation of clustering coefficients with approximately linear time complexity, which is the consideration of our attempt to use clustering coefficients to obtain topological features in terms of time complexity. For some global topology metrics, such as Betweenness Centrality, the computational time complexity is too high, thus discarding the use of such metrics.
We tried to calculate the time to build the topological feature graph and carved a graph of the time to build the topological feature graph versus the total time for the training dataset loss stabilization, as shown in Figure 4. The other times in the figure indicate the difference between the time when the training dataset loss is stable and the time when the topological feature graph is constructed. As can be seen from Figure 4, the process of constructing the topological feature graph is a relatively small percentage of time in the whole training and stabilization process. Of course, based on the analysis of the complexity of the clustering coefficients, the time consumption for constructing topological feature graphs is relatively large for larger-sized graphs, which is approximately linearly related to the number of edges in the dataset.

G. PARAMETER STUDY
In this section, we explore the sensitivity of the parameters on the Citeseer and BlogCatalog.
Analysis of the K-nearest neighbor construction node feature graph k: When constructing the node feature graph,   in order to explore the influence of the top k neighborhoods in KNN. In Figure 5, we investigate the performance of T-GCN at different K-number ranges from 2 to 9. For both Citeseer and BlogCatalog, the accuracy and F1 scores increase and then decrease, reaching higher values when k = 9. This could be because bigger values give the node feature graph more structural information.
Analyzing the effect of learning rate on the model: Different hyperparameter settings can have an impact on the performance of the model. In this paper, we explore the effect of not using learning rate on model performance without changing other hyperparameters. On Citeseer and BlogCatalog, the learning rates were set to 0.0001, 0.0003, 0.0005, 0.001, 0.005, 0.007, and 0.02, respectively. The experimental results are shown in Figure 6. As we can see in Figure 6, for the Citeseer dataset, the best model performance is achieved for a learning rate of 0.0005, when the values of both the accuracy and the F1 score are higher than the other learning rates. For the BlogCatalog dataset, a learning rate of 0.005 is the best model performance for a learning rate of 0.005 when the values of both the accuracy and the F1 score are higher than the other learning rates.

V. CONCLUSION
In this paper, we rethink the use of topological structural characteristics in graph convolutional neural networks. We examine how to use the topological structure information in the network data itself and create an extraction method for obtaining topological features using clustering coefficients based on this fundamental idea. And combined with the features of the graph nodes themselves are applied to each other in the graph convolution process, making it possible to make fuller use of topological and node features in the graph convolution. Our method showed outstanding performance in VOLUME 4, 2016 some datasets with good network topology connection after employing the topological features produced by this method. For non-connected datasets, our approach has corresponding limitations, with only a small improvement in some metrics. To address this issue, we will continue to consider combining multiple topological characteristics metrics to obtain topological features later. Extensive trials have proven the approach's superiority, with the method outperforming stateof-the-art models on real datasets.