Predicting the Robustness of Real-World Complex Networks

Many real-world natural and social systems can be modeled as complex networks. As random failures and malicious attacks can seriously destroy the structure of complex networks, it is critical to ensure their robustness and maintain the functions. Generally, connectivity and controllability robustness are adopted to evaluate the performance of networked systems against external attacks and/or failures. A sequence of values is measured to dynamically indicate the network robustness with iterative node- or edge-removal. Calculating the robustness of large-scale real-world networks is usually time consuming, whereas deep-learning provides an efficient methodology to estimate network robustness performance. In this paper, a multi-convolutional neural network (CNN) method called Real-RP is designed to predict the robustness of real-world complex networks. Unknown real-world networks are first classified into known network categories, and their robustness performance is then predicted based on the knowledge of the specific network category trained using a substantial number of synthetic networks. Experimental results show that: 1) real-world complex networks can be classified by a CNN with high precision, and 2) the robustness performance of real-world networks can be predicted with lower average errors compared to existing methods.

The structure of network data make it hard to extracting 106 effective node features for CNN. That is, the structure of a 107 network is generally very irregular and the data has very high 108 dimensions, so that it does not maintain the main properties, 109 such as translation invariance. To fit these characters, some 110 research try to learn feature represents for nodes. Specifi-111 cally, lower-dimensional representations are generated from 112 compacting higher-dimensional raw graph data, and then 113 downstream classification or regression tasks are performed 114 by processing the lower-dimensional representation data. 115 Patchy-SAN [26] is a typical algorithm. 116 Although the CNN-based prediction approach performs 117 well on synthetic network robustness prediction, it requires a 118 full-knowledge of complex network. The experiments indi-119 cate that the robustness of networks is dependent on the 120 topologies and structures, especially the real-world networks 121 are so different in these two aspects above. 122 The current single CNN based predictors have got a certain 123 degree of high accuracy, however, experiments on real-world 124 networks show that the level of error values do not meet 125 expectations. In this paper, we propose a new method, named 126 Real-RP, to predict the robustness of real-world networks. 127 We firstly classify the topology of real-world networks into 128 nine refined network topology categories, so that the down-129 stream task can obtain more topology knowledge of the net-130 work. Then a corresponding regressor is applied in predicting 131 the robustness of the input network. Compared with human-132 synthesized networks, real-world networks have less prior 133 knowledge and indistinct features. Thus, our contributions 134 also include proposing efficient classification criterion and 135 accurately estimating the unknown networks. With funda-136 mental of the right estimation, the robustness of real-world 137 networks are accurately predicted. 138 The following content is organized as follows. Section II 139 reviews the measures of network connectivity and control-140 lability robustness against destructive node-removal attacks. 141 Section III introduces the details of the method Real-RP. 142 Section IV presents experimental results with analysis and 143 comparison. Section V summarizes the investigation. 145 This section describes the mathematical model of connec-146 tivity robustness and controllability robustness. Connectivity 147 robustness records the changes of connectivity under a series 148 of malicious attacks, while controllability robustness records 149 the changes of controllability. Malicious attacks include 150 nodes or edges removal. In this paper, we set node removal as 151 default. In the process of nodes removal, we select each node 152 randomly to be delete till there is only one node left.

154
Given a time-invariant networked system, which is described 155 asẋ = Ax + Bu. A and B are constant matrix with tun-156 able dimensions. x is system state vector and u is the outer 157 control input. controllable. 169 We applied the fraction of driver nodes, n D as the measure 170 of controllability for a networked system, which is defined by 171 the following equation: In which, Adj is the adjacency matrix of the network, rank(A) 189 represents calculating the rank of the matrix A. 190 Then, we calculate the controllability robustness by the 191 equation: where i is the attacking times, N is the size of the networked While comparing the prediction precision, we need a scalar 207 error value. Here, the values at the corresponding positions 208 of the two curves are subtracted, and we can get the error 209 curve. Then we average the curve to get the error between the 210 attacking simulation and the different methods. Processing 211 details are as the equation: where ξ represent the error value, N 1 means the size of 214 sampled robustness curve, R pred and R sim represents the pre-215 dicted robustness and the attacking simulation robustness, 216 respectively.

218
In this section, we briefly reviews the predictors PCR and 219 Patchy-SAN for network robustness prediction.

220
In PCR [24], a VGG-based CNN structure with several 221 convonlutional layers and full connected layers is designed to 222 process network adjacency matrices, which can be converted 223 to gray-scale images and then used directly as the input to 224 CNN. And the last fully-connected layer map the features 225 into a vector as output of the whole model. PCR updates 226 parameters in the model through supervised learning, which 227 has high generalization in domain tasks. However, PCR just 228 uses very little knowledge of the input network. If more net-229 work knowledge can be made better use of, such as network 230 typologies, logical boundaries and so on, the prediction can 231 be more precise. On this basis, we consider the effect of prior 232 knowledge and add it to robustness prediction of real-world 233 networks.

234
For complex network data have distinguished continu-235 ous and discrete attributes that are different from general 236 image data. Patchy-SAN [26] here is proposed to learn a low 237 dimensional feature representations from high dimensional 238 network. That is to transform macroscopical structural 239 and other node level information into low dimensional 240 data, which can be easily proposed by downstream CNN 241 models.

244
First, the nodes with greater priority are utilized for the 245 following processing steps instead of all nodes. All nodes 246 are arranged in descending order according to an importance 247 labeling procedure. Then a part of all nodes are selected to 248 be processed. Each of selected nodes is assembled to a local 249 sub-network with the same size. For each node, a similar 250 breadth-first traversal method is used to find the nodes with 251 higher labeling values as the neighborhood in sub-network. 252 Later, the normalization procedure is applied on every sub-253 network, after which every node in sub-network is assigned 254 a label, so that all nodes can be ranked according to their 255 significance. Last but not least, the normalized sub-networks 256 with same data structure can be efficiently processed by 257 downstream tasks.
for feature extractions are not sufficient in real-world network 260 predictions. Real-RP is designed for predicting robustness of 261 real-world networks, which is hard for the two algorithms 262 above. Given some prior knowledge, real-world networks can 263 be firstly classified into several categories, and then the con-264 nectivity robustness can be predicted using the corresponding 265 predictor. Given several network types that are commonly 266 used, the users are able to prepare a specific CNN predictor 267 for each type, other than treat these networks in some general 268 forms. For a classify task, the larger the distances between 269 categories are, the easier it is to classify them. However,  And for a predictor, the last layer is a fully connected layer, 283 which reshapes the feature information to a desired vector.

284
More details are shown in the Table.1.

285
There are seven convolution blocks and each block con-  the 7 convolution blocks, two fully-connected layers are 296 reconfigured to process the output. In the supervised training, 297 the mean-squared error between the predicted robustness and 298 the simulation value is employed as the loss function as the 299 following equation.
where N represents the size of network,R (i) is the predicted 302 robustness while R(i) is the simulated robustness. || · || repre-303 sents the Euclidean norm. 305 We designed a series of experiments to verify the improve-306 ment in predicting the connectivity robustness and con-307 trollability robustness of real-world networks. The model 308 are trained on synthetic networks and then tested on 309 both synthetic networks and real-world networks, respec-310 tively. We applied nine types of synthetic network mod-311 els, including the Erdos-Renyi (ER)random-graph [29], 312 Barabasi, Albert-Laszlo (BA) scale-free [30], [31], generic 313 scale-free (SF) [32], onion-like generic scale-free(OS) [14], 314 Newman-Watts small-world(SW-NW) [ Specifically, the nine types of synthetic networks are gen-318 erated by corresponding mathematical models. ER network 319 is generated based on ER random model, whose basic idea 320 to connect each pair of N nodes with probability P until the 321 network has enough edges as settings. BA and SF network 322 are both follow power law distribution. Differently, a BA 323 network is generated according to the preferential attachment 324 scheme [30], while a SF network is generated according to 325 a series of predefined weights for every node. w i = (1 + 326 µ) −σ ,where i = 1, 2, . . . , N , σ ∈ [0, 1) and µ N . 327 Here, N represents the number of network to be generated. 328 Then, pick two nodes i, j with a probability proportional to 329 their weights as the source and target nodes for each edge. 330 VOLUME 10, 2022 FIGURE 1. CNN structures in Real-RP.Input is adjacency matrix of a network with N nodes, after which are total 8 feature maps(FM) processed by convolutional layers. After FMs is a special layer for downstream task. For classifier, the last layer is a softmax layer, otherwise, it is a fully connected layer. Here N c = 9, represents the number of network classes.

344
In the next step, SW-NW adds other edges without removing 345 existing edges [33], while SW-WS will remove existing edges 346 and then add new edges as well as rewiring operations [5].

347
In QS network, there is a main backbone chain with multiple 348 snap-back edges [34]. Here, there is only one layer, that is 349 r = 1. The out-degree of the ith node d out (i), i = 1, 2, . . . , N 350 is calculated by Eqa.8: RT and RH are both generated according to Henneberg 353 increasing mechanism. What is different is that RT is made of 354 randomly generated triangles, while RH is made of hexagons. 355 In the following experiments, while training the classifier, 356 we generated 1000 samples for every type of network, which 357 is total 9000 samples. For each sample, network adjacency 358 matrix is data X and their topology categories is the label Y. 359 In this paper, N c is set as 9, indicating that the classifica-360 tion result is a 9-dimensional vector, which represents the 361 probability of being classified into each category. A Soft-362 Max operator is applied as the last layer of the classifier, 363 whose output are continuous values from 0 to 1. Here we set a 364 threshold θ = 0.8. Networks with a maximum classification 365  probability greater than θ are considered classifiable and is 366 processed by the regressor of the corresponding category.

367
Otherwise, the network will be processed by a common 368 regressor.

369
For each regressor, we prepare 6000 samples as training  For each network sample with N nodes, we attack N −1 times 377 so that there is only one isolated node eventually. Finally, 378 we obtain the robustness curve as labels. System. And for all CNN models, the programs are deployed 386 on computing platform with a GPU Tesla V100-16G.

388
We predict both connectivity robustness and controllability 389 robustness of networks including directed and undirected 390 networks by PCR [24], Patchy-SAN [26] and Real-RP. Fig.2 391  and Fig.3 show the connectivity robustness and controllabil-392 ity robustness prediction results on directed networks. Fig.4 393 and Fig.5 show the prediction performance on undirected 394 networks.

395
In all figures, nine types of networks mentioned above 396 are tested, δ represents the proportion of attacked nodes to 397 total N nodes.R lcc (δ) and R ctrl (δ) means predicted connectiv-398 ity robustness and controllability robustness while removing 399 nodes with ratio δ, respectively.     excellent performance in QS, SW-NW, SW-WS, RT. In the 427 other five types, the error values of Real-RP are still lower, 428 but Kruskal-Wallis H-test shows that there is no significant 429 difference between Real-RP and PCR. Besides, not only in 430 predicting controllability robustness, but in predicting con-431 nectivity robustness, Real-RP performs significantly better 432 than Patchy-SAN in all types.

433
The same experiments are carried out on undirected net-434 works, Fig.4 and Fig.5 show the prediction performance on 435 undirected networks of those nine types. And Table.3presents 436 94382 VOLUME 10, 2022     After having trained models on synthetics, we tested predict-445 ing accuracy on real-world networks. Here, we randomly 446 TABLE 7. Features of four real-world network. Here to varify the class result, each feature is compared with the average value of synthetic graph features. '+' means the feature value is bigger than the average value of synthetic graphs, '−' means the feature value is smaller than that.
collected nine real-world networks from Reddit-multi 447 datasets [22]. These real-world networks are of about 448 500 nodes, which have completely different typologies. 449 Table.4 shows the details of selected 9 real-world networks, 450 including network scales and network average degrees. 451 Notice that all real-world networks have different scales. 452 In the process, we randomly remove or add several nodes 453 to adjust these networks' scale to N = 500 exactly.   Adding more network categories bring higher precision 467 predicting the robustness of real world networks. In this 468 paper, the work aims to improve the capability to classify 469 real-world networks. On the basis of this, we verify the clas-470 sify results by comparing the feature models, which include 471 four features, heterogeneity(ho), average clustering coeffi-472 cient (avg_cc), average betweeness(avg_bet) and average 473 path length(avg_path).     four network is as Fig.9. We divide the feature values into two 498 parts, bigger than the average value of all synthetic networks 499 and less than that. If a feature both of real-world network 500 and synthetic networks of corresponding category performs 501 bigger than the average value, then we consider the classifica-502 tion result to be positive. As the following Fig.7, REAL1 and 503 REAL2 both have much higher ho than other categories, 504 which is consistent with existing knowledge [30], [31], [32].