A Graph Deep Learning-Based Fault Detection and Positioning Method for Internet Communication Networks

In modern smart cities, the scale of urban backbone networks used to provide Internet communication environment are constantly increasing. When faults occur, it usually takes lots of efforts to detect and locate the faults. As a result, automatic detection and positioning of faults with use of intelligent algorithms have been a practical demand in this area. In this paper, the complicated whole urban backbone network is viewed as a graph-level object, in which massive nodes and edges are involved. On this basis, a two-stage graph deep learning-based fault detection and positioning method for Internet communication networks. For the first stage, the graph neural network is employed to extract graph-level features from Internet communication networks. This is expected to obtain proper feature representation for core characteristics of backbone networks. For the second stage, the fault detection and positioning algorithm is formulated to output final results. At last, experiments are conducted to assess performance of the proposal. The results show that the proposed method has good performance in abnormal node detection as well as high accuracy in fault positioning. The accuracy of the two-stage graph deep learning algorithm proposed in this chapter is much higher than that of KNN algorithm, reaching 96.5% in the end, slightly lower than that of pure graph deep learning algorithm, while the accuracy of IRBFG algorithm can only reach 92%.


I. INTRODUCTION
In modern society, the scale of Internet access is gradually increasing.According to the latest report released by Deep Learning IC, nearly half have become netizens [1].The continuous popularization of the Internet also makes the computer networks more and more important for our life.The computer networks have began to penetrate into all aspects of our life, such as shopping, medical treatment, working, etc [2].It can be said that without the Internet now, we can not move a single step [3].However, with the popularization and continuous growth of the scale of the Internet, the scale of the network has grown quite large.And the network topologies have also become extremely complex [4].This makes network failure more likely to happen, and the loss of network failure is increasingly large [5].In many cases, The associate editor coordinating the review of this manuscript and approving it for publication was Catherine Fang.
even the failure of a small network node will often affect the normal working of many other nodes [6].
Clustering has always been an active research direction and has become one of the important means to solve the problem of fault diagnosis [7].Commonly used clustering algorithms include k-means algorithm and expectation maximization (EM) algorithm, etc [8], [9].But these algorithms are built on convex sample space and are not suitable for solving arbitrary shape clustering problem [10].Graph deep learning algorithm is a new clustering algorithm proposed in recent years [11].Since graph deep learning has no strict requirements on the shape of data distribution [12].It can avoid the singularity caused by high-dimensional feature vectors [13].Some scholars have applied the graph deep learning method to solve the problem of fault identification and diagnosis [14].For example, literature [15] used minimum-maximum tangent criterion to construct the objective function of fault data graph segmentation [16].And it used k-means to improve the process of finding the optimal segmentation point, in order to realize quick fault state identification [17].
In reference [18], aiming at high data dimension and nonlinearity, adaptive local linear embedding was adopted to carry out nonlinear dimensionality reduction of original data.Then, recursive call canonical cut was applied to cluster low-dimensional spatial data.However, the clustering effect is affected because there is no prior analysis of the collected data when solving the fault diagnosis problem.In addition, the number of clusters needs to be determined artificially in the process of clustering.This limits algorithm performance and application range.However, in today's heterogeneous wireless network environment, the diagnosis method based on manpower analysis will occupy a lot of manpower and material resources.The exploration and research of more efficient and intelligent fault diagnosis technologies in heterogeneous networks are bound to become one of the important contents in this area.
The proposed fault detection tries to introduce graph deep learning into the field of network fault detection to solve the problem of fault pattern recognition which the existing methods did not.By combining the semi-supervised idea with the automatic graph deep learning algorithm, the paired constraints are introduced heuristically.Besides, the paired constraint information is propagated based on the data similarity propagation, and the original similarity matrix is adjusted globally.On this basis, the automatic graph deep learning is carried out to improve the clustering performance.Experimental results on real-world data show that the performance of automatic graph deep learning algorithm in this paper is better than that of correlation comparison algorithm.At the same time, the algorithm is used to detect network faults, which shows that the algorithm is feasible for network fault recognition.
This paper is organized as follows.The first part of the article is the introduction.The second part is related work.The third part is Figure Architecture of communication network fault detection and location method based on deep learning.The fourth part is experimental verification.The fifth part is the conclusion.

II. RELATED WORK
When the system becomes more and more complex, it is difficult to establish the mathematical analytic model of the detected object [19].In this case, signal-based processing is very useful [20].The signal-based processing method uses relevant equipment to collect, identify and process signals in the form of numerical calculation [21].Literature [22] focuses on intelligent detection of cell disruption.Faced with the complexity and vulnerability of heterogeneous networks, the author observed and analyzed the changes of key performance indicators in the time domain.And they adopted K-nearest neighbor classification algorithm to realize the automatic detection of network anomalies.However, the algorithm does not take into account the correlation between base stations and the correlation of parameters in the time domain.
In literature [23], a comprehensive Quality of Experience index, voice quality and success rate of wireless access, is selected to judge whether there is an anomaly in the network.The algorithm combining self-organizing mapping and K-means is used to classify the abnormal data points.Literature [24] proposes a modeling technique for heterogeneous network fault recognition.Firstly, the causes of each fault in heterogeneous networks are enumerated, and then a fault tree model is built according to these causes.Finally, fault is sorted based on probabilistic reasoning to locate the fault.However, with the increase of the number of nodes in heterogeneous networks, the relationship between network components becomes more complex, so it will be a challenge to obtain accurate prior probability distribution, which will directly affect the accuracy of probabilistic reasoning.
Fault detection technology based on analytical model [25], [26], which is the earliest fault detection technology, has been studied most comprehensively and systematically.Based on the measurement of input value x and output value y, the analytical model method constructs the corresponding mathematical model to generate the characteristics representation for network systems.By comparing the features calculated by the mathematical model with the measured features, the fault detection is realized.When the system becomes more and more complex, it is difficult to establish a mathematical analysis model of the detected object [27].In this case, signalbased processing is very useful.The signal-based processing method uses relevant equipment to collect, identify and process signals in the form of numerical calculation.The base stations in hierarchical heterogeneous network have diversity and the backhaul modes adopted by various types of base stations have heterogeneity.
Since the public network is also responsible for the traffic transmission of broadband network users, data congestion is easy to occur on the public network, which will reduce the QoS of users [28].Therefore, when designing and planning the backhaul network, we should not only consider the cost, but also consider QoS.The backhaul network is generally composed of wireless network and wired network [29].Other low-power base stations may communicate wirelessly to the core network in the form of clusters of base stations.
Literature [30] proposes a cell service interruption monitoring method based on K-nearest neighbor machine learning classification algorithm.In order to improve network performance and user experience in the future wireless network, heterogeneous characteristics will become more and more obvious.Under the complex network structure, the monitoring of cell service interruption faces more difficulties.Because of the dense distribution of base stations, if a lowpower base station has a service interruption, the user can easily switch to the neighboring base station or the Acer station.In order to detect the cell service interruption more effectively, this paper uses the method of machine learning to find the parameter variation characteristics of base station faults.
Jiang et al. [31], [32] gave a set of key parameters in the process of network cell service interruption detection.They pointed out that too many parameters affected the mapping relationship between parameters and faults.Therefore, a dimension reduction scheme based on kernel method was proposed to generate low-dimensional features.Literature [33] pointed out that there were relatively few researches on service interruption diagnosis of nanopicotone base stations.In this paper, a two-stage diagnostic framework was proposed to detect suspicious symptoms in the network based on the collaborative filtering of association information [34].While in the detection stage, the sequential cooperative detection algorithm [35] based on data completes the fault diagnosis.The results show that the detection accuracy is good, but the detection delay is large.
Literature [36] used Markov chain to complete fault diagnosis.It took the difference ratio between the change trend of the real monitoring value and the predicted trend of the model as the discrimination standard.In literature [37], dual Markov chain model was used to improve the monitoring accuracy of the original model.However, both of them judge the final diagnosis results based on threshold values, and the determination of reasonable threshold values needs to be studied.
To sum up, heterogeneous network environment is the important solutions to solve the future network capacity surge.Heterogeneous network structure is complex and there are many constraints between network components.Although industry and academic institutions have done some research on heterogeneous network fault diagnosis, network fault diagnosis to ensure the quality of network operation is still a future research hotspot.

III. METHODOLOGY A. GRAPH DEEP LEARNING FOR FEATURE EMBEDDING OF INTERNET COMMUNICATION NETWORKS
The deep learning model is actually neural network with more hidden layers, usually with more than eight or nine hidden layers.Figure 1 is the architecture diagram of the simplest communication network fault detection and location method.
where x is the input from the ith neuron, w is the connection weight of the ith neuron, δ is the current neuron bias, and t is the activation function.Error back propagation algorithm, also known as back propagation algorithm, is the most commonly used algorithm to update model parameters in neural networks.Given the training data set D = x, y, the following derivation takes the connection weight W of the last hidden layer to the output layer as an example.For the training sample x, y, q are the number of neurons of the last hidden layer.Suppose the neural network: Then the mean square error of the network at (x, y) is: According to the different coverage and transmission power, low-power base stations can be divided into Femto base stations, Pico base stations and relay nodes.With the installation and use of various types of low-power base stations, intensive heterogeneous network architecture is gradually formed, as shown in Figure 2.

B. FAULT DETECTION AND POSITIONING
The workflow for fault detection and positioning is illustrated as Figure 3.This part is responsible for describing its mathematical process.
The output of the JTH neuron in the 1 layer of x l j is calculated as: The error between the output value and the actual value of the neural network is called the cost function.For classification problems, the expression of the cost function is as follows: Fault diagnosis is an important part of virtual network fault management.By monitoring the parameters of network links and nodes, link/node symptoms caused by faulty components are discovered in time and the information is reported to the management system.The main purpose of fault diagnosis is to obtain the fault probability hypothesis accurately, quickly utilize the network parameters, and provide the warning of the fault reallocation of virtual network resources to ensure the QoS of network services.
With the deepening of network virtualization, network fault diagnosis becomes more and more important.This chapter aims to use historical fault data parameters of physical networks and virtual networks to predict the probability of network faults in a future period.According to the structure of the neuron, the activation vector of the input gate is: where w is the weight of transfer between neuron connections, x(t) is the input vector of the network, x(t − 1) is the Authorized licensed use limited to the terms of the applicable license agreement with IEEE.Restrictions apply.7) The output of the memory storage structure is as follows: At time t, a historical data set of length n can be obtained, where x(t) is the input parameter matrix and n represents the length of the associated data for fault prediction.
The network fault diagnosis model used is F, then the network parameter eigenmatrix x(t) for time t: In the environment of multi-memory storage structure, the input calculated value is: The computer network fault detection model based on graph deep learning can work by adjusting network fault parameters constantly in the detection process.When there is a problem in a computer network system, the normal operation of the computer system will be affected.The application of the computer network fault detection method based on graph deep learning is mainly aimed at automatically detecting the entire computer system before the failure of the computer network system, looking for the place where the fault may occur.The constraint information of the point pair in the data set can be represented by the constraint information matrix K of nXn.Its elements are: Similarly, Largrange multiplier method can be converted to its dual problem: min Through the above steps, the forward propagation and loss function of graph deep learning are calculated.In order to obtain the optimal network parameters, the network parameters are updated based on the back propagation algorithm.The specific implementation steps of the network fault diagnosis method based on LSTM are as follows: Step 1: Based on the analysis of network parameters, collect network data and complete the pre-processing, and divide the training data set and the test data set.
Step 2: Complete One-hot coding for network faults.
Step 3: Establish the graph deep learning fault diagnosis model, and randomly initialize the parameters of each layer of the network.
Step 4: Calculate the output label through forward feedback, calculate the loss function of LSTM neural network, reverse transfer the error value based on the back propagation algorithm, and update the weight of the network.Repeat the step with a predefined number of iterations.
Step 5: Save the network parameters and test the deep learning performance of the trained graph with the samples in the test data set.
In addition, in order to prevent overfitting, early stop is set in the training of this chapter.The stopping condition is that the loss function of the model of the network training process on the test data set is no longer reduced, then the network training process is stopped and the previous round of training parameter values are output.

IV. CASE VERIFICATION
The experimental data used here is simulated.In order to better compare graph deep learning with error baseline learning, it is assumed that the experimental data of 80,000 is normal and 30% is abnormal, that is, 30% of the error baseline is satisfied.The training set is kddcup.data_10_percent_corrected, and the test set is corrected.Due to the excessive amount of data in the training set and test set, we only randomly select 100,000 records from kddcup.data_10_percent_corrected as the training set, of which 80000 are normal and 20000 are abnormal.Similarly, 30000 records are randomly selected in corrected as the training set, of which 24,000 are normal and 6000 are abnormal.When selecting data, try to have data for each exception.Only normal or abnormal is concerned here, regardless of the specific exception type at the time of the exception.
The comparison results of standard deep learning and graph deep learning are shown in Table 1.The Dimension reduction results of principal component analysis are shown as Figure 4.The columnar rectangle represents the contribution rate of a single principal component, and the broken line represents the cumulative contribution rate.Obviously, the lower the sequence number, the smaller the single contribution of the principal component, and the smaller the impact on the cumulative contribution rate.Generally, the cumulative contribution rate of not less than 96% is considered to be a more satisfactory result.When the number of principal components is 7, the cumulative contribution rate reaches 98.45%.
When no paired constraint is provided (constraint logarithm is 0), the CRI of all algorithms is small.With the increase of the number of constraint pairs, the CRI of the graph deep learning algorithm on the New-thyroid data   set decreased, but the CRI of the other algorithms on all the data sets increased to some extent.This indicates that the clustering performance of most algorithms is gradually improved with the increase of the number of constraint pairs.The improved graph algorithm has better clustering performance than the classical graph algorithm.In terms of data sets, the performance of the proposed algorithm is better than other algorithms, especially in the aspect of (2) class data sets.This also shows that the algorithm in this paper has little sensitivity to the data set, and the algorithm has high stability and robustness, as shown in Figure 5.
The experimental results of the computer network fault detection method based on graph deep learning and the traditional network fault detection method after the experiment are shown in Figure 6.The horizontal coordinate is the time of fault detection, and the vertical coordinate is the accuracy of fault detection.As can be seen from Figure 6, the computer network fault detection method based on graph deep learning can quickly and accurately detect computer network faults, while the accuracy of traditional computer network fault detection method increases with the increase of detection time, so it takes a lot of detection time to ensure the accuracy.Experimental results show that the proposed detection method is effective.
Figure 7 shows the comparison of fault diagnosis accuracy of five different algorithms after averaging multiple iterations.Fault diagnosis accuracy refers to the percentage of correctly diagnosed information in the total number of diagnoses.As can be seen from the figure, when the number of iterations is small, the accuracy of KNN is higher than other algorithms.This is because the weight and bias parameters in the graph deep learning algorithm are far more than those in the kNN algorithm.Multiple iterations are needed to update these initial parameters, so the accuracy will be low at the beginning.As the number of iterations increases, the accuracy of the two-stage graph deep learning algorithm proposed in this chapter is much higher than that of KNN algorithm, reaching 96.5% in the end, slightly lower than that of pure graph deep learning algorithm, while the accuracy of IRBFG algorithm can only reach 92%.The accuracy of traditional SBM algorithm is very low because it assumes that prior probability follows beta distribution, and it is also very difficult to obtain accurate prior conditional probability in complex heterogeneous networks.8.When the historical data is recorded as 200, the collected network state data cannot accurately reflect the changing trend of network state, and it can be found that the accuracy of training is very unstable.With the increase of the number of historical data records, the accuracy of the training set gradually increased, the stability improved, and the accuracy of the test set improved significantly.By the time the historical numbers were recorded at 400 and 500, performance had barely improved.

V. CONCLUSION
A two-stage fault diagnosis algorithm based on graph deep learning is proposed.The algorithm firstly reduces the dimension of network characteristic parameters by combining mutual information, and selects the optimal combination of features.Then, by calculating the distribution similarity of time series data, the suspicious fault cells in the network are preliminarily screened.Finally, the fault diagnosis model based on graph deep learning is used to locate the fault causes of the suspicious fault cells.On the basis of analyzing the fault causes of heterogeneous wireless networks, aiming at the complex structure of heterogeneous wireless networks, 102268 VOLUME 11, 2023 Authorized licensed use limited to the terms of the applicable license agreement with IEEE.Restrictions apply.many factors affecting network faults, and limited wireless resources, the minimum redundancy and maximum correlation method is used to select network parameters that have a large impact on nodes, and the change characteristics of the timing distribution of network parameters are monitored to reduce the consumption.Simulation results show that the method has low detection delay and high fault diagnosis accuracy.When designing the network fault detection system, this paper normalized all the data to the range [0,1], without considering the specific meaning of the data, and ignored the fact that different characteristics may have different degrees of influence on the final results.If different attributes are normalized to different ranges or different weights are added to different attributes according to the specific meaning of the data, the detection effect may be better.She has been with the Nanjing Brain Hospital, since 2016, where she is currently an Assistant Engineer.From 2016 to 2022, she participated in the hospital grade three evaluation work, hospital integrated operation management system online test and testing, implementation and use of hospital integrated operation management system, procurement data analysis, and computer parts maintenance.She has hosted the ''Supplibao Software Vendor Side User Operation Training.''Her research interest includes the hospital electronic information engineering network construction.

FIGURE 1 .
FIGURE 1. Technical architecture diagram of communication network fault model of deep learning.

FIGURE 2 .
FIGURE 2. Network architecture of communication network fault detection and location.

FIGURE 3 .
FIGURE 3. Flowchart of deep learning communication network fault detection and location model.

FIGURE 4 .
FIGURE 4. Dimension reduction results of principal component analysis.

FIGURE 5 .
FIGURE 5. Comparison of CRI indexes of deep learning algorithm results in various graphs.

FIGURE 6 .
FIGURE 6.The verification results of deep learning algorithm are shown in this paper.

FIGURE 7 .
FIGURE 7. Comparison of fault diagnosis accuracy of different algorithms.

FIGURE 8 .
FIGURE 8. Model training accuracy under different time steps.
XIAOYU WANG was born in Nanjing, Jiangsu, China, in 1984.He received the double bachelor's degree from the Nanjing University of Political Science, in June 2009.He is currently an on-thejob Postgraduate in the Provincial Party School of Jiangsu province.He has been with the Nanjing Brain Hospital, since 2009, where he is also an Intermediate Engineer.From December 2019 to December 2021, he was responsible for the research and procurement, project implementation, and acceptance of the projects, such as HIS System and Platform Docking and Transformation, HIS+EMR Provincial Platform View, Sunshine Monitoring Platform Docking of High-Value Medical Consumables, and Brain Function Information Management Platform.He has published a paper in a magazine sponsored by Jiangsu Radio, Film and Television Group: ''Application of HIS System in Hospital Informatization Construction.''His research interest is the construction of hospital electronic information engineering network.ZIXUAN FU graduated from the Binjiang College, Nanjing University of Information Science and Technology, in June 2010.
XIAOFEI LI was born in Dalian, Liaoning, China, in 1981.She received the dual bachelor's degrees in applied chemistry and in software engineering from the Dalian University of Technology, in July 2004 and July 2005, respectively.Since August 2005, she has been with Dalian COSCO Shipping Engineering Company Ltd.Since March 2011, she has been with the Nanjing Brain Hospital.She was an assistant engineer, an intermediate engineer, and a senior engineer, in 2006, 2012, and 2021, respectively.She was the Chief of the Equipment Section, in 2014; the Information Section, in 2018; and the Equipment Section again, in 2020.During her tenure, she has presided over the revision of the medical equipment management system of the Nanjing Brain Hospital for many times, organized the re-evaluation of departmentlevel hospitals, and established and improved the management system of the Medical Equipment Department.She has introduced MRI, CT, DSA, surgical navigation systems, surgical microscope, color Doppler ultrasound diagnostic instrument, and other important instruments and equipment for the hospital.In 2015, she has presided over and participated in the on-line implementation of the hospital HRP system medical materials management and fixed asset management modules.In 2016, the management of highvalue medical consumables in the hospital was promoted to a new level and the scanning billing of high-value medical consumables and closed-loop management in the hospital were realized.During her tenure, she was the Head of the Information Department, has presided over the formulation of the hospital information construction planning scheme and the construction of a number of important hospital information systems, such as the hospital information integration platform, electronic medical record systems, image cloud platform, and OA systems.She has published two articles in Chinese core journals.Ms. Li is a member of the Clinical Medical Engineering Branch, Nanjing Medical Association; the Nanjing Medical Consumables Management Quality Control Center; the Information Branch, Nanjing Medical Association; and the Hospital Logistics Supply Chain (SPD) Big Data Application Branch, China Health Information and Health Medical Big Data Society.

TABLE 1 .
Comparison of standard deep learning and graph deep learning experiments.

TABLE 2 .
Experimental results of running time.The experimental results of deep learning in the figure are shown inTable 2. The first row of the table uses only the 31-dimensional data after numeration and uses standard SVM for training.The second exercise uses 31 dimensional data after numerical and normalization, and uses standard SVM for training.The third exercise uses 23 dimensional data after numerical, normalized and dimensionality reduction, and uses standard SVM for training.The fourth exercise also uses the 23 dimensional data after processing, and uses the graph deep learning algorithm proposed in this paper for training.Historical data number refers to the number of samples entered into the model at a time.With the full sample, range factor 4, and the number of historical data records, the change of accuracy in the training process is shown in Figure