A Deep Learning-Based Fault Diagnosis of Leader-Following Systems

This paper develops a multisensor data fusion-based deep learning algorithm to locate and classify faults in a leader-following multiagent system. First, sequences of one-dimensional data collected from multiple sensors of followers are fused into a two-dimensional image. Then, the image is employed to train a convolution neural network with a batch normalisation layer. The trained network can locate and classify three typical fault types: the actuator limitation fault, the sensor failure and the communication failure. Moreover, faults can exist in both leaders and followers, and the faults in leaders can be identified through data from followers, indicating that the developed deep learning fault diagnosis is distributed. The effectiveness of the deep learning-based fault diagnosis algorithm is demonstrated via Quanser Servo 2 rotating inverted pendulums with a leader-follower protocol. From the experimental results, the fault classification accuracy can reach 98.9%.

training of a great amount of data. Therefore, it is important to reduce data transmission and design an appropriate data preprocessing method in deep learning-based fault diagnosis of multiagent systems. To reduce data transmission, distributed fault diagnosis has been a recent research interest, e.g., [31]- [35]. Identifying faults in one agent through training data using its neighbour is a crucial objective of distributed fault diagnosis. However, multi-sensor data fusion can reduce training complexity and training time. Image fusion can transform a number of one-dimensional sensor data into a two-dimensional image. The image keeps rich information on the data; hence, the accuracy of fault diagnosis can be guaranteed. Training the image rather than the onedimensional data repetitively can enhance the real-time performance of the system. [27], [28] and [30] employed image fusion for CNN data preprocessing. However, the systems under consideration in the abovementioned work are single systems. According to the authors' knowledge, there is no existing work about distributed fault diagnosis of multiagent systems through image fusion-based CNNs.
Multiagent systems with leader-following protocols are widely used in engineering fields, such as unmanned helicopters [36], [37], multi-inverted pendulums [38], battery packs [39] and liquid-level systems [40]. In the abovementioned work, faults were not considered in [36], [38], [39]. Only communication fault was discussed in [37]. Distributed fault detection was designed in [40]; however, fault classification was not a concern. The objective of this paper is to use an image fusion-based deep CNN to design distributed fault classification for a leader-following multiagent system considering actuator faults, sensor faults and communication faults. Specifically, the one-dimensional historical data collected from followers are converted into two-dimensional image information by a multisensor data fusion technique. Then, a deep CNN is established to train the image data. Through the training, the type and location of faults can be identified. The faults under investigation include communication interruption faults, sensor failure faults and actuator limitation faults. Furthermore, three types of faults can exist in both the leader and the followers. Finally, a real experiment on a leader-following inverted pendulum demonstrates the effectiveness of the developed algorithms. The contribution of this article can be summarised as follows: 1. Image fusionbased deep learning for fault diagnosis of leader-following systems is a novel topic. Compared with a one-dimensional CNN, the relevance of different sensors can be preserved through a two-dimensional image fusion-based CNN. Therefore, the accuracy of fault diagnosis can be enhanced via the developed technique. Actuator faults, sensor faults, and communication faults in both leaders and followers are investigated in this article, which is a challenge for fault diagnosis. Many existing results only consider actuator and sensor faults or only communication faults. Classifying the three types of faults together is another contribution of this paper. 3. The developed fault diagnosis algorithms depend on data from followers rather than from both leaders and followers. Prior to existing work on distributed fault diagnosis, such as in [24]- [27], communication among leaders and followers was not required. Therefore, the designed fault diagnosis is fully distributed.
The organization of the paper is as follows: The system and faults under consideration are introduced in Section II. Multisensor fusion-based deep learning for the distributed fault diagnosis algorithm is developed in Section III. Section IV demonstrates real experimental work to apply the developed technique to Quanser Servo 2 rotating inverted pendulums. Section V presents the conclusion and future work.

II. THE SYSTEM DESCRIPTION AND THE PROBLEM STATEMENT A. MULTI-AGENT SYSTEMS WITH UNKNOWN COMMUNICATION
The system under consideration is a networked homogeneous multiagent system under the leader-follower control protocol, and its topology is shown in FIGURE 1, where 0 represents the leader and 1, 2, . . . N represents the followers. The communication among N followers can be denoted by an undi- . . d N } is the degree matrix, and L = D − A represents the Laplacian matrix of G. The above definitions are generally used in leader-following multiagent systems, such as in [41], [42]. We assume that there is a pre-existing control protocol such that the overall system is stable with desired performance (e.g., consensus, robustness, etc.) in the absence of faults, and the design of the control protocol has been widely investigated in other works, such as [41]- [44]. The aim of this paper is to design a fault diagnosis technique to detect and classify faults accurately. The design of the control protocol is not a concern of this article.
In the design of the fault diagnosis method, the physical model and communication are unknown. In other words, the physical model and communication are internal to agents but are not available in fault classifiers. Remark 1: This method adopts distributed fault diagnosis, and each follower has a fault diagnosis device (NN i , i = 1, 2, . . . N ), as shown in the FIGURE 1. In the authors previous work [49], a residual-triggered mechanism is proposed, where each agent has a predictor and a fault classifier. The designed predictor and the actual output form a residual. When the residual value generated in a follower exceeds the threshold, the fault diagnosis device on the corresponding follower can be triggered to classify the fault. When the residual of the leader exceeds the threshold, the residual network in follower1 is enabled by default. Therefore, the fault diagnosis task with N agents can be divided into a leader-follower distributed fault diagnosis mode with N − 1 pairs. Taking the formation mode of artificial potential field as an example, the strong repulsion field will be generated between followers only at a very short distance. Under normal circumstances, the influence is small, and only the leader will produce a large gravitational field on the follower. Therefore, the objective of this paper focus on enhancing fault classifier in the residualtriggered fault diagnosis developed in [49], namely, when a leader-follower pair is triggered.

B. THE TYPE OF FAULTS
Between agents can also be abnormal in networked systems, e.g., the interruption of interactions or false data injection between agents. Therefore, the identification and classification of communication faults is crucial for leader-following systems.
In this paper, the faults under consideration include sensor failure, actuator limitation faults and communication interruption. Specifically, the sensor fault is moulded as follows: where f s (t) represents the sensor fault, t 0 denotes the time that a sensor fault occurs, and y(t) is the system output without a sensor fault. This type of fault indicates that the sensor output remains zero, which can occur when the signal is open circuited [45].
An actuator limitation fault f a (t) is modelled as follows: where u (t) is the real input of the actuator and A is a constant that the actuator output cannot exceed due to abnormal hardware or software conditions. This type of fault occurs due to actuator stuck failures [46] and is recognised as one of the most important factors that reduce system performance.
A communication fault f c (t) under investigation is an interrupt fault, and can be calculated as follows: where t 1 is the time that the interruption occurs and q(t) is the normal communication signal. This type of fault can occur due to a cyber-attack, which is also known as denialof-service [47]. Remark 2: This paper adopts a distributed fault diagnosis, which is used to identify the fault between leader and follower. The main contribution of this paper is to investigate the fault diagnosis of leader-follower tracking. Moreover, the communication between followers is weak or no communication. As a result, the fault of communication between followers is not considered in this paper. ''Follower communication failure'' means that the communication in which the follower sends its status to the leader is interrupted.
The actual signal is formulated as follows: Y (t) represent the real signal of system, y (t) represent the theoretical signal of system, f σ (t) represent three types of fault. Take f s (t) as an example. When there is no fault, f s (t) = 0. When a fault occurs, f s (t) = −y(t). As a result, the really output of sensor equal 0. Another challenge in fault diagnosis of multiagent systems is that the amount of data to be trained is great. Moreover, the relation of data from different sensors should be included in the data collection. In the rest of the paper, a multisensor fusion-based deep learning fault classification technique is proposed to identify and locate sensor faults, actuator faults and communication faults accurately. To further reduce data transmission, the designed fault classifier can recognise the fault of the leader through output of the follower without requiring their communication information and precise mathematical model. A 2D image is generated through multisensor fusion. Then, a deep-learning algorithm is designed to classify and locate faults. The diagnosis process can be described in FIGURE 2. The diagram of the overall design can be found in FIGURE 3.

III. DEEP LEARNING-BASED FAULT CLASSIFICATION
This section introduces the image-based sensor fusion method and the main types of faults. The image-based sensor fusion method takes advantage of the BP neural network in feature extraction and fault recognition accuracy.

A. IMAGE FUSION BASED ON MULTI-SENSOR-SIGNALS
Data preprocessing is the premise for deep learning-based fault diagnosis. Traditional data preprocessing methods for deep learning are known as normalisation and regularisation. Many modern systems, especially multiagent systems, have a number of sensors. Normalising and regularising data from multiple sensors considerably increases computational complexity and the relation among sensors can be missed. Therefore, this paper is motivated to develop an image fusion technique to transform data from a number of one-dimensional sensors into two-dimensional grey images. The fusion of multisensor data into a grey image for deep learning can improve the accuracy, richness, and efficiency of fault diagnosis. FIGURE 4 illustrates the process of converting one-dimensional data into two-dimensional image grey values.
Each sensor obtains M sampling points through sliding window technology(in the section IV). Then M sampling points are calculated and converted into pixel values through formula (5). Finally, N sensor signals are fused into an image of size N × M . When the noise signal is incorporated into the original signal, the influence to fault diagnosis depends on the size of the noise. If the value of noise is small, it will not affect the converted data into pixel values, because there is a round operation in the process of converting data into pixel values. If the noise is big, then it will result in a wrong pixel, which reduces the performance of fault diagnosis. Normally, the noises incorporated into data is in the former case. The grey value of each pixel is obtained via Equation (5) as follows: In Equation (4), i = 1, 2, . . . stands for time iT , where T is the sampling time and j represents the j-th sensor.

signal(i, j)
is the data collected from the j-th sensor at time iT . Meanwhile, i and j correspond to the number of rows and the number of columns in the 2-D image, respectively; Value(i, j) stands for the grey value of the image at i, j, and Round(·) is a rounding function, which ensures that every data point can be converted into the corresponding grey value, and there is no case in which the decimal cannot generate the corresponding grey value.

B. CONVOLUTIONAL NEURAL NETWORK-BASED DEEP LEARNING
Through the aforementioned image fusion, a 2-D image containing rich information of the system can be generated. Then, deep learning-based fault diagnosis can be designed by the fused image. The existing deep learning method for image recognition is to reshape the image to be nearly square before recognition. However, the adjacent features can be lost by reshaping. The sensor returns continuous sequence data, and local visual field features are important for feature extraction of convolution. The reshaping process reduces the accuracy of fault diagnosis. Therefore, this paper proposes a deep learning fault diagnosis algorithm without reshaping the image to increase the accuracy of fault diagnosis.
To train a large-scale dataset, a CNN is used for deep learning. The CNN used in this paper is composed of an input layer, a convolution layer, a pooling layer, a batch normalisation layer, a full connection layer and an output layer. The main function of the input layer is to preprocess the input data and convert the data of different sizes into the data of the same format and size. The processed image is called the feature map. Two methods of data preprocess are used, i.e., padding and normalisation.

1) PADDING
Padding is used to fill in one or more circles of 0 value around the image, which can ensure that the height or depth of data will not decrease when convolution calculation is carried out. More importantly, the image edge information can be preserved. If there is no padding, the edge features of the next layer disappear, which does not affect the subsequent convolution. Generally, in convolutional neural networks, P is used to represent the number of paddings. The schematic diagram of padding is shown in FIGURE 5.

2) NORMALISATION
Normalisation reduces the value of each pixel to the range from 0 to 1 after preprocessing the data, which is conducive to data processing. The common normalisation methods are function transformation, Z score standardisation and max min standardisation.
The convolution layer is composed of a number of convolution kernels, which are also known as filters. The pixels of the image after padding and normalisation are input to the convolution layer. A schematic diagram of the convolution calculation process is presented in FIGURE 6 and Equation (6).
In FIGURE 6, the element in the upper left corner of filter map a 0,0 is calculated as follows: where x i,j represents the grey value of the pixel at row i and column j; W m,n represents the weight at row m and column n and W b represents the bias. f (·) is a leaky rectified linear unit (leaky ReLU) activation function, which is shown in Equation (7) and FIGURE 7. In Equation (7), c is a fixed constant, and c (1,+∞). The main function of leaky ReLU is to add nonlinearities to the network model so that the network has a better feature recognition ability. It has a broad acceptance domain for solving the gradient disappearance problem in the training process. Equation (7) is calculated as follows: Remark 3: There are two common activation functions, which are known as sigmoid and ReLU activation functions. When the value of the convolution layer is large, the gradient of the network disappears by using the sigmoid function, which will cause the slow convergence speed of the model. ReLU function remains linear at (0, +∞). Therefore, the ReLU activation function solves the problem with a slow learning convergence of the neural network caused by gradient disappearance. However, the output of neurons can be negative, and all negative activation becomes zero by the ReLU function. Compared with traditional ReLU function, which induces the disappearance of negative neurons, leaky ReLU, a variant of ReLU, can preserve negative neurons, hence enhancing the training performance.
To obtain a more obvious feature map, a batch normalisation layer [48] is added after the convolution layer. The function of a batch normalisation layer is similar to the normalisation of data in the input layer. It normalises the features learned by the previous convolution layer, which is also VOLUME 10, 2022 conducive to feature learning by the next convolution layer. However, the output of the convolution layer is different from the input of the network. The output of the convolution layer is the extracted feature, while the input of the network is the non-extracted data. Therefore, the output of the convolution layer cannot be normalised directly via the method of input. For the aforementioned reason, batch normalisation, which enables the network to learn and recover the characteristic distribution, is employed in this paper.
Pooling is implemented for downsampling. Through pooling, useless information from the calculated feature map can be removed and the amount of data can be reduced to enhance the operation speed. Among widely used pooling methods, maximum pooling can effectively preserve and enhance the feature information in downsampling. Therefore, maximum pooling is used in the applied convolutional networks. An example of the method is shown in FIGURE 8. The function of the fully connected layer is to send the feature map obtained after a series of operations, such as convolution, into the fully connected layer through flattening. Different feature planes can be mapped to the same feature plane after full connection linear transformation, which is conducive to combining features trained by different filters for analysis.
The output layer applies the softmax activation function to solve multiclassification problems. This layer maps the outputs of multiple neurons into the (0,1) interval, which can be regarded as the probability that the current output belongs to a category. Then, multiclassification can be carried out. The calculation method is given in (8), where a i is the value of the ith input node of the output layer and A i is the output of the CNN. This is calculated as follows: All of the nodes in the previous layer are connected to the next neuron node by a weight. The overall structure of the convolutional neural network is shown in FIGURE 9.
Remark 4: For the convolution neural network, we use the classical neural network LeNet-5 to modify the structural parameters. i.e. the first layer is composed of six 3 × 3 filters where the stride is 1; the second layer consists sixteen 3 × 3 filters, where the stride is 1. In order to increase performance of fault diagnosis, an additional layer with forty 3 × 3 filters is added to LeNet-5 in the developed technique.

A. SYSTEM AND FAULT DESCRIPTION
In this section, the developed deep learning fault diagnosis is applied to Quanser Servo 2 rotating inverted pendulums to demonstrate the effectiveness. The agents are connected through a leader-follower control protocol. The hardwarein-loop diagram is shown in FIGURE 10. We conduct fault diagnosis by collecting the data of follower. The parameter of sensor shown in

B. DATA SAMPLE
The Quanser Servo 2 inverted pendulum is connected with MATLAB Simulink through USB, and a hardware-in-loop experiment can be implemented. The real-time data of the inverted pendulum is collected from Simulink with a sampling time of 0.005 s. The storage of software is limited, which indicates that the maximum length of data is certain. However, sufficient data should be obtained to train the neural network such that the generalisability is satisfied. To generate sufficient data from limited software storage, a sliding window data sampling approach is employed to amplify the data. Specifically, a sampling window of length f moves on the collected data of length L, and the moving step is S (see In each fault scenario, real inverted pendulum data is collected over 29 s, and the sampling time is 0.005 s. The number of sampling points is 5800, i.e., L = 5800. The number of sampling points in the sampling window is 800, i.e., f = 800. The sampling step S = 1. Therefore, 5000 groups of data (n = 5000) are generated, which increases the amount of data by 114.28 times. However, the calculation ability of the software is limited. Therefore, the sampling time of the sliding window after data expansion is extended to 0.1 s, which indicates that the sample length is 40 [49].

C. EXPERIMENTAL RESULTS AND ANALYSIS
Different types of faults are injected to the system, and data of 4 sensors in the follower is recorded. In each fault scenario, data after expansion is used to train the neural network, and the features of the fault can be exacted. The data is divided into three parts: 80% (training set) of the data is used to train the network and update the model weight parameters and 8% (validation set) of the data is used to evaluate the model performance. The remaining 12% (test set) is used to test the accuracy of final neural network. In this experiment, the fault classification performances of the BP neural network, the traditional CNN network with image fusion, and the batch normalisation embedded CNN network with image fusion are compared.
Remark 5: In this paper, 2-D is considered in the data conversion. High dimensions are not considered because there are only four sensors for data acquisition in the experimental equipment, and it is difficult to increase the dimension of data. Moreover, if the dimension of data is increased, the sensors that obtain the data should have the same characteristics. For example, if three channel RGB image (n × n ×3) is used, fault characteristics in the three channels should be similar with each other. Therefore, the increased dimension is not applicable in this experiment.

1) BP NEURAL NETWORK FAULT DIAGNOSIS
BP neural network is used to train the original image. The image of size 4 * 40 * 1 is flatted into 160 inputs. The physical meaning of data form sensor 1 (θ), sensor 2 (α), sensor 3 (θ) and sensor 4 (α) can be found in TABLE 1. Sensor 1,3 and 2,4 are independent; Sensor 1 is the integral of sensor 3; and sensor 2 is the integral of sensor 4. The structure of BP shows in TABLE 3.
After 571 seconds' training, we can draw the training figures. The accuracy curve and loss function curve of BP network-based fault classification are shown in FIGURE 12 and FIGURE 13, respectively. To illustrate the performance of BP neural network-based fault diagnosis, the fault misclassification matrix is drawn in FIGURE 14.
The coordinate values from 1 to 7 in FIGURE 14 are the label numbers in TABLE 2, representing different fault types   of the leader-following system. The number in the shadow is the number of actual sample tags that match the predicted sample tags. The BP neural network can roughly identify the seven types of faults, and the accuracy of types 3,4 and 6 is good. Nevertheless, the accuracy of other faults is low, especially for type 2 faults. The overall identification accuracy is only 85%.

2) CNN-BASED FAULT DIAGNOSIS
It can be seen that the fully connected network cannot have a satisfactory diagnosis performance, especially when the characteristics of faults are closed. Now, it is time to implement the multisensor data fusion-based CNN algorithms to achieve fault classification.
First, the one-dimensional time domain signal data of different fault types are converted into two-dimensional images, and the different fault images are shown in Figure 15. The converted image is divided into three parts: 80%, 8% and 12%. 80% (training set) of the data is used to train the network and update the model weight parameters and 8%(validation set) of the data is used to evaluate the model performance. The remaining 12%(test set) is used to test the accuracy of the final neural network. Then, we use the convolutional neural network, which has outstanding performance in image recognition, as the training network, and the specific network structure parameters are shown in Table 4.
After 280 seconds' training, the final network accuracy reaches 91.5%. The fault accuracy curve is shown in FIGURE 16, and the loss function curve is shown in FIGURE 17. By connecting the trained neural network to the inverted pendulum model for the experiment, the confusion matrix can be generated in FIGURE 18.   Comparing FIGURE 18 with FIGURE 14, we find that the convolution neural network has better fault diagnosis performance than the fully connected neural network. There is a certain misjudgment for individual faults, but the error rate is in an acceptable range. Compared with the fully connected neural network, the fault classification accuracy of the image fusion-based CNN has improved substantially. The accuracy is acceptable even for some faults with similar features.

3) CNN FAULT DIAGNOSIS WITH BATCH NORMALISATION
However, there are still fault data that cannot be identified accurately by utilising traditional convolutional neural networks. To improve the accuracy of recognition, we are motivated to further improve the feature extractability of the convolution network. Therefore, we add a layer of batch normalisation after each convolution layer, which is introduced in part III. The hierarchy of the network is documented in  It can be seen from FIGURE 21 that, after 318 seconds' training, the recognition accuracy of the network has reached 99%, which is a considerable improvement compared with the BP and the traditional CNN. Based on the developed image fusion-based CNN with batch normalisation, the  fault classification accuracy of the seven types of faults is satisfactory.
Remark 6: The time complexity can be reflected in Figures  12-20, and the space complexity can be reflected in TABLEs 3-5 as follows: (11) where M represents the edge length of the feature map and K represents the side length of each filter. C represents the number of outputs of the current layer, and l represents the number of outputs of the current layer. D represents the depth of network. We can conclude the following from the experimental parameters: time bp > time bn−cnn > time cnn ; space bp > space bn−cnn > space cnn . The time complexity determines the training or prediction time of the model, and the space complexity determines the number of model parameters. The time complexity can be reflected in Figures 12-17, and the space complexity can be reflected in TABLE 2 and TABLE 3. The computation complexity of the BN CNN is medium among the three methods, and the accuracy is the best.
It should be noted that at present, only two rotating inverted pendulums exist in our laboratory. Therefore, only one leader and one follower are used in the experiment. However, we are recently employing heterogeneous manipulators (robotic arms with 4-6 freedom) and multiunmanned aerial vehicles, which will be used in future research. Moreover, improvement of the network structure to enhance fault diagnosis accuracy is also under further investigation.

V. CONCLUSION
This research presents a distributed deep learning fault diagnosis technique for leader-following systems, where the CNNs with batch normalisation and image fusion methods are integrated to enhance training efficiency and accuracy. Three typical faults, including communication faults, sensor faults and actuator faults, are investigated in this article. For the leader-following system with communication coupling, the fault diagnosis of the leader can be achieved by observing the follower. Real experimental work has illustrated that the recognition rate of the developed fault diagnosis method is important prior to the BP neuro network and the traditional CNN. In future research, multiagent systems with other topologies and the improvements made to the CNN to enhance fault diagnosis accuracy will be considered.