A Pipeline Leak Detection and Localization Approach Based on Ensemble TL1DCNN

There is an increasing need for timely pipeline leak detection and localization methods, pipeline leak could lead to not only the loss of the goods but also considerable environmental and economic problems. With the rapid development of hardware and software, the pipeline leak detection and localization algorithms have been widely researched and applied in many Fields. However, traditional methods are usually limited by extracting features manually, which is inefficient and time-consuming. Convolutional neuron network is an effective method to extract features automatically. In this paper, a novel ensemble transfer learning one-dimension convolutional neural network (TL1DCNN) for the pipeline leak detection and localization is proposed. The TL1DCNN plays the role of base learner. The results of a set of obtained base learners are integrated to achieve the task of pipeline leak detection and localization. Firstly, one-dimension convolutional neural network (1DCNN) models with different parameters are pretrained with source domain data. A small learning rate is set to retrain the above 1DCNN models for target task with target domain data in order to obtain TL1DCNN base learners. Then, the four ensemble strategies with different number base learners whose ensemble weights are optimized by particle swarm optimization algorithm are obtained by minimizing the sum of similarity. The dataset simulated by pipeline network model is used to evaluate the effectiveness of the proposed approach. The indicators such as classification accuracy, precision, recall, F_score and confusion matrix are used to compare the proposed approach with traditional methods and other deep learning methods. The experimental results show that the performance of the proposed approach is superior to other compared methods.


FIGURE 1. Classification of leak detection methods.
fiber [5]- [10]. However, when the hardware-based methods are applied on a large scale, expensive equipment needs to be installed, which results in high economic costs. Since some measured variables (such as pressure, volume, mass) are usually needed to be measured for the software-based methods, pipeline hydraulic dynamics or machine learning methods are adopted to obtain the models to detect and locate pipeline leaks [11]- [15]. Although many pipeline leak detection and location tasks can be solved by traditional software-based methods, there are still many limitations. For example, the extraction of suitable features usually relies on strong professional knowledge and experience, which can be considered as an important factor for the best performance. Moreover, only shallow information, which is difficult to be used in complex systems, can be obtained by the manually extracted features. Therefore, the disadvantages of traditional software-based methods limit the classification and regression performance.
More recently, deep learning brings breakthroughs in computer vision, natural language processing and fault diagnosis [16]. The convolutional neural network (CNN) has been largely followed with interest when the AlexNet model [17] won the ImageNet competition champion in 2012. Deep features from raw data without prior knowledge can be automatically extracted with the CNN which generally consists of convolutional layers, pooling layers and fully connected (FC) layers. Some researchers converted one-dimension signal to two-dimension image and then selected two-dimension convolutional neural network (2DCNN) for fault diagnosis in the early stage [18]- [21]. Compared with the direct use of one-dimension convolutional neural network (1DCNN) model, some useful information may be lost and the computational cost would be increased after the conversion of one-dimensional signals to two-dimensional signals. As a result, some scholars have recently been paying more attention to 1DCNN. Ince et al. [22] proposed a timely and accurate motor condition monitoring and incipient fault detection system based on 1DCNN trained from raw data directly. Wang et al. [23] employed a multi-attention 1DCNN supported by attention mechanism for wheelset bearing fault diagnosis. Kang et al. [24] used 1DCNN to extract the features from original time series signals collected by accelerometers and fused a multilayer perceptron and a support vector machine, the latter obtained the results from the extracted features for leak detection. However, there are some shortcomings in the application of CNN. For example, training data and testing data should be in the same feature space and the distribution need to be assumed to be the same. Moreover, a large labeled data is required. These assumptions are often difficult to be satisfied in the real world. Recently, transfer learning [25], [26] has been utilized to avoid these problems and has been applied to fault diagnosis in many fields [27]- [30].
Nevertheless, transfer learning sometimes is difficult to achieve outstanding performance for the tasks with a small amount of data and many feature dimensions yet. The risk of poor generalization performance and falling into a local minimum by combining several learners can be reduced with ensemble learning [31]. As a result, the ensemble deep learning algorithms were applied in fault diagnosis by some literatures [32], [33]. In addition, many satisfactory results can be obtained by the ensemble methods which combine several base learners with different training data, parameters or structures [24], [34].
To achieve the task of accurately detecting and locating the pipeline leak from small data, an approach that 1DCNN played a role of base learner and the results of several base learners were combined and optimized to achieve the task of pipeline leak detection and localization is proposed in the paper. Additionally, the proposed approach can eliminate the limitation of obtaining sufficient fault sample data and improve the accuracy of leak detection.
The contributions of this paper are summarized as follows: (1) Transfer learning one-dimension convolutional neural network (TL1DCNN) for the task of pipeline leak detection and location is proposed. Then, several classification probability vectors of different TL1DCNN are combined with different optimized weights to improve the performance of pipeline leak detection and location.
(2) In order to verify the effectiveness, this proposed approach is tested in a simplified water distribution network to detect leakage and location, where the TL1DCNNs are trained using different combination strategies of hyperparameters. The result shows that the performance of the TL1DCNN model is greatly affected by hyperparameters. In addition, some traditional leakage detection approaches and other deep learning approaches are selected to compare with the proposed approach. (3) The proposed method can be used to deal with the signals collected by sensors to identify a leak and to locate the leak section roughly. (4) The double descent curve of deep learning model is verified with the leakage data.
The remainder of this paper is organized as follows: In the next section, the TL1DCNN for feature extraction and ensemble strategy is described in detail. Section 3 analyzes and discusses the results of the research, and compares our method with other methods. Finally, the conclusion is given in Section 4.

A. TRANSFER LEARNING
In many fields such as fault detection, it is hard to obtain enough sample data to build models. Transfer learning is employed to overcome this problem. Transfer learning aims to solve one problem and applies the project to another different but related problem. The transfer learning techniques can be divided as inductive transfer learning, transductive transfer learning and unsupervised transfer learning. Pan and Yang [26] gave a definition of transfer learning: Given a source domain D S and learning task T S , a target domain D T and learning task T T , inductive transfer learning aims to help improve the learning of the target predictive function f T (·) in D T using the knowledge in D S and T S , where T S = T T . For example, if there is a task of identifying pictures of cats, but only few pictures of cats could be collected and many dog pictures could be obtained. In this case, when the transfer learning technique is employed, the dog pictures can be set as source data and the cat pictures can be set as target data. Moreover, identifying the dogs and cats are set as source task and target task respectively. Many machine learning algorithms are used for transfer learning. In particular, deep learning models such as CNN and RNN are widely applied to the transfer learning system by the scholars in recent years [35], [36].
In the work, the transfer learning is used to learn some useful parameters, which can be transferred to the target task with small target domain data under other working conditions from source domain data. The left part in Figure 2 shows the transfer learning process. The training process is expresses by: Step 1: The best performance pretrained CNN model is obtained by source domain data and optimized by adjusting hyperparameters.
Step 2: The parameters of target model are initialized by the pretrained CNN model. The neurons numbers of output layer are changed from source label category numbers to target label category numbers. Then, target model is retrained to fine tune the model parameters with little learning rates until the value of loss function keeps stable.
Step 3: Several TL1DCNNs for ensemble with different parameters are obtained by repeating Step 1 and Step 2.

B. TL1DCNN MODEL
The CNN is used as a feature extraction module in pipeline leak detection and localization, because the CNN has been proven to perform well in computer vision, object recognize, fault diagnosis, etc. Different from 2DCNNs that usually process two-dimensional pictures, one-dimensional time series signals are input into the 1DCNN model for feature extraction in the study. Local features from input data are extracted by the convolutional layers. Some experience has shown that bottom layers usually obtain general features such as edge, shape, etc., and upper layers usually get higher-level features based on previous local features [16]. The function of convolution is described by the following expression: where i, n, W denote the index, size and parameters of the convolutional kernel, respectively. X and S represent the input and output of current convolutional kernel are represented by X and S, and the bias of the i-th kernel is denoted by b i . Krizhevsky et al. [17] regarded ReLU activation function as the key factor for the success of the AlexNet model. The mathematical representation of ReLU is: The eigenvalue distribution of each layer will be close to activation function saturation interval of output range as network depth increasing, and it would lead to vanishing gradient. Batch Normalization (BN) makes current eigenvalue distribution return to standard normal distribution, and the eigenvalue would fall in the interval where the activation function is more sensitive to the input.  The feature maps output by the convolution layer can be downsampled by the pooling layers, which generally includes maximum pooling and average pooling. Thus, the size of data and the parameters to be calculated can be reduced exponentially.
The FC layers are set at the top layer of CNN model after several alternate convolutional layers and pooling layers.
The key features extracted before FC layer are calculated by softmax function and the predicted probability vector is expressed asŷ. The predicted error E can be calculated by the cost function using cross entropy, expressed by: where y n denote label and predictive output, respectively. n is category index. m is sample index. The purpose of the training is to update the weights and biases of the network through the learning rate to minimize the prediction error. The updating direction and number size of each epoch are calculated by Adam optimizer.

C. ENSEMBLE TL1DCNN APPROACH
Better performance can be achieved by using multiple base learners with the ensemble approach. The key point of ensemble method is a difference between base learners, which means each base learner is good but different. Generally, to make ensemble method more valid, data, parameters, structure and model type should be set different. The parameters' difference means that the different base learners are trained from different model parameters combination, even with the same training set [34].
For a two-category classification, y ∈ {−1, +1} and real function f (x), it is assumed the error rate of base learner is e. Each base learner h i (x) meets: If N base learners are integrated by voting, the result can be obtained by: Assuming that the error rates of the base classifiers are independent of each other, according to the Hoeffding inequality, the integrated error rate is: As shown in (6), the error rate of ensemble would decrease rapidly as the number of the base learners increases. The probability vectors from n TL1DCNNs with different parameters multiply with n weights are optimized by particle swarm optimization (PSO) respectively, and the output of the proposed method is obtained by adding n multiplications. This process is shown in the right part of Figure 2 where  (7) is represented in the lower part and equation (8) is displayed in the upper part. The function is shown as following: where Y denotes the probability vector calculated by ensemble method. w i and P i represent i-th weight parameter and probability from i-th base learner. n is obtained by balancing the computing cost and the model performance. Result is the category output by the proposed method.

III. EXPERIMENTAL RESULTS AND ANALYSIS
The proposed method is implemented in Python 3.7 with TensorFlow 2.1 and runs on Windows 7 with an Intel i5 CPU.
The following metrics are calculated to quantify the classification performance of different methods: where TP, TN , FP, FN are the abbreviation of True Positive, True Negative, False Positive and False Negative, respectively.

A. EXPERIMENTAL DATA
The simulated data set used in the proposed ensemble TL1DCNN method comes from Flowmaster V7, which is a well-known pipeline simulation software. Flowmaster is used as a general 1D computational fluid dynamics code for modeling and analyzing fluid mechanics in complex piping systems of any scale. Moreover, a method of characteristics solver is used, which can accurately handle fluid transients in complex systems and systems-of-systems [37]. The structure of the pipeline is shown in Figure 3. In the case, the leak problem which is a form of leak characterized by short duration but usually large flow caused by pipe burst is considered.
The pipeline system is composed by a pump providing the water pressure with 10 ± 0.5 bar, a main pipeline with a length of 1 kilometer and a diameter of 400 millimeters, two branch pipes with a length of 500 meters and a diameter of  Usually, it is assumed that the velocity of the negative pressure wave is 1000 m/s in many published paper [38], [39]. As a result, the negative pressure wave transmission speed is set at 1000 m/s in the simulation. The opening of ball valves is used to simulate the size of leakage. The flowmeter is set between the ball valve and pipe to measure the leak flow. The pressure signals are collected at the pipeline by changing the water pressure from pump, leak aperture diameter, leak position and the ball valve opening time. Thus, 500 points pressure data are collected from each sensor. The points of a sample are 2,000 by connecting the signals from the four sensors end to end. Generally, the large leak is defined that the ratio of leak flow to pipe flow is bigger than 5% [40]. Because the Flowmaster software generates noiseless data, by setting the leak flow rate, the collected data can easily be labeled as large or small leak. However, under actual conditions, there will always be some noise in the collected signal, such as measurement noise and environmental noise. In order to simulate real working conditions and improve the robustness of the model, the zero-average standard normal noise distribution is applied to the collected pressure data. There are 9 leak sections which are divided into large leak and small leak according to the leak flow bigger than 5% of the pipeline flow or not, and each section is 100 meters long.
In the case, there are 19 categories with one normal condition, 9 large leak conditions (Large S1-S9) and 9 small leak conditions (Small S1-S9). Random Gaussian noise between −0.5 bar and 0.5 bar is added to the data to simulate the real environment. There are 3,144 samples that 2,358 samples as source training set and 786 samples as source validation set with the labels from S1 to S3 and no leak. The target dataset with 1,137 samples is composed by 717 samples with the labels from S4 to S9 and 420 samples extracted from source data set randomly. The target data set is divided into target training set with 730 samples, target validation set with 122 samples and target test set with 285 samples. Figure 4 shows the pressure signals of each category. The information about datasets is given in Table 1.

B. EXPERIMENTAL RESULTS
The source domain task (T S ) is to establish a classification model of the source domain data (D S ) with 7 categories, the target domain task (T T ) is to establish a classification model of the target domain data (D T ) with 19 categories. The knowledge of source task model could help to train the target task because the data volume of D S is much larger than that of D T . Before establishing the model, the data should be preprocessed. Firstly, the data from each sensor are spliced end to end and then the standard normal distribution noises are added to the splicing data. Finally, to speed up the convergence of weight parameters, the data are preprocessed through standardization before inputting them to the neural network.

1) MODEL CAPACITY
Traditional computational learning theory believes that for a certain algorithm, when the amount of data is fixed, the performance of the model first becomes better and then becomes worse as the number of parameters increases, whose test risk is a U-shaped curve. But in the field of deep learning, this rule is not entirely correct. There are lots of literatures which mentioned over-parameterization is necessary [41]- [43]. Goodfellow et al. [44] pointed out that for general non-convex optimization problems of deep learning, there is only a few theoretical analyses. The AlexNet model [17], which won the 2012 ImageNet champion, trained 60 million parameters by only 1.2 million images, so the calculation theory of deep learning should be re-understood.
Belkin et al. [45] proposed a double descent risk curve to modify the U-shaped curve, and gave a formula for calculating the peak value of the curve, which is the product of the number of samples and the number of categories. The authors explained that the performance of the model becomes better with over-parameterization after the peak because more options are provided by more parameters for the model to choose smoother and simpler functions. Therefore, the performance of the classifier can be improved by increasing the model capacity. Ba and Caruana [46] also proposed that the flexible model can be simulated by simple model, in other words, the inner function of a flexible model may be not really flexible and the model expressiveness is not the same as the model complexity.
The pipeline leakage data are used to verify the conclusions of the above literature. The average loss of the sample decreases first, and then increases as the number of parameters increase. After reaching the peak, it decreases and is even better than the sweet spot of the U-shaped curve. The detail is shown in Figure 5. Therefore, the magnitude of the subsequent models' parameter is 10 5 .

2) SELECTION OF THE NUMBER OF PRESSURE SENSORS
In the study, it is founded that the number and location of pressure sensors are very important for balancing the computational consumption and model accuracy. Thus, sensor layout plans with different numbers and location are compared. Table 2 presents 5 cases of the sensor layout.
The TL1DCNNs are built for Case 1 to Case 5 with the same structure and hyperparameters. The confusion matrices of these five cases are shown in Figure 6. It can be found  that the accuracies of Case 1 to Case 4 are 74.04%, 78.60%, 81.75% and 84.91%, respectively, while the accuracy of case 5 reached 90.18%, which means that the accuracy of case 5 is higher than 5% in each of the previous four cases. The reason for the above phenomenon may be that if there is no sensor installed on a certain pipeline, and the signal of the pipeline is similar with that of other pipelines nearby, and it would cause the misjudge of the model prediction. Therefore, four sensors are selected for achieving an acceptable accuracy and saving computational resources as much as possible.

3) SELECTION OF OPTIMAL TL1DCNN PARAMETERS
In order to get the model parameters with best performance in validation set and test set, the layer number, learning rate, epoch and batch size are selected to compare the prediction accuracy. The combination of these parameters is listed in Table 3. It can be found that the model using the second set of parameters achieved the largest accuracy of the test set when the training cost was small, and is similar to the accuracy of the training set, which indicates the overfitting phenomenon and insufficient model expression ability can be overcome to a certain extent. Thus, the parameters of the No.2 are used for subsequent analysis.

4) SELECTION OF ENSEMBLE MODELS
Similarity is defined as the ratio of the same predictive label number of two models and the total validation sample number. The output results of each of the five TL1DCNN models whose structure and accuracy are shown in Table 4 is VOLUME 9, 2021  probability vector, which is denoted by P 1 to P 5 . Moreover, the epochs of five TL1DCNN models are shown in Figure 7. The first index points to the label of no leak. The second index points to small S1 and the third index points to large S1. As the same law, the fourth index to the nineteenth index point to small S2 to large S9 respectively. For a probability vector, if one of the indexes gets the maximum, the predictive label of this sample is the pointed label of the index which is represented as I . The predictive results and similarity of two models can be defined as: where P [y] and n denote the yth element in the P vector and the number of samples respectively. The similarity matrices of the five models in validation dataset are shown in Table 5.   From all strategies with the same base learner number, the combined strategy with the smallest sum of similarity in the validation dataset is selected to form an ensemble case in the work. According to this algorithm, four cases can be obtained, namely, Case 1: TL1DCNN 3, TL1DCNN 4; Case 2: TL1DCNN 3, TL1DCNN 4, TL1DCNN 5; Case 3:

C. RESULT ANALYSIS
In order to evaluate the performance of the proposed ensemble TL1DCNN model, the traditional methods and the ensemble TL1DCNN method with same weight parameters and a single TL1DCNN method are selected to compare. Usually, empirical mode decomposition (EMD) and wavelet decomposition (WL) are employed to denoise the original signal. Then, some features can be extracted from the preprocessed data, such as mean, variance, standard deviation, root mean square, skewness, kurtosis, form factor, crest factor, impulse factor and margin factor [47], [48] which are defined as (15)- (24). Finally, the support vector machine (SVM) and back propagation neural network (BP) are selected in order to compare with the performance of the proposed method for leak detection. As a result, four traditional methods involving EMD-SVM, EMD-BP, WT-SVM and WT-BP are considered for comparison with the proposed method.
where n denotes the sample numbers and X mea , X var , X std , X rms , X ske , X kur , X fof , X crf , X imf and X maf denote mean, variance, standard deviation, root mean square, skewness, kurtosis, form factor, crest factor, impulse factor and margin factor, The denoising effect are shown in Figure 9 and the extracted features are shown in Table 6. The features are extracted via the above formulas from the denoised data and then these features are used as the input of the SVM and BP classifiers. For the unbalanced data, BP would predict the labels of less number samples as the labels of larger number samples. In order to overcome this shortcoming, the no leak data are resampled to balance the categories. The compared results between the proposed method and the other methods are shown in Table 7. It should be noted that the 2DCNN method [19] in Table 7 was applied to a long straight pipeline, which is a different type of pipelines from the pipeline network for the proposed method, and the 2DCNN was also  trained with the negative pressure wave data simulated by Flowmaster V7.
For the task of identifying no leak, small leak and large leak, the proposed method achieves better performance than other methods. At the same time, compared with the BP model, the proposed method does not need to deal with the problem of unbalanced data. In addition, the WT-SVM achieves better classification performance than the other three common methods.
Additionally, the proposed method is compared with the ensemble TL1DCNN method with same weight parameters and a single TL1DCNN method. The precision, recall and F_score of small leaks using the ensemble TL1DCNN with same weights are 0.74%, 1.71% and 0.63%, respectively, which are 88.13%, 85.68% and 86.67% lower than the proposed method, respectively. The precision, recall and F_score of large leaks using ensemble TL1DCNN with same weights are 92.58%, 96.51% and 94.24% respectively, which are 1.49%, 0.80% and 1.56% lower than the proposed method. For the small leak, there are 88.88%, 87.87% and 87.64% in the evaluation indexes with TL1DCNN, which are 0%, 0.48% and 0.34% bigger than the proposed method. For the large leak, TL1DCNN achieves 90.37%, 94.11% and 91.44% in precision, recall and F_score, respectively. However, the proposed method achieves all that evaluation indexes 3.20% more than the single TL1DCNN. It is similar for large leak using each of all these methods. The proposed ensemble TL1DCNN method shows the great improvement for the small leak. The detail information is shown in (e) of Figure 6, Figure 10 and Table 8.
The base learners of the proposed method have good classification ability and there are differences between them. The weights for each base learner are different and optimized, moreover, the one with a great contribution to classification is given a greater weight. VOLUME 9, 2021

IV. CONCLUSION
A pipeline leak detection and localization method based on the ensemble TL1DCNN is proposed in this study. The main contribution of this research is that the transfer learning is integrated into 1DCNN, and the PSO is used to optimize the weights of several selected TL1DCNN base learners, and an ensemble TL1DCNN is finally obtained for pipeline leak detection and localization. Compared to the traditional data-driven methods, the features can be automatically extracted rather than manually, and the performance of the model is improved by the ensemble approach. Moreover, the need for dataset size is reduced by using transfer learning technology. Due to the above advantages, the proposed TL1DCNN model trained on a small dataset achieves a good performance for pipeline leak detection and localization.
In this study, the different combination strategies of model structure, learning rate, epoch, batch size are selected to obtain optimal parameters. One of the four ensemble strategy that achieves the best performance is selected. The effectiveness of the proposed method is evaluated by simulated data which achieves 90.5%, 91.8% and 90.4% in the precision, recall and F_score, respectively. On the other hand, the proposed method is better than the enumerated traditional approaches, the ensemble TL1DCNN with same weights method and a single TL1DCNN method. These results show that the proposed method has certain prospects in the field of pipeline leak detection and location.
MENGFEI ZHOU received the M.S. degree in chemical engineering from the Zhejiang University of Technology, Hangzhou, China, in 2004, and the Ph.D. degree in control science and engineering from Zhejiang University, Hangzhou, in 2010. He is currently an Associate Professor with the Zhejiang University of Technology. His research interests include process control, machine learning, fault detection and diagnose, especially leak detection and location in the pipeline.