A Recognition Method for Multi-Radial-Distance Event of Φ-OTDR System Based on CNN

This paper proposes a multi-radial-distance event classification method based on deep learning. To the best of our knowledge, this is the first time that the $\Phi $ -OTDR can tell how far the target event from the sensing fiber is through deep learning approach. The temporal-spatial data matrix collected by the system is filtered by three different band-pass filters to form RGB images as the input of the Inception_V3 network trained by ImageNet dataset. The passband of three band-pass filters is selected by searching the maximum Euclidean distance in the frequency domain. Three kinds of filters with different frequency bands enhance the effective features of data samples in advance. The simulated annealing (SA) algorithm is applied to search the maximum Euclidean distance. Field experiment includes five kinds of events with four different radial distances, where there are 17 subclasses in total, has been carried out. The classification results show that the classification accuracy reaches 86% and the method can tell both the event type and radial distance.


I. INTRODUCTION
Distributed optical fiber sensing is a technology that collects signals through multiple sensing units in a single sensing optical fiber. Phase Sensitive Optical Time Domain Reflectometer( -OTDR) is a typical distributed optical fiber sensor system that can detect and locate external mechanical vibration for a long distance [1]. It is widely used in many fields, such as long-distance cable breakage monitoring, structural health monitoring, voice signal and human footprint detection, and city-wide behavior monitoring [2]- [5]. In the early researches, researchers have made great efforts to optimize the performance parameters of the system, such as spatial resolution, sensitivity, dynamic range, signal-to-noise ratio and sensing dimensions [6]- [11]. Jiajing et al. [10] proposed a distributed acoustic source localization technology based on an array signal processing method which enlarges the positioning dimension and realizes 2D and 3D sensing. Wu et al. [11] proposed a collaborative energy-based The associate editor coordinating the review of this manuscript and approving it for publication was Muguang Wang . method of source localization to estimate the vertical offset-distance of a specific vibration source. With the improvement of sensing performance, event classification ability has become one of the bottlenecks of application and promotion. Sun et al. [12] extracted feature vectors from morphological features of temporal-spatial signals got over 90% classification accuracy with 3 types of vibration events. Wang et al. [13] proposed a method to extract the feature vector of signal by wavelet energy spectrum analysis, and to classify vibration events by relevance vector machine. Xiong et al. [14] proposed a vibration event recognition method based on nearest neighbor classification support vector machine. These methods need to extract the features of the signal artificially and find a specific recognition algorithm. However, in different application scenarios, the environmental noise of the system, the light source quality of the laser and other unpredictable factors will make the effectiveness of some features in the signal lost, resulting in the degradation of the recognition effect, so the generalization ability of the above method is limited. Convolutional neural network (CNN) does not need to artificially set algorithm VOLUME 9, 2021 This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/ for classification task, it can extract corresponding features according to sample distribution and automatically fit classification algorithm. Therefore, CNN is suitable for vibration event recognition of the -OTDR system. Wu et al. [15] proposed a classification method based on one-dimensional CNN network and support vector machine. The average recognition accuracy of this method for five typical acoustic signals in oil transportation monitoring application is more than 98%. Wang et al. [16], [17] directly use the temporal-spatial data matrix of the -OTDR as the input of CNN, and propose a lightweight CNN network, which can classify 5 events with an accuracy of 96%. Li et al. [18] proposed a method of constructing event classification network based on transfer learning. This method can train the classification network quickly under relatively poor hardware conditions and can achieve 96% classification accuracy in eight types of events. The sensitivity of the sensing fiber is superior, which is enough to detect the vibration events that occur several meters away from the sensing fiber. However, in previous research, only the detection ability near the sensing fiber (i.e. event occurs right above the buried sensing fiber) was applied. Due to the non-flat frequency transmission characteristics of soil, the detected signal of the event away from the sensing fiber will distort. If the distorted data are directly collected into the training data set without considering the transmission differences in frequency attenuation, it will make the classification accuracy of deep network drop sharply.
This paper proposes a multi-radial-distance event recognition method based on deep learning and considering the frequency transmission differences for -OTDR distributed optical fiber sensing system. While identifying different types of vibration events, we make full use of the sensing capability of the sensing fibers to classify different types of vibration events that occur at different radial distances of the sensing fiber. The definition of radial direction and axial direction is shown in Fig.1. To the best of our knowledge, this is the first time that the -OTDR can tell how far the target event from the sensing fiber is through deep learning approach. The temporal-spatial data matrix collected by the system is filtered by three different bandpass filters to enhance the event features and form RGB images as the input of the Inception_V3 network pre-trained by ImageNet dataset [19], [20]. The passband of three band-pass filters is selected by searching the maximum Euclidean distance between the frequency domain of each class of vibration events. The Simulated Annealing (SA) algorithm is applied to search the maximum Euclidean distance [21]. Field experiment includes five kinds of events with four different radial distances, where there are 17 subclasses in total, has been carried out. The classification experiment results show that the classification accuracy, which can tell both the event type and radial distance, is improved from 33.57% to 86.82%, and the discrimination of the data filtered by three different frequency bands is better than that of single frequency band.

II. THE RECOGNITION AND PASSBAND SELECTING METHODOLOGY A. THE RECOGNITION METHODOLOGY
For the completeness of description, the principle of -OTDR is described below and the main structure of the system is shown in Fig.2. A light pulse is injected into the sensing fiber and the Rayleigh backscatter traces (RBT) returned from the sensing fiber are recorded. A temporal-spatial data matrix is then composed by these RBTs. The row direction of the data matrix stands for the distance along the sensing fiber and the column direction stands for the pulse repetition sequence, which can be treated as a temporal direction. Vibration along the sensing fiber is detected and located by moving average and differential methods. Due to the width of the probe pulse, a single vibration creates a range of light intensity fluctuations in the received RBTs. Therefore, peripheral spatial neighborhoods (such as the 40-meter range on both sides) near the vibrations in the temporal-spatial data matrix are extracted as event data samples for subsequent event classification by CNN.
The recognition methodology is constructed in detail in Fig.3, which includes five main stages: signal detecting, data pre-processing, database preparation, offline training and online testing. The data matrix contained the vibration and its neighborhoods are filtered by three band-pass filters with different frequency bands in temporal domain and then be scaled to the input size of the CNN. The RGB images with labels are constructed by concatenating the filtered data matrix and then stored in a database. The Inception_V3 network pre-trained by ImageNet is offline trained on the prepared data set. The pre-trained network can make the training process faster and less training samples are required. Then the field data follows the above process will be online tested and acquired their event types and radial distances.
In this study, not only the different types of vibration events, but also the radial distances for different vibration  events need to be identified. The spectra of different radial distances with the same vibration event type, which is 'Digging with a shovel', is shown in Fig.4. As shown in Fig.4, the effective frequency is under 310Hz and the four curves are not obviously distinguished as they are with the same event type. There are distinctive only in certain frequency bands. In order to distinguish the radial distances of the vibration with the same type of vibration event, three bandpass filters with different frequency passbands are set up to tell the relative difference among the signals with different radial distances. A data matrix with depth of three is generated after being filtered by these three filters, which can be regarded as an RGB image. Three filters with different passbands can highlight these differences between each frequency bands and make full use of the whole effective frequency band.

B. THE PASSBAND SELECTION METHODOLOGY
Suitable filter band selection will help improve the discrimination between the event sample with different radial distances and the final classification. In this paper, Euclidean distance is applied to help select the best filter bands. Euclidean distance refers to the distance between two points in m-dimensional space. It is defined as: where x and y represent the two points in m-dimensional space. Suppose the distance along the sensing fiber is L, the pulse repetition rate is N, and the vibration location is l ∈ [1, L]. Then, the time domain signal of location l can be expressed as a column vector, where r is the RBT. The frequency domain of R l after fast Fourier transform can be expressed as, Then a frequency domain matrix includes all kinds of event can be represented as, where C is the number of classes. The matrix M l C×N contains the frequency information of all the event subclasses. The appropriate bandpass filter will selected by finding the maximum Euclidean distance among the spectra of all recorded samples. The Euclidean distance of M l C×N under 3 filter bands is as follow, . b 1 and b 4 are determined by the selected effective frequency section, for example, 30Hz and 310Hz in Fig.4. A larger Euclidean distance means the discrimination between each event type is more obvious. In order to obtain the maximum Euclidean distance, simulated annealing (SA) algorithm is applied. SA is a general optimization algorithm and an extension of local search algorithm, which don't need other auxiliary information to evaluate the objective function. The main idea of SA algorithm is to walk randomly in the search interval, and then use the Metropolis sampling criterion to make the random walk gradually converge to the optimal solution. Suppose D MAX and D are the current optimal target value and the new target value, the Metropolis sampling criterion process can be expressed as: If D > D MAX , then D is the current optimal value. If not, accept or discard D with a random probability. The probability of D being accepted as the current optimal value is,  where temperature T is a control parameter which controls how quickly the random process moves to a local or global optimal solution. The parameters required by SA algorithm and their definitions are shown in the Table 1. The steps of searching the maximum value of D with SA algorithm are shown in the Fig.5.

III. DATA COLLECTION AND CLASSIFICATION EXPERIMENTS A. DISTRIBUTED OPTICAL FIBER SENSING SYSTEM AND DATA COLLECTION
The composition of the distributed optical fiber sensing system used to collect data is shown in Fig.2. An ultra-narrow linewidth laser (NLL) with a frequency width of 3 kHz is used as the light source, and then the continuous light emitted by the laser is cut into probe pulses through an acoustic optic modulator (AOM). Erbium-based optical amplifier (EDFA) is used to compensate the insertion and transmission loss. The amplified probe pulse is injected into a kilometer-long sensor optical fiber through a circulator. The sensing optical fiber is a G652 single mode optical fiber wrapped in a polyvinyl chloride coating externally with two wires for support and protection. The RBTs are collected directly by a photodetector (PD) after passing through the circulator. A data acquisition card (DAC) with a sampling frequency of 100MHz is then used to record changes in signal strength over time. Data are then processed on a computer (PC). Five different kinds of event at the same location of the sensing optical fiber are tested and recorded: Background (No. I), Jumping (No. II), Digging with a shovel (No. III), Walking (No. IV), Brick fall (No. V). For the last four kinds of event, four additional experiments were carried out with 1 m radial distance interval, which means the radial distance is 0m, 1m, 2m and 3m, separately. It should be noted that the experiments were conducted by the different person at different time, which ensures the robustness of the experimental data. Limited by the experiment condition, the sensing optical fibers are buried in the soil with a depth of about 5 cm. The real soil condition and the experiment environment are shown in Fig.7. In order to avoid the coherent fading effect in the events classification, two different probe pulse widths, 100ns and 200ns, are applied. Therefore, the spatial resolutions (SR) of the two data are 10m and 20m respectively. The data acquired under different pulse width are equally treated as the same event. The pulse repetition rate is 10kHz. RBTs within 1 second and 20m on both two sides near the vibration position (40m in total) are extracted as a sample. The composition of each dataset is detailed in Table 2. The ratio of training set and validation set is 1:3.
Details of the five kinds of event are as follow,

1) BACKGROUND
Environment noise collected by the system without artificial external disturbance.

2) JUMPING
A person jumping at one location with a rate of about twice a second. In the 'Jumping' event, it is divided into four subclasses: right above the sensing optical fiber (No. II. i), 1m radial distance (No. II. ii), 2m radial distance (No. II. iii) and 3m radial distance away from the sensing optical fiber (No. II. iv).

3) DIGGING WITH A SHOVEL
Using a small shovel to shovel the land at one location with a rate of about twice a second. In the 'Digging with a shovel' event, it is also divided into four subclasses: right above the sensing optical fiber (No. III. i), 1 m radial distance (No. III. ii), 2 m radial distance (No. III. iii) and 3 m radial distance from the sensing optical fiber (No. III. iv).

4) WALKING
A person walks around a fixed location with a rate of about two steps per second. There are four subclasses in the 'Walking' event: right above the sensing optical fiber (No. IV. i), 1 m radial distance (No. IV. ii), 2 m radial distance (No. IV. iii) and 3 m radial distance away from the sensing optical fiber (No. IV. iv).

5) BRICK FALL
An 8.55kg brick falls freely from 1m height at one location. Four subclasses are included in the 'Brick fall' event: right above the sensing optical fiber (No. V. i), 1 m radial distance (No. V. ii), 2 m radial distance (No. V. iii) and 3 m radial distance away from the sensing optical fiber (No. V. iv).

B. DATA PREPROCESSING
Each RBT collected by the DAC forms the row of the data matrix. Each sample data matrix consists of RBT collected in one second. In this way, the horizontal row of the data matrix represents the spatial domain, and the vertical columns represents the temporal domain. Since only the AC component carries the vibration information and the DC component with large value reduces the visibility of the AC component, high pass filter with cutoff frequency of 30Hz is set to remove the DC component. Due to the low pass characteristics of soil, the frequencies higher than 310Hz are rarely detected. Therefor the value of b 1 and b 4 are set to 30Hz and 310Hz. The initial temperature is T = 10 10 , cooling coefficient is K = 0.8, terminated temperature is T = 10 6 and the number of iterations per temperature is L = 100. In order to avoid a super narrow passband, the minimum passband is set to 10Hz. Therefor the feasible range of b 2 and b 3 is from 39Hz to 309Hz, with an alter step of 1Hz. The fitness evolution curve of through SA algorithm searches the Euclidean distance between subclasses in frequency domain is shown in Fig.7. Fig.7 shows that the maximum Euclidean distance between classes in frequency domain is 1.247 × 10 8 and the corresponding b 2 and b 3 are 118Hz and 177Hz. This process only takes 4.01s to pass 2700 iterations. Thus, the three filters are applied with 30Hz to 117Hz, 118Hz to 177Hz, and 177Hz to 310Hz bandwidths. The data matrix passes through these three filters separately in temporal domain. Three filtered data matrices are concatenated to form a data matrix with a depth of three, which shown as an RGB format. In order to match the size of the input layer of Inception_V3, the RGB images are resized to 299 × 299 × 3. Then the RGB images with their subclasses regarded as labels are stored in the training and validation data set. Some typical RGB images of each subclass are shown in Fig.8.

C. EVENTS TYPE AND RADIAL DISTANCE CLASSIFICATION
The Inception_V3 network pre-trained by ImageNet dataset has a strong feature extraction ability, which will greatly save the training time. Since there are 17 event subclasses in total, the output full connection layer is reconstructed to 17 neurons. The overall learning is 0.001, the train epoch is 200, the selected optimizer is SGD, and the batch size is 16. The details of training and validation set are shown in Table 2.
After training process, the classification accuracy of the  validation set for each subclass and general type are obtained and shown in Table 3. The accuracy and loss curve of training and validation processes are shown in Fig.9 and Fig.10. Fig.9 and Fig.10 show that the network is stably convergent and well trained after 150 epochs. It can be seen that the average classification accuracy is 86.82% for the event within 3m radial distance in Table 3. If only the samples within 2m radial distance are considered, the average classification accuracy can be improved to 88.80%. As the subclass represents the radial distance, the network can tell both the event type and how far away the event is. In order to test the effectiveness of the proposed method, two other methods are applied for comparison. The first method uses a single-channel grayscale picture (narrow-bandwidth data) generated from the original data matrix by a narrow-bandwidth bandpass filter with bandwidth from 190Hz to 210Hz, where there are the biggest differences in frequency domain as shown in Fig.4. The second method uses a single-channel grayscale picture (broad-bandwidth data) generated by a broad-bandwidth from 30Hz to 310Hz, which includes the whole frequencies. The broad-bandwidth  method is usually used in previous research for event classification. The same network is applied for training, and  other parameters including the size of training and validation set are kept the same. The average validation accuracy of these three methods is shown in Table 4. As the radial distances distort the vibration waveform and the small amplitude waveform will be drowned by noise, the classification accuracy of broad-bandwidth method drops rapidly to only 33.57% for multi-radial-distance event recognition. The narrow-bandwidth method shows a better result (80.75%) than the broad-bandwidth method as the narrow frequency band contains the biggest difference and less noise. The proposed three-channel method shows the best performance (86.82% classification accuracy) as the relative change between the three frequency bands gives more difference and is not obviously affected by the backscattering light intensity fluctuation, which is common in -OTDR system due to slow change of system parameters.
In order to show the classification ability in event type and radial distance separately, inner general class error(IGE) and cross general class error(CGE) are applied. The IGE and CGE are defined as, where FN inner is the sample number with wrong subclass predictions, FN cross is the sample number with wrong general class predictions and N val is the sample number of the validation set. IGE shows the error in radial distance recognition and CGE show the error in event type recognition. The IGE and CGE for each method are shown in Table 5.
As can be seen from Table 5, both IGE and CGE for the three-channel method are lowest. which shows that the three-channel method is superior to the other two methods in both event type classification and radial distance identification.

IV. DISCUSSION
For different filed applications, the optimal filter frequency bands need to be chosen again. The SA algorithm will help to find the best filter frequency bands for a certain dataset in a practical way. Once the training dataset is determined, the SA algorithm only needs to be run once before network training process, which takes 4.10s for our dataset. If the network is used for other event types, the optimal bands should be searched again.
In the experiments, the vibration events are handled with 1m interval and the results shows the proposed method has the ability to distinguish the 1m difference. For different distances, such as 0.5m, 1.5m and 2.5m, the amplitude of fiber detected vibration may change a little, but this method can still work as the differences between each subclass can still be detected and distinguished.
As the proposed method can tell both the event type and radial distance. It can be used for early warning in special areas or helps find fibers that are buried in unknown locations. In future studies, the correlation of radial distance resolution, accuracy of event recognition, and the soil environment in which the optical fiber is buried will be investigated.

V. CONCLUSION
This paper has proposed a multi-radial-distance event recognition method for -OTDR distributed optical fiber sensing system based on deep learning. To the best of our knowledge, this is the first time that the -OTDR can tell how far the target event from the sensing fiber is through deep learning approach. The temporal-spatial data matrix from -OTDR is filtered by three filters of different frequency bands and then stacked and scaled to form an RGB images, which can be used as the input of CNN. The data matrix filtered by three different bandpass filters offers the relative difference between each bandwidth, which can help the network earlier pay attention to the difference between each subclass with different radial distances. The passband of three filters is selected by searching the maximum Euclidean distance in frequency domain by SA algorithm. Field experiment of five kinds of events with four kinds of radial distance, where there are 17 subclasses in total, has been carried out. The classification results show that the classification accuracy of the proposed method improves from 33.57% to 86.82%.