Data-Driven Distributed Optical Vibration Sensors: A Review

Distributed optical vibration sensors (DOVS) have attracted much attention recently since it can be used to monitor mechanical vibrations or acoustic waves with long reach and high sensitivity. Phase-sensitive optical time domain reflectometry (Φ-OTDR) is one of the most commonly used DOVS schemes. For Φ-OTDR, the whole length of fiber under test (FUT) works as the sensing instrument and continuously generates sensing data during measurement. Researchers have made great efforts to try to extract external intrusions from the redundant data. High signal-to-noise ratio (SNR) is necessary in order to accurately locate and identify external intrusions in Φ-OTDR systems. Improvement in SNR is normally limited by the properties of light source, photodetector and FUT. But this limitation can also be overcome by post-processing of the received optical signals. In this context, detailed methodologies of SNR enhancement post-processing algorithms in Φ-OTDR systems have been described in this paper. Furthermore, after successfully locating the external vibrations, it is also important to identify the types of source of the vibrations. Pattern classification is a powerful tool in recognizing the intrusion types from the vibration signals in practical applications. Recent reports of Φ-OTDR systems employed with pattern classification algorithms are subsequently reviewed and discussed. This thorough review will provide a design pathway for improving the performance of Φ-OTDR while maintaining the cost of the system as no additional hardware is required.


Data-Driven Distributed Optical Vibration Sensors: A Review I. INTRODUCTION
O VER the past few decades, distributed fiber optical sensors (DOFS) have been intensively studied owing to the large-scale monitoring range, fully distributed manner, accurate localization and low cost [1]- [4]. Among all DOFS, fiber sensors based on detecting backscattering light is the best choice for truly distributed measurement. Till now, Rayleigh, Brillouin (including spontaneous and stimulated Brillouin scattering) and Raman scatterings based DOFS are the three widely adopted technologies [5]- [9]. Among them, phase-sensitive optical time domain reflectometry (-OTDR) based on detecting backscattering Rayleigh light has been used to measure vibration, intrusion, acoustic wave, strain and temperature [10]- [15]. Especially, -OTDR has been demonstrated as a promising technique to realize dynamic measurement due to its overwhelming advantages of high sensitivity, real-time measurement, broadband response and multiple intrusions detection capability. During the past years, -OTDR has been widely used in many fields, such as perimeter and pipeline security monitoring (distributed optical vibration sensing, DOVS), soundwave and seismic wave detection (distributed acoustic sensing, DAS) etc. [16], [17]. In practical applications of -OTDR, response bandwidth is important since high frequency response is necessary for DAS. However, the inherent frequency response of the system is restricted by the repetition rate of the probe pulse (and the maximum value is determined by the sensing range). To overcome this limitation, several solutions have been put forward including -OTDR incorporating with Mach-Zehnder interferometer (MZI) and Michelson interferometer (MI) [18]- [21]. For these hybrid schemes, the correspondence between the vibration location and the vibration frequency is not clear although the frequency spectrum mapping method has been used to mitigate this drawback. To better realize high-frequency measurement, multiple-frequency probe pulses are injected into the sensing fiber with a certain time delay [22]- [24]. Moreover, quantitative analysis of the external events is also critical for the field applications. To realize this, digital coherent detection [25], use of chirped pulse [26], I/Q demodulation [27] and digital signal processing [28] have been implemented to retrieve the phase information.
Because of its truly distributed manner, -OTDR is always driven by data in very large size in practical applications. The relatively low signal-to-noise ratio (SNR) restricts the system performance, which needs to be improved to reduce the nuisance alarm rates in real-world scenarios. To improve SNR and extract the events accurately, light amplification approaches have been performed, such as distributed Brillouin amplification [29], pulse width modulation Brillouin amplification [30], distributed Raman amplification and hybrid amplification [31]- [34]. All of them can improve the SNR and greatly extend the sensing range but they increase the complexity and the cost of the systems. On the other hand, post-processing of raw data using advanced algorithms is an alternative efficient approach to improve the system SNR. More importantly, these post-processing algorithms enhance the signal quality effectively without adding any extra hardware [35]- [41]. Recently, many algorithms based on one-dimensional and two-dimensional processing have been used to enhance the system performance, such as adaptive temporal matched filtering (ATMF) [35], wavelet denoising (WD) [36], [37], curvelet transform [38], empirical mode decomposition (EMD) [39], compressive sensing [40], two-dimensional edge detection (2D-ED) [41], two-dimensional adaptive bilateral filtering (2D-ABLF) [42] and nonlocal mean (NLM) [43].
Pattern classification algorithms can automatically identify different types of external intrusion events according to the signal characteristics extracted from the raw data, which have been widely applied in many fields such as automated driving, handwriting recognition and biomedical applications. Conventional -OTDR is capable of locating the external intrusions along the fiber link but fall short in distinguishing different types of intrusion events. To overcome this limitation, pattern classification algorithms can be utilized to recognize the types of intrusion events in -OTDR systems. It has been demonstrated that original data from -OTDR can be represented by their time-domain features [44]- [48], frequency-domain features [49]- [52], time-frequency domain features [53]- [62] and time-space domain features [16], [63]. Many machine learning algorithms, such as support vector machine [50], [51], multi-layer perceptrons [58], [64] and convolutional neural network [65]- [67], are used as pattern classifiers.
In this paper, we review the recent advancements in postprocessing algorithms of the raw data in -OTDR. Methods covered in this review can be divided into two categories: SNR enhancement and pattern classification approaches. For SNR enhancement methods, we carefully analyze their operation principles, denoising performances and individual advantages. Furthermore, pattern classification algorithms for -OTDR are reviewed in this work. Feature extraction methods exploiting different domains of the demodulated signals are presented and discussed. Recent reports of machine learning based intrusion event recognition algorithms are reviewed. With a hybrid of SNR enhancement and pattern classification algorithms, intrusion signals can be demodulated from the redundant raw data of DOVS more accurately. This advanced version of data-driven DOVS system is suitable for sophisticated applications, such as border breach monitoring, pipeline leakage detection etc.
II. PRINCIPLE AND THE DATA STRUCTURE OF PHASE-SENSITIVE OTDR Phase-sensitive OTDR has been proposed to measure mechanical vibrations and acoustic wave along the sensing fiber. According to the 1-D backscatter impulse model as illustrated in Fig. 1, the amplitude of the Rayleigh backscattering trace is the superposition result of the light scattered within the probe pulse width. As narrow-linewidth probe pulse is normally adopted in phase-sensitive OTDR, the trace is jagged and the amplitude of the trace is sensitive to the external measurands [14], [15]. When a probe pulse with a pulse width W is injected into the sensing fiber, the power of the scattered light can be described as [14], [15]: where P(t) is the sum of the inherent transmission attenuation trace P 1 (t) and the interference trace P 2 (t). α and n f are the inherent attenuation coefficient and the effective refractive index of the sensing fiber, respectively. a j and τ j are the amplitude and transmission delay of the j-th scattering wave respectively. We have τ j = 2n f z j /c and z j is the distance from the scattering center to the input end. J is the total number of scatterers and it has a much large value since the size of each scatterer should be smaller than one-tenth of the wavelength. c is the velocity of light in vacuum, r ect[x] = 1 when 0 ≤ x ≤ 1 and equals to zero otherwise. ϕ j k is the phase difference between the j-th and k-th scatters and it can be calculated as 4πνn f (z j − z k )/c. Compared to conventional OTDR, which is characterized by P 1 (t) only, the backscattering trace of phase-sensitive OTDR also includes the coherence trace P 2 (t), as shown in Eq. (1). In phase-sensitive OTDR, the capability of detecting the weak perturbation is provided by P 2 (t). According to Eq. (1), P 2 (t) is mainly determined by the phase difference ϕ j k although it is also dependent on other parameters. Hence, the stability and coherence of the laser are critical for phase-sensitive OTDR, especially for high-sensitive and lowfrequency measurement.
In phase-sensitive OTDR, both direct and coherent detection schemes have been adopted to collect raw traces. For both systems, the raw data can be regarded as a two-dimensional array x[M, N], as depicted in Fig. 2. There is a one-to-one corresponding relation between the probe pulses (the green pulse train) and the backscattering traces (the blue traces). M is the number of the collected traces and it is determined by the probe pulses. N is the number of sampling points in each trace and it is determined by the sampling rate (S) and the measurement period (T ), N = ST. Assuming that a vibration is applied on the sensing fiber at the location of z k (corresponding to k-th element in N), the vibration signal (along Y direction) can be extracted and it is presented with a one-dimensional array v[M, 1] = x [M, k].
In practical applications, the ideal signal x[M, N] is polluted by noise and is modified into x n [M, N] and it is expressed by where δ n is the noise including the background noise, the thermal and shot noise of the photodetector. The background noise includes the amplified spontaneous emission (ASE) noise and the noise caused by finite extinction ratio of the light modulator [68], [69].
III. SIGNAL-TO-NOISE RATIO ENHANCEMENT WITH POST-PROCESSING ALGORITHMS Although the power of Rayleigh scattering is larger than other scattering signals, such as spontaneous Brillouin and Raman scattering, it still suffers from low SNR in practical applications. Hence, post processing of the backscattering Rayleigh signal is required to improve system SNR. In this work, six SNR enhancement post-processing algorithms including temporal matched filtering (ATMF), wavelet denoising (WD), empirical mode decomposition (EMD), two-dimensional edge detection (2D-ED), two-dimensional adaptive bilateral filtering (2D-ABLF) and nonlocal mean algorithms (NLM) are discussed in detail. These algorithms can be divided into two categories: one-and two-dimensional denoising. According to the properties of these algorithms, ATMF, EMD, 2D-ED only improve the localization SNR but the other algorithms (WD, 2D-ABLF and NLM) enhance the signal SNR and localization SNR simultaneously.

A. Adaptive Temporal Matched Filtering (ATMF)
Different from conventional temporal filtering algorithm, the adaptive temporal matched filtering algorithm (ATMF) can maximize SNR through tuning the parameters of the filter according to the signal of interest and the actual noise. I. Ölçer et al. improve the SNR of phase-sensitive OTDR with the ATMF [35]. In their work, great localization SNR has been achieved. Besides, they claim that the SNR improvement is realized without sacrificing system bandwidth. In phase-sensitive OTDR, assuming that the target signal is x[M, N], as shown in Fig. 3. After being processed with the ATMF, the filtered signal y(k) at the corresponding location k is expressed by [70], [71] where Q is an arbitrary value between 0 and M, W k and X k are the filter weight and target signal, H represents the complex conjugate vector or matrix transpose. Both of W k and X k are Q * 1 matrixes and they are expressed by In the ATMF, the optimal weight filter [72] is described as: where In Eq. (6), R k is the noise covariance matrix statistically estimated from the received signal. R k is a Q * Q matrix and it is calculated with the neighboring ranges close to the vibration location, indicated by the green rectangle in Fig. 3. It is assumed that there is no any other perturbation in the selected neighboring region X k/H 0 . L s is the length of the summation and it is an arbitrary number. It should be noted that Ls is along the distance direction. S k (w) is the vibration signal which is applied on the position k and it is a Q * 1 matrix. w = 2π f / f S , f is the vibration frequency and f S is the repetition rate of the probe pulse. α is an arbitrary real number and it can be calculated with Eq. (6) to normalize the noise power at the output.
According to Eq. (6), we can find that the frequency of the vibration signal must be known to form S k (w). So the search process for the perturbation signal is required before forming the optimal filter. Through optimizing the size of the weights W k , the maximum SNR can be achieved for the optimal weights size Q. The procedure of ATMF algorithm is illustrated in Fig. 4 and is summarized as follows: 1. Collect the original phase-sensitive OTDR data; 2. Determine the data size in one pulse period (i.e., N for the target signal x[M, N]) and the length of the processing array Q (between 0 and M). 3. Determine a training strategy to estimate the covariance matrix R k from the background noise and the temporal vector S k (w) according to Eq. (6). 4. Compute the adaptive weight vector W k according to the Eq. (5). 5. Calculate the output y(k) of the filter with this weight vector according to Eq. (3). According to the theoretical analysis in [35] and the logic flow chart shown in Fig. 4, the filter weights are calculated along X and Y directions, but the filtering process is only performed in Y direction. So AMTF is essentially a onedimensional filtering algorithm. Moreover, we can find that the dimension of the input matrix is reduced after being processed with the ATMF. Especially, a M * N matrix is converted into a 1 * N matrix if Q = M. That indicates the ATMF is only good for improving the localization SNR. Although the time domain signal along Y direction can be improved with a sliding window (Q < M), SNR improvement is not significant. Worse still, this approach loses some useful information in the process and only retrieves the signal with low frequency. Hence, this algorithm can only apply to raw signals with low vibration frequency.
Therefore, the ATMF approach can maximize the localization SNR through removing the out-of-band noise for the target signal. However, it cannot preserve the effective signal (along Y direction as shown in Fig. 3) and improve its SNR. So, a hybrid denoising algorithm is needed to enhance the signal SNR as well as localization SNR.

B. Wavelet Denoising (WD)
Wavelet transform is a useful analysis tool for signal processing in both time domain and frequency domain. Different from the fixed frequency resolution of the Fourier transform, wavelet transform can provide a variable resolution in frequency domain. That greatly improves the frequency accuracy in time-frequency analysis. Owing to the distinguished advantages, wavelet transform has been widely used in data compression [73], signal and image processing [74], pattern recognition [75], as well as in mathematics, physics, biology and engineering applications. Among the various applications, wavelet denoising can improve the signal performance of the DOVS due to the fact that it works in full frequency spectrum and retains the signal well. In the past, great research efforts have been made to investigate this method [36], [37]. Through stretching and translating the mother wavelet function, a target signal can be decomposed into a series of base functions with the wavelet transform. After being processed via the continuous-time wavelet transform (CWT), the transform result of the target signal x(t) is expressed by where x(t) is the original signal and s and u are the stretch and translation parameters, respectively. ψ * (t) is the conjugate function of the mother wavelet.
In practical applications, the discrete-time wavelet transform (DWT) instead of CWT is adopted, because DWT analyzes the signal with less number of scales in comparison with CWT. As for phase-sensitive OTDR, Z. Qin et al. improve the signal SNR with the DWT algorithm [36]. In theory, DWT is obtained by sampling the CWT with a stretching factor as powers of 2. The expression of the DWT is described as where j , k are the stretching and translation factors for DWT, respectively. To simplify the acquisition process of the DWT, a filter bank consisting of the low-pass and high-pass quadrature mirror filters has been proposed to compute the DWT [76], [77]. Fig. 5 shows the schematic diagram of DWT and the inverse DWT based on the filter banks. After passing through a series of filter banks, the target signal is decomposed into the approximation coefficients (the output of the low-pass filter, a k ) and the detail coefficients (the output of high-pass filter, d k ). Hence, wavelet denoising based on DWT consists of the following three steps: 1. Apply the DWT on the raw noisy traces (target signals) to obtain the wavelet coefficients (including approximation a k and detail coefficients d k ); 2. Choose an appropriate threshold rule to modify the wavelet coefficients; 3. Reconstruct the denoised signal with the inverse DWT;  In [36], 20-Hz and 8-kHz vibration can be extracted with 50-ns wide probe pulses. However, the effective performance enhancement (e.g. SNR) is not presented. The setting of the threshold for the coefficient modification is critical and that determines the denoising performance of the wavelet denoising essentially. So, the optimal wavelet coefficient modification methods should be carefully designed. One-dimensional wavelet denoising algorithm can be performed along two different directions, as shown in Fig. 6. In their experiment, onedimensional wavelet denoising is executed but the direction of the denoising process is not explained. Therefore, SNR can be further enhanced by adopting two-dimensional wavelet denoising.

C. Empirical Mode Decomposition (EMD)
To accurately analyze the details of the non-stationary and non-linear signal, Hilbert-Huang transform is proposed by Huang et al. [78]. Especially, the empirical mode decomposition (EMD) algorithm is an important part of the Hilbert-Huang transform. By utilizing the EMD algorithm, Qin et al. separate the vibration signal from background noise to improve the localization SNR [39]. The EMD algorithm decomposes a target signal x(t), which contains noise into a series of intrinsic mode functions (IMFs) and a residual component r(t). After being decomposed with the EMD algorithm, the target signal is expressed by where N is the number of the IMFs and it is changeable. The IMF generation is illustrated in Fig. 7. The decomposition is performed along the Y direction as shown in Fig. 2. In Fig. 7, the blue curve is the target signal x(t). The decomposition process is described as follows: 1. Identify the local extremums of x(t) and the local maximums (E max (t)) and local minimums (E min (t)) are marked as red and green dots, respectively; 2. Construct the upper envelope of local maximums (the red curve) and the lower envelope of local minimums (the green curve) by cubic splines interpolation, as shown in Fig. 7; 3. Calculate the average value m(t) with the upper and lower envelopes and it is depicted in Fig. 7 (the black curve); 4. Subtract m(t) from x(t) to obtain h(t): 5. Assess whether h(t) satisfies the IMF conditions (which will be explained in the subsequent content). If h(t) doesn't satisfy the IMF conditions, repeat step 1 to 4 for h(t). If h(t) satisfies the IMF conditions, denotes h(t) as the first order IMF. Then, calculate the difference between x(t) and h(t) to acquire the first residual component r (t). When r (t) is a monotonic function or only one extremum retains, the decomposition process ceases. Otherwise, substitute x(t) with r (t) and repeat the steps 1 to 5 to obtain remaining IMFs. In step 5, there are two necessary conditions for IMF: a) The average value of the average curve (the black curve) is close to zero; b) The total number of the local extremums (N max + N min ) is close to the number of the intersection of the IMF curves with zero point (N zero ), the difference between them should be To explain the decomposition process clearly, the logic flow chart is illustrated in Fig. 8. As explained in [79], the number of the IMFs is only determined by the target signal x(t). Meanwhile, the frequency of IMFs decreases gradually with respect to its order [80]. Hence, the high frequency noise is mainly concentrated on the first order IMF and can be reduced through filtering or thresholding some IMFs. To remove the noise in phasesensitive OTDR, the Pearson correlation coefficient (PCC) has been proposed to calculate the correlation between the IMFs and the target signal x(t). The i-th order PCC is described as where x j and h j are the target signal and the i-th IMF, respectively.x andh are the corresponding arithmetic means of the two variables. M is the number of the Rayleigh scattering traces, as defined in Fig. 2.
Through calculating the PCC between the target signal and each IMF, the PCC distribution along the IMF can be obtained. It is verified that the maximum PCC is located in the first order IMF for the noise signal while the peak appears at other orders of IMFs for the actual vibration signal. Thus, it is proposed in [39] a denoising procedure to reduce the noise and it is as follows: 1. Process the raw Rayleigh backscattering traces with the EMD algorithm according to the procedure in Fig. 9; 2. Obtain the PCC distribution along IMFs through calculating the PCC between raw data and each IMFs; 3. Acquire the location of the maximum PCC; 4. If the maximum PCC appears at the IMF1, only the residual component r(t) is retained, otherwise keep the raw data. Similar to the above-mentioned methods, the EMD algorithm is also a one-dimensional algorithm and it executes along Y direction, as depicted in Fig. 10.
To verify the performance of the proposed method, validation experiment has been performed in [39]. Superior performance has been achieved and the maximum SNR of 42.52 dB has been obtained for a 100-Hz vibration. Besides, the authors compare the proposed method with moving average, wavelet denoising and the EMD-soft denoising methods. The proposed method achieves the best localization SNR improvement among those methods. However, the raw vibration signal has not been improved although ultra-high localization SNR can be achieved with the proposed EMD algorithm. Hence, the proposed method is powerful for the vibration localization but cannot improve the performance of the vibration signal. In addition, the algorithm might be unstable since the judgement strategy of the denoising is too rough. The algorithm stability and the signal performance can be improved if the judgement strategy is optimized.

D. Two-Dimensional Edge Detection (2D-ED)
In image processing, edge detection method is an effective approach to extract the abrupt change if there is a discontinuity or a steep intensity gradient in the image (intensity function) [83]. For DOVS, if there is a significant change at the event location in comparison with the surroundings, similar edge detection method can be an efficient tool for event extraction purpose. T. Zhu et al. apply the two-dimensional edge detection (2D-ED) algorithm to phase-sensitive OTDR to improve the localization SNR [41]. In their work, Sobel operator (a kind of orthogonal gradient operator) has been adopted to acquire the vibration location and improve the signal SNR. For a continuous function f (x, y) at the position (x, y), its gradient can be expressed as a vector [81], [82].
G x , G y are the two derivatives along the horizontal (x direction) and vertical coordinates (y direction) respectively. |∇ f (x, y)| and φ(x, y) are the magnitude and direction angle of the vector. The masks of Sobel operator are commonly given as two 3x3 matrices: After that, the relative change in a neighborhood (with size of 3 × 3) can be calculated by convolving with the raw data, as shown in Fig. 11. The expression is described as (⊗ represents convolution operation): To obtain better performance enhancement, the authors attempt larger matrices. In their work, operators with 4 × 4 and 5 × 5 matrixes have been tested to verify the effect of the matrix size on the performance enhancement. The operators with 4 × 4 and 5 × 5 matrixes are as follows: In their work, validation tests have been implemented and a piezoelectric transducer (PZT) was used to apply a vibration on the test fiber. In the 2D-ED process, the raw data is firstly converted into gray image and then convolves with the operator S m y at each point. Note that, a burst mainly appears at one direction such that the raw data is only computed with one operator. The test results show that the highest localization SNR of 8.4 dB can be achieved when the operator with 4 × 4 matrix is adopted. Moreover, the authors claimed that the 2D-ED algorithm can improve the spatial resolution (SR) and the test results indicate that the original SR of 5.1 m has been improved to 3.4 m. Although the 2D-ED method with Sobel operators improves localization SNR, the maximum improvement is less than 1.9 dB. Besides, the 2D-ED algorithm cannot improve the SNR of the vibration signal. Even worse, it distorts the vibration signal [42]. Therefore, 2D-ED is a good method to extract the vibration events but not a good approach to improve the signal performance. The 2D-ED algorithm might need to cooperate with other denoising algorithms. In a hybrid scheme, the 2D-ED algorithm aims to locate the vibration and other algorithms can improve the SNR of the vibration signal.

E. Two-Dimensional Adaptive Bilateral Filtering (2D-ABLF)
Bilateral filtering algorithm is a neighborhood filtering algorithm and it retrieves the effective signal through weighted average method [83]- [85]. To make full use of the redundant information of the target signal and to remove the noise completely, 2D bilateral filtering algorithm has been proposed in image denoising. H. He et al. apply the 2D bilateral filtering algorithm to phase-sensitive OTDR to improve the signal SNR and realize fast signal processing [42]. After denoising with the 2D bilateral filtering algorithm, the target signal containing noise X[i, j] is modified to Y [i, j] and the denoising expression is described as where i and j are the horizontal and vertical coordinates of one pixel, m and n are the horizontal and vertical distances of the neighboring pixels to the target pixel, P is the half width of the neighborhood, and W [i,j;m,n] is the weight coefficient, which is the product of geometric (W G [i,j;m,n]) and radiometric (W R [i,j;m,n]) weights, as shown in Fig. 12(a). The weight coefficient is expressed by As shown in Eq. (19) and Fig. 12, the effective weight coefficient W is determined by both the geometric (W G ) and radiometric (W R ) weights. The geometric weight is determined by the Euclidean distance between the center pixel [i,j] and the neighboring pixel [i + m,j + n]. The radiometric weight calculates the intensity difference between the center pixel Y [i,j] and the neighboring pixel Y [i + m,j + n] in the neighborhood window (as illustrated in Fig. 12(b), p = (i, j ) and q = (i + m, j + n)). In the conventional 2D bilateral filtering algorithm, the two weights are defined as where σ G and σ R are the geometric standard deviation and gray level standard deviation, respectively. It is difficult to obtain optimal σ G and σ R simultaneously.
Considering that the empirical study of σ R is more critical than σ G to achieve the optimal denoising performance [85], the authors reduce the number of the variables. In their work,

the original W G (Gaussian function) is replaced by a Lorentz function and it is simplified into
The simple modification makes the 2D algorithm more practical since there is only one variable (σ R ) needed to be optimized. Besides, they adaptively install the σ R according to the relation between the optimal σ R and the noise standard deviation σ n [85]. To further improve the signal performance, the raw data has been processed with multiple iterations in their work. According to their verifications, only very small improvement can be achieved when the iteration times is greater than two, but the processing time increases greatly. Therefore, twice iteration is performed in their work. It should be noted that the processing time is acceptable in real-time monitoring system although two iterations is used.

F. Nonlocal Means (NLM)
Nonlocal means algorithm is a modification of the conventional neighborhood filtering algorithm in image denoising. Different from the conventional algorithms, a similarity window is proposed to compute the weight [86]- [88]. Hence, the NLM algorithm has better denoising performance but it takes much more time in comparison with the conventional neighborhood filtering algorithm (e.g. bilateral filtering algorithm). M. A. Soto et al. improve the SNR greatly in BOTDA and ROTDR with the NLM algorithm [43]. Meanwhile, they claim that this approach is also suitable for other DOFS schemes such as phase-sensitive OTDR. So the NLM algorithm is a good choice for SNR improvement in DOFS. It should be noted that the processing time becomes more critical in DOVS, which is different from the static measurement of temperature or strain via BOTDA or ROTDR.
The NLM algorithm reduces the noise by taking advantage of the high redundancy of the raw data. Especially, the raw data x[M,N] measured by DOFS can be regarded as a 2D image. Similar to 2D-ABLF, the NLM algorithm is also a weighted average method . For a pixel p = (i, j) of the target signal of x(M,N), where i and j are the horizontal and vertical coordinates of the pixel respectively. The output of the filtering is described as [86] where w(p,q) is the weight factor and it depends on the similarity between the pixels p and q(q = (i , j )), as depicted in Fig. 13. Moreover, 0 ≤ w( p, q) ≤ 1 and p,q w( p, q) = 1. Different from conventional neighborhood average methods, the similarity between two pixels p and q is computed with the neighborhoods surrounding the pixels of interest (i.e. all pixels in the similarity window), not only the values x( p) = x(i,j) and x(q) = x(i',j') assigned to the pixels [86]- [88]. The weight w(p,q) is calculated with the following expression.
where the Euclidean distance is computed with all pixels in the two neighborhoods. It should be noted that the weighting factors w(p,q) are independent on the geometry and only depend on the similarity of the data around pixels p and q, which makes this method non-local. Besides, the size of the similarity window must be close to the smallest feature of the image [86]- [88]. In the processing of DOFS, the window size cannot be less than the spatial resolution of the sensor.
In [43], the noise in the time-domain traces is removed clearly by adopting the NLM algorithm in BOTDA and ROTDR. Over 20dB and 13dB SNR enhancements are realized in BOTDA and ROTDR traces. Although a subsequent work performed by the same research group proves the effective SNR enhancement is only ∼3dB in BOTDA as a fitting process is executed [89], performance improvements of the NLM algorithm in other DOFS are more effective (e.g. ROTDR, OTDR etc.). Thus, the NLM algorithm is also a promising tool to improve the system performance without any hardware change. Compared to 2D-ABLF, NLM is more robust according to the theoretical analysis. But it must be noted that the NLM algorithm has relatively high computational complexity and its application scenarios should be carefully chosen.

IV. PATTERN CLASSIFICATION METHODS FOR
INTRUSION EVENTS RECOGNITION Generally, pattern classification consists of two stages, namely the feature extraction and classification stages. Fig. 14 illustrates a typical DOVS perimeter monitoring system based on pattern classification algorithm. The targeted types of intrusion events are determined to trigger alarms, while the other intrusions are regarded as environmental noise. Based on this strategy, pattern classification algorithms can distinguish the targeted intrusions and the uninterested intrusions, thus dramatically reducing false alarm rate compared to conventional DOVS. In this section, we review recent developments of pattern classification technique employed in DOVS. To explain this technique clearly, we discuss it in two separate aspects, namely the feature extraction methods and the pattern classification algorithms.

A. Feature Extraction
As mentioned in section II, the original raw signal of a phase-sensitive OTDR system is a superposition of all backscattering lights and it is extremely sensitive to external vibrations. But the signal is also greatly affected by background noise. Hence, a feature extraction procedure is needed to extract feature vectors from the original time-domain traces. The extracted feature vectors preserve the signal characteristics while also reducing the dimensionality of the original data. External vibrations can be represented by time-domain features, frequency-domain features, time-frequency domain features and time-space domain features.
Time-domain features are directly extracted from the temporal traces of phase-sensitive OTDR. H. Zhu et al. set an empirical value as threshold and calculate the rate of measured trace crossing through the threshold as a feature, namely the level crossing rate [44]. In [45], Qu et al. use the local maximum amplitude and short-time average energy as the feature. F. Jiang et al. use spatial kurtosis as a time-domain feature, which denotes the rapid change of the measured light intensity over time [46]. Singular spectrum analysis (SSA) is employed by H. Wu et al. to analyze phase-sensitive OTDR signal [47]. Besides, Bi et al. inject two pulses with orthogonal polarization states into the sensing fiber simultaneously and perform correlation between the two measured backscattered traces [48]. The calculated average and variance of the correlation coefficients are used as features.
Compared to time-domain features, frequency-domain features might be more reliable for representing external vibration events. For frequency domain analysis, fast Fourier transform (FFT) is a fundamental tool to extract spectrum information and it has been widely used in phase-sensitive OTDR systems [10], [49]. Cao et al. select total energy, ratio of energy over low frequency band to total energy, and ratio of peak amplitude value to mean value of the spectrum as three features for further classification [50]. A. Papp et al. divide the spectrum into 10 frequency bands after FFT and use the sum of coefficients in ten frequency bands as features [51]. Apart from FFT, power spectrum analysis is also suitable for extracting frequency domain features. Li et al. employ power spectrum analysis and calculate the total energy from each sampling point for feature extraction purpose to locate intrusions [52]. Different from analyzing signal in time domain or frequency domain individually, time-frequency domain features reflect more useful information about dynamic intrusion signals. To extract such features, short-time Fourier transform (STFT), wavelet transform (WT) and Hilbert-Huang transform (HHT) have been adopted [53]- [62]. In 2007, C. Madsen et al. first observed the unique spectrograms from four different types of intrusion by adopting STFT in -OTDR, though no further feature extraction were performed [53]. Yu et al. show similar results by applying STFT to three different intrusion events [54]. Tejedor et al. apply STFT to form feature vectors in a fixed time frame and use calculated energy over frequency bands for each frame as feature elements [55]. One year later, the authors improve their feature extraction method by employing three time-frames with different durations, instead of one fixed time frame [56]. Except STFT, WT is also used to perform time-frequency domain analysis [36]. Wu et al. apply multi-scale wavelet decomposition (WD) to the signals in -OTDR and the coefficients of each decomposition scale are used as feature elements [57], [58]. Another WT method called wavelet packet decomposition (WPD) has also been adopted [59], [60]. Different from WD, WPD utilizes a symmetrical lowpass-highpass decomposition tree [61]. To improve the poor resolution of the STFT method, X. Hui et al. utilize HHT to process the -OTDR signal [62]. HHT consists of two step, namely empirical mode decomposition (EMD) and Hilbert spectral analysis (HSA). Here, the EMD decomposes the original signal into intrinsic mode functions (IMFs). By performing Hilbert transform with each IMF, the energy-frequency-time distribution (i.e. Hilbert-Huang spectrum) can be obtained.
Besides, time-space domain analysis is another approach to extract underlying patterns from original data. The time-space distribution of phase-sensitive OTDR signal is illustrated in Fig. 2. Instead of only analyzing signal from one sampling point, the time-space domain features represent the characteristics of intrusion signal from a segment of neighboring sampling points. F. Pang et al. calculate normalized sliding variance in time-space domain to recognize rising and falling edges in their proposed train-tracking system [16]. Sun et al. extract morphological features from time-space domain intrusion signals [63]. Based on local amplitude, several regions are first defined in the time-space domain and 10 morphological features are extracted from each region, such as roundness, pixel number and minimum interval between regions.

B. Classification Algorithms
Classification algorithm assigns each input feature vector to a pre-determined class. In a -OTDR based DOVS system, the classification algorithm determines the types of external vibration according to the corresponding input feature vectors [50]. In this section, we review recent reports on phasesensitive OTDR systems employed with pattern classification algorithm.
Threshold-based decision tree is the simplest classification method. The decision thresholds are set manually in several cases [44]- [46], [48], [52], [57], [90]. As illustrated in Fig. 15, the input feature vector is assigned to a class after two threshold-based decisions. In [44], level crossing rate is used as the feature vector and multiple thresholds are set for different intrusion types. After comparing the input level crossing rate with the thresholds, the original data is assigned to one intrusion class. In [46], the kurtosis of original signal is extracted as the feature. Similarly, the kurtosis is compared with a preset threshold to determine whether an intrusion event is present. In [90], a threshold-based decision is first made to locate the intrusion. In this case, the threshold-based decision tree algorithm reduces the data size by discarding data collected in undisturbed fiber positions before the classification stage.
In the training stage, a machine learning algorithm builds a mathematical model based on training samples. According to this mathematical model, the algorithm makes predictions or decisions on newly input data in the classification stage [91]. In our case, the machine learning algorithm classifies the input feature vectors into different types of intrusions. Here the classification model should be firstly acquired in the training stage. For the training task, there are two ways to perform it in the machine learning algorithms: supervised and unsupervised machine learning. If each training sample data is labeled with its desired output intrusion type, the algorithm is referred to as supervised machine learning. Otherwise the algorithm is an unsupervised method. Supervised machine learning algorithm builds a model based on the input-output paired samples in the training stage while unsupervised machine learning algorithm finds underlying patterns or commonalities from input data.
Supervised machine learning algorithms such as support vector machine (SVM), multi-layer perceptrons (MLP), convolutional neural network (CNN) and other types of artificial neural networks (ANN) have been studied in DOVS recently. SVM is a frequently-used binary classification algorithm [92]. In the training stage, two classes of sample feature vector each with its corresponding label are used as the input data. As shown in Fig. 16, SVM maps the input data into a high dimensional feature space via the kernel function and finds the optimized hyperplane between two classes. The smallest distance of the sample feature vectors to the hyperplane among all feature vectors is referred as the support vector. Then in the classification stage, input data with no label is mapped into the same high dimensional feature space. A decision is carried out based on which side of the hyperplane the input data falls into. If the kernel function is selected properly, a linearly inseparable problem may become linearly separable in the new feature space. In [50], an SVM classifier is proposed Fig. 16. An illustration of a hyperplane separating two classes of feature vectors with the largest margin possible in a two-dimensional feature space. The red and green dots represent two classes of data points. The dotted line represents the hyperplane, and the black solid lines represent the maximum margins between the hyperplane and the samples from two classes. Support vectors are pointed out as well.
for -OTDR based pattern classification scheme for the first time. However, this report lacks detailed descriptions about its algorithm. In [51], a train-tracking system based on -OTDR is proposed. An SVM classifier is adopted to classify input feature vectors into two types of classes. In [93], four types of events including human tapping, striking, shaking and crushing are distinguished by adopting SVM. As mentioned above, conventional SVM is only suitable for binary pattern classification. To realize multi-class classification, two approaches, namely the one-against-one and one-against-all methods, are commonly used [94]. However, the authors didn't specify about the implementations of multi-class SVM classifier in this report.
Relevant Vector Machine (RVM) has identical decision function to SVM, though RVM employs a probabilistic Bayesian framework and offers probabilistic classification result [95]. RVM has several advantages over SVM, such as imposing no restrictions over kernel functions and providing a sparser model, but the computational complexity of RVM is higher than that of SVM. Q. Sun et al. adopt RVM to realize three-class (walking, digging and vehicle passing) pattern classification in DOVS [63]. In the three RVM classifiers, each class works as a binary classifier, i.e. walking-digging classifier, walking-vehicle classifier and digging-vehicle classifier. Each input feature vector is fed into the three classifiers simultaneously. Then the most popular classification result among all three output results is selected as the final decision. This procedure is called the one-againstone approach. Recently, Wang et al. report another RVM-based DOVS pattern classification system [96]. However, the structure of multi-class RVM classifier was not described clearly.
MLP are also extensively studied for multi-class classification problems. MLP structure consists of one input layer, one output layer and one or more hidden layers in between, as shown in Fig. 17(a). In MLP, each node is connected to all nodes in the next layer with unique weight factors. For each node except for the ones in the input layer, a corresponding activation function is also assigned, as shown in Fig. 17(b) [97]. The output of the nodes in hidden layers is expressed by where x i (1 < i < n) denotes the i-th input value, x 0 denotes the bias and is equal to 1, w i denotes the corresponding weight, f (x) is the activation function.
In the MLP training stage, all activation functions and weights are manually selected. Then the training samples are fed into the classifier. A pre-determined error function is used to calculate the error between the training results and the corresponding labels. Then the classifier is optimized by tuning the weights to minimize the error. Back propagation (BP) algorithm is usually used to update the weights and optimize the classifiers [100]. Each hidden layer maps its input vectors into a new feature space. Through readjusting the weights continuously in the training stage, a linearly inseparable classification problem in the original feature space might become linearly separable in a new feature space. For -OTDR based intrusion systems, different structures of MLP have been reported [56], [58], [60], [64]- [67], [99]. Wu et al. utilize a three-layer BP-MLP for the recognition of three types of events [58]. In their work, a hyperbolic tangent sigmoidal function is set as the activation function in the hidden layer nodes. The number of nodes in the hidden layer is determined empirically. In the output layer, linear function is set as the activation function for the three nodes in this layer, each node represents one type of intrusion events. Similarly, a three-layer MLP is also employed in [56]. The number of nodes in the hidden layer is selected empirically as well. The number of nodes in the output layer is set as the same number of the intrusion categories. It should be noted that the output values of the MLP are used to form a new feature vector and fed into a Gaussian mixture model for further classification, instead of directly assigning a class based on the output values.
Convolutional neural network (CNN) is a state-of-the-art neural network structure and it is originally developed for image processing purpose. Different from the conventional ANN approaches, the input of CNN is two-dimensional images rather than 1D sequences [101]. Typical CNN structure consists of convolutional layers, pooling layers and fully connected layers (see Fig. 17(c)). In the convolutional layers, multiple convolutional filters slide across the whole input image to form new feature maps. Then a pooling layer is needed for down-sampling the feature maps. After the convolutional and pooling layers, fully connected layers are attached at the end of the network to process the results into the final classification decision. For a -OTDR based vibration detection system, CNN is applicable if the feature vector extracted from the original signal can be regarded as 2-D images [65]- [67], [102]. Xu et al. employ CNN to process the spectrograms of original signal [67]. In their work, four types of time-varying vibrations including walking, digging, shaking and vehicle passing are recorded by -OTDR. After processing with STFT, the spectra of four classes are obtained and used as the input images for CNN. By properly adjusting the convolutional filters and weights in the training stage, CNN is capable of identifying different patterns underlying in the input spectrograms of different intrusion types, hence providing good prediction result. Besides, a similar CNN is proposed to process the time-MFCC image [102]. In this work, Mel-frequency cepstral coefficient (MFCC) method is used to extract time-frequency domain feature from the original signal. With this modification, prediction accuracy higher than 90% for five types of intrusion is achieved.
J. Tejedor et al. from Spain have thoroughly and extensively studied intrusion recognition algorithms for a -OTDR based pipeline monitoring system [55], [56], [103]. In [55] and [103], Gaussian Mixture Module (GMM) algorithm is employed as the classifier. GMM works under the assumption that a linear combination of several Gaussian basis functions can accurately represent any probabilistic distribution function that one class of data may take [104]. On-site experiments are carried out to record the intrusion signals under real-world conditions. To ensure the generalization capability of the training database, intrusions are applied at different positions along the whole fiber and the recorded vibration signal are cross-validated with each other in the pattern classification stage. They adopt energy over frequency bands calculated by STFT as their feature elements. The GMM model is trained and tested with two separate databases. In the testing stage, the threat detection accuracy of 80% and false alarm rate of 10% are achieved although the machinery activity identification accuracy is 46.6% [103]. Besides, the authors propose a novel classification network that employs contextual feature extraction [56]. Three temporal windows with the size of 5 s, 12.5 s and 20 s are applied over the original feature vector to extract the fast-varying and slow-varying patterns. These contextual features are later fed into three networks with identical structure, each being a combination of MLP and GMM. The final classification decision is a combination of the three contextual classification results. In this test, machinery activity identification accuracy is around 55%, achieving slight improvements over [103]. However, both the threat detection accuracy (∼75%) and false alarm rate (∼35%) deteriorate.
In Table 1, we have summarized the feature extraction methods and pattern classification algorithms employed in -OTDR systems. After a thorough review of pattern classification methods in -OTDR system, we believe that this field is still in the early stage of development. Most of the works dealt with no more than 5 types of intrusion events and employed basic feature extraction methods. Many of the experiments were performed under laboratory conditions instead of field tests. Intrusion events were applied to only couple of fixed positions, instead of the whole FUT. In the future, more advanced features and specifically modified classification algorithms can be explored for recognition accuracy improvement of -OTDR systems.
V. CONCLUSIONS In this paper, we review the advancements of SNR enhancement and pattern classification algorithms employed in -OTDR systems. Detailed methodologies of several SNR enhancement techniques are first described. ATMF, EMD and 2D-ED methods are capable of improving the localization SNR, thus providing better intrusion localization accuracy. In contrast, WD, 2D-ABLF and NLM methods are utilized to improve the overall system SNR as well as the localization SNR, therefore achieving better vibration demodulation capability. In addition, pattern classification algorithms adopted in -OTDR systems are reviewed and discussed in this work. Feature extraction approaches exploiting time-and frequencydomain of the demodulated vibration signals are introduced. Then machine learning based intrusion event recognition algorithms are presented. By the use of pattern classification algorithms, the intrusion detection accuracy and false alarm rate are greatly improved simultaneously. In real-world scenarios, advanced post-processing algorithms are required to accurately detect the targeted types of external vibrations in DOVS systems. In this context, this review paper may serve as a roadmap for designing DOVS systems with enhanced sensing capability.
Shuaiqi Liu (Student Member, IEEE) received the B.Eng. degree in opto-electrical engineering from Nanjing University, China, in 2018. He is currently pursuing the Ph.D. degree under a collaboration program between the Southern University of Science and Technology and University of Macau.
His research interests include distributed optical fiber sensing, nonlinear fiber optics, and optical signal processing.