Spatial Resolution Enhancement of Brillouin Optical Correlation-Domain Reflectometry Using Convolutional Neural Network: Proof of Concept

Brillouin optical correlation-domain reflectometry (BOCDR) is a fiber-optic distributed sensing technique with single-end accessibility and high spatial resolution. In BOCDR, the measured Brillouin gain spectrum (BGS) distribution is generally given by a convolution of the intrinsic BGS distribution and the beat-power spectrum. In most conventional implementations, the Brillouin frequency shift (BFS) distribution is directly obtained using the measured BGS distribution. Determining the BFS distribution on the basis of the intrinsic BGS distribution will give potentially higher spatial resolution, which can be achieved by deconvolution of the measured BGS distribution. In this work, we employ a convolutional neural network to perform this deconvolution processing in BOCDR and show its potential for spatial resolution enhancement. A spatial resolution which is 5 times higher than the nominal value is demonstrated.


I. INTRODUCTION
Optical fiber sensors based on Brillouin scattering have attracted immense interest in the past decades because of their distributed strain and temperature measurement capabilities [1]- [7], which find promising applications in damage detection and structural health monitoring. Brillouin scattering refers to the phenomenon in which pump light injected into an optical fiber interacts with acoustic phonons, generating backscattered Stokes light propagating in the direction opposite to that of the pump light [1]. The frequency downshift of the Stokes light with respect to that of the pump light, commonly referred to as Brillouin frequency shift (BFS), provides information on the strain and temperature changes along the length of the optical fiber. Numerous distributed sensing techniques have been developed, each of The associate editor coordinating the review of this manuscript and approving it for publication was Eyhab Al-Masri . which can be classified into either of the following categories: ''analyzers''-in which light is injected into both ends of the fiber under test (FUT) to induce stimulated Brillouin scattering [2], [4], [6], or ''reflectometers-in which light is injected into only one end of the FUT to exploit spontaneous Brillouin scattering [3], [5], [7]. These categories are further subdivided into one of the following spatially-resolving domains: time [2], [3], frequency [4], [5], and correlation [6], [7].
Distributed sensing in BOCDR is achieved through the synthesis of optical coherence functions (SOCF) [7], [17]. Laser output is divided into reference light and pump light, VOLUME 9, 2021 This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/ both of which are sinusoidally frequency-modulated to produce correlation peaks (sensing points) along the FUT. By tuning the modulation frequency so that only one correlation peak is left within the range of the FUT, the Brillouin scattering signal occurring at the position of interest can be spatially resolved. The peak frequency monitored by the electrical spectrum analyzer (ESA) provides the BFS, and correspondingly, the strain or temperature change applied to the FUT. In particular, the BFS shifts by 480 MHz/% tensile strain or by 1.18MHz/K temperature change applied to standard silica single-mode fibers (SMFs) [7], [17]. Since the BFS is directly related to the intrinsic distribution of the Brillouin gain spectrum (BGS), while the measured BGS distribution is given by the convolution of the intrinsic BGS distribution and a so-called ''beat-power spectrum'' [17], determining the intrinsic BGS distribution (and therefore the BFS distribution) from the measured BGS distribution requires deconvolution of the latter. Conventionally, this has been achieved by calculating the Fourier transform of both the measured BGS distribution and the beat-power spectrum, and then by applying deconvolution in the frequency-domain, which, in the absence of noise, is merely a point-by-point division of the two signals in the Fourier domain [18]. The derived signal is subsequently inverse Fourier-transformed to obtain the deconvolved signal associated with the intrinsic BGS distribution. Performing this deconvolution, however, requires knowing in advance both the functions of the measured BGS distribution and the beat-power spectrum; otherwise, derivation of the intrinsic BGS distribution from the measured BGS distribution will be extremely difficult, if not impossible.
The spatial resolution z and the measurement range dm of BOCDR are given by [17] respectively, where f m is the modulation frequency, f is the modulation amplitude of the laser frequency, v B is the Brillouin gain bandwidth, and V g is the group velocity of the light propagating through the fiber core. Spatial resolution in conventional systems pertains to the distance between the correlation peak and the point at which the full width of the beat spectrum broadens twice as wide as v B , whereas measurement range is defined as the distance between neighboring correlation peaks [6], [7], [17]. Clearly, there is a tradeoff relationship between z and dm, which both vary with fm and f . In a conventional setup, where f m = 1.0 MHz, f = 1.0 GHz, and v B = 30 MHz, the spatial resolution is approximately 1.0 m and the measurement range is about 100 m. Note that even with the same modulation parameters, longer measurement range and higher spatial resolution can be obtained using several schemes, such as temporal gating [19], double modulation [20], and chirp modulation [21]; however, additional devices such as electro-optic modulators introduce complexity and extra cost to the system.
Using a machine learning algorithm to improve the signal processing can be advantageous in terms of simplicity and cost, as it simply requires the neural network to be properly trained, after which the BFS and the corresponding strain or temperature change occurring at a section of the optical fiber can be conveniently and directly obtained from the measured BGS without the need for additional equipment or deconvolution as in conventional systems. Several studies have previously proposed the use of machine learning algorithms to improve the signal processing in distributed Brillouin sensors. In particular, artificial neural networks [22], [23], support vector machines [24], autoencoders [25], and deep neural networks [25], [26] have been proposed for more robust extraction of strain and temperature changes in time-domain analyzers.
In this paper, we propose a deep learning algorithm based on convolutional neural network (CNN) to directly obtain the intrinsic BFS distribution from the measured BGS distribution in BOCDR using regression. We also report potential improvement of the shortest strained length that can be realized for BFS extraction, which, throughout this paper, will be referred to as spatial resolution. Although Yao et al. [27] and Chang et al. [28] also proposed a CNN-based machine learning algorithm to obtain the BFS distribution from the measured BGS distribution, [27] did not report any specific results, whereas [28] reported that the spatial resolution of their system remained independent of the CNN, having used input data with varying BGS traces. Here, we also report an improvement of the spatial resolution by at least five times that of the nominal value.

II. NETWORK ARCHITECTURE
A convolutional neural network is a class of deep neural network widely used in image processing and computer vision. It has been successfully applied to image recognition [29]- [34], semantic segmentation [35]- [38], video analysis [39]- [43], and many other applications. Its working principle is similar to that of regular multilayer neural networks, but with tied weights wherein only neurons within each local region are fully connected to individual neurons in the next layer, taking advantage of the fact that structural information is contained within local regions of the image, rather than the whole [44]. This translationally invariant neural network generally comprises alternating convolution and pooling layers, optionally followed by fully connected layers [45], [46].
Convolutional layers convolve the input with a set of learnable filters that act as feature extractors [47]. The first convolutional layer detects low-level features such as e.g. edges, lines, corners, and endpoints (in the case of images); whereas higher-level convolutional layers detect higher-level features. The neurons of these filters adjust their weights according to features present in a region, and keep these same set of weights to detect similar features in various locations within the feature map, preventing overfitting by reducing the computation complexity and the number of parameters to be adjusted during training [48]. Pooling layers are occasionally periodically inserted between convolutional layers to reduce the spatial size of the feature map, and more importantly, the number of parameters to be adjusted during training. They coarse-grain the input while providing a limited amount of rotational and translational invariance, preserving locality and spatial structure [46]. Max pooling is one of the most common pooling operations used [48] which reduces the dimensions of the feature map by replacing a small region with a node having the maximum value in that region. Fully connected layers are then occasionally inserted after convolution and pooling layers to receive the outputs and map the features extracted by the previous layers [49]. In hindsight, convolutional neural networks are functionally analogous to fully connected networks, besides the fact that the former has tied parameters [48]. In this paper, a CNN is used to directly derive the BFS distribution from the measured BGS distribution of a BOCDR system when strains of varying magnitude are applied to different sections along the FUT.
After repeated trials with varied architecture for optimal performance, the CNN model is eventually designed to be similar to that shown in Fig. 1. The proposed CNN comprises four parts. The first part consists of 32 3 × 3 convolution filters which create 32 feature maps, followed by a max pooling layer of kernel size 2 × 2 which halves the dimensions of the input, and then succeeded by three 32 convolution filters (each of size 3 × 3). All of the convolutional layers are activated by a rectified linear unit (ReLU) function, defined by y = max(0, x), to introduce nonlinearity. ReLU function is one of the most frequently used activation functions in regression as it converges fast and is sparsely activated [47]. Batch normalization (BN) is then introduced to adaptively normalize the input values of the following layer [49], and help improve the training speed and efficiency by solving the internal covariate shift (ICS) problem [50] which occurs as a result of the variation in the distribution of each layer's inputs caused by changes in the preceding layers' parameters during training [47], [50]. The second part comprises four sets of two consecutive 64 convolution filters of size 3 × 3, followed by BN. The third part consists of seven alternating convolution layers and max pooling layers. The convolutional layers in this part have 128 3 × 3 filters, whereas the max pooling layers have a kernel size of 2 × 2. The last convolutional layer of this part is then followed by a fully connected network with three hidden layers having 10-N -N neurons in each layer, where N is the number of position channels: 151 or 301. BN is again inserted between the first and second hidden fully connected layers to improve the training speed and performance.

III. SIMULATION A. GENERATING TRAINING INPUT
The conceptual diagrams of the MATLAB program for simulating the measured BGS distributions (used as inputs) as well as their corresponding BFS distributions (used as labels for training) are shown in Fig. 2(a) (setup) and Fig. 2(b) (algorithm). A time-domain method was used to calculate the beat spectrum to lower the complexity of the algorithm. First, an array for storing the frequency values of the modulated reference light changing with time as the dependent variables, f 0 (t), was created, followed by a matrix f B (t,x), which stores the frequency values of the Stokes light backscattered from each sampling point. The matrix for beat signal frequency values was then obtained from the difference between the two as: f beat (t,x) = f 0 (t)-f B (t,x). Subsequently, for each sampling point, the probability density function of the time-varying frequency of the beat signal falling within a certain frequency range was calculated by cumulating the number of sampling points within each frequency interval, converting the time-domain signal to the power spectrum, i.e., the beat spectrum. Finally, from an array storing the strain value of each position, we created the intrinsic BGS distribution (with no bandwidth) g int (f , x 0 ), defined as where f int is the intrinsic BFS at position x = x 0 , which was then 2-D convolved with the beat spectrum to produce the measured BGS distribution. Note that the Brillouin bandwidth is 1-D convolved with the beat spectrum in advance. Uniformly distributed strains between 0.10% and 1.40% at equal steps of 0.05% were applied to sections between 0.2-mand 4.0-m-long, situated at various positions along the FUT, generating 339,062 data sets which were divided into 20% testing and 80% training sets-of which 20% was used for validation. Both the testing and training sets cover strained lengths varying between 0.2 m to 4.0 m uniformly. The strain values were chosen for practical reasons, since silica singlemode fibers (SMFs) are known to be reliable only for strains around 1% [51]. Note that for this study, we only considered the case where the FUT is strained along a single section. Also note that owing to the lack of public real data sets for training, this paper covers results for purely simulated clean data as a proof-of-concept. It is the author' plan to train the proposed CNN on both simulated and real clean and noisy data in the future.
Two separate data sets with varying dimensions were generated: 301 × 90 and 151 × 90, to investigate the correlation between the spatial resolution enhancement and the performance of the CNN mode-measured in terms of mean absolute error (MAE) and median symmetric accuracy (MSA). Here, 90 pertains to the number of frequency channels, whereas 151 (or 301) pertains to the number of position channels. The number of position channels was chosen such that at least one channel is activated for a 0.2 m-long strained section, as in the case of the 151 × 90 data set. For the case of the 301×90 data set, on the other hand, a 0.2 m-long strained section corresponds to three activated channels. We present the CNN model's performance for both data sets in Section V to examine the effect of increasing the number of activated channels on the mode's performance. Additionally, we investigate the variation of the mode's performance as a function of the applied strain and strained length for the 301 × 90 data set and present our results in Section V.

B. TRAINING THE CNN
The measured BGS distributions, used as training inputs (Fig. 3(a)), and their corresponding BFS distributions, used as labels (Fig. 3(b)) for training, were normalized prior to feeding into the CNN so that the input values scaled between 0 and 1. Normalization is known to help improve the numerical stability and increase the training efficiency of the network, while maintaining the general distribution of the data [52]. The two-dimensional measured BGS distributions and their corresponding BFS distributions were then fed into the CNN (Fig. 1), and trained for 200 epochs with a batch size of 64. The weights were initialized using a random normal initializer restricted to a mean of 0 and a standard deviation of 0.05, and optimized using Adam algorithm with the following parameters: learning rate = 0.001, β1 = 0.9, β2 = 0.999, and ε = 0.1. Adam is a stochastic gradient descent method based on the adaptive estimation of first-and second-order moments, whereby each parameter's learning rate is maintained and separately adapted as the network learns [53]. During training, outputs were obtained through forward propagation, and errors were minimized through back propagation, whereby the weights were adjusted to minimize the log-cosh loss function, given by where yi is the actual value and p i is the predicted value. This function is twice differentiable everywhere and is not strongly affected by incorrect predictions [54], making it popular in regression problems. Note that the CNN was trained thrice to ensure the results' independence from lucky selection of weights and that the results presented in Section V are from the same training iteration of the CNN.   approximately 48 hours, while training the same number of 301 × 90 input data took twice as long. Using a GPU with faster clock speed is expected to render shorter training time.
After training, the model was used to predict the BFS distributions of the simulated measured BGS distribution testing data set, which were kept independent of the training data to prevent biases in the neural network's predictions. The model took approximately 38.8 seconds to predict the BFS distributions from the simulated measured BGS distributions of 4,844 testing data from the 301×90 data set, an improvement of about 137 times from the 1.1 second deconvolution time for each similar-sized simulated measured BGS distribution using the same machine.

IV. MODEL EVALUATION
To evaluate the performance of the proposed algorithm, we used mean absolute error (MAE) and median symmetric accuracy (MSA) as evaluation metrics. MAE is one of the most commonly used measures of error in regression, which calculates the arithmetic average of the absolute error between the actual and forecast values, and is given by where pi is the predicted value and yi is the actual value [55]. While related studies [22], [24]- [26], [28], [47] on deep neural network-assisted signal processing in Brillouin optical fiber sensors use root mean squared error (RMSE) as accuracy metric, RMSE is known to penalize larger errors more heavily [56]; hence our decision to utilize MAE as performance metric in this study instead. MAE, however, is scale-dependent, as it follows the scale of the data predicted or measured. To augment this, we additionally present our errors in terms of MSA, given by where MSA is symmetric (i.e., it gives the same accuracy for overand under-forecasts) unlike the more commonly known mean absolute percentage error (MAPE) [57], [58], which is known to penalize overprediction more heavily. Moreover, MSA is not scale-dependent and is resistant to outliers, allowing us to interpret the model performance in terms of percentage, without bias [58]. Note that both the MAE and MSA presented throughout this paper were derived by considering only the strained section i.e. the peak of the BFS distribution. This is to better quantify the ability of the CNN to approximate the peak of the strained section for various strains and strained lengths.

A. CNN PERFORMANCE AS A FUNCTION OF SPATIAL RESOLUTION
The 25-m-long FUT was divided into 301 position channels for the 301 × 90 data set, and correspondingly, into 151 position channels for the 151 × 90 data set to investigate the relation between the mode's performance and the spatial resolution. The peak of a 0.2-m-long strained section translates to three activated position channels for the 301 × 90 data set, and to one activated position channel for the 151 × 90 data set. Comparison between the BFS distributions of the two data sets shows that for a 0.2-m-long strained section, the CNN model predicts the BFS distribution more accurately for the 301 × 90 data set compared to the 151 × 90 data, regardless of the magnitude of the applied strain. This is evident in Fig. 4(a) and Fig. 4(b), which show the actual and predicted BFS distributions when varying strains are applied to a 0.2-m-long section centered at 18 m from the proximal end of the FUT. Comparison between the MAE and MSA of the strained section of the 151 × 90 data set and that of the 301 × 90 data set (Fig. 4(c) and Fig. 4(d), respectively) as a function of the strain applied at 18 m from the proximal end of the FUT also shows that both accuracy metrics of the 151 × 90 data set can reach as much as five times that of the 301 × 90 data set. The shortest strained length achievable for the 301 × 90 data set is 0.2 m, corresponding to three activated position channels, for which the CNN was able to approximate the main peak of the strained section with an MSA of not more VOLUME 9, 2021  than 27% for an applied strain of 0.20% at 18 m from the proximal end of the FUT (Fig. 4(d)). Meanwhile, the MSA of the 151 × 90 data set for the same strained length, corresponding to one activated position channel, rendered as high as 48% for an applied strain of 1.35% at 18 m from the proximal end of the FUT, suggesting a correlation between the regression accuracy and the number of activated position channels. Note that the same CNN architecture was trained for both data sets (301 × 90 and 151 × 90), and although the parameters were initialized with a random normal initializer for each training iteration and data set, the distribution of initialization values was controlled by setting the mean to zero and the standard deviation to 0.0-to reduce the effect of weights initialization on the results. To further improve the spatial resolution, a higher number of position channels may be considered, at the expense of computation complexity and cost.
The following discussion will be constrained to the same training iteration of the 301 × 90 data set, which is the focus of our study.  Fig. 5(a), the former's predicted BFS distribution better approximates that of the actual, indicating the model's ability to better approximate BFS distributions for low strains applied to longer sections, compared to low strains applied to shorter sections. This is also evident in Fig. 6(a) and Fig. 6(b) where both the MAE and MSA decrease with an increase in strained length for all strains. The boxplots of MAE and MSA as a function of the strained length for 0.15% and 0.85% strains, in Figs. 7(a)-(d), also show similar trends, implying that the CNN performs better for longer strained lengths, with the MSA reaching values below 10% at strained lengths above a certain threshold (>1.8 m for 0.15% strain data set, and >1.7 m for 0.85% strain data set).

B. CNN PERFORMANCE AS A FUNCTION OF STRAINED LENGTH
As the data used in training the CNN was uniformly distributed, the 0.2-m-long section data is only made up of an extremely small proportion of the training data, with a majority of the data being made up of longer sections (>0.2-m-long sections). This could be one of the reasons for the high MSA of the 0.2-m-long section data ( Fig. 7(b) and Fig. 7(d)). Only at relatively small strains (<0.20%) was the predicted main peak not discernible; for higher strains applied to 0.2-m-long sections, the main peak was visible but with a much lower amplitude, accounting for the high MSA reported. Nevertheless, the CNN model was shown to be capable of directly obtaining the BFS distribution from the measured BGS distribution for strained sections as short as 0.2 m, with the highest MAE and MSA reported to be 0.225 (arb. unit) and 33.1%, respectively, for the 0.2-m-long strained section. This translates to a maximum RMSE of 329.8 µε between the peaks of the actual and approximated BFS distributions. Comparatively, this RMSE is larger than that reported by Wang et al. [26], whose deep neural network (DNN)-assisted BOTDA for a 24 km large-effective area fiber (LEAF) with a spatial resolution of 2 m accrued a maximum RMSE of 132.2 µε. But do note that the RMSE we reported herein only covers the strained section, which inevitably produces a much higher RMSE than what would have been derived if we calculated the RMSE of the full FUT for which there are 301 samples instead of 3 to divide the sum of the squares of the difference between the actual and predicted values with, corresponding to the activated position channels of the 0.2-m-long strained section data set. Chang et al. [28] also proposed a CNN-assisted BOTDA for BFS extraction, although there was no explicit mention of the maximum error obtained. They did, however, indicate their results in terms of normalized RMSE.
Increasing the proportion of the 0.2-m-long sections in the training data, as well as the training iterations, is expected to improve our CN's performance in deriving the BFS distribution at shorter strained lengths.

C. CNN PERFORMANCE AS A FUNCTION OF STRAIN
Examination of the strain dependency of MAE in Fig. 6(a) shows an increase in the accuracy metric with an increase in the applied strain for shorter strained lengths. The MSA ( Fig. 6(b)) for shorter strained lengths, on the contrary, shows a parabolic trend whereby relative maxima are obtained at the lower and upper extremes of the applied strain. This is evident in Fig. 8(b) where the MSA of the 0.2-m-long strained section presents an initial decrease from a relative maximum at an applied strain of 0.20% to a minimum at 0.50%, increasing thereafter. At low strains applied to the 0.2-m-long section, the peak of the predicted BFS distribution is obscured by other peaks, as in Fig. 5(a). Meanwhile, a distinct main peak is observable at higher strains, such as that shown in Fig. 5(c), with the predicted BFS distribution more closely resembling that of the actual. At higher strains, however, the difference between the peaks of the actual and predicted BFS distributions become more significant, such that a higher MAE and MSA is derived at strains beyond 0.50%. Fig. 5(c) and 5(d) show the predicted and actual BFS distributions when 0.15% and 0.85% strains are applied to a 2.2-m-long section centered at 0.2 m from the proximal end of the FUT. As shown, the predicted BFS distribution for longer strained sections closely approximates that of the actual, regardless of the magnitude of strain applied. At longer strained sections, such as that of the 4.0-m-long strained section, the MSA does not follow a specific trend, although it is evident from Fig. 8(d) that both the MSA and its variance are consistently and significantly lower than that of the 0.2-m-long strained section in Fig. 8(b), and that lower than 10% MSA can be achieved for strains above a certain threshold (>0.30%). Do note that the MAE and MSA presented in this paper only spans the strained section, and that a lower MAE and MSA is expected if the whole length of the FUT, e.g. 301 position channels, is considered.
Notably, the variance in both MAE and MSA for the 0.2-m-long strained section ( Fig. 8(a) and Fig. 8(b), respectively) is observed to increase with the applied strain; whereas the variance in both MAE and MSA for the 4.0-m-long strained section (Fig. 8(c) and Fig. 8(d), respectively) is observed to decrease with the applied strain. A closer inspection of the MAE and MSA as functions of the applied strain for both 0.2-m-and 4.0-m-long strained sections shows that the MAE of the 0.2-m-long strained section can reach as high as 10 times that of the 4.0 m-long strained section, while the MSA of the former can reach as high as 5 times that of the latter. This is indicative of the CN's ability to better estimate the actual BFS distribution at longer strained lengths. Training the CNN with a larger data set may improve the model's performance and reduce errors. We expect the model can also be applied to real data; but to increase its performance and reduce errors, the CNN will have to be trained with a combination of simulated and real data, at the expense of computation cost and training time.

VI. CONCLUSION
This paper provided some specific results of BOCDR assisted by a deep learning technique. A CNN was successfully trained and utilized in obtaining the BFS distribution from the measured (convolved) BGS distribution in BOCDR. We showed that the spatial resolution can be potentially enhanced through this method by at least 5 times compared to the nominal value calculated using the convolved BGS distribution. Our method is also applicable to Brillouin optical correlation-domain analysis (BOCDA) [6], the measured BGS distribution of which is given in the same manner as in BOCDR. We expect the network to be useful in deriving the BFS distributions from real measured BGS distributions, though it may be necessary to train the CNN with both real and simulated measured BGS distributions to improve the mode's performance and reduce the errors. We anticipate that this work will motivate future studies focused on the improvement of signal processing in BOCDR and BOCDA.