GPR Data Reconstruction Using Residual Feature Distillation Block U-Net

Due to the unevenness of ground surface, mismatch between trig interval and sampling speed, or other electromagnetic interferences, traces missing is a quite typical occurrence during the on-ground ground penetrating radar (GPR) testing. Effective reconstruction of GPR missing traces has been regarded a crucial link to improve both the signal-to-noise ratio of raw data and the resolution of GPR imaging. In this article, we propose a novel deep-learning framework based on the residual feature distillation block U-Net (RFDB-U-Net) to mitigate the transmission loss problem of the conventional U-Net. To be specific, by employing the information distillation network based on the multiple feature extraction connections, RFDB is capable of utilizing the adequate residual information of each layer for feature learning. Moreover, a skip connection is additional patched on the residual units to properly compensate the missing features in the convolution process. In particular, the merging of lightweight U-Net ensures the lightness of RFDB. The outperformance of the proposed framework is verified in detail through the reconstruction accuracy and evaluation metrics in the test of synthetic data, laboratorial data, and in-site field data.


I. INTRODUCTION
G ROUND penetrating radar (GPR) is a geophysical method for shallow surface detection exploiting the electromagnetic signals. By virtue of its unique features, such as nondestructive testing, rapid data acquisition, and excellent resolution, it has gained great promises for various subsurface sensing applications [1], [2]. However, the completeness of GPR data under the common on-ground acquisition mode is somewhat susceptible to the unevenness of ground surface, the decay of electromagnetic wave energy, and especially the mismatch between trigging interval against sampling speed. These inevitabilities could lead to a random or regular data loss and reducing the signal-to-noise ratio (SNR) of the raw GPR data. To mitigate this, developing an effective signal-loss reconstruction approach has become a hot spot in the literatures. The projection onto convex set (POCS) method was widely used in seismic surveys to properly address the issue of false frequencies due to an insufficient spatial sampling rate or missing traces [3]. With the fundamental similarity regarding acquisition mode and wave equations between GPR testing and seismic reflection survey, the progressive techniques of seismic exploration are then employed into the GPR data processing. Focusing on the sparse sample issue, Yi et al. [4] introduced an iterative reconstruction method combining POCS with frequency-wavenumber (F-K) zone-pass filtering for GPR missing traces and prove its advantages in reducing the data acquisition density. Missing traces in the GPR data may have a significant impact on the performance of clutter removal, making GPR detection and target imaging practically impossible. To resolve this issue, some studies have been carried out by using the matrix completion property of the randomized low rank and sparse decomposition. For example, the well-known Go Decomposition (GoDec) for clutter removal in the missing data. It reported the GoDec method was superior to the principle component analysis method in the case of missing data [5]. Although several other solutions have been also proposed for different cases of missing-traces problem, there is still a high priority for effectively reconstructing the sparsely sampled data [6]. As known to us since the significant correlation between sparse sample reconstruction and compress sensing (CS) that the sparse GPR data can be reconstructed from very few nonadaptive linear measurement data by using the sparsity features of data in a certain specific domain [7], [8]. For instance, the representative iterative shrinkage-threshold algorithm (ISTA) and the fast iterative shrinkage-threshold algorithm (FISTA) [9] are effective schemes that combine sparse sample reconstruction and CS problem, and ISTA has been successfully applied to the GPR data reconstruction in the general circumstances of incomplete data and phase distortion [10], [11], [12].
However, these methods need to satisfy certain linearity assumptions [13], sparsity feature constraints [6], [10], or even stricter sampling conditions [11] to bear the potential to exhibit better reconstruction performance. For example, exploiting data sparse or compressible feature, the sparsity-promoting interpolation method can efficiently reconstruct missing data [14]. In contrast, machine-learning (ML) techniques do not rely on the certain constraints [15], [16], and can be trained by utilizing the features at all levels of the network to achieve a desired result. In such a context, ML technique has drawn tremendous attention for geophysical data reconstruction applications [17], [18], [19], [20]. For example, the methods based on the pyramid context encoder network have been used to reconstruct the missing GPR data [21]. Deep networks and generative adversarial networks have also been used for GPR data recovery in the case of extreme column deletions [22], [23], [24]. And recursive neural network was proposed by model training using the same data for missing-traces reconstruction [18]. Recently, U-Net has been introduced to perform the end-to-end reconstruction rely on its unique encoder-decoder architecture [25]. For consecutively missing traces cases, the training process of U-Net is further improved into the multistage U-Net [26].
Under certain circumstances, comprehensively characterizing the data with only a single network is somewhat a tough issue since its training process often suffers from the gradient vanishing problems. To overcome this, a deep residual learning framework was further introduced [27] by adding a residual module in the U-Net, which shows more significant performance in sparsely sampled data reconstruction [28]. Whereas dramatically increasing the network depth for accuracy improvement, in turn, leads to a redundancy issue of cumbersome network structure. In this event, a lighter U-Net is more preferable with respect to its reconstruction efficiency [29].
Yet, in terms of the feature continuity and reconstruction accuracy of the local feature information, there are still deficiencies with the conventional U-Net since the residual information of each layer cannot be fully utilized inherently. In fact, due to the sensitivity difference of U-Net framework to the feature information, the transmission loss problem inevitably occurs in the training process. For improving the feature continuity and local details of the data, this article attempts to exploit both the high sensitivity of deep residual distillation network to feature information and the lightness virtue of U-Net, we propose a novel deep-learning framework based on the residual feature distillation block U-Net (RFDB-U-Net). In addition, for the first time, we are striving to consider both the efficiency and accuracy of reconstruction to facilitate subsequent processing of GPR data, and in particular to ensure trace integrity with respect to possible small-scale abnormalities in full waveform inversion or migration imaging.

A. U-Net
U-Net was initially applied in the segmentation of medical images [30], inspired by U-Net notion and aimed at minimizing the U-Net model size, scholars in other fields, and then made improvements and tentatively proposed a widely applicable lightweight U-Net [29], as shown in Fig. 1(a). The network is composed of encoding and decoding paths. Each module in the encoding path consists of a convolutional layer, a maximum pooling layer, and the rectified linear unit (ReLU) activation function.

B. RFDB-U-Net
The limitation of conventional U-Net mainly stems from its inherent inability of fully utilization of the fundamental residual information at each layer, which is rich in feature continuity and local details of information. To mitigate this and for the first attempt, we focus on the architecture deficiency of U-Net and perform a series of improvements to propose a novel deep-learning framework based on the RFDB-U-Net, as shown in Fig. 1(b). More specifically, the two-dimensional (2-D) convolutional block is first replaced by the RFDB block, which primarily consists of the shallow residual block (SRB) layer, contrast-aware channel attention (CCA) layer, batch normalization (BN) layer, and the skip connection. It is noteworthy the essential contributions of the proposed RFDB-U-Net are as follows: SRB is used to enhance the extraction function of effective residual features of the data; CCA is added for the comprehensive information after reconstruction; and BN is used to accelerate the convergence of the network and to prevent the gradient disappearance, within which the skip connection is added to supplement the feature information lost during the convolution process. More importantly, a new structural similarity index measure (SSIM) function is exclusively designed to the original loss function by fully restoring the high accuracy of GPR data as well as maintaining the effective feature continuity of data.
The core function of the RFDB framework is mainly implemented through the SRB block. The SRB block consists of a network block with residual connections of kernel size 3 × 3, an identity mapping, and an activation unit, which can better exploit the adaptive capability of parameter learning.
The CCA is used to adjust the weight of each channel, using data X = [x 1 , x 2 , …, x c ] with c feature maps and space size H × W as an input. z c is the cth element of the contrast output and H GC (·) denotes the global contrast information evaluation function. It can be expressed as In this article, we employ a variant of the ReLU activation function, known as the parametric rectified linear unit (PReLU) activation function, in our deep neural networks. The PReLU activation function is defined as where α i is a learnable parameter. When α i = 0, the PReLU activation function degenerates into the traditional ReLU activation function. However, the introduction of α i allows for more flexibility in the network and can effectively avoid the problem of gradient disappearance, which is a common issue encountered with the traditional ReLU activation function. By utilizing the essential virtues of CCA, SRB, and U-Net, RFDB-U-Net structure is ultimately formed to instantly realize the lightweight network and properly complete the end-to-end GPR data recovery operation.

C. Dataset Design and Network Parameter Optimization
In this article, we present a novel GPR dataset for deep learning, which includes 2400 sets of synthetic data and 100 sets of real measurement data. To generate the synthetic data, we used stochastic methods [31], [32] to randomly establish physical models, as illustrated in Fig. 2, to simulate different geological scenarios. The depth of the model was randomly selected between 3 and 6 m, and the distance was randomly selected between 6 and 12 m. We generated the number of layers of the model within the range of 2-5 to represent various types of background material distribution, such as sand layer, soil layer, bedrock, and different underground environments, such as urban roads and field roads. The relative permittivity of the layers range form 3-9.
There were 3-7 anomalies randomly distributed in the medium. The relative permittivity of these anomalies was randomly generated in the range of 1-20 to simulate different materials, shapes, and sizes (0.1-0.5 m in diameter) of underground abnormal bodies, including PVC pipelines, metal pipelines, and irregular cavities. Through these stochastic physical models, our dataset provides diverse, representative, and challenging scenarios for deep-learning tasks in subsurface imaging and analysis. To obtain the final training dataset, we randomly used various types of wavelets, including Ricker wavelet, Gauss wavelet, and Blackman-Harris wavelet, with center frequencies randomly selected from 400, 600, and 900 MHz. We used the stochastic model shown in Fig. 2(a) to perform forward simulation and generate complete GPR data.
To simulate missing data, 20%-60% of the current GPR data trace was randomly reset to the background value of the GPR data [5], [29], [33], [34]. The final dataset is shown in Fig. 2 where the data on the bottom represent the GPR data with missing traces (i.e., input GPR data), and the data on the top represent the predicted GPR data (i.e., output GPR data). We collected the actual field GPR data on the test site with the mismatch between the acquisition rate and sampling interval by using the GSSI (Geophysical Survey Systems, Inc.) antenna with center frequencies of 400 and 900 MHz, where the ground truth of underground targets was known.
In addition, to ensure the integrity and randomness of data features and avoid mismatch of input data size, the data pairs of the dataset are preprocessed by using a modified sliding block selection method [35], as illustrated in Fig. 3. The number of rows and columns of the data pairs are randomly selected from 256, 384, 512, and 768, respectively, and the size of the data pair was adjusted accordingly. We then used a sliding window with the size of 256 × 256 to capture the features of the data pair, which ensures each window contains rich abnormal waveform features. The sliding step size was set as 32. The overall processing flow is shown in Fig. 4.

D. Network Parameter Optimization
In order to evaluate the performance of the network and ensure that it does not fall into an overfitting state, a validation set was created and utilized. This validation set consists of 15 randomly selected synthetic data pairs and 15 randomly selected real-world data pairs. For every 50 completed training epochs, the validation loss was recorded and used to monitor the network performance. In this article, we train the neural network architecture that uses the TensorFlow 2.10 in an Ubuntu 22.04-based computational environment. The system is equipped with an NVIDIA P4000 GPU for acceleration. The Adam optimizer is employed during training, with a mini-batch size of 32.
The Adam algorithm is an adaptive parameter update algorithm which effectively controls the step size of the learning rate and gradient direction by using the first moment estimate and second moment estimate. This enhances the stability of the optimization and prevents gradients from oscillating and disappearing. In addition to storing an exponentially decaying average of past squared gradients s t , Adam also keeps an exponentially decaying average of past gradients v t , similar to momentum [36] v where v t and s t are estimates of the first moment (the mean) and the second moment (the uncentered variance) of the gradients, respectively. These variables are used to update the parameters of the model, which is done by using the gradients obtained from the backpropagation algorithm. Specifically, the value of β 1 is set to β 1 = 0.9, and the value of β 2 is set to β 2 = 0.99. Based on our experimental results and the literature report, these values have been found to be effective in controlling the optimization process [37]. However, when v t and s t are initialized to zero vectors or the attenuation rate is very small, the final result of the optimization process tends toward zero, resulting in a deviation from the optimal solution. To address this issue, we utilize the bias-corrected first moment estimatev t and second moment estimateŝ t to counteract these biases, which are shown as follows: The correction factor is calculated and then used to update the parameters of the model. The update equation for the Adam algorithm incorporating this deviation correction is given by where α represents the learning rate, and δ represents a small constant value that is used to prevent the division by zero. In this article, we have set the initial values of α = 0.001 and δ = 10 −8 based on our experimental results and the literature report in [37].

E. Training of Deep-Learning Model
The loss function plays a crucial role in the training of deep neural networks by comparing the output of the network with the ground truth and the calculation of gradients for weight updates. The choice of the loss function can have a significant impact on network performance. Mean square error (MSE) is a commonly used loss function in deep learning. However, the use of MSE may sometimes lead to data being over-smoothed, resulting in varying degrees of distortion of the GPR profile data. To address this issue, we propose a hybrid loss function that combines the MSE loss function with the SSIM loss function. The MSE loss function measures the error between data pixels, while the SSIM loss function captures the overall similarity between data. This combination can provide a more robust and accurate representation of data that improves network performance.
The MSE shown in (8) and the SSIM shown in (9) are used as the mixed loss function for this network training where x represents the original data, y represents the reconstructed data, and N represents the number of data matrix elements where x represents original data, y represents reconstructed data, and μ x and μ y represent the mean values of x and y, respectively. σ x and σ y represent the standard deviation of x and y, respectively. σ xy represents the covariance of x and y. C 1 , C 2 , and C 3 are constants to avoid system errors with denominators of zero. Therefore, the hybrid loss function can then be expressed as Loss(x, y) = MSE(x, y) + SSIM(x, y).
In this article, we focus on demonstrating the effectiveness of incorporating the RFDB module in improving the performance of the original network. To achieve this, we conduct network training on both U-Net and RFDB-U-Net models. The training process is defined for a total of 1000 epochs and the loss value of training and validation is illustrated in Fig. 5. The results show that all the loss functions exhibit an overall decreasing trend, indicating that neither model has fallen into an overfitting state. Furthermore, the loss function of the RFDB-U-Net model shows  a flatter trajectory and a faster convergence speed when compared to that of the U-Net model, with less drastic oscillations. In addition, both models performed well on the validation set, indicating that the network has adequate generalization capacity. The results demonstrate the performance of the RFDB-U-Net with respect to improving the overall capability of the network.
During the training process of the network, RFDB module enhances the detail feature capture capability of lightweight network and optimizes the network performance, but floating-point operations presecond and trainable parameters are increased in this process, as shown in Table I. This will be the focus of our future network optimization efforts.

F. Evaluation Metrics
For the synthetic data, this article uses the SNR and SSIM image evaluation indexes to evaluate the quality of the reconstructed GPR data. The expression of SSIM is defined in (9), and its value range is [0, 1].
The SNR expression is defined as (11) where f (i, j)represents the original data,f (i, j)represents the processed data, M and N represent the size of the data matrix.
Generally, a higher SNR represents a less noise B-scan with higher quality.
Since there is no original data for comparison between laboratory data and measured data, we introduce the Brenner gradient and entropy focusing functions to evaluate the sharpness of reconstructed images of missing data based on autofocusing metrics [38], [39]. The basic principle of image sharpness evaluation is that the focused image has high contrast, clear, and high frequency information on the edge of the corresponding frequency domain. The focusing basis of Brenner gradient focusing algorithms is on the high-frequency components of the image. That is, fully focused image means it has a high degree of clarity and the presence of abundant high-frequency components, particularly at the edges of the image. In contrast, defocused image means blur, which reflects the attenuation of high-frequency components in the frequency domain, also denotes the lack of detail and sharpness at the edges. This method is widely applied by virtue of its real-time feature as reported in [40]. The Brenner gradient focusing function is a fast, rudimentary edge detector, measuring the difference between a pixel and a neighbor that is typically two (m = 2) pixels away, which can be expressed as where s(i,j) is the grayscale pixel value at coordinates (i,j), and I and J represent the number of pixels in the i and j directions, respectively. Entropy can then be used to describe the abundance of information. The entropy focusing evaluation function is based on the gray distribution with diversity in the focusing image. For a I × J size of the image, there is L grayscale level for statistical result, and the probability of occurrence of kth grayscale level is P k . The entropy focusing evaluation function is defined as where P k ∈ (0, 1) and L k=1 P k = 1, the value of b is set to 2, which based on our experimental results and the literature report [41]. The entropy of the focused image is maximum when the grayscale level probability is used to calculate the entropy.

A. Setup
An underground pipe and cavity model is built and illustrated in Fig. 6. The model dimensions were set as 4.0 m × 8.0 m, and it consists of three layers, from top to bottom: an air layer, a soil layer (containing six pipes and an irregular cavity), and  a substrate medium. The permittivity constants of the soil layer and substrate medium were set as 4 and 9, respectively. As cavity and pipeline models were concerned in this article, conductivity was not considered and was set to zero. The simulation was carried out using the finite difference time-domain method [42], [43] with a Riker wavelet source at the central frequency of 400 MHz and a time window of 60 ns. The time and space intervals were set as 0.04 ns and 0.02 m, respectively. In line with the in-site testing scenarios, signals were intentionally designed to be received randomly, and two percentages of missing traces (20% and 50%) were considered. In the simulation results, only the 50% stacking section was used and presented, as shown in Fig. 7. Table II lists the metrics of reconstruction using the proposed method, ISTA [44], [45], FISTA [44], [45], POCS, and deeplearning-based U-Net [29] in terms of the SSIM and SNR. The results show that, whether the 20% or 50% missing trace data are concerned, the traditional methods have large differences from the original complete data with different degree of signal loss, both metrics of SSIM and SNR reveal the outperformance of the proposed RFDB-U-Net.

B. Results and Discussions
As shown in Fig. 7, all methods represent various degrees of reconstruction performance with 50% missing trace. The data processed by deep-learning methods [ Fig. 7(e) and (f)] are cleaner than traditional methods [ Fig. 7(b) and (d)], in which the extra noise has been introduced during the reconstruction by traditional methods. To compare the specific detail of the hyperbolas, we zoom in the hyperbolic portion of the data, as highlighted by red box in Fig. 7. The detail indicates that our method shows more continuous and smoother homogeneous events especially with respective to the faint-amplitude reflections. To this end, the proposed method could make use of more residual information of each layer for feature learning, additionally, skip connection is added to the remaining elements to further compensate for the missing features in the convolution process, which bear the potential to extract more hyperbolic details and more continuous waveforms. To some extent, the proposed method represents a more optimized and adaptive version of the U-Net, thereby the reconstructed data could match precisely with the data truth since the extracted information is more prominent with more hyperbolic details.

A. Setup
To validate the practicality of the proposed method, we further designed a laboratory model (Case 2) to simulate the missingtrace problem that may be encountered in the acquisition process. Fig. 8 shows the sand tank laboratory in the Geoscience Building of Central South University.
Four anomalies were buried in tank, and parameters of abnormal were listed in Table III. For data acquisition, we used 400 MHz antenna of GSSI SIR-4000 radar with the distance measurement mode, the sampling points per trace and the time  window were set as 1024, 30 ns, respectively. The sampling rate was set as 100 traces/m, and the length of the survey line was 4 m, with a total of 400 traces were collected, as the B-scan shown in Fig. 9(a). Due to the fluctuation and discontinuity of the site near the iron sphere (No. IV), together with the mismatch  between the sampling rate and trace interval, the traces collected near the iron sphere (No. IV) were unexpectedly missed in the raw collected data.
Furthermore, for validating the effectiveness of the proposed method on the complicated in-site field cases (Case 3), we selected the field GPR data collected through a road testing in Zhengzhou, China, as shown in Fig. 10.
For data acquisition under the distance measurement mode, the GSSI SIR-4000 GPR instrument with 400 MHz antenna was also used. We set the sampling points as 512 and the time window as 35 ns. The sampling rate was 50 traces/m, and the length of the survey line was 18.4 m, with a total of 920 traces were collected. Due to the unevenness of ground surface, mismatch between trace interval and sampling rate, trace-loss issue also occurs in the B-scan of common offset gathers, as shown in Fig. 11(a).

B. Results and Discussions
For the laboratory experiment (Case 2), as shown in Fig. 9, all methods show various degrees of reconstruction performance. Similarly, the details of the hyperbola near the iron sphere (No. IV) are highlighted by red dotted box and shown in Fig. 9. The results demonstrate that traditional methods are failed to reconstruct the missing traces, while the deeplearning methods show considerable improvements in term of reconstructing the continuous events, as well as the complete hyperbola. As the CCA and SRB layers are combined, the proposed method is superior to U-Net in terms of the potential to capture more residual information of each layer and to further strengthen the extraction and optimization of residual features.
Considering the in-site laboratory and in-site field data are rather different from the simulated GPR data, we finally evaluate the reconstruction performance using Brenner gradient focusing function and entropy focusing evaluation function. Since more residual information has been extracted to reconstruct the abnormal features, the proposed method provides a more focused image by improving the contrast between the detailed features and the background [39].
As shown in Fig. 12 (Case 2), with the highest evaluation index, the proposed method quantitatively outperforms other methods in the laboratory case.
For the in-site field test (Case 3), the reconstruction results of different methods are compared by red box in Fig. 11(b) and (f), which confirms that the proposed method is capable of reconstructing the "B-scan truth" with more details. The U-Net method, as shown in Fig. 11(e), improves the continuity of the reflection events in comparison to the traditional methods.
However, its extraction and reconstruction ability are somewhat limited mainly due to its lightweight architecture. In contrast, the RFDB-U-Net algorithm enhances the extraction scale of residual features by incorporation of the RFDB module and especially the implementation of a hybrid loss function. These improvements allow the algorithm to capture more hyperbolic features hidden in the profile and conducive to a smoother representation of abnormal waveforms. These visual evaluations are also supported by quantitative measures of sharpness and edge information of the image, as shown in Fig. 13. It can also be found that the highest auto-focusing value is obtained by  the RFDB-U-Net, which delivers a preferable reconstruction performance and reconfirms the practicability of the proposed method in the complex environment for field data acquisition.

V. CONCLUSION
The major contribution of the proposed framework is the incorporation of the RFDB module into a lightweight U-Net architecture, since the SRB and CCA modules are the most critical components of the RFDB to preserve valid information lost in different layers. In addition, a new combination of MSE and SSIM is introduced as a hybrid loss function to improve the continuity of valid information in the presence of extensive missing GPR traces.
We compare the reconstruction performance of the proposed framework with the state-of-the-art methods using missing traces of synthetic data, laboratory data, and on-site field data. The results present practical evidence of the proposed framework and demonstrate its ability to improve the details of information on reconstruction features and accuracy, which could offer high potential for 3-D GPR scanning tasks and fine GPR inversion imaging in complex interference environments for the GPR community.