Ultrasonic Logging Image Denoising Based on CNN and Feature Attention

Various kinds of noise will be produced during the process of ultrasonic logging in high temperature and high-pressure environment under oil wells, which is blurring the logging image. This paper presents a novel end-to-end denoising model (ULNet) based on CNN and feature attention to address this problem and remove the noise from ultrasonic logging images. Our method mainly includes feature attention, feature enhancement based on residual model and reconstruction for ultrasonic logging image. Feature enhancement based on a residual model integrates global and local features to increase the expressive ability of the denoising model. Feature attention is used to distinguish the channel feature weights, and effective for blind denoising of actual images. Kernel dilation and skip-connection is used to reduce the computational cost during training. The Noise mapping results are used to reconstruct a clean image. Comprehensive quantitative and qualitative evaluations of results for selected study datasets collected at six oil wells in China show that this model is a feasible and effective means for denoising ultrasonic logging images. Overall, ULNet shows potential for practical ultrasonic logging images denoising.


I. INTRODUCTION
As the world is most important source of energy, oil is the lifeblood of the industrialized nations, petroleum oil hydrocarbons are brought to the surface through oil wells bored in the Earth. We need to test the oil well after we finished it [1]. The well logging methods include radioactivity logs [2], resistivity logs [3], and ultrasonic logs [4]. Ultrasonic logging is widely used in the field of exploration and development of oil and gas resources for its low cost, high sensitivity, high penetrating power, and real-time detection capabilities [5], [6]. It utilizes a rotating transducer to transmit a high frequency ultrasonic pulse to the borehole wall and the information of the borehole is captured by receiving echo reflected from the borehole. The non-ideal logging instruments and the complexity of well logging operating environment, however, disturb the echo signal with different types of noise, downgrading the image quality considerably. These problems produce many difficulties when interpreting the borehole well information.
The associate editor coordinating the review of this manuscript and approving it for publication was Mohamed M. A. Moustafa .
Removing noise from an acquired image is a necessary step in ultrasonic logging image analysis. Therefore, an efficient, flexible and practical ultrasonic logging image denoising algorithm is an active area of research.
Nowadays, ultrasonic logging image denoising has attracted the attention of many researchers and quite a few approaches for ultrasonic logging image denoising have been proposed. These methods can be classified as hardware-based and software-based. Hardware-based image denoising methods include those employing composite transducers [7], [8] and or phased arc array transmitters with azimuthal detection capability [9]. This method can improve the acquisition of transducers accuracy and the sensitivity. But the logging image is inevitably by noise due to the influence of downhole environment. Softwarebased image denoising methods, include techniques such as the Gaussian filter (GSF) [10]- [13] and total variation (TV) [14]- [18] method. A Gaussian filter is a straightforward sliding window spatial filter that replaces the center value of the window with the mean values of all the adjacent pixels values together with the center value itself. A Gaussian filter effectively removes Gaussian noise and is computationally VOLUME 9, 2021 This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/ efficient. The edges of logging images however will be blurred and the feature information of logging images will be corrupted with the filter method. Total variation model denoising relies on reducing the high the integrate value of the image, while keeping it very close to the original image, takes out unnecessary detail whereas conserving significant features of boundaries. But there are a lot of manually selected parameters, and selecting these suitable parameters is subjective and empirical. Recently, convolutional neutral network (CNN) has made great progress in computer vision tasks. In improving the efficiency of denoising task, deep CNN can be regard as modular part to plug into some classical optimized methods for recovering the latent clean image, which was very effective to cope with the noisy image. Inspired by image denoising based on deep learning discussed in [19]- [21] we can see that deep CNN are very competing to both performance and efficiency in image denoising, so we use a deep CNN for ultrasound logging images denoising.
In this paper, motivated by the practical observations of ultrasound logging images denoising, we focus on four challenges: (1) the low-resolution. (2) blind denoising. (3) self-adaption parameters. (4) computational cost. Firstly, since ultrasonic logging images are low-resolution and unclear boundaries, it makes denoising tasks more difficult. Feature enhancement based on a residual model integrates global and local features to increase the expressive ability of the denoising model, which can achieve satisfactory performance. Secondly, in the real world, images are easily corrupted and noise is complex, we cannot generally obtain the noise standard deviation. The attention mechanism [22] uses the current stage to guide the previous stage for learning the noise information, which is very useful for unknown noisy images, such as blind noisy and real noisy images [23], [24]. Thirdly, previous denoising methods of ultrasound logging images may require manual intervention to improve results since there are a lot of manually selected parameters, which limits their application in practical denoising. The endto-end architecture for ultrasound logging images trains the denoising model without requiring manual intervention. Finally, in image recognition or classification, residual model [25]- [28] have two ways to enlarge the receptive filed, such as increasing the depth and width of the deep networks, which results in higher computational costs and more memory consumption. To solve this problem, the proposed network has small depth, but provides a wide receptive field through kernel dilation. Unfortunately, the current denoising models based on deep learning are far from achieving all of these aims, so we present a fast and flexible ultrasonic logging image denoising method as well as ULNet, which is based on CNN and feature attention.
The main contribution of our work is summarized as follows: • A fast and flexible denoising network based on CNN and feature attention, namely ULNet, is proposed for ultrasonic logging image denoising, which is the first paper that use deep learning model for ultrasound logging images denoising. We present ULNet model that provides state-of-the-art results using end-to-end framework, which can deal with noise on different levels, as well as spatially variant noise.
• We utilize attention mechanism to enhance the weights of important features from the maps, which enhancing the expressive ability of the denoising model. It is very useful for unknown noisy images, such as blind noisy and real noisy images.
• The sparse representation of the noise base on kernel dilation and skip connections are used to reduce the network depth, which improves the denoising performance and the efficiency.
The rest of the paper is organized as follows. Section II discusses the principle of ultrasonic logging. Section III presents our proposed a method to denoise ultrasonic logging images by using CNN and feature attention. Section IV shows the extensive experiments and results of the proposed method for image denoising. Section V presents the conclusion.

II. PRINCIPLES OF ULTRASONIC WELL LOGGING
Ultrasonic well logging records the amplitude and the travel time of the sound wave to create a 360 • image of the borehole wall [29], [30]. The principles of Ultrasonic well logging are shown in Figure 1. Figure 1 (a) shows a schematic diagram of the logging process, and an amplitude and time diagram of ultrasonic logging is shown in Figure 1 As shown in Figure 1, a rotating transducer [31], [32] transmits ultrasonic pulses vertical to the borehole wall, and receives the echoes reflected from the borehole wall at the same time. A depth point is raised upward once a week as the transducer rotates, the ultrasonic echo amplitude is used to generate a coefficient reflection image of the borehole wall, documenting the acoustic impedance information of the borehole wall [33]. The time of the ultrasonic echo is used as the reflection time of the borehole wall, and used to calculate the diameter of the borehole.
A captured ultrasound logging image however, is a degraded latent observation, the noise from the degradation process can be caused by two types of factors: ultrasonic signals are sensitive to high-temperature and high-pressure environments under oil wells, creating noise. In addition, a reflected echo signal of ultrasonic cannot be received because of the logging instrument eccentricity, resulting in data loss. In the latter case, noise can be removed using an amplitude eccentricity estimate [34] and ellipse fitting based on Least Squares [35] In this paper, we only consider the former case, noise generated during the processing of ultrasonic logging imaging.

III. PROPOSED METHOD
In this section, we introduce the proposed denoising network, ULNet, composed of feature extraction and feature enhancement units (FEU) based on residual model and reconstruction, as shown in Figure 2. Our ULNet is an end-to-end architecture that performs denoising by using a  single model and handling both spatially variant and invariant noise. Firstly, ''Conv(Convolution) + ReLU(Rectified Linear Units)'' is adopted for the first convolution layer, which is used to obtain initial features from the input image. Secondly, feature enhancement units (FEU) based on residual model are cascaded together for the main feature learning. Finally, the output features of the last layer are feedback to reconstruct the clean image.
Specifically, the design of network architecture of ULNet follows between the performance and efficiency for ultrasonic logging image denoising. We used three strategies to remove the noise from the ultrasonic logging images to improve performance. Dilated and standard convolutions are used to enlarge the receptive field size for improving denoising performance. The FEU uses the global and local features of ULNet to enhance the expressive ability in image denoising. The attention mechanism can enhance the weights of important features from the maps, which is very effective for unknown noisy images, such as blind noisy and real noisy images. Further, we will introduce these techniques in later subsections.

A. FEATURE EXTRACTION
Assume that IN denotes the input noisy image and IO is the denoised output image. The first convolution layer extracts initial feature from the noise image as where F 0 is the initial feature of the noise image, C 0 (.) represents the convolution function on the noise image. F 0 is input to the residual layers for the feature learning, the implementations of this model can be transformed as in the following formula where F r are the trained features and C fl (.) is the main feature leaning based on the residual model, which is composed of FEU (feature enhancement unit) that are linked together.

B. ATTENTION MECHANISM
An attention mechanism extracts features suitable for image processing applications. If the weights of channel features in denoising model are equal, it is not suitable for many cases and cannot process unknown noisy images. The attention mechanism guides the CNN when training a denoising model, generating attention differently for each channel wise feature.
In frequency domain, an image generally includes two parts: high-frequency (edges and texture area) and lowfrequency regions (smooth or flat area). A CNN model only uses local information instead of global contextual information, so global average pooling is employed to represent the statistics of the whole image, Let F c denote the output features of the final convolution layer and c is the number of channels, the size of feature maps is h × w, the size of global average pooling K a will be reduced from h × w × c to 1 × 1 × c as: where F c (i, j) is the feature value at position (i, j) in the feature maps. Inspired by attention mechanism discussed in [22], we use a self-gating mechanism to obtain the channel correlation from the descriptor retrieved by global average pooling. The self-gating mechanism learns the nonlinear synergies between channels as well as mutually exclusive of each other. Therefore, sigmoid operators and soft shrinkage are utilized to realize the gating mechanism. We assume that S(.) and ϕ(.) denote the sigmoid operators and soft shrinkage, respectively. The gating mechanism is transformed as in the following formula where L U is the channel reduction operator and L D are the channel up sampling operators. When the global pooling layer K a is outputting, it is convolved with a down sampling convolution layer, then is activated by the soft-shrinkage function. To distinguish the channel features, the output layer is then feedback into an upsampling convolution layer that followed by sigmoid activation. In addition, for statistical in-formation, the output of the sigmoid (R c ) is adaptively rescaled by the input F c of the channel features. The implementation of this process can be formulated aŝ In this section, feature enhancement based on residual model with short skip connections and local skip is introduced in detail. It is known that very deep network night suffers from weaken influences from the shallow layers on the deep layers as the growth of depth. For solving this problem, the feature enhancement is proposed in ULNet for image denoising. Our Feature enhancement unit is called FEU. The feature enhancement unit (FEU) is composed of three parts as following. Firstly, the input features are divided into two branches and transited to two dilated convolutions, then concatenated and passed through another convolution. Secondly, when the features are compressed by an enhanced residual unit of three convolution layers, features employ a residual block of two convolutions to learn. The feature compression is able to improve the speed of processing. The final layer of residual block flattens the features by using a 1 × 1 kernel. Thirdly, the output of the feature attention unit is transited to the input of next unit. Therefore, we use FEU as a basic module to construct our denoising network. The n-th module of the FEU is given as where F n is the output feature of the FEU n . The direct cascading the residual modules cannot finish better performance.
Hence the input of the feature training module is transited to the last output of the stacked modules as where, W and b are the weights and biases trained in the model. F a is transmitted to reconstruction layer to output the same number of channels as the input of the network.

D. LOSS FUNCTION
The proposed ULNet is trained by the degradation equation y = x + n, y is noise image, x is the original image, n is the noise. It is known that the ULNet is used to predict the residual image, n via n = y − x. Then, we employ the given pair and the mean square error (MSE) to train the denoising network model, where is the noisy input and the ground-truth. The implementations of this process can be formulated as where ULNet is our network, w and b represent the set of all the network parameters learned, IN is noise image, I c is ground-truth.

E. NETWORK RECONSTRUCTION
The output features of the last layer are feedback to the reconstruction module, which is also comprised of one convolution layer.
where Cr(.) denote the reconstruction layer, IO is the denoised output image, Fr are the trained features.

F. IMPLEMENTATION OF ULNET
Our ULNet model includes five FEU blocks. The convolution filters size of each layer is 3 × 3, but the size of last layer is set to 1 × 1. In order to realize the same size out-puts of feature maps, we use zero-padding to fill the maps. The number of channels is fixed at 64 for each convolution layer, except for feature attention down scaling. The whole network has only four feature maps since there are 16 time for reducing these convolution layers. The final convolution layer might output three or one feature maps because of the input feature maps.

IV. EXPERIMENTS A. THE ULTRASONIC LOGGING IMSGE DATASETS
We selected 4000 ultrasonic logging images 256 × 256 pixels in size from the six oil wells as the test datasets. The all models are trained on Gaussian noise levels set to 15, 25, 40 and 50. Noise signals and the original signal were added to produce a corrupted signal following the model: w(x, y) = s(x, y) + n(x, y) In these equations, s (x, y) is the original signal or image, n (x, y) denotes the noise introduced into the image to produce the corrupted image w (x, y), and (x, y) represents the pixel location.
Different areas of an image contain different kinds of detailed information. Hence, we divided the noisy training images into 158000 patches of 40 × 40 pixels in size. A patch facilitates more robust features and improves the efficiency when training a denoising model. Noise varies and is complex in the real world, so we used 1000 real noisy images with 256 × 256 pixels in size from the datasets to train a real noise denoising model. To accelerate the speed of training, the 1000 real noise images were divided into 111600 patches of 50 × 50 pixels in size. Additionally, each training image was randomly rotated by one of eight ways: original image, 90 • , 180 • , 270 • : original image flopped by itself horizontally, 90 • flopped by itself horizontally, 180 • flopped by itself horizontally, and 270 • flopped by itself horizontally.

B. EXPERIMENTAL ENVIRONMENT
The ULNet model was trained on the server, but the actual operation is done on a Personal Computer. Therefore, the experimental environment in our paper is divided into two parts (see Table 1): the server and personal computer.

C. EXPERIMENTAL RESULTS AND ANALYSIS
We conducted qualitative and quantitative experiments to evaluate and demonstrate the performance of the proposed ULNet model for removing noise. The experimental results produced by Gaussian filtering (GSF), TV algorithm, nonlocal mean filtering (NLM) [36] and DnCNN [20] were compared with our proposed method. A visual inspection of recovered clean images subjectively reveals signs of clarity and completeness. An objective, empirical evaluation used peak signal to noise ratio (PSNR), structural similarity index (SSIM) values and the runtime of denoising of an image to test the denoising effects of the proposed model in relation to other denoisers. In addition, real noisy images were used to further assess the practicability of ULNet.
In this paper, peak signal to noise ratio (PSNR) and structural similarity index (SSIM) values were employed for in the objective empirical evaluation. The PSNR and SSIM were calculated as the error metric and compared against other competitive state-of-art algorithms. For a fair comparison, we used the default settings of the comparative methods provided by the corresponding authors. Assuming that the size of the original image X is M × N, and Y is the denoising image. PSNR is then defined as: where f max is the maximum intensity of the input image, for the common 8-bit gray level image with 256 possible gray level values, since f max = 255. PSNR is used to measure the denoising effect, but we also can employ other quantitative indexes to evaluate the structural similarity between original and the denoised images. SSIM is a quality assessment that measures the similarity between two images. Suppose x and y are two non-negative image signals for calculating the SSIM.
SSIM (x, y) = (2µ x µ y + c 1 )(2σ xy + c 2 ) where the terms µ x and µ y , are the mean intensity of x and y, while, σ x and σ y are their standard deviations. The term σ xy is the covariance of images x and y, and c 1 , c 2 are constant values. The local parameters µ x , µ y , σ x , σ y and σ xy are calculated within a local 8 × 8 square window, and the square window slides from pixel to pixel over the whole image.

1) QUALITATIVE EVALUATION
In order to verify the denoising effect, Gaussian white noise (GWN) was added to the borehole wall image, the denoising effect of the method model was measured by the visual image after denoising. The performance of the proposed ULNet is illustrated through the testing results of the model well and the ultrasonic logging image as an example. Figure 3 shows examples of the denoising results VOLUME 9, 2021 The I and II images of Figure 3 are the model well images, acquired by an ultrasonic logging instrument CBIL that uses transducers. The III and IV images of Figure 3 are ultrasonic logging images, acquired by the piezoceramic ultrasonic imaging logging instrument BHTV from Changqingyi well. From Figure 3, the experimental results show that our method VOLUME 9, 2021   is visually superior to the other methods, yielding satisfactory denoising results. A closer inspection on the ultrasonic logging imaging reveals that our model generates textures closest to the ground-truth with fewer artifacts and more details.
From the second column of figure 3, although the traditional Gaussian filtering denoising methods can complete the denoising, it is clear that the image is over-smoothed. From the third and fourth column of figure 3, the non-local mean filtering and TV algorithm retain the image details while denoising, but the local detail features are lost. It is worth noting that, the perforated borehole wall (the IV image in figure 3) was processed by TV algorithm filtering when the noise level was set 40 and 50. As a result, part of the perforated on the right has been lost and the whole image is distorted.
Compared with other algorithms, the proposed method in our paper and DnCNN can remove the noise while preserving the finer details and structures. As an effective denoising model, it can protect the texture features of the image and retain the line features of ultrasonic logging image, which can be used in actual ultrasonic logging image denoising. Therefore, our method was more effective than others, quantitative evaluation will be provided in the next subsection.

2) QUANTITATIVE EVALUATION
We evaluated the denoising performance of ULNet via 600 images, there are 100 images in each group. The size of each image is 256 × 256 pixels, and each dataset was from one of six test oil wells. The average PSNR and SSIM results of different methods on the dataset are shown in Table 2 and  Table 3.
Compared to the traditional denoising method (Gaussian filter), the methods TV and NLM have a notable PSNR and SSIM gain. According to [21], deep learning techniques have received much attention in the area of image denoising and achieve better performance than previous method for optical images. We used DnCNN and ULNet (our method) to remove the noise of the ultrasonic logging imaging. The experiment results from Table 2 and Table 3 show that the denoising method based on deep learning further improves the capacity of removing the noise of the ultrasonic logging imaging. As one can see, the proposed approach can achieve the best PSNR and SSIM results than the competing methods. Our method outperforms all the competitive methods on datasets for all noise levels since our method has the better expressive and generalization ability of denoising. Not only that, in the next section we will discuss the advances of our method in terms of operational efficiency.

3) OPERATING EFFICIENCY
In addition to visual quality, another important aspect for an image restoration method is the testing speed. Table 4 shows the run times of different methods for denoised images of sizes 3000 × 256 and 300 × 512 with Gaussian noise level 25. Since GSF, NLM, DnCNN and our ULNet methods are well suited for parallel computation on GPU, we also give the corresponding run times on GPU. As in [37], we do not count the memory transfer time between CPU and GPU.
As illustrated in Table 4, we can see that the proposed ULNet can have a relatively high speed on CPU and GPU, its running time is very competing in contrast to other popular methods. Further, the ULNet has the smaller complexity than that of state-of-the-arts, such as DnCNN. Taking denoising performance and flexibility into consideration, ULNet is very competitive for practical applications. The next section describes how this works in a real-world situation.

4) REAL NOISY IMAGES
In the real-world noise is complex, there are many noise sources and images are easily corrupted. An evaluation of real noisy images could indicate the effectiveness of algorithms in real-world applications. However, such an evaluation is difficult to conduct due to the following reasons. (I) Both the ground-truth clean image and noise level are unknown in real noisy images. (II) The real noise comes from various sources, and the noise is spatially variant (non-Gaussian) and signal dependent, hence, the assumption that noise is spatially VOLUME 9, 2021 invariant, employed by many methods do not hold for real noisy images.
Since there is no ground-truth image for a real noisy image, a visual comparison is employed to evaluate the performance of ULNet. As shown in Figure 4, we visually compared the result of our method with the other methods on the real noise images collected at the Changqingyi oil well.
It is clear that the methods of GSF, TV and NLM perform poorly in removing the noise from local area (no.1 and no.2) as shown in Figure 4. From Figure 4 (b), although the denoising results of GSF remove the noise, the image is over smoothed. From Figure 4 (c) and 4 (d), TV and NLM fail to remove the noise since they are efficient and capable of handling synthetic instead of real noisy images of ultrasonic logging. In contrast, DnCNN and our proposed ULNet obtain the better visual results than the methods of GSF, TV and NLM with removing the noise and detail preservation, but a few noises still exist from the magnified area by the method of DnCNN. Therefore, the results indicate the feasibility of employing our method for practical image denoising applications.

V. CONCLUSION
In this paper, we proposed a new CNN model, ULNet, for synthetic noise and real noisy images of ultrasonic logging. Our model is an end-to-end architecture without requiring separate sub-nets or manual intervention. Our solution includes three modules: feature extraction, the enhancement attention modules based on residual network and image reconstruction. The results on synthetic images with GWN demonstrated that ULNet can not only produce state-of-theart results when input noise level matches the ground-truth noise level, but also has the ability to robustly control the trade off between noise reduction and detail preservation. The running time comparisons showed the faster speed of ULNet over other competing methods. The results on real noisy images further demonstrate that ULNet can deliver perceptually appealing denoising results. Considering its flexibility, efficiency and effectiveness, ULNet provides a practical solution for ultrasonic logging image denoising.