Multi-Scale Dilated Convolution Neural Network for Image Artifact Correction of Limited-Angle Tomography

Limited-angle computed tomography (CT) has arisen in some medical and industrial applications. It is also a challenging problem since some scan views are missing and the directly reconstructed images often suffer from severe distortions. For such kind of problems, we analyze the features of limited-angle CT images and propose a multi-scale dilated convolution neural network (MSD-CNN) to correct the artifacts and to restore the image. In this network, the dilated convolution layer and multi-scale pooling layer are combined to form a group and exited in the whole encoder-decoder process. Since the dilated convolutions support an exponential expansion of the receptive field without losing resolution and coverage, the obtained artifact features possess the multi-scale characteristic. Furthermore, to improve the effectiveness and accuracy of the training step, we employ a preprocessing method, which extracts image patches. Numerical experiments verify the out-performance of the proposed method compared with some conventional methods, such as Unet based deep learning,TV- and $L_{0}$ -based optimization methods.


I. INTRODUCTION
As a non-destructive imaging technology, computed tomography (CT) has been widely used in industrial and medical fields. When restricted by a specific scan equipment or a radiation-dose requirement, limited-angle CT problems often arise. Limited-angle CT is a highly ill-posed reconstruction problem with incomplete scan data. Traditional algorithms such as FBP (FDK) [1], [2] and ART (SART) [3], [4] will result in significant artifacts in the reconstructed images.
In order to reduce the artifacts and improve the reconstructed image quality, conventional methods for The associate editor coordinating the review of this manuscript and approving it for publication was Wei Zhang. limited-angle tomography often employ some prior knowledge as constraints into the reconstruction process. One popular choice is the sparse gradient constraint, which is based on the fact that the reconstructed image can be approximately treated as piecewise constant. Then, the sparse constraint can be measured by total variation (TV, the L 1 -norm of the image gradient), L 0 -norm of the image gradient and so on. Such measurement can be further employed as a regularization and incorporated into some optimization models. The TV-based optimization model can be effectively solved by the adaptive steepest descent-projection onto convex sets method (ASD-POCS) [5]- [7], the prior image constrained compressed sensing method [8]- [11], the soft-threshold filtering approach [12] and so on. Recently, the solution methods VOLUME 8, 2020 This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see http://creativecommons.org/licenses/by/4.0/ of the gradient L 0 -norm minimization problem have been studied [13]- [16]. Because the L 0 -norm is superior to the L 1 -norm in sparsity expression, the gradient L 0 -norm based optimization method outperforms to the TV based one, especially for edge maintenance. In recent years, some researchers have introduced the idea of deep learning to limited-angle CT problems as well as low does problem [17]- [20]. According to the differences in the processing steps, it can be classified into three categories: image post-processing, projection pre-processing and reconstruction processing.
• Image post-processing: image post-processing which combines traditional methods with deep learning methods is an end-to-end training to learn full-angle images from limited-angle images. The artifacts of limitedangle CT images have certain directionality and global distribution. In 2016, Hanming Zhang et al. first applied deep learning to the limited-angle CT problem and achieved satisfactory results when the missing-angle range is small [21], [22]. Jawook Gu et al. studied the characteristics of limited-angle image artifacts and proposed a method to restore the image by learning multiscale wavelet coefficients [23].
• Projection pre-processing: the incomplete projection data can be learned to achieve a complete version before the reconstruction process. Anirudh et al. proposed a system consisting of a one-and a twodimensional convolutional neural network to recover the missing projection data. This method improved the reconstruction quality and was applied to baggage CT systems [24], [25]. Projection pre-processing is to learn full-angle projection data from limited-angle projection data. The disadvantage of projection preprocessing is that small changes in projection domain have a great impact on the reconstructed result.
• Reconstruction processing: reconstruction processing is to emploly deep learning in the reconstruction process and to reconstruct images from limited-angle projection data. The process is staged and the learning process is more complex.Deep learning can also be employed in the reconstruction process. Hammernik proposed a method that can directly reconstruct images from finiteangle projections [26]. This method first learned the missing weights of the data in the projection field and correspondingly corrected the intensity changes. Then by employing a variational network, the stripe artifacts are effectively suppressed.
Considering the impact from data sets and processing procedure, in this paper, we propose an end-to-end deep neural network for the limited-angle CT problem. The network contains two subnetworks for main structure correction and fine artifact removal respectively. Experimental results show that the proposed network outperforms the conventional methods for the limited-angle CT, even in the case that the scan angle range is restricted to 130 • .

II. METHOD
The post-processing method for the limited-angle artifact correction is to establish a mapping relationship between a limited-and a full-angle reconstructed image via a deep neural network. The parameters of the network are obtained by supervised learning for a training dataset. In this work, we design a new end-to-end network structure, where the CT image reconstructed from limited-angle projection data is employed as the network input, and the one from complete projection data is adopted as the network label value. Without loss of generality, we define the input image as F, the label image as G, and the output image as P, with the same size of N × N , i.e., F, G, P ∈ R N ×N . We establish a relationship as follow, where the mapping function : R N ×N → R N ×N represents the image recovery process. Eq. (1) is then converted to an optimization model, To develop a deep learning based solution method for optimization problem (2), we propose a multi-scale dilated convolution neural network (MSD-CNN). As is illustrated in Fig. 1, the entire network is based on an ''encoderdecoder'' structure with a refining stage. Dilated Convolution (or Atrous convolution) [27] was originally developed for wavelet decomposition [28]. The main idea of dilated convolution is to insert ''holes'' (zeros) between pixels in convolutional kernels to increase image resolution, thus enabling dense feature extraction in deep CNNs [29]. These zeros are able to be considered as ''gaps'' and the different width of the gap corresponds to the different dilation rate. Under such a definition, the dilated convolution on 2D data is illustrated in Fig.2. For a given 2D image, when the dilation rate equals 1, it is a normal convolution, when rate = 2 means skipping one pixel per input, and when rate = 4 means skipping 3 pixels. In the example of Fig. 2, red dots are the inputs to a filter which is 3×3, and yellow area is the receptive field captured by each of these inputs, where the receptive field implies the area captured on the initial input by each input to the next layer.
Dilated convolution provides a method to exponentionally increase receptive views of the network via linearly enlarging the field of convolution kernels. Thus, it is suitable for the applications caring more about integrating knowledge of the wider context with less cost. For example, in the semantic segmentation framework, Yu and Koltun [30] employed serialized layers with increasing rates of dilation to enable context aggregation. More recently, dilated convolution has been applied to a broader range of tasks, such as object detection [31] and audio generation [32].
Multi-scale pooling (MSP) is to use pooling layer (or convolution kernel) with different sizes to extract the features of images at various scales, so as to prevent the loss of network information and to improve the network trainning. As shown in Fig. 3, assuming the size of input image is M × N and the channel number is n, the channels are added up after 2 × 2, 4 × 4 and 6 × 6 Pool&Conv with stride 2, padding 0, 1, 2 respectively, and the n-channel images are obtained with the size of M 2 × N 2 . MSP is adapted according to the spatial pyramid pooling and initially applied in the field of object detection [33]. By extracting features through different-size pooling and generating fixed-size feature vectors, MSP avoids the information loss in candidate In our neural network, we combine a dilated convolution layer and a multi-scale max pooling layer to form a block named as MSD block. These blocks effectively impact the encoding and decoding stages. The left part of MSD-CNN is the encoding stage, which contains 7 convolution layers (including BN and ReLu layers) and 4 MSD blocks. After 4 down-sampling operations, the resolution of the image is reduced to N 2 4 × N 2 4 . In the decoding stage, 4 up-sampling operations (deconvolution) are performed (including BN and ReLu layers). The result of each up-sampling step is combined with the same resolution image, i.e. the result of the MSD block in the encoding stage. Finally, the N × N output result is obtained. The refining stage is a shallow convolution network consisting of four 3 × 3 convolution layers and one 1×1 convolution layer. The input is a dual-channel image that combines the result of the decoding stage with the original input image, and the final output is the corrected limitedangle CT image [34]. Although there are 3 stages in the neural network, the MSD-CNN can be learned as a whole network without pre-training.

A. DATASET GENERATION AND PREPROCESSING
In this paper, the training and testing datasets are from an open source dataset called Liver Tumor Segmentation Challenge VOLUME 8, 2020 (LiTS) (https://competitions. codalab.org/competitions/1559). The LiTS dataset contains 200 three-dimensional (3D) CT scans of the human body, which cover the chest to the pelvic region. In this experiment, we choose the chest (lung) as the target object and generate a limited-angle dataset according to the following steps: 1) randomly select 10 lung-region slices (2D image) from each 3D CT data as label images. The size of each slice is 512 × 512; 2) perform a parallel projection to each slice with a scan angle range of [0 • , 150 • ] ∪ [0 • , 130 • ] to obtain the limited-angle projection and the angular sampling rate is 0.5 • ; 3) implement the SART algorithm for the projection data obtained in step 2) to reconstruct the limited-angle CT images. The limited-angle CT image obtained in step 3) is the input image of the network, and the 2D slice of lung obtained from the first step is the label image. We randomly choose 160 out of the 200 3D CT data as the source of the training set and the rest 40 as the source of the test set. Thus, the data set includes 160×10 = 1600 training images and 40×10 = 400 testing images. Fig. 4 shows an example of the data set, where the first row is a label image and two zoom-in patches, and the second row is correspondingly from the network input, i.e., the limited-angle CT image.
From the limited-angle CT images, it is obvious that the large structural defects and artifacts usually affect a local region rather than the entire image. Hence, we utilize a strategy that divides the image into small patches for the training usage. As a result, both the training efficiency and accuracy are improved.
As shown in Fig. 5, the sliding window is used to extract the patches from the input and the label images. Then, a 256×256 patch pair is obtained per 128 pixels. For each image with the size of 512×512, we divide it to nine 256×256 small patches for local training usage. Therefore, after the extracting operation, the sample number of the data set is 14400(1600 × 9). However, in the test experiments, we only use the entire image instead of the patches to avoid the merging operation.

B. ENVIRONMENT AND HYPER-PARAMETER
We use Pytorch as the deep learning framework. The specific software and hardware environments are shown in Table 1. The elaborated network parameter settings are shown in Table 2.

C. EXPERIMENTAL RESULTS
We design three experiments to validate the MSD-CNN structure and the artifact correction performance respectively. Some image quality assessments (IQAs) are employed for quantitative comparison, such as Peak Signal Noise Ratio (PSNR), Structural Similarity Index(SSIM) [35] and Universal Image Quality Index (UIQI). The UIQI is caculated as follow, x + σ 2 y )(x 2 +ȳ 2 ) wherex andȳ represent the mean of the reference image and the reconstructed image, σ x and σ y the variances ofx and y, and σ xy the covariance ofx andȳ. If the two images are identical, Q = 1. And the closer Q is to 1, the better image quality is achieved.
First, we compare the MSD-CNN with different module combinations, e.g., without dilated convolution layers (undilated), without MSP layers (MSP-free) and without refining stage (unrefined). For each scan angle range ([0 • , 150 • ] ∪ [0 • , 130 • ]), the neural networks are trained with the same aforementioned parameters and datasets. The images predicted from these neural networks are shown in Fig.6 and Fig.7. The corresponding IQA values are listed in Table 3. Loss curves of training and testing datasets with different MSD-CNN modules are shown in Fig. 8.
According to Figs. 6 and 7, it is obvious that the proposed original MSD-CNN superiorly recover local details and fine sturctures comparing with other modifications. The quantitative evaluations in Table 3 have the consistent performances. The loss curves of training and testing datasets with different MSD-CNN modules also prove the superiority of MSD-CNN network. Thus, the MSD-CNN visually and numerically outperforms in image prediction. The results indicate that the dilated convolution layer, the MSP layer as well as the refining stage are all essential to the MSD-CNN.
Then in the second experiment, we compare the MSD-CNN with a classical network i.e. Unet, and a network   Table 4. Average IQA on testing sets with different network structures are listed in Table 5. The loss curves of training and testing datasets with different network structures are shown in Fig. 11.
According to Fig.9 and Fig.10,Unet and Single-Channel networks are blurred in some small structures and appear black shadows, while MSD-CNN performs well. The quantitative evaluations in Tables 4 and 5 have consistent performances. The loss curves of training and testing datasets VOLUME 8, 2020  with different network structures also prove the superiority of MSD-CNN network. Thus, the MSD-CNN has a visual and numerical outperformance in image prediction. The results indicate that MSD-CNN is superior to the other two network structures.  Finally, in the third experiment, we employ some traditional approaches as comparisons, including the SART, L 0 -and TV-based regularization methods, to varify the   superiority of the MSD-CNN in artifact correction of the limited-angle CT. Figs. 12 (example 1) and 13 (example 2) show the results by using different methods. It is noticeable that the L 0 -and TV-based method can recover most of the main image boundaries, but the restored fine details are not satisfactory. Some ''block artifacts'' and blurring effect still exist in the L 0 -and TV-based results. However, the images processed by the proposed neural network dramatically improve in structure restoration and artifact removal. Figs. 14 (example 1) and 15 (example 2) show the processed CT results by using different methods with the same limited-angle range of 0 • − 130 • . It indicates that the MSD-CNN still works well with a narrow scan rang.  Both main and detailed structures are satisfactorily restored and the artifacts are greatly suppressed and even almost completely removed. It is obvious that the conventional methods VOLUME 8, 2020  with TV and L 0 regularizations fail to restore some important image structures and the blurring artifacts are still remained.
From Table 6, it is noticed that the image processed by the MSD-CNN consistently obtains better quality assessments than the conventional methods when the scan angle range varies.
The scan angle of Fig. 15 is 130 • which leads more obvious artifacts in reconstructed image than the case of 150 • . Hence, the quality (especially in the soft-tissue regions) of the images predicted by MSD-CNN decreases for the case of 130 • than 150 • (Fig. 13). However, comparing to other conventional methods or deep learning networks (Unet) with the same scan  angel setting, the proposed method has the best performance in structure maintainance. It is noticed that even with a narrow scan angel (130 • ), the image processed by MSD-CNN still has a better performance than the other methods with 150 • scan angle (Figs. 15 and 13, and Table 6).
To further demonstrate the effectiveness of the proposed network, we calculate an average IQA values for all the output images, seen Table 7. The corresponding results show that the proposed method achieves the desired performance and effectively overcome the limited-angle-caused blur vision.

IV. CONCLUSION
In this paper, we propose a multi-scale dilated neural network (MSD-CNN) for the artifact correction of limited-angle CT problems.
The design of the MSD-CNN is based on an ''encoderdecoder'' structure with several dilated convolution layers and multi-scale pooling layers. Meanwhile, a shallow convolution layer is composed to refine the limited-angle artifact correction. These different modules make the MSD-CNN have a superior capability to deal with the artifacts caused by limited-angel tomography. In addition, considering the local characteristics of the limited-angle artifacts, we design a method to extract the image patches, which retains the original image information and meanwhile increases the size of the training dataset. Thus, the training efficiency and accuracy are further improved. We verify the MSD-CNN with different module combinations and compare it with some conventional networks and methods, such as Unet,TV-and L 0 -based optimization methods. The results demonstrate the necessity of each MSD-CNN module and combined superiority. Moreover, it shows the outperformance of the proposed MSD-CNN than the conventional networks and methods for artifact correction of limited-tomography.