Learning Image From Projection: A Full-Automatic Reconstruction (FAR) Net for Computed Tomography

The x-ray computed tomography (CT) is essential for medical diagnosis and industrial nondestructive testing. The aim of CT is to recover or reconstruct image from projection data. However, in particular, the reconstructed image usually suffers from complex artifacts and noise, such as the sampling is insufficient or low-dose CT. In order to deal with such issues and achieve reconstruction, a full automatic reconstruction (FAR) net is proposed for CT reconstruction via deep learning technique. Different with the usual network in deep learning reconstruction, the proposed neural network is an end-to-end network by which the image is predicted directly from projection data. The main challenge for such a FAR-net is the space complexity of the CT reconstruction in full-connected (FC) network. For a CT image with the size <inline-formula> <tex-math notation="LaTeX">$N\times N$ </tex-math></inline-formula>, a typical requirement of memory space for the image reconstruction is <inline-formula> <tex-math notation="LaTeX">$O(N^{4})$ </tex-math></inline-formula>, for which is unacceptable by conventional calculation device, e.g. GPU workstation. In this paper, we utilize a series of smaller fully connected layers (FCL) to replace the huge Radon transform matrix based on the sparse nonnegative matrix factorization (SNMF) theory. By applying such an approach, the FAR-net is able to reconstruct images with the size <inline-formula> <tex-math notation="LaTeX">$512\times 512$ </tex-math></inline-formula> on only single workstation. The results of numerical experiments show that the projection matrix and the FAR-net is able to reconstruct the CT image from projection data with a superior quality to conventional methods such as optimization based approach. Meanwhile, the factorization for the inverse projection matrix is validated in simulation and real experiments.


I. INTRODUCTION
X-ray computed tomography (CT) has been widely used in medical diagnostic and industrial nondestructive testing due to its great ability in visualizing interior structure. Considering the damage of radiation to patients in practical application, it is of great significance and essential to reduce the dose in the practical application. There are effective methods to reduce the dose of CT, such as decreasing the scanning angles (sparse-view CT) and lower current of X-ray tube (low-dose CT). However, this strategy will lead in a variety of issues in reconstructed images such as artifacts and noise, especially in the case of sparse-view CT. Many studies have demonstrated that the early traditional methods [1], The associate editor coordinating the review of this manuscript and approving it for publication was Yi Zhang . such as filter back-projection (FBP), algebraic reconstruction algorithm (ART) [2], simultaneous algebraic reconstruction technique (SART) [3], expectation maximization (EM) [4], which method fail to deal with these issues via theories or experiments. In order to further improve the quality of reconstructed images, various methods have been proposed according to compressive sensing (CS) theory [5]. Most of them are based on optimization model which imports some prior knowledge as constraints terms. For example, some studies utilize L 1 regularization term of the gradient of images for dealing with different issues, such as sparse-view, lowdose, limited-angle as well as interior tomography. Such types of regularization are also named as total variation (TV) [6]- [11] which are based on the assumption that the gradient of CT images is sparse. Inspired by the sparsity of image, many approaches are proposed, e.g. dictionary network mimics an organism better than a linear operator, and is much more intelligent than a linear system solver [34] as well. Hence, in practice, it is of great importance to build an end-to-end network which tries to translate an image in a modality that is difficult to understand to a corresponding image which can be recognized by human [35], [36]. Würfl et al. have already demonstrated that image reconstruction can be expressed in term of neural network and shown that FBP can be mapped identically onto a deep neural network architecture [37]. In this method, the parameters of FCL of neural network are fixed and pre-calculated by discrete formulation of FBP algorithm and the parameters of convolution layer are initialized by Ram-Lak filter. Argyrou et al. proposed an approach of artificial neural network reconstruction [38], and Zhu et al. shows that sensor domain to image domain via automated transform by manifold approximation (AUTOMAP) [39]. Such neural network requires an impractical amount of memory space which hampers the approach applied to practical. More specifically, the proposed neural network needs to train tremendous parameters. The space complexity of the network parameters is O(n 4 ), where n is the width or height of reconstruction image. Because of memory limitation, the scheme can only reconstruct low resolution images. For example, if we want to reconstruct an image with a resolution of 512 × 512 using AUTOMAP method, the number of parameters (single float) is 512 4 , requiring 1 TB memory for which will be a heavy cost of computational hardware cost such as GPU. Li et al. proposed iCT-Net for CT images from sinogram data [40], which achieves back projection by introducing a fixed and separated rotation layer which is calculated based on the CT system's configuration. Li et al. proposed a hierarchical approach to deep learning and applied it to tomographic reconstruction [41]. By those strategies, the parameters of the mapped network could decrease dramatically.
In this paper, in order to take advantage and overcome shortcomings of end-to-end neural network for image reconstruction, we propose an approach to reduce the size of the FCL based on the idea of sparse non-negative matrix factorization (SNMF) [42]. Non-negative matrix factorization (NMF) is usually utilized for parts-based representations in machine learning [43] in earlier study. For a given non-negative data matrix V , NMF finds an approximate factorization W ≈ V · H into non-negative factors V and H [44]. Especially, if we make sparseness constraints at the matrix, the better quality will be achieved for NMF. By applying the SNMF idea, the space complexity of the network has been reduced to O(n 3 ) and the requirement of the memory space has been decreased to an acceptable size. For example, we are even able to reconstruct CT image with a resolution of 512 × 512 for sparse-view scanning in a single workstation with multi-GPUs. Moreover, in the case of sparse-view scanning, we integrate an artifact suppression (AS) module to our network for furtherly suppressing the residual artifacts and noise. Hence, our network would learn features from data and reconstruct image without manual intervention and we name it as Full-Automatic Reconstruction net (FAR-net). The numerical experiments show that FAR-net could be implemented on only one workstation and predict the CT image from projection data directly with a superior quality than traditional algorithm such as TV based approach.
The remain of the paper is organized as follows: In the section II, we describe the proposed FAR-net which contains two strategies: reconstruction and AS module. In the section III, numerical experiments are carried out to verify the CT problem can map into FC network in low dimensions, but also is useful for sparse-view CT problem by two neural network. Finally, we summarize the paper in section IV.

A. METHOD OVERVIEW
In ideal condition, the mathematical model of CT is usually described as a discrete linear system where A ∈ R M ×N denotes the projection matrix, X denotes the reconstructed image and b denotes the projection data. It is an inverse problem for solving X from b. Here the system matrix is A (u×v)×(w×h) A M ×N where u, v are the number of detector bins and scanning samples, and w, h are the size of reconstruction image. For such a problem, it is difficult to obtain X directly since the system matrix A is too huge to find the inverse matrix. It is noteworthy that A is a sparse non-negative matrix since the elements of A denote the length of the intersecting line between the x-ray and the image pixel. Hence, for a typical medical image, the number of the elements in A achieve about 2 15 × 2 18 = 2 33 . When the projection data are complete and noiseless, analytical methods such as FBP method can be used to solve perfectly. However, when the projection data are obtained from insufficient sampling views, the reconstructed image will have severe artifacts and the quality will degrade significantly. To obtain the high-quality image b from X , this problem can be regarded as search for a transform function f . This transform f : X → b can be achieved using neural network.
As mentioned above, it is an enormous challenge to reconstruct CT image from projection data via a deep neural network without any manual intervention due to the unacceptable requirement of memory space. In this paper, inspired by the matrix factorization, we propose a Full-Automatic Reconstruction (FAR) net to predict CT image from spare-view projection data directly on a single workstation. The FAR-net is motivated by the following observations: • the sparse matrix can be decomposed into the product of a series of matrices approximately based on SNMF theory; • Sparse-view CT images usually suffer from heavy artifacts and noise, and CNN has great potential to remove such artifacts and noise; • a deep neural network is ideal to capture various types information from a large amount of training data [27]. According to the first observation, we first propose our FAR-net to realize the CT reconstruction process. As the second point of observation, in the case of sparse-view or incomplete data, additional artifact suppression is also very necessary. So, for the case of sparse-view CT, we added a post-processing module (artifacts suppression (AS) module) after the FAR-net, which is named FAR-net + . From a macro perspective, the processing of our method can be interpreted as the Figure 1. The FAR-net and FAR-net + are an end-to-end network. The additional AS module is designed to solve the artifacts and noise caused by reconstruction in sparse-view angles, and this is not a necessary module for full angle case.

B. FULL-AUTOMATIC RECONSTRUCTION NETWORK
According NMF theory [42], a given non-negative matrix (W ) can be factorized as follow: where c < min(M , N ) and both V M ×c and H c×N are non-negative. To improve the quality of approximation of Eq. 2, there are different cost functions such as L 2 norm or Kullback-Leibler divergence: Lee et al. has found an algorithm to minimize E L2 and E KL and gave the proof of convergence [43]. Furthermore, Hoyer et al. indicated explicitly that incorporating the sparseness as contrast for matrix could improve the result of factorization [42]. If we decompose V and H continually, the theory still works since both of them are SNMF. Hence, we could factorize the sparse non-negative matrix within a few steps, e.g. 2 or 3 layers. Then, considering the huge projection matrix A in CT imaging, it could be approximatively represented by a serious of smaller matrices as follows, In particular, the CT reconstruction can be considered as two processes: filter and back projection (FBP). The process of back projection also can be regarded as a linear transformation, and the sparse property of this linear transformation matrix R is consistent with the projection matrix A. The process of filter can be realized by convolution layer. Hence, we are able to learn the inverse of R based on Eq. 5 via FCL neural network which are composed by some smaller middle layers. As shown in Figure 2, these middle layers can effectively reduce the amount of network parameters and the requirement of memory space. Based on such a structure, we proposed the reconstruction neural network shown in Figure 3 (FAR-net), which include some hyper parameters need confirm such as k represents the number of neurons in the middle layer and also means the matrix sparsity. Although Eq. 5 is not strictly proved in theory, numerical experiments indicate that the weights matrices of trained network is approximately enough equal to the inverse matrix. Therefore, the network is able to predict the CT image from projection data directly.

C. ARTIFACTS SUPPRESSION NEURAL NETWORK
Generally speaking, sparse-view CT images usually suffer from complex artifacts as well as noise. However, the FAR-net is not satisfied for predicating image with high quality in such condition. Hence, an additional AS module has been proposed for dealing with sparse-view tomography. As you can see in Figure 4, our network takes full advantage of residual block [45] and U-net architecture [46]. More specifically, the residual block is shown in the lower left corner of the Figure 4. The bypass connection in the residual block is able to recover image with higher quality and to avoid vanishing gradient problem in back-propagating. Similarly, The U-net architecture also can preserves the details of high-frequency features. Since a typical CNN has pooling layers, the information may be lost after passing these layers. To avert this phenomenon, high-frequency features from the contracting path are combined with up-sampled output to recover the details [27]. In addition, our network has added its own features. Our network performs down-sampling and up-sampling on different scale spaces to realize information exchange in different scale spaces, while U-net has less communication between data in different scale spaces, and there is only one down-sampling and up-sampling in adjacent scales.

III. NUMERICAL EXPERIMENTS
In this section, various experiments are carried to evaluate the FAR-net as well as the matrix factorization.

A. VALIDATION OF MATRIX FACTORIZATION
We first perform some numerical experiments to validate the Equation (5). In order to simplify problem, we remove the convolution layers of FAR-net in the testing (only FC network remained). However, it is difficult to estimate directly whether the predicted matrix which is defined as A is inverse of A. So we calculate E ≈ W k · · · W 2 · W 1 · A = AA as the evaluation objective, which should be equal or approximately equal to identity matrix (I ). To train and validation the FC network (the simple version of FAR-net), we have to prepare a large dataset comprising pairs of input data and label data. Firstly, we generate a nonnegative sparse matrixA N ×N randomly, of which the sparsity is 2 √ N N . Then, we select 100 images from Pascal VOC [47] and resize them to N × N . Each row (X i ) of image can be regarded as ground truth data and b i (= A · X i ) can be regraded as input data. Hence, the total number of training and testing data pairs is 100 × N . It should be noticed that the operator is only a matrix multiplication rather than radon transform in this experiment. The configuration of the FC network is displayed in Table 1.   In this testing, there are 4 aspects are studied for the impact of FC network in FAR-net: the depth of network, whether using nonnegative constraint (rectified linear unit activation function), the times of back-propagation and the dimension of A. The results with different matrix dimension show that the inverse of the sprase non-negative matrix (SNM) is able to be trained by the FC neural network with several smaller layers, and the dimension of corresponding matrices are smaller than the inverse matrix. Furthermore, we can make some conclusion from numerical experiment.
• Figure 5 shows loss curves with different number of middle layers shows MSE loss value of 3000 iterations in different layers. It is obviously that increasing the number of middle layers can accelerate convergence slightly and obtained lower loss value.
• The results of weight matrix multiplied by matrix are shown in Figure 6, where the dimension of A is N = 256, 512, 1024, 2048. It is illustrated that the matrix E are approximately equal to the identity matrix I .
• From the Table 2, under the normalized mean absolute distance (NMAD) , it is noticed that the factorization can achieve better performance with nonnegative constraint which is a very similar conclusion with the SNMF.
• Figure 7 shows that the variation of NMAD with the different dimension of the matrix A. We can see that NMAD decreases as the matrix dimension increases. In fact, when the reconstructed image size is 512 × 512, the dimension of matrix A can be regarded as 2 9 × 2 9 = 2 18 for reconstruction problem.

B. FAR-NET FOR FULL ANGLE RECONSTRUCTION
To demonstrate that the trained FAR-net is able to reconstruct images directly from projection data, there is an experiment to illustrate the performance of FAR-net.

1) DATASET AND CONFIGURATION
In the numerical experiments for full angle CT, we select the TCGA-ESCA cancer CT image dataset [48] as the experimental dataset. We chose 4302 images from the dataset with the size of 512 × 512 (pixels) and are reshaped as 256 × 256. In these images, 4001 images (include 11 human tomographic images) are regarded as the ground truth of training dataset. using the ground truth images simulate parallel beam projection and obtain the projection data, which are regarded as the input dataset of FAR-net. The other images include 2 human tomographic images) are regarded as test dataset, which are generated in the same way and are not included in the training dataset. The parameters of parallel beam CT projection are set as follows: the total number of views is 180, of which the interval is 1 degrees with the scan range [0 • , 180 • ) and there are 300 rays for each view.Here we only consider the configuration of parallel beam. For fan-beam CT, the previous studies indicate that it is feasible to convert the fan-beam data to parallel beam data with networks. The configuration of the FAR-net are displayed in Table 3.

2) RESULT
In order to illustrate the performance of FAR-net and select a better network parameter k, Figure 8 and      Figure 8 and Figure 9.
where p and p noisy denote noise-free and noisy projection data, respectively. Here, we simulated Poisson noise with incident intensity I 0 = 1.0 × 10 6 . As shown in Figure 8 compared with the classic FBP reconstruction algorithm, the proposed method can reconstruct images with higher quality with a big enough k ( when k > 5). In the case of noise, Figure 9 reveals that FAR-net has generally advantageous in terms of noise suppression, feature preservation. So the FAR-net is a real-time and more accurate reconstruction algorithm. Table 4 lists two distance measures for the reconstruction images which are shown in Figure 8 and Figure 9. The two measures further verify and illustrate the advantages and performance of our FAR-net.
To test the potential capability of our method in practical applications, experiments are carried out on a rat paw. The real data of parallel beam projection is acquired by synchrotron radiation devices (SSRF-BL13W1), which has high monochromaticity and the reconstructed image does not have serious hardening artifacts. For the scanning parameters of parallel beam CT projection are set as follows: the total number of views is 180, the angle interval of two adjacent projection views is 1 • , detector number is 600. The experiment data are selected from full angle data and the size of projection data is 180 × 300 and the reconstruction image size is 256 × 256. Since the neural network needs large amounts of data and is limited by data collection, the training data set does not include any real data. Figure 10 shows the results reconstructed by FBP with full angle and FAR-net  with k = 5, 10, 20, 30, respectively. It can be seen that the image reconstruct correctly with the proposed FAR-net. Our method also can achieve promising gains and also explain the effectiveness.

C. SPARSE-VIEW CT RECONSTRUCTION
In this subsection, the FAR-net and FAR-net + are evaluated with the medical image dataset and compared with conventional approaches for sparse-view CT.

1) DATASET AND CONFIGURATION
In the numerical experiments for sparse-view CT, we also select the TCGA-ESCA cancer CT image dataset [48] as the experimental dataset. The classification of training set and test set is consistent with the experiment of the FAR-net for full angle reconstruction. The parameters of parallel beam CT projection are set as follows: the total number of views is 60, of which the interval is 3 degrees with the scan range  [0 • , 180 • ) and there are 600 rays for each view. the configuration of the FAR-net and AS module are displayed in Table 5.

2) RESULTS
The images in testing dataset are predicted by the FAR-net + with sparse-view projection data. For comparison, FBP algorithm, optimization-based algorithm and deep learning methods are also utilized to reconstruct images from the sparse-view projection data. In the optimization-based method, the regularization term of the objective function is the Anisotropic TV (Rudin-Osher-Fatemi model) of image.
We solve this problem using Split Bregman method [49]. So we introduce auxiliary variable d x , d y and regularization for the Bregman variable λ; then, Eq. (7) is equivalent to where the proper values of b k x and b k y are chosen through Bregman iteration.
For neural network method, U-net and comprehensive net [50] is chosen as deep learning based method for comparing the proposed method. Moreover, the two network structures proposed by us are combined with other network structures or methods for performance evaluation. The images reconstructed with sparse-view projection by FBP and the corresponding reference are set as input and label to training U-net, AS module respectively. Hyper-parameters such as learning rate, batch size are consistent with AS module. In the rest of the paper, we use FBP+U-net, FAR-net+U-net, FBP+AS module to denote those composed approach. Figure 11 displays the loss curves of neural networks methods over all iterations for noise free datasets. The FAR-net + demonstrates the best performance. The loss value of FAR-net(5) (including 5 layers in the FCL) is significantly  lower than compared to FAR-net(3) (including 3 layers in the FCL). Figure 12 lists some example images which are reconstructed from 60 views with FBP, TV based method, FBP+U-net, Comprehensive net, FAR-net(3), FAR-net(5), FAR-net+U-net, FBP+AS module and FAR-net + , respectively. The FAR-net can effectively reduce the linear artifacts, but its image boundary is fuzzy, where images reconstructed from FBP suffer with heavy artifacts caused by under-sampling projection. The results reconstructed with our proposed FAR-net and FBP with AS module method are superior to others classical method (TV-based) and comparative network methods in general. The quality of reconstructed image is degraded by streak artifacts distributed across the whole image. the other pre-or post-processing methods suppress image artifacts and improve image quality to various levels. However, as shown in the Figure 12 (c), TV based method suffered from a blocky effect and also smoothened some important small structures. FAR-net many easily cause edge fuzzy. The image reconstruction by using network methods has no obvious blocky effect and has a best performance.  Figure 15 shows the zoomed in region of interest (ROI) marked by in Figure 12. As indicated by the red arrows and circles indicate several noticeable structural differences between different methods. Our proposed FAR-net and FBP+AS-module take it advantage in recovering the high contrast bone edge, where other comparing methods, such as TV-based, FBP+U-net, Comprehensive net and FAR-net+U-net, lose the bone edge contrast with varying degrees.
The image quality indices such as PSNR and SSIM are displayed in Figure 13 and Fig 14. The proposed FAR-net + is not only superior to conventional method (TV based) but also better than deep learning based pre or post-processing method, though slightly better than FBP+AS module results. Ours method for sparse-view CT consists of two strategies and the first strategy (FAR-net) have obtained higher quality reconstruction results than FBP. Hence the second strategy can further obtain high quality images. Furthermore, we test these methods in noise situations. Figure 16 shows the loss curves of neural network methods, VOLUME 8, 2020   which are plotted over all iteration for noisy datasets. The results are similar to the case of noise-free. Figure 17 presents sample images obtained using the best FAR-net + and competitive methods. The images demonstrate that the FAR-net + , VOLUME 8, 2020  comprehensive net and FBP+AS module enabled better noise suppression and structural fidelity than the competitive methods. The blocky effect is still available in the image obtained by TV based method. Figure 18 is the regions of interest marked by red arrows in Figure 17. As indicated by the red arrows, clearly, the images predicated by FAR-net + and FBP+AS module have advantages both in contrast and edge preservation. The blurred structures and over-smoothness at the edge appear in the images reconstructed by TV based method. For the FBP+U-net method, similar to noise free conditions, some artifacts are still remained and low contrast details are submersed. Furthermore, the above results are validated by image quality indices. The PSNR and SSIM are shown in Figure 19 and Figure 20. FBP+AS module and FAR-net + performs more comparably in terms of noise suppression and structural fidelity scores comparing with the other comparison methods.
Comparing different methods from the above experiments. the proposed FAR-net + can predict images with better performance both in vision and evaluation indicators. This is mainly due to the better reconstruction effect of the deep FAR-net and the good performance of AS module in removing artifacts. Firstly, the deeper FAR-net has better ability edge-preserving and keep the structure smooth, in FAR-net with different FCL. Secondary, in the experiments of U-net and AS module, whether the input of these two networks is FBP reconstruction result or FAR-net result, AS module is better than FBP in details and evaluation indicators, which shows that AS module improves the result significantly.
According to the above analysis of the results, the proposed FAR-net can predict images with better performance. Furthermore, the deep learning method has a higher efficiency for image reconstruction. Table 6 demonstrates the number of parameters and time consumption of different reconstruction methods. Although there are many parameters in our proposed method, the FAR-net + only consumes 0.0845 seconds to predict an image from projection data.
Similarly, to test our method with practical applications, experiments are carried out on a rat paw, and this is the same data that we used in the experiment of FAR-net for full angle reconstruction. The sparse-view data are selected from full angle data and the size of sparse-view data are 60 × 600. the reconstruction image size is 512 × 512. Since the neural network needs large amounts of data and is limited by data collection, the training data set does not include any real data. Figure 21 shows the results reconstructed by FBP with full angle, FBP with sparse-view data, TV-based method and FAR-net which the hyper-parameter k = 30. It can be seen that the image is reconstructed correctly with the proposed FAR-net. For the FAR-net, streak artifacts introduced by FBP method as shown in Figure 21(b) have been suppressed to a certain extent and, obviously, better than TV-based method in keeping small structures.

IV. CONCLUSION
In this paper, we propose a neural network to map the CT reconstruction processing which is able to predict CT images from sparse-view projection data automatically. Different from most of the relevant works, which treated the neural network as black boxes, the FAR-net was directly motivated by sparseness non-negative matrix factorization and all the parameters are learned from training samples rather than precalculated. FAR-net + is a complete reconstruction framework for sparse-view CT, which the whole net is divided into two sub structures including FAR-net and additional AS module. This strategy makes the network deeper and realize a coarse-to-fine learning process and the size of whole net are optimized from O(n 4 ) to O(n 3 ) which could be distributed in a single workstation with multi-GPUs. Numerical experiments show that the FAR-net and FAR-net + can be effectively introduced into the reconstruction process and has shown the outstanding advantages in terms of noise suppression, artifact reduction, edge and feature preserving. Comparing to the conventional methods, the FAR-net and FAR-net + have demonstrated a superior performance over in both image quality and computational efficiency.