Image-Domain Based Material Decomposition by Multi-Constraint Optimization for Spectral CT

As a new generation computed tomography (CT) technology, spectral CT has great potential in many aspects, especially in the identification and decomposition of materials. To achieve higher accuracy of materials decomposition, we propose a multi-constraint based nonlocal total variation (NLTV) method, named as MCNLTV. Because image-domain based material decomposition belongs to the two-step material decomposition method, the Filter Back-Projection (FBP) algorithm or SART algorithm is used to reconstruct spectral CT images in the first step. Then the material attenuation coefficient matrix is obtained from the reconstruction results. In the second step, MCNLTV regularization is utilized to obtain the material decomposition image. Both simulation experiments and real data experiments are carried out. Experiment results show that the proposed method can obtain higher accuracy of material decomposition than traditional total variation based material decomposition (TVMD), ROF-LLT regularization and direct inverse transformation (DI) for spectral CT.


I. INTRODUCTION
In traditional computed tomography (CT) system, the energy integration detector (EID) is used to collect the projection data [1]. The so-called energy integration detector records the attenuation information of objects by accumulating X-ray photons emitted by different energies. It collects projections from the polychromatic spectrum and cannot identify the energy of each photon. As a result, different materials could have similar or identical linear attenuation coefficients in reconstructed CT images. This means that it fails to discriminate and decompose material components with the conventional CT system [2]- [4]. To further explore the capability of CT system, dual-energy CT (DECT) and the spectral CT have been developed. Different from traditional CT, DECT reconstructs images and analyzes the distribution of materials by utilizing projection data obtained from two groups of X-ray sources with different energy [5], [6]. Dual-energy CT can only decompose two The associate editor coordinating the review of this manuscript and approving it for publication was Ge Wang . or three materials and reduce metal artifacts to some extent [7], [8]. Compared with DECT, spectral CT can simultaneously obtain multiple channel-wise projections and then reconstruct corresponding material separation images, as well as the optimal comparative noise ratio of different tissues and organs [9]. In terms of the advantages in aspects of dose reduction, tissue contrast improvement, quantitative tissue analysis, beam hardening artifacts reduction, material discrimination, and decomposition, spectral CT is stronger than conventional CT and DECT [10].
In a spectral CT system, the photon-counting detector (PCD) as a core component is to collect multiple projections with different energy channels by one scan [11], [12]. In theoretical, the PCD can effectively reduce the electronic noise in acquired data and suppress the data noise by counting each entering photon of the detector [13]. It can also distinguish the energy of incident photons to obtain more X-ray energy information and a higher discrimination degree than existing DECT, which can effectively control noise amplification in the reconstruction process or other jobs [14]. In fact, because of the influence of X-ray fluorescence, charge sharing, K-escape, and pulse pileups, the accuracy of the material decomposition is decreased.
To achieve higher precision of material decomposition, it is important to consider more regularized priors to develop superior material decomposition methods. Generally speaking, there are two main material decomposition methods for spectral CT: Direct methods and indirect methods [9]. 1) The direct methods obtain the material composition directly from the X-ray emitting spectrum [11], [15]- [17]. But in practice, it is hard to estimate the X-ray spectrum. 2) The indirect methods can be divided into two classes: image-domain based methods and projection-domain based methods. The projection-domain based methods mean that a projection is firstly decomposed to specified material sinograms, and then image reconstruction is carried out. However, it is difficult to get the X-ray transmission spectrum model in practice. The image-domain based methods [18], [19] firstly reconstruct images from the spectral CT dataset and then decompose the materials from the reconstructed results. There are many iterative reconstruction methods developed for the first step, but less work concerned about the second step, such as TDL [20], L 0 TDL [21], SSCMF [22], NLCTF [23], ASSIST [24], L 0 -PICCS [25], SISTER [26]. In this work, we focus on the second step of the image-domain based methods.
In this article, we propose a material decomposition method that uses multi-constraint optimization by nonlocal total variation (NLTV [15]) regularization, volume conservation [25], [27] and we consider the air pixels. Compared with traditional total variation (TV [11]), NLTV exploits the non-local self-similarity of the image that makes it better to protect image details. In conventional TV, the gradient is always computed as the difference between the target pixel and its nearest neighbors [28]. However, in NLTV, all pixels in the image are utilized as neighbors to calculate the weighted gradient. The weights depend on the similarity of the patch structures between the target patch and the neighbors [28]. The method is originally proposed for natural image processing to restore repetitive patterns and textures, and now it has been applied in many aspects. Specifically, to better restore an image from the corrupted image, Huanyu et al. [29] refined NLTV and obtained more image details, clear edges and preserved image contrast. Liu et al. [30] introduced NL operator into RTV and used it for image smoothing. It can effectively suppress noise, preserve contrast and edge. Nie et al. [31] used NLTV for PoISAR data spot reduction. It is valid to preserve the resolution, textures and details. Ren et al. [32] utilized the NLTV for single image super-resolution image reconstruction, and got high-quality images that have sharper edges and lower-level noise. Lv et al. [33] explored the Bayesian inverse problem in CT image reconstruction utilizing NL operator. The numerical experiments demonstrate the performance of that method is competitive against existing and adapted methods. Compared with the previous work, we creatively used NLTV for image-domain based material decomposition.
The rest of the paper is organized as follows. In section II, the derivation of the mathematical model based on multiconstraint optimization will be given. In section III, simulation and real data experiments are carried out. In section IV, we discuss some issues and make a summary of this study.

II. MULTI-CONSTRAINT OPTIMIZATION MODEL A. MATERIAL DECOMPOSITION MODEL
When the photon counting detector is carrying out, assuming the photon number of each detector cell in the energy window E n is p i,l . It can be expressed as: where [E i , E i+1 ] is the integral along the energy window E i to energy window E i+1 , q i,l (E) indicates the original X-ray photon numbers emitting from X-ray source, h∈l f n (E, r) dr represents the attenuation coefficient at the r th position of the l th path in the energy window E i . In Eq. (1), the summation of the mass-attenuation coefficient f n (E, r) can be represented as: where θ n represents the mass-attenuation coefficient with the n th material which can be found on the internet, u n is the components for n th material. For the image domain based material decomposition, u n can be known according to Eq. (2), and f n can be obtained from spectral CT reconstruction. To obtain the material component maps, Eq. (2) can be rewritten as matrix form: where θ mn indicates averaged attenuation coefficients of m th material at n th energy window, u n is the material component maps, f M represents the spectral CT reconstruction results. Eq. (3) can be expressed as following where U(U ∈ R I ×J ×n ) and F(F ∈ R I ×J ×m ) are two 3order tensors and they indicate the reconstructed images and material component maps, respectively. U (3) and F (3) are mode-3 unfolding of tensor U and tensor F. To recover the material maps, direct inversion (DI) is employed [34]. The Eq. (4) can be written as follows where θ −1 L is the pseudo inverse of θ, which is equivalent to θ T θ −1 θ T . Therefore, Eq. (5) can rewrite as following: However, in practice, the reconstructed image is often destroyed by noise. Noise should be introduced into the model as following where represents the noise. That means we cannot solve the equation by direct inversion. Therefore, we converted Eq. (7) into the following problem arg min To improve the ability of denoising, the regularization prior is a feasibility strategy to constrain the feasible domain during the iteration process. So Eq. (8) can be further modified as: where · F is the F-norm(Frobenius norm), 1 2 is the data fidelity term, R(U) is the regularization term. λ is the regularization factor to balance the data fidelity term and the regularization term.

B. MCNLTV METHOD
As described earlier, NLTV can be used as a regularization term to constrain the problem [35]. So R(U) can be written as It exploits the typically existing non-local similarity throughout natural images to improve the quality of the reconstruction image. Assuming the patch size is p × p, so for each vector form target patch P i P i ∈ R p 2 ×1 , which centered at the location i, we search the similar patch P j P j ∈ R p 2 ×1 in a local area around i with size r × r. The similarity weight between the two patches is defined as: where G a is the Gaussian kernel with standard deviation a, h o is a global filter parameter. For the convenience of subsequent understanding, some other definitions are given below [35].
1) The non-local gradient ∇ w NLTV u(x) is a vector defined as: 2) Assuming a graph divergence of vector v is defined by the standard adjoint operator with the gradient operator, and the definition of graph divergence div w NLTV v(i) can be written as follows: 3) The graph Laplacian operator is defined by Then, the NLTV can be defined as the following where U i is the center pixel of P i and represents the i th pixel in the reconstructed image U, A 1 denotes the index set for all the pixels of U, A 2 represents the index set for the non-local similar pixels of U i . According to the above equations, we defined a non-local gradient vector of U i as ∇ w NLTV U i , so Eq. (12) and Eq. (15) can be modified as compact form as follows: The whole non-local gradient can be represented as Eq. (15) can be written with a simple form as follows: The NLTV regularization model can be involved into arg min It is difficult to determine U from Eq. (19) To further improve the precision of material decomposition, we considered more constraints. Specifically, if the air is treated as basic material, the summation of the pixel (including air pixels) with the same location in different material images should be equal to one. One constraint is introduced, as shown in Eq. (21). Moreover, to solve the objective function of Eq. (20), we introduce a tensor S to replace with the tensor U. That means more constraints can be utilized. So Eq. (20) can be furtherly formulated as the following formula. That is the mathematical model of the proposed multi-constraint optimization method arg min where γ is the optimization factor and A represent the air matrix. As a multi-constraint problem, Eq. (21) can be converted into an unconstrained problem as follows: arg min β is the Bregman penalization parameter. The variable b k is determined by Bregman iteration. Thus, the problem is converted into four sub-problem as follows: (a) U-related subproblem: As for Eq. 24(a), there is a two-step strategy to solve the problem, and the first step is to solve the following Eq. (25). arg min Eq. (25) is a constraint convex programmable optimization problem without air pixel value constraint. Considering Eq. (25) with pixel level, it can be further written as where Assuming its derivative equals zero, it can be equal to the following minimization problem: arg min We can find that it is a constraint least square problem, which can be solved easily. The second step is to find the air region by setting a threshold on the U k+1 . because the air pixel value is a small positive value. The given threshold is set as 0.99 in this work. If the pixel value within U k+1 is larger than the given threshold. It can be thought of as a single material whether it's air or something else without noise. The pixel value can be computed by direct inversion.
(b) S-related sub-problem: From Eq. (24.b), the objective function can be solved by the derivative method. Again, (28) can be written as: It can be further involved in Eq. (30) can be solved by Gaussian-Seidel method.
(c) d-related subproblem: The updating of vector d k+1 can be solved by using soft shrinkage operator: It can be represented by the following equation: According to the above contents, the method can be summarized as the following steps. The first step is obtaining the reconstructed image by FBP or SART algorithm. Then, utilizing the result to formulate the material attenuation matrix. Finally, we implement material decomposition procedure. The pseudo-code of the entire algorithm flow is shown as follows: Algorithm 1 MCNLTV Input:λ, β, γ ,K and other parameters;

III. EXPERIMENTS AND RESULTS
In this article, the simulated and clinical experiments are used to verify the effectiveness of the proposed multi-constrained optimization model for material decomposition. The results of the proposed method are compared with those obtained by DI, TVMD and ROF-LLT [36] methods. The root means square error (RMSE), peak signal-to-noise ratio (PSNR), and structural similarity (SSIM) are used to quantitatively evaluate the decomposed results.

A. SIMULATION STUDY
A numerical mouse thorax phantom injected with a 1.2% iodine contrast agent was used for the simulation. This thorax phantom consists of three basic materials, bone, soft tissue, and iodine contrast agent, as shown in Fig.1 (a). A polychromatic 50KVp X-ray source is assumed and the spectrum is divided into 8 energy bins which is shown in Fig.1(b), ([16, 22) keV, [22,25) keV, [25,28) keV, [28,31) keV, [31,34) keV, [34,37) keV, [37, 41) keV and [41, 50) keV.). The PCD includes 512 bins and the distance between each bin is 0.1mm. The distances from the X-ray source to the PCD and the object is 180 mm and 132mm, respectively. From a full scan, the PCB can collect 640 × 8 projections. In this experiment, the photon number of each emitting X-ray path is 7 × 10 4 . To better reflect the ability of the proposed method to suppress noise, the Poisson noise was added to the reconstruction results. In this study, the images for decomposition were firstly reconstructed by FBP algorithm. The images consist of 256 × 256 × 8 pixels and are shown in Fig.1(c).
The material decomposition results are shown in Fig.2. Table 1 shows the quantitative evaluation indicators of the four methods (RMSE, PSNR, and SSIM). The results of the second behavior, DI, shown in the figure, indicate that some of the results of the iodine contrast agent are misclassified in the bone breakdown results. In soft tissue decomposition results can be seen the larger noise, which has a greater impact on the results. There is a distinct misclassification in the decomposition of iodine contrast agents. In general, the results obtained by the DI decomposition method are not very ideal.
The 3 rd row in the Fig.2 is the result obtained by TVMD after 40 iterations. It can be seen from the DI's result that the decomposition results of bone are wrongly decomposed with iodine contrast agent had disappeared, which indicated by the image structures with arrow ''1'' and ''2''. The results obtained by TVMD are more accurate and reliable compared with the result of DI. From the quantitative indicators, it can be seen that TVMD results are better than those of DI. Therefore, overall, the TVMD method is more precise than DI. In the graph of the fourth row, the enlarged image of bone result shows that the ROF-LLT method is clearer than TVMD in the image details. Moreover, soft tissue result shows that it does a better job of suppressing noise and the details are even better than MCNLTV. And, the result of iodine contrast agent performs better than other two methods.
As for the results of bone, it can be seen that the magnified area is much clearer than other methods, it is also closer to reference image than TVMD and ROF-LLT, and the details in the Fig.2 are better than TVMD which can also be seen from the area indicated by arrows ''1'' and ''2''. The result of the iodine contrast agent is more obvious, and there is almost no misclassification. In general, the results obtained by the MCNLTV method are more accurate. The quantitative indicators in Table 1 also show that the MCNLTV can get better results in terms of the smallest RMSE value, highest PSNR value, and highest SSIM value.
To further compare the accuracy of the decomposed materials from three methods, the profiles in Fig. 3 are extracted and shown in Figs.4 and 5. Compared with other methods, the pixel value calculated by the MCNLTV method is closer to the reference image.

B. CLINICAL MOUSE EXPERIMENT
A mouse injected with contrast agent (gold nanoparticles, GNP) was scanned by a MARS micro spectral CT system for the clinical experiment. The spectral CT system contains a micro x-ray source and a flat-panel PCD. The source tube  with 13 energy channels operated at 120 KVp and 175 mA. The energy of every channel was above a given energy threshold and increased with the increment of the energy channel index. The distance between the X-ray source and the object is 158mm, and that from the X-ray source to PCD is 255mm. The PCD consists of 512 bins covering a length of 56.32 mm. The projection data with 13 energy bins is collected from different 371 views by employing multiple scans. The size of reconstructed image is set as 512 × 512 × 13 and Fig.6 shows reconstructed images from four representative energy bins (1 st , 4 th , 9 th and 13 th ) using the SART algorithm.  To evaluate the performance of the material decomposition results with different algorithms, Fig.7 shows the material decomposition results using different methods. Regarding as the bone component, the results by DI suffer from serious noise, which can result in inaccurate classification accuracy. It can be seen that it is difficult to distinguish bones in the figure. In the results of TVMD, the noise can be reduced with material decomposition accuracy improvement, but some details of the image are still missing. About the ROF-LLT results, it has a similar result with TVMD. In the MCNLTV results, noise and misclassification were further mitigated with many small details and features. In terms of soft tissue component, DI results also suffer from high noise and wrong decomposed pixels. But the results show the TVMD has a better performance on noise reduction. About the ROF-LLT results, it performs worse than TVMD. In the results of MCNLTV, the classification is more accurate, the denoising effect is better, and the details of the image have been greatly improved. Moreover, MCNLTV can effectively suppress the outlier artifacts in the GNP result. Fig.8 further shows the comparison of four methods with four ROIs (ROI-A, B, C, D). As shown in ROI-A, the DI results are suffering from a lot of noise and error decomposition results, which make it difficult to distinguish the details. Although the noise and error classification are reduced in the TVMD results, the structures of the ROIs are still poor. And the ROF-LLT results are like TVMD. In MCNLTV, the noise reduction effect is further enhanced, and the structure and boundary details are improved, as the image structures indicated by arrow ''4''. In ROI-B, it can be observed that the MCNLTV can protect the bony structures and edge better than the other two methods. In ROI-C, from the tissue structure ''5'', it can be seen that the MCNLTV can retain accurate structures with higher quality. The edge is also better than the comparisons. The same thing with ROI-D, the tissue structures are clearer and more complete. The regions indicated by arrow ''6'' and arrow ''7'' also show that the MCNLTV results are better than other comparisons.

IV. DISCUSSION AND CONCLUSION
In this article, a non-local total variational optimization model based on multi-constraint optimization was proposed to improve the accuracy of material decomposition for spectral CT. This model introduces NLTV regularization to image domain material decomposition and adds more constraint conditions to guarantee more accurate results. The method makes full use of the non-local similarity of the image and deals with the air pixels.
The experiments show that MCNLTV can obtain a higher accuracy decomposition than traditional DI, TVMD and ROF-LLT regularization. In terms of decomposed images, MCNLTV has better performance on denoising and preserving image details than other methods. From the simulation experiment, we can see MCNLTV can effectively decompose the iodine contrast agent materials, but it is not good enough for soft tissue materials decomposition when compared with the ROF-LLT method. And further experimental verification is needed in the future.
From the real data experiment, the proposed method shows the advantages of decomposition accuracy, preserving image details and noise reduction. However, there are still some limitations. First, this method contains many parameters, which are difficult to select in practical application. Second, this method is time-consuming due to calculate its non-local weight. Third, this approach still loses a bit of detail like the small soft tissue in the simulative results. As for the next work, we would contribute our efforts from the following aspects. First, it is necessary to develop an automatic strategy to optimize the parameters. Second, more experiments should be taken in the case of the imaged object contain three or more materials. Third, how to accelerate the implementation should be investigated in our future work.
In summary, the method MCNLTV we proposed has a good performance on image-domain material decomposition for spectral CT.