Super-Energy-Resolution Material Decomposition for Spectral Photon-Counting CT Using Pixel-Wise Learning

Spectral photon-counting CT offers novel potentialities to achieve quantitative decomposition of material components, in comparison with traditional energy-integrating CT or dual-energy CT. Nonetheless, achieving accurate material decomposition, especially for low-concentration materials, is still extremely challenging for current sCT, due to restricted energy resolution stemming from the trade-off between the number of energy bins and undesired factors such as quantum noise.We propose to improve material decomposition by introducing the notion of super-energy-resolution in sCT. The super-energy-resolution material decomposition consists in learning the relationship between simulation and physical phantoms in image domain. To this end, a coupled dictionary learning method is utilized to learn such relationship in a pixel-wise way. The results on both physical phantoms and in vivo data showed that for the same decomposition method using lasso regularization, the proposed super-energy-resolution method achieves much higher decomposition accuracy and detection ability in contrast to traditional image-domain decomposition method using L1-norm regularization.


I. INTRODUCTION
S PECTRAL photon-counting X-ray computed tomography (sCT) provides additional X-ray spectrum information about material components in an organ or tissue, due to its ability of separating photons into different energy bins by using photon-counting detector (PCD). This ability makes possible efficient material decomposition of which the objective is to quantitatively measure different materials existed in a pixel.
It is a huge challenge to obtain high accuracy of material decomposition in clinical applications, especially for low-concentration materials [1], [2]. This is linked to the principle itself of sCT. With respect to conventional CT or dual-energy CT (DECT) to some extent, sCT divides the whole energy range into so-called energy bins. More there are energy bins, the higher energy resolution (narrower energy-bin width) and more spectral information. However, more energy bins mean at the same time less photons in each energy bin and consequently higher quantum noise. Many aspects can influence the accuracy of material decomposition, including exposure time, energy bin width, acceptable flux rate, etc. Although there is complex interdependency between the parameters, we will focus on energy bin that is a basic parameter for spectral CT in contrast to conventional CT. In other words, energy resolution of sCT is limited by a trade-off between the total number of energy bins and the image quality at each bin, which restricts the accuracy of material decomposition and ability of detecting low-concentration materials.
Existing material decomposition methods can be divided into three categories: projection-domain [3]- [6], image-domain [7]- [12] and one-step material decomposition [13]- [15], according to the sequence of material decomposition and image reconstruction, i.e. before, after or during. Most decomposition methods of all the three ways try to improve performance of material decomposition by exploiting the high correlation between spectral and spatial features embedded in the multi-energy images. However, almost no work can be found in the literature to enhance energy resolution of sCT by software. Actually, for DECT, there is an imaging method that aims to enhance the visibility of contrast agent by reconstructing the image at one chosen computed energy, called virtual monochromatic image (mono-E or VMS) [16]. The performance of mono-E from DECT is determined by the material decomposition that is processed only on actual images, which is noisy and depends on the chosen computed energy.
In this paper, we propose to introduce the notion of super-energy-resolution (SER) into sCT in order to improve the accuracy of material decomposition in image domain. The idea is to learn the relationship between spectral features of simulated data and those of physical data, then applying it to other physical data. To do it, theoretical sCT images containing abundant spectral energy information, designated as SER images, are simulated according to physical phantoms. The mapping between actual sCT images and SER images are learned by utilizing a coupled dictionary learning (CDL) method in a pixel-wise way. SER images synthesized based on the learned mapping are then used for material decomposition in image domain.

II. MODEL AND METHOD
Common models of image-domain material decomposition are firstly shown in this section. Then, SER for sCT is introduced and modeled. At last, the proposed SER-based material decomposition is described.

A. MODELS OF IMAGE-DOMAIN MATERIAL DECOMPOSITION
Tomography reconstruction, which is a preliminary step for image-domain material decomposition methods, is separately processed at each individual bin. Reconstruction at the i-th energy bin tries to achieve the spatial image, i.e. the linear attenuation coefficients (LAC) µ(⃗ x, i) at pixel ⃗ x: where s i (u) represents the recorded photons along the u-th ray in the i-th energy bin, C the mapping function between s i (u) and LAC, D the data-fidelity term, and R the regularization. The detailed relationship between µ and s, i.e. C in equation (1), is described by:s wheres i (u) denotes the mean value of s i (u), n 0 (E) the incident photon fluence from X-ray source, L(u) the u-th ray, µ(⃗ x, E) the LAC at energy E, and d i (E) the detector absorption efficiency or detector response function that can give the probability of a photon of energy E to be measured within bin i [3]. For X-ray CT images, noise level is negatively related to the square root of the total number of detected photons. In other words, sCT suffers from stronger quantum noise for narrower bin. Material decomposition is then performed on the reconstructed multi-energy images: where subscript α denotes the α-th basis material, µ mα (i) the calculated i-th energy-bin effective mass attenuation coefficient (MAC), B the number of energy bins, M the number of basis materials, and ρ α (⃗ x) the mass density at pixel ⃗ x. We rewrite image-domain material decomposition into its matrix form as: where Y ∈ R B×N P designates the reconstructed multienergy spatial images of µ with N P designating the number of pixels or voxels, M ∈ R B×M the decomposition matrix of µ m , X ∈ R M×N P the decomposed basis material images of mass densities ρ and N the noise.
Consequently the aforementioned trade-off between the number of energy bins and noise level leads to limited energy resolution in sCT. In practice, the number of energy bins of sCT is generally scheduled by manufacturers, e.g. 5 bins for Phillips sCT prototype [17].

B. SUPER-ENERGY-RESOLUTION IMAGING IN SCT
We propose to improve the energy resolution by means of fully exploiting both simulation data and physical data. To do it, the notion of SER is introduced in sCT.
SER images are virtual multi-energy images with much more bins but noise-reduced compared to the actual sCT images. Fig. 1 illustrates the relationship between actual sCT, denoted by low-energy-resolution (LER) imaging, and SER imaging by respectively plotting the multi-energy values of corresponding pixels in the LER and SER images. For one pixel in the reconstructed LER images (e.g. yellow asterisks in the left of Fig. 1, designated as y L ∈ R B L ), it has 5 values for 5 energy bins, i.e. B L = 5. While for the corresponding SER pixel (yellow points in the right of Fig. 1, designated as y S ∈ R B S ), it can have much more values, e.g. 50 values for B S = 50 (30 -80 keV). Note that the choice of 50 bins corresponds to the most precise µ m we can get from NIST [18]. That is, µ is sampled every 1 keV. The proposed SER imaging is achieved by introducing the assumption that if containing the same basis materials and concentrations, the mapping between LER and SER images is determined and can be learned. We realize it by learning the relationship between physical data and simulation data and then applying it to other physical data, as illustrated in Fig. 2. The physical data are measured and reconstructed as LER images, while the corresponding SER versions are produced by simulation with same shapes and components but more bins. Then, LER and SER spectral features are extracted and their relationship will be learned by a mapping which can be utilized to synthesize SER images for other in vivo data.

C. SER IMAGE SYNTHESIS BASED ON COUPLED DICTIONARY LEARNING
We learn the mapping between LER and SER spectral features by utilizing CDL method in a pixel-wise way.

1) Pixel-wise training strategy
It is inefficient to directly learn the mapping from the whole LER and SER images or patches. In general, learning morphological features of certain organ or more is another challenge that demands abundant images with various spatial information for training. It will be expensive to produce these kinds of physical data currently. Meanwhile, complex material components can make the learning more difficult.
We solve it by training the mapping from LER and SER pixel pairs, rather than the whole images or patches. With this strategy, the training samples only contain spectral information. Correspondingly, the physical phantoms used in our work can have simple shapes, but abundant material components.
Meanwhile, to cope with the difficulty of extracting features from noisy LER pixels, we enhance the learning by adding conventional CT images. The conventional CT images of physical phantoms are measured on the same sCT system but reconstructed with full spectrum (i.e. a single energy reconstruction), thus having lower noise level compared to with single energy bin. The conventional CT and sCT pixels are then concatenated together as one LER sample for training. As a result, now y L ∈ R (B L +1) is a combined LER pixel.

2) Learn the mapping by CDL
CDL was used in the field of super spatial resolution [19], [20]. In the theory of dictionary learning, features of a pixel y can be represented as a sparse combination of atoms inside an over-complete dictionary D. That is, y = Dα, where α is the sparse codes with very few nonzero entries: where ∥α∥ 0 designates L 0 norm of α. Generally, solving optimization problem in (5) is NP-hard, and it can be instead solved using L 1 norm, as: where λ denotes the Lagrange multiplier. LER and SER pixel sets can then be separately represented using a LER dictionary D L and a SER dictionary D S : where Y L ∈ R (B L +1)×N t and Y S ∈ R B S ×N t respectively designate the LER and SER pixel sets with N t pixels in the training set, D L ∈ R (B L +1)×N D and D S ∈ R B S ×N D the LER and SER dictionaries containing N D atoms, Λ L and Λ S the corresponding sparse codes (i.e. sets of α in equation (5) and (6)), and ∥Y − DΛ∥ 2 F the data fidelity VOLUME 4, 2016 term determined by Frobenius-Norm ∥·∥ F . Then, the connection between LER and SER features is established by jointly training D L and D S , formulated as: Meanwhile, a transform matrix is utilized to decrease the impacts of dramatic gap between simulation and physical data: where W denotes the transform matrix that was firstly introduced in super spatial resolution for stable crossstyle image synthesis [20]. The enhanced training model then becomes: We solve the above model by alternating-directionmethod-of-multipliers (ADMM), which alternately updates coupled dictionaries D, sparse codes Λ and transform matrix W.
It should be noted that atoms in the dictionaries learned using a pixel-wise strategy are energydependent, in contrary to spatial features in traditional super-spatial-resolution applications. Figure 3 illustrates the difference between super spatial resolution and super energy resolution by showing the learned features in practice. Atoms learned from a spatial image are morphological features, while for SER, atoms in the dictionary are energy-dependent and look more like µ curves.
After training, we can readily synthesize SER pixel y S from its LER version y L with the learned coupled dictionaries and transform matrix, by solving the following optimization:

D. MATERIAL DECOMPOSITION USING SER IMAGES
The synthesized SER images are then utilized for material decomposition in image domain. For existing sCT, decomposition matrix M is generally initialized by the aforementionedμ m [7] according to:μ where µ mα (E) designates the theoretical MAC value and E i the width of the i-th bin. Note thatμ m (i) is an estimate of the true value of µ m (i) corresponding to the i-th bin. As a result, thinner bins can produce more preciseμ mα (i), i.e. more precise M. A more precise decomposition matrix suitable for the SER images is then needed. For LER, µ m of each basis material (one  To assess the decomposition of the synthesized SER images, we choose a common image-domain decomposition algorithm using L 1 norm (lasso) regularization. Mathematically, we formulate the SER-based material decomposition using lasso as:

A. EXPERIMENTAL SETTING AND EVALUATION METRICS
The experiments were carried on real data from a Philips sCT prototype [21]- [23]. This medical approved prototype has a 500mm FOV and a pixel pitch of 0.274mm at iso-center. The photon-counting detector has a 1848x64 grid. However we used a 2x8 binning mode to obtain 924×8 pixels with a size of 1x2mm. The experiments were performed at tube voltage 120 kVp and current 220 mA under rotation time of 1s with a small focal spot. Reconstructed images from one single bin are shown in Fig. 5a. Fig. 5b depicts the cylindrical inserts of the physical phantom. The upper image in Fig. 5a is for training, while the lower two regions-of-interests (ROIs) of phantoms (i.e. all the areas inside red contours) are for training and testing, respectively. The reconstructed images for training and testing have different noise levels due to the presence of extensional fat rings added to the testing phantom exterior. Re- movable tubes containing different dilutions of iodine or gadolinium are inserted in the ROIs, as illustrated in Fig. 5b. The number refers to the concentration of each tube which covers from 0.1 mg/cc to 15 mg/cc. In our experiments, both iodine and gadolinium were measured for training and testing. Meanwhile, we use another in vivo dataset to further evaluate the proposed method. In our experiments, a rabbit was injected by gadolinium dilution, the image of which 5 and 15 minutes after the injection are respectively given in Fig. 6. For quantitative evaluation, tubes with known concentrations of gadolinium (2, 5, 10 mg/cc) are also placed beside the rabbit. The image of 5 min after injection is a little blurred due to movement. All the spatial images in physical datasets were separately reconstructed at each energy bin via a common reconstruction method: conjugate gradient method (with Reconstruction Toolkit (RTK) [24]). Note that the decomposition method of Philips prototype is projection-domain based, while our proposed SERbased method is performed in image domain. What's more, our reconstruction software (RTK) is also different from that used by Philips. In other words, we apply our own algorithms on prepped photon count VOLUME 4, 2016 data from the scanner, whereas other correction steps from the Philips image chain (e.g. proper handling of bad detectors) are not used. Hence the results will be different from the Philips' reconstruction results. The detector bin response function d i (E) in model (2) was provided by Philips. All phantoms were reconstructed into 9 slices and the size of each slice is 900*900. We learned the coupled dictionaries composed of 1024 (N D ) atoms from 400K (N t ) random LER-SER pixel pairs. On a machine equipped with Nvidia Tesla V100 and Intel I9 CPU, our method took 18h for training and 5 minutes for testing of one slice.
Both SER images synthesis and material decomposition were quantitatively evaluated. Normalized Root Mean Square Error (NRMSE) (i.e. NRMSE = RMSE/x for the image x) and peak signal-to-noise ratio (PSNR) were utilized to assess the quality of synthesized SER images. Note that NRMSE and PSNR were respectively calculated on the whole images of iodine and gadolinium, in order to assess the total synthesis performance of all concentrations. The smaller the NRMSE and the higher the PSNR, the more accurate the synthesis.
Mean and standard error of the mean (SEM) were calculated for each dilution to assess decomposition accuracy. Meanwhile, linear regression between the nominal (y-axial) and measured concentrations (using the calculated mean values, x-axial) was performed for all dilutions, as: y = ax + b, where a and b represent the slope and y-intercept, respectively. To further assess the detection ability of contrast agents, we also calculated the limit of detection (LOD) for both iodine and gadolinium. The LOD was described by the Clinical and Laboratory Standards Institute [25] as: wherex blank denotes mean water signal detected in the images of decomposed contrast agents, δ blank the standard deviation of the measurement, and LOB the limit of blank. LOB denotes the highest apparent contrast agent concentration expected to be found when replicates of a sample contain no contrast agent, which was calculated using tubes of water in each phantom. LOD is taken as the lowest concentration of a contrast agent in a sample that can be detected.

B. TRAINING
NRMSE and PSNR (dB) of synthesized SER images at each energy bin are illustrated in Fig. 7. SER images of iodine have relatively higher accuracy than gadolinium, e.g. smaller NRMSE and higher PSNR, although both of them have good synthesis performance. Note that low-energy bins have relatively larger NRMSE and smaller PSNR, especially for bins of k-edge (i.e. energy bins containing k-edge). Also, we need to recover more details at the first two bins: 30 -50 keV and 51 -61 keV.  The material decomposition results based on synthesized SER images are shown in Fig. 8. Visually, the proposed SER-based decomposition presents good detection ability according to the edge-preserving performance in training. It can substantially preserve edge information even for low-concentration iodine and gadolinium e.g. 0.5 mg/cc, as illustrated in the ROIs. Note that all the ROIs are shown with grayscale images in order to highlight morphological information.
To quantitatively assess the decomposition, linear regressions of iodine and gadolinium for all the concentrations are illustrated in Fig. 9. Note that there is no additional scatter correction for all the results of linear regression in this work. Both iodine and gadolinium have clearly high accuracy with respect to theoretical concentrations. The correlations between the measured and prepared concentrations are strongly linear for all dilutions (all R 2 ≥ 0.95). The slopes are close to 1 for both iodine (a = 0.96) and gadolinium (a = 0.97). The y-intercept values are very low, e.g. b = -0.26 for iodine and -0.38 for gadolinium. Additionally, the decomposition is very reliable because both iodine and gadolinium are always underestimated and their SEMs  are small. Table 1 lists LOD for both iodine and gadolinium, to further assess the detection limit. Both iodine and gadolinium can be detected for concentrations lower than 0.5 mg/cc in training. More specifically, the proposed SER method has better detection ability for iodine than for gadolinium.

C. TESTING
The LER-based material decomposition method using the same lasso regularization was compared to the proposed SER-based method. Both physical phantoms with more noise and artifacts and in vivo data were tested. Fig. 11 shows the NRMSE and PSNR (dB) of synthesized SER images at each energy bin for testing. As in the case of training, SER images of iodine have relatively higher accuracy than gadolinium, e.g. smaller NRMSE and higher PSNR. Compared to training, although testing has relatively higher NRMSE and smaller RSNR, the difference is small.
The material decomposition results of physical phantoms with more noise and artifacts are shown in Fig. 10. In contrast to LER method, the proposed SER method shows better performance in quantification and detection compared to LER method. Firstly, the proposed SER method gives much better morphological accuracy in terms of edge-preserving performance and much less noise, even for low-concentration materials, e.g. 0.5 mg/cc for both iodine and gadolinium. The edges are substantially preserved by SER method. While for LER method, the selected areas of low-concentration materials are overwhelmed by noise and cannot be detected. Secondly, more quantitatively as shown in Fig. 12, linear regressions of both I and Gd illustrate that SERbased method has much smaller errors. SER method presents more precise slope (for iodine: a LER = 0.71 and a SER = 0.87; for gadolinium: a LER = 1.24 and a SER = 0.94) and y-intercept value (for iodine: b LER = 1.71 and b SER = −0.08; for gadolinium: b LER = 0.92 and b SER = −0.51). Meanwhile, SER method is more reliable with smaller SEMs and almost all decomposed values always exhibit the same underestimation trend. As can be observed in Table 1 that lists LOD values of different methods for both iodine and gadolinium, our SER method has much stronger detection ability for both iodine and gadolinium compared to LER method by producing smaller LOD values in testing. It has better detection ability for iodine than for gadolinium, as in the case of training.
Meanwhile, the decomposition results of different methods on in vivo data are given in Fig. 13. We identify calcium (Ca) inside spine as the basis material iodine in terms of their similar MAC under our energythresholds arrangement. Clearly, LER method has less accurate separation between gadolinium and iodine and has more noise for the two basis materials. In contrast, the SER method has better detection ability for both iodine (in the spine) and gadolinium (in tubes and organs). Mean and SEM of gadolinium in the tubes are listed in Table 2. Obviously, SER method leads to more accurate decomposition for gadolinium of concentrations covering from 10 mg/cc to 2 mg/cc.

IV. DISCUSSION
Our main goal is to improve the energy resolution of sCT in order to improve the k-edge quantification accuracy and detection ability. To achieve that we introduced the notion of SER in sCT to achieve SER imaging, by fully exploiting abundant spectral features embed- ded in both simulation and physical data. Preliminary results revealed that the proposed SER method enables accurate decomposition and high detection ability even for low-concentration materials. The SER-based material decomposition method is reliable. Firstly, even for noisier conditions in training, the proposed SER method still enables high decomposition accuracy and detection ability. In our experiments, we trained the dictionaries on the phantoms with less noise and artifacts, but tested on the ones with stronger noise and more artifacts. However, the results still show better decomposition accuracy and detection ability of In this work, we did not consider the mixtures of multiple elements in the experiments. Nevertheless, materials detection mainly relies on the subsequent decomposition method if the input is SER images. At the same time, the ability of the decomposition method with L1 norm to detect mixtures has been demonstrated in [26], [27]. Hence, our method would work for the mixtures of interested elements (I or Gd).
Concerning the partial volume effect, which arises when more than one material type occurs in a voxel (or pixel) having a limited spatial resolution, it is a very interesting and pertinent issue in sCT. In our method, material decomposition is processed pixel by pixel, which implies that only material category and density are accounted for and spatial neighborhood is not involved. As a result, partial volume effect would be minimized. Nevertheless, it would be interesting to study how a decomposed material quantitatively changes with different spatial resolutions, different types of mixtures, different material proportions in the voxel, etc.
The interest of projection-domain and one-step methods lies in the fact that they generate directly material images from projection data without needing to first reconstruct spatial images. In contrast, image-domain decomposition has its own advantages of being able to both exploit spatial images and generate material images. On the other hand, projection data is not always available or not practical to use in clinical practice because of commercial confidentiality or practical constraints. In short, image-domain methods are hardware-independent and thus provide much flexibility than the other 2 methods.
Each single synthesized SER image is a virtual monochromatic image (mono-E). The mono-E image form DECT is generally reconstructed from a pair of decomposed basis material images (e.g. a soft-tissuelike basis material and a contrast agent). The performance of mono-E is determined by the material decomposition that is processed only on actual images (LER images for sCT in the present study). In contrast, SER VOLUME 4, 2016 images are directly synthesized using spectral features learned from both simulation and physical data. As a result, our method could contribute to more accurate mono-E imaging.

V. CONCLUSION
We have proposed a SER imaging method to improve image-domain material decomposition in sCT. The method is based on generating SER images from LER images. This is achieved through learning mapping between simulation and physical data in a pixel-wise way and applying it to other physical data. The results on both physical phantoms and in vivo data showed that the SER-based material decomposition method shows clearly higher accurate material decomposition and detection ability compared to the LER-based method. In the future work, the performance of other imagedomain decomposition methods using SER images will be investigated. It would be also interesting to compare image-domain decomposition with projection-domain methods such as that used by Philips sCT prototype.