Introduction
X-ray computed tomography (CT) has become a powerful inspection tool for medical applications [1]. Spectral CT based on absorption spectrum adds the specificity of diagnostic information by showing the presence and distribution of substances, which has been used in many clinical applications, including gout diagnosis [2], automatic bone removal [3], and stone composition characterization [4]. The virtual monochromatic images can effectively reduce beam hardening artifacts in the images reconstructed by spectral CT. Beam hardening comes from the principle that lower-energy photons are more easily absorbed than higher-energy photons in broad spectrum of X-rays, so the average energy of the spectrum increases as the photons pass through the object. Structures with high attenuation coefficients (such as dense bone tissues and implanted metals in the body) can greatly degrade image quality including incomplete reconstruction or severe artifacts [5]. Recent research highlights [6] the potential benefits of virtual monochromatic images from dual energy. The ability to obtain accurate attenuation coefficients of materials and remove beam hardening artifacts is a major benefit of virtual monochromatic images [7].
Over the past decade, a virtual monochromatic image can be synthesized from either a projection (raw data) domain or an image domain [8]. A straightforward approach to dual energy processing based on raw data is to decompose the projection data into two materials of any density, allowing reconstruction of material-specific images (dual material decomposition). Although virtual monochromatic CT images synthesized in the projection domain have theoretical advantage that they may fully eliminate beam-hardening artifacts, existing research does not explain this theory in practice [6]. The technique that the decomposition is done within the projection domain is extremely sensitive to motion, because it requires an effective match of the projection angle. Goodsitt et al. [9] evaluated the accuracy of CT value and effective atomic number in virtual monochromatic images acquired from a projection-based fast-kilo-voltage-switching dual-energy system. The results show that the inaccurate CT value still exists, especially for dense materials at low energy. Also, the data collected by a dual-source CT system are usually inconsistent in the projection domain and can only be decomposed in the image domain [10]. Currently, image-based post processing is the most established and clinically used technique for dual-energy decomposition. All the raw-data-based dual-emission techniques and applications can be translated into image space.
The experiment from Apfaltrer et al. [11] proved that the virtual monochromatic images datasets can significantly increase contrast noise ratio (CNR) in the reconstructed images, suggesting that clinical applications of low-keV monochromatic reconstruction can reduce the contrast dose of iodine required for adequate image quality. But Petersilka et al. [12] described that the virtual monochromatic images at lower energy also contain higher levels of noise, requiring special algorithms to correct for increased artifacts and scatterings. The application of monochromatic extrapolation may skew the CT value of low-density tissues and cannot correctly describe the attenuation coefficients around metal artifacts [13]. Studies from Komlosi et al. [14] and Guggenberger et al. [15] also showed that the high-energy monochromatic images can reduce artifacts caused by metal implants, but individualized monochromatic values should be used for different metals. This significant noise burden at lower energies has been a limited factor in the actual application of virtual monochromatic imaging. Over the years, some algorithms have been widely applied for high quality CT images reconstruction, such as total variation [16] and hybrid regularization [17]. Pathak et al. [18] introduced the anisotropic diffusion model to resolve simultaneously data fidelity terms as well as mixed noise issues. Valenti [19] explored a new field of discrete tomography which directly reconstructed images in very few projections with instrumental and quantization noises. However, these methods used in denoising only consider each energy bin separately and cannot effectively make full use of the correlation between different energy bins for more information. The reconstructed images come from different energy bins are highly correlated, as all projection data are acquired from the same object. Sukovic and Clinthorne [20] presented a penalized weighted least-squares image reconstruction method to handle the non-Poisson noise added by amorphous silicon detectors for dual-energy CT. Gao et al. [21] proposed a multienergy CT based on a prior rank, intensity, and sparsity model to reconstruct images. These methods operated on a priori measured spectral data will bring additional calculation error and complexity. And all the aforementioned algorithms are universal and belong to the preprocess before virtual monochromatic imaging, whose effects still depend on the setting of the optimal energy value. Recently, the noise-optimized virtual monochromatic imaging (VMI+) algorithm [22] has been introduced, which is specifically designed to improve monochromatic images quality at low-keV levels. Preliminary studies about this VMI+ technique has shown an excellent description of hepatic vessels [23] and improved quantitative image quality in spectral CT angiography of the aortic and lower extremity vasculature [24]. Small differences in attenuation can be diagnostically important, but the optimal energy for synthesizing a virtual monochromatic image depends on many uncertain model factors, including patients size, data acquisition schemes, and noise levels in low energy bins [10]. In summary, minimizing the dependence of quantitative measurement and artifacts removal on patient sizes or compositions is an urgent problem to be solved by virtual monochromatic imaging algorithms.
Recently, deep learning (DL) has caused excitement in the fields of computer vision [25]. DL can efficiently learn advanced features from input data through a multilayered framework. This advance is now entering the medical field [26], for example, organ segmentation [27], nuclear detection [28], and tissue classification [29]. In many tasks related to image reconstruction, such as super-resolution [30] and denoising [31], it is known that minimizing per-pixel loss between the output image and the real ground can lead to blurring or make the results visually unattractive. The generative adversarial network (GAN) was first proposed in 2014 by Goodfellow et al. [32]. It is a generative model that attempts to generate real images using the min-max optimization framework, in which two independent networks (generator
Despite successes in these areas, GANs still face significant difficulties in training [37]. The generator
Moreover, the loss function is also an important and indispensable part of the network learning, especially in reconstruction tasks. Some studies have explored different loss functions and their combinations to efficiently accomplish specific learning tasks [39]. The perceptual loss introduced by Yang et al. [40] has been shown to provide better results for CT denoising. Seitzer et al. [41] combined the mean squared error (MSE) loss with the perceptual loss to reconstruct MRI images. Inspired by the successful cases in the above image translation field, we consider the virtual monochromatic imaging problem as a translation from polychromatic images to monochromatic images and introduce the possibility of applying a generative adversarial network to inherently model spatial and spectral corrections in virtual monochromatic imaging inverse problem without the projection data in this paper. In particular, Wasserstein generative adversarial network with hybrid loss (WGAN-HL) can provide a good distance estimate between the polychromatic and monochromatic images distributions.
The rest of this paper is organized as follows: The proposed method and the impact of each loss function on the virtual monochromatic images quality are described in Section II. Experiments and results are presented in Section III. The relevant issues will be discussed in Section IV. Finally, the conclusion is drawn in Section V.
Methods
A. The Virtual Monochromatic Imaging Principle
The attenuation of polychromatic X-rays in an object can be expressed as follows:\begin{equation*} I(l)=\int \nolimits _{0}^{E_{\max }} I_{0}(E)e^{-\int _{l}\mu (E,Z)dl}dE\tag{1}\end{equation*}
\begin{equation*} \mu {(E)}=b_{1}\mu _{1}(E)+b_{2}\mu _{2}(E)\tag{2}\end{equation*}
In short, the virtual monochromatic imaging principle of network is modeled as Fig.1. There may be hardening artifacts in reconstructed polychromatic images when projecting a homogeneous material, which is manifested by the unequal CT value appearing along an X-ray path. However, the ideal monochromatic images should appear a homogeneous CT value in this case, only as a function of energy. Let \begin{equation*} Y=F(X)+a\tag{3}\end{equation*}
\begin{equation*} F^{-1}Y=\hat {X}\thickapprox X\tag{4}\end{equation*}
In the study, two polychromatic images of different energy bins (
B. Wasserstein Generative Adversarial Network (WGAN)
GAN is a model comprised of two parts: generative model \begin{align*}&\hspace {-2pc}\min \limits _{G} \max \limits _{D} L_{WGAN} (D,G) \\[3pt]=&-E_{X\sim P_{\text {r}}} [D(X)]+E_{Y\sim P_{\text {p}}} [D(G(Y))] \\[3pt]&+\,\lambda E_{\hat {X}\sim p_{\text {g}}}[(\|\nabla _{\hat {X}} D(\hat {X})\|_{2}-1)^{2}] {}\tag{5}\end{align*}
C. Network Structrue
Here, the proposed network structure will be introduced in detail. As shown in Fig.2, the polychromatic images from dual energy bins are superimposed together as the input to generator, and the difference between generated images and target gold-standard images is reduced according to the hybrid loss. At the same time, the training of discriminator is restricted by the discriminator loss that is used to measure Wasserstein distance. The specific structure will be described separately as follows.
The overview structure of the proposed WGAN-HL network for virtual monochromatic imaging.
1) Generator
As shown in Fig.3, the generator G is a fully-convolutional network consists of 12 layers, and starts with a convolutional layer followed by four residual blocks. There are two convolutional layers with a skip connection in each residual block, they can be expressed by:\begin{equation*} z_{j} =z_{0} +\sum \limits _{i=0}^{j-1} {T(z_{i},\{W_{i} \})}\tag{6}\end{equation*}
The details in generator, discriminator, and hybrid loss. Note that the variable k denotes the size of kernels, n stands for the number of filters, and s means the stride size.
2) Discriminator
As shown in Fig 3, the generated monochromatic images and the gold-standard monochromatic images are the input of discriminator
3) Hybrid Loss Function
The main challenge with this virtual monochromatic imaging task is that the generated images do not exactly match the target images. The presence of noise causes a non-constant shift between pairs of images, even if the pixels are precisely aligned. The proposed hybrid loss function in Fig.2 and Fig.3 supervises network learning by measuring the difference between the generated images and the target images, taking into account various factors such as texture details, features, and CT value distributions. Our network is required to remove noise when fitting the attenuation coefficient relationship between the polychromatic images and the ideal monochromatic images. The errors of image reconstruction are back-propagated to update the kernels weight of
D. Loss Function for Virtual Monochromatic Imaging
In order to improve the accuracy of the CT value in the generated images and optimize the quality of clinical diagnosis, the loss of each pixel and the difference of content are taken into account, divided into the following four components as shown in Fig 3:
1) L1 Loss
To ensure that each voxel has an accurate CT value, it is required that there is a good correspondence between each pixel of the generated images and the target images. The MSE or \begin{equation*} L_{1} =\frac {1}{hwd_{2}}\vert G(Y)-X\vert\tag{7}\end{equation*}
2) Chromatic Loss
To measure the difference of the CT value between the virtual monochromatic images and the target images, a Gaussian blur is applied and the Euclidean distance between the both images is calculated [47]. This is equivalent to using a fixed Gaussian kernel as an additional convolutional layer followed by a MSE function. Compared with traditional MSE, the method can ignore the texture difference between the two images to evaluate the shift of the CT value. The chromatic loss can be expressed as:\begin{equation*} L_{\textrm {chro}} (M,N)=\vert \vert M_{\textrm {c}} -N_{c} \vert \vert _{2}^{2}\tag{8}\end{equation*}
\begin{equation*} M_{\textrm {c}} (p,q)=\sum \limits _{u,v} {M(p+u,q+v)\cdot } G(u,v)\tag{9}\end{equation*}
Hence, a constant
3) Adversarial Loss
As mentioned in the recent study [39], minimizing the MSE will lead to over-smoothed appearance. Adversarial loss encourages the generator to output monochromatic images that are indistinguishable from the real monochromatic images by forcing the network to be trained to minimize the Wasserstein distance with the regularization term:\begin{equation*} L_{adv}=\min \limits _{G}\max \limits _{D}L_{WGAN}(D,G)\tag{10}\end{equation*}
As a result, minimizing the loss will push the generator to focus on the texture information and generate the same images as the target as much as possible. Furthermore, this adversarial loss is shift-invariant by definition in this case, since alignment is not required.
4) Perceptual Loss
The perceptual similarity measures the distance between \begin{equation*} L_{VGG}=E_{Y,X}\left[{\frac {1}{hwd_{2}}\|VGG(G(Y))-VGG(X)\|^{2}_{F}}\right] {}\tag{11}\end{equation*}
Since the pre-trained VGG-19 network has RGB channels because of the input color images, the grayscale CT images will be decomposed or expanded to meet the channel size before being input to the VGG network.
5) Total Loss
The \begin{equation*} L_{total}=\lambda _{1}L_{1}+\lambda _{2}L_{chro}+\lambda _{3}L_{adv}+\lambda _{4}L_{VGG}\tag{12}\end{equation*}
E. Network Training Process
The training process of proposed WGAN-HL is shown in Fig.4. The discriminator performs updates repeatedly during each iteration of the generator. Here
Experiments
In this section, details of the experiments and quality measures used to evaluate proposed method will be presented. A large number of simulation phantom datasets were generated, then the WGAN-HL network was evaluated separately from three aspects: CT value measurement, beam hardening artifacts, and metal artifacts removal. Finally, real datasets were used for testing to further demonstrate the application value of the proposed method.
A. Simulation Experimental Datasets and Setup
The effect of deep learning based method is highly dependent on the size of the training datasets, and large scale datasets can improve the performance of the network. However, a large number of spectral CT datasets are difficult to acquire, and more importantly, the ideal monochromatic images that are matched with polychromatic images are not easy to implement clinically. In order to improve the performance of the proposed method, following strategies were utilized. The real human phantoms from the tool XCAT introduced by Duke University [49] were used to generate a large number of matched datasets for training and testing, and in order to prevent over-fitting, 20 patients with different ages, ethnicities, weights, and heights were selected randomly, as shown in appendix A.
In addition, breathing motion was simulated by XCAT to ensure the similarity to the clinical condition. 16 samples were taken during each breathing epoch. Taking into account the errors that might occur during the detecting and imaging process, the Edge-on X-ray detector we have previously proposed was adopted here [50], fully considering the X-rays absorption, scattering, and different random noise level setting in the absorption process of photons in the detector. The X-ray photons are from the spectrum of GE_Maxiray_125 tube operated at 120kVp. Siddon’s ray-driven algorithm [51] shown in Table 1 was used to simulate fan-beam geometry, and the images were reconstructed by the FBP algorithm. Furthermore, overlapping patches strategy was used because it can not only significantly expand the size of training datasets, but also push the WGAN-HL to learn more details [52].
In the experiment, the two polychromatic images were superimposed as a set of inputs to the network, and the four ideal monochromatic images were the matched targets as the gold-standard. In total, we generated 1185 pairs of different multisuperimposed polychromatic slices and ideal monochromatic slices by 17 patients with
To assess the ability and quality for the proposed network, the traditional decomposition method of basis material such as water and iodine was used as a comparison. And in addition to the CT value measurement and beam hardening effect, we compared the benefit of metal artifacts removal between the proposed network and another common algorithm based on beam hardening correction (BHC) [53]. For quantitative comparison to evaluate the effectiveness of the proposed methods, five common metrics were selected, including peak signal to noise ratio (PSNR), structural similarity index (SSIM), feature similarity index (FSIM) [54], the mean CT value, and standard deviations (SDs). The first one can evaluate the pixel-wise differences in the CT value between the generated monochromatic images and the ideal monochromatic images to measure the decomposition accuracy; the second one compares the visual structural similarity of the content in two set of images, and is known for its improved correlation with human perception; the third one is the improvement of SSIM, paying more attention to the similarity of local features; the last two show the statistical errors and robustness about the proposed network. All indicators were calculated based on the original CT values.
B. Network Training
During the training phase, the famous Adam optimization method [55] was adopted to optimize the proposed network with a mini-batch of 16 slices patches for each iteration. The learning rate was set to be
C. Network Convergence
To visualize the convergence of the network, the training curves were plot for each loss function in Fig.5. In order to avoid the contingency of the experimental results, the training loss of a total of 6 groups experiments were respectively shown in the form of scatters. The solid line in each graph represented the average convergence curve of 6 experiments, and the standard deviation was displayed every 50 epochs. Four figures showed that all loss functions were rapidly decreasing, which indicated that the generated images were positively correlated. After the network iterated for about 600 epochs, they became smooth, and each loss function basically converged to a minimum. The standard deviations were also gradually reduced as the training progresses, proving the stability and robustness of network training. When the training progressed to a certain extent, the fluctuation of chromatic loss was more severe, and it was guessed that WGAN-HL pays more attention to the texture details during the training process. At first, some high frequency artifacts may affect the distribution of overall image CT value. An important indicator for assessing network performance is the Wasserstein distance, which was defined in (5). We can observe in Fig.5(c) that the W-distance was reduced as the number of epochs increased, although the decay rate became smaller. It can indicate the proposed network’s robustness. In the end, all the loss functions converge well. A total training time is about 35 hours.
The curve of the hybrid loss function convergence. (a) L1 loss, (b) chromatic loss, (c) Wasserstein distance, and (d) perceptual loss.
D. Virtual Monochromatic Imaging Performance
To show the effectiveness of the proposed method, the qualitative and quantitative comparisons are list over three representative aspects, including CT value measurement with denoising, beam hardening artifacts removal, and metal artifacts reduction. Fig.6, Fig.8, Fig.10 respectively represent the CT value distribution of the human lung slices, the head slices that are prone to beam hardening, and the hip slices jointed with metal implants. The Fig. 7, Fig. 9, and Fig. 11 demonstrate the zoomed regions-of-interest (ROIs). Four typical monochromatic energy values are compared, respectively 40keV, 70keV, 55keV, and 100keV, in order to prove the image translation ability and robustness of our proposed method.
Results from the lung CT slices. (a) Input polychromatic image at 30–80keV, (b) input polychromatic image at 80–120keV, (c) target monochromatic image at 40keV, (d) target monochromatic image at 55keV, (e) target monochromatic image at 70keV, (f) target monochromatic image at 100keV, (g) decomposed monochromatic image at 40keV, (h) decomposed monochromatic image at 55keV, (i) decomposed monochromatic image at 70keV, (j) decomposed monochromatic image at 100keV, (k) WGAN-HL monochromatic image at 40keV, (l) WGAN-HL monochromatic image at 55keV, (m) WGAN-HL monochromatic image at 70keV, and (n) WGAN-HL monochromatic image at 100keV. The display window is [−500,450]HU.
The ROIs in Fig.6. (a) Input polychromatic image at 30–80keV, (b) input polychromatic image at 80–120keV, (c) target monochromatic image at 40keV, (d) target monochromatic image at 55keV, (e) target monochromatic image at 70keV, (f) target monochromatic image at 100keV, (g) decomposed monochromatic image at 40keV, (h) decomposed monochromatic image at 55keV, (i) decomposed monochromatic image at 70keV, (j) decomposed monochromatic image at 100keV, (k) WGAN-HL monochromatic image at 40keV, (l) WGAN-HL monochromatic image at 55keV, (m) WGAN-HL monochromatic image at 70keV, and (n) WGAN-HL monochromatic image at 100keV. The display window is [−500,450]HU.
Results from the head CT slices. (a) Input polychromatic image at 30–80keV, (b) input polychromatic image at 80–120keV, (c) target monochromatic image at 40keV, (d) target monochromatic image at 55keV, (e) target monochromatic image at 70keV, (f) target monochromatic image at 100keV, (g) decomposed monochromatic image at 40keV, (h) decomposed monochromatic image at 55keV, (i) decomposed monochromatic image at 70keV, (j) decomposed monochromatic image at 100keV, (k) WGAN-HL monochromatic image at 40keV, (l) WGAN-HL monochromatic image at 55keV, (m) WGAN-HL monochromatic image at 70keV, and (n) WGAN-HL monochromatic image at 100keV. The display window is [−500,350]HU.
The ROIs in Fig.8. (a) Input polychromatic image at 30–80keV, (b) input polychromatic image at 80–120keV, (c) target monochromatic image at 40keV, (d) target monochromatic image at 55keV, (e) target monochromatic image at 70keV, (f) target monochromatic image at 100keV, (g) decomposed monochromatic image at 40keV, (h) decomposed monochromatic image at 55keV, (i) decomposed monochromatic image at 70keV, (j) decomposed monochromatic image at 100keV, (k) WGAN-HL monochromatic image at 40keV, (l) WGAN-HL monochromatic image at 55keV, (m) WGAN-HL monochromatic image at 70keV, and (n)WGAN-HL monochromatic image at 100keV. The display window is [−500,350]HU.
Results from the hip CT slices. (a) Input polychromatic image at 30–80keV, (b) input polychromatic image at 80–120keV, (c) BHC for (a), (d) BHC for(b), (e) target monochromatic image at 40keV, (f) target monochromatic image at 55keV, (g) target monochromatic image at 70keV, (h) target monochromatic image at 100keV, (i) decomposed monochromatic image at 40keV, (j) decomposed monochromatic image at 55keV, (k) decomposed monochromatic image at 70keV, (l) decomposed monochromatic image at 100keV, (m) WGAN-HL monochromatic image at 40keV, (n) WGAN-HL monochromatic image at 55keV, (o) WGAN-HL monochromatic image at 70keV, and (p) WGAN-HL monochromatic image at 100keV. The display window is [−1000,1000]HU.
The ROIs in Fig.10. (a) Input polychromatic image at 30–80keV, (b) input polychromatic image at 80–120keV, (c) BHC for (a), (d) BHC for(b), (e) target monochromatic image at 40keV, (f) target monochromatic image at 55keV, (g) target monochromatic image at 70keV, (h) target monochromatic image at 100keV, (i) decomposed monochromatic image at 40keV, (j) decomposed monochromatic image at 55keV, (k) decomposed monochromatic image at 70keV, (l)decomposed monochromatic image at 100keV, (m) WGAN-HL monochromatic image at 40keV, (n) WGAN-HL monochromatic image at 55keV, (o) WGAN-HL monochromatic image at 70keV, and (p)WGAN-HL monochromatic image at 100keV. The display window is [−1000, 1000]HU.
1) CT Value Measurement
Fig 6 (a)-(b) show the dual-energy lung CT polychromatic slices with a significant difference in contrast between the two images. There is some noise inside the images, and we can see the effects clearly in ROIs in Fig.7(a)-(b). The first row in Fig.6(c)-(f) shows the monochromatic images with no noise under ideal conditions, which are clear in content, and they are the gold standard as targets. The method of decomposing water and iodine in the image domain is adopted here for comparison, calculating the decomposition matrix by the dual-energy images, and the reconstructed images are corresponded to the same monochromatic values as our proposed WGAN-HL network. As expected, there is an obvious difference in contrast compared to our target images, especially the images from low energy values. This result indicates that the decomposition method will result in a shift in the CT value, which interferes with the definition of tissue density. At the same time, the Fig.6(k)-(n) show the images generated by WGAN-HL network. They are visually similar to standard images and are almost indistinguishable. More details about the ROIs can be seen in Fig.7. It is very obvious that the noise in the images obtained by the decomposition algorithm is amplified under low energy from Fig.7(g), and it may be relatively reduced in the higher-energy images from Fig.7(h)-(j). However, whether the images with minimal noise are most favorable for diagnosis in the clinic is not guaranteed. On the contrary, Fig.7(k)-(n) show that the images we generated have less noise at any energy, and it is common theory for the contrast to change with the change of energy. It is clinically possible to select the appropriate energy value according to the needs without being disturbed by noise.
At the same time, three indicators are chosen to evaluate proposed network, including PSNR, SSIM, and FSIM. The ideal monochromatic images are the gold standard in our evaluation. For the reconstructed images from dual-energy bins, the monochromatic images at the average energy are taken as references. As shown in Table 2, WGAN-HL network is superior to the traditional decomposition method. The improvement is surprising for low-energy images that are clinically most diagnostic and also the most difficult to achieve high quality. It is not surprising that the algorithm based on the decomposition will degrade low-energy images quality, although it performs well at high energy. The generated monochromatic image at 100keV scores the highest PSNR, SSIM, and FSIM, since the images that are reconstructed at high energy are theoretically easier to have less noise and artifacts. However, the 40keV-generated image achieves the greatest improvement relative to the decomposition of basis material. By encouraging generated images to keep less difference with targets, WGAN-HL is able to obtain clearer texture information with less noise to improve images quality as comparison with other methods.
2) Beam Hardening
As for beam hardening, the slices from head are selected as shown in Fig.8 because of more bone materials with high attenuation coefficients. As expected, from Fig.8(a)-(b) we can see that the dual-energy reconstructed images in low-energy bin perform more artifacts between the two high-attenuation substances, affecting the observation of content. Relatively speaking, high-energy images have no obvious hardening artifacts, but the contrast declines. As pointed by the red circle and arrow in Fig.9(c)-(f), the ideal target images are perfect with the clear texture. Similarly, the traditional method based on decomposition is applied here, and it is found to be quite unstable in different energy values. As observed in Fig.8(g) and Fig.9(g), the image decomposed at 40keV has stronger noise, even worse than the original input polychromatic images. Fig.8(h) and Fig.9(h) show that the monochromatic image under 55keV still contains a small amount of visible artifacts compared to our network in Fig.9(l). As the energy gets higher, the images quality improves, but the contrast fades. It is clinically necessary to choose between contrast and signal-to-noise ratio, which is difficult to obtain the most ideal images. But for WGAN-HL, the change in energy has little effect on the artifacts removal.
However, the monochromatic images in Fig.9(k)-(n) from the output of WGAN-HL network perform as well as the gold-standard, and it is also possible to see the difference in contrast of images at different energy values that all are artifacts free. The results of image quality assessment are shown in Table 3.
3) Metal Artifacts
From Fig.10 and Fig.11, we observe that the performance of different algorithms in removing metal artifacts is similar to that described above. The effect of metal artifacts in polychromatic images far exceeds the normal beam hardening, which will cause lots of data loss. For example, in Fig.10(a)-(b), we can observe that the polychromatic images contain a large number of streak artifacts covering the entire tissue in addition to data loss, and low-energy images are worse than high-energy images as shown in Fig.11(a)-(b). First, the BHC algorithm was applied to them in Fig.10(c)-(d) and Fig.11(c)-(d), which is capable of removing some of the artifacts, but not well enough for distinguishing the tissues details. Compared with BHC, the Water-I-based material decomposition algorithm does not show a visual change here in Fig.10(i). From Fig.10(j)-(l), although the streak artifacts decrease as the energy increases, the missing data is not fully recovered at any energy. Yet, we can observe in Fig.11(m)-(p) that the proposed WGAN-HL network encouraged the recovery of lost tissue details. It not only performs better than other algorithms in terms of artifacts removal, but also obtains more accurate CT value for material characterization. The proposed method can more clearly visualize bones and better preserve soft tissues around the metal. As is expected in Table 4, the decomposition algorithm at a lower energy performs a worse PSNR, SSIM, and FSIM in metal artifacts removal. It shows that the CT value near the metal is not accurate. The WGAN-HL achieves best anatomical feature preservations in all method with about 40dB improvement in PSNR for the images at 40keV, whose performance is almost unaffected by energy and can achieve better visual quality.
In summary, the proposed WGAN-HL network was compared with the decomposition method of water and iodine at four monochromatic values. And additionally, the BHC algorithm was used to eliminate the metal artifacts reduction as a contrast group experiment. We can clearly see that the superiority of our method in quantitative measurement of CT value, beam hardening removal, and metal artifacts reduction. It not only can obtain accurate monochromatic attenuation coefficients, but also removes noise and artifacts that interfere with vision, which has great significance for diagnosis. Note that we recommend that readers focus on the ROIs area to better evaluate our results.
E. Statistical Analysis
In order to quantitatively evaluate the statistical advantages of the proposed method, the mean CT values and SDs in the ROIs are calculated, as shown in Table 5. The gold standard of comparison is still the ideal monochromatic images. Lower difference represents better robustness. As is expected, the decomposition method gets the largest errors in mean values and SDs, especially in low-energy images. In a word, compared with the other method, the proposed network has obvious advantages, showing that the mean values and SDs are extremely close to the gold-standard images.
In addition, the average results on PSNR, SSIM, FSIM, mean values, and SDs of the three cases in the test sets are shown in Fig.12. Respectively, the images in Fig.6, Fig.8, and Fig.10 belong to Case 1, Case 2, and Case 3. The results are exactly same as above, which prove that our method can achieve good robustness.
The average results of the images in the test dataset. (a) PSNR of Case1, Case 2, and Case 3, (b) SSIM of Case1, Case 2, and Case 3, (c) FSIM of Case1, Case 2, and Case 3, (d) mean values and SDs of Case 1, (e) mean values and SDs of Case 2, and (f) mean values and SDs of Case 3.
F. Compare With Different Network
To further illustrate the advantages of the WGAN-HL network, we compared the impact of different network structures, taking a 40keV-reconstructed image in Case 3 as an example. They are CNN-L1, CNN-VGG, WGAN, WGAN-HL, and the ideal target image. The meanings of the notations about different networks are listed in Table 8. Since the method of deep learning has not been used for virtual monochromatic imaging, we have consistently selected the network parameters according to the WGAN-HL settings. The results are shown in Fig.13. The red line in Fig.13(a) shows an X-ray projection path that we randomly selected, and Fig.13(b) shows the CT value of each voxel location. Obviously, CNN-VGG has achieved the worst results here because the metal edges are so obvious that the VGG-19 network ignores other details. The overall CT value distribution of CNN-L1 is the closest to the target image, but many texture details are ignored, and the image appears to be overly smooth. A simple WGAN network captures similar content information, but there is some deviation in the performance of the CT value, and some high-frequency artifacts are generated near the metals. However, the curve of WGAN-HL is basically coincident with the target image, and is superior to other networks in all aspects.
G. Real Data Test
In the previous sections, the perfectly matched polychromatic images and monochromatic images used for training and testing were acquired via simulation experiments. Although the real human bodies and the detectors were imitated as much as possible, the real datasets from spectral CT were used here for testing, in order to evaluate the application value of our network. We downloaded the scanned images of the titanium phantoms and scaffolds from a Medipix All Resolution System (MARS) spectral CT with Medipix MXR cadmium telluride detectors. The projections were collected in four different energy channels with 80 kVp,
The images from MARS. (a) the Ti phantom at 15–80 keV, (b) the Ti phantom at 50–80 keV, (c) the Ti scaffold at 15–80 keV, and (d) the Ti scaffold at 50–80 keV. The color bar represents linear attenuation coefficients (cm
In order to be consistent with the energy bins and detection mode of the test sets, the method of previous simulation experiment was also used here. We simulated the metal polychromatic images in 15–80 keV and 50–80 keV energy bins and the ideal monochromatic images that matched them. The test was carried out on real titanium phantoms and scaffolds after the network was fully trained. Since the test images have only the raw images without a corresponding gold-standard monochromatic images, there is no image quality assessment here, and only the visual superiority is shown in Fig.15.
The results of test data. (a) Decomposed Ti phantom at 40keV, (b) decomposed Ti phantom at 55keV, (c) decomposed Ti phantom at 70 keV, (d) decomposed Ti phantom at 100 keV, (e) WGAN-HL Ti phantom 40 keV, (f) WGAN-HL Ti phantom at 55 keV, (g) WGAN-HL Ti phantom at 70 keV, (h) WGAN-HL Ti phantom at 100 keV, (i) decomposed Ti scaffold at 40 keV, (j) decomposed Ti scaffold at 55 keV, (k) decomposed Ti scaffold at 70 keV, (l) decomposed Ti scaffold at 100 keV, (m) WGAN-HL Ti scaffold at 40 keV, (n) WGAN-HL Ti scaffold at 55 keV, (o) WGAN-HL Ti scaffold at 70 keV, and (p) WGAN-HL Ti scaffold at 100 keV. The color bar represents linear attenuation coefficients (cm
We observe that the scanned images are characterized by severe cup and streak artifacts, and the effects are reduced under the condition of a narrow energy range but there is still a large amount of noise in Fig.14. The monochromatic images formed by the decomposition of the dual materials still presence a lot of noise, and the artifacts cannot be removed at low energy in Fig.15(a)-(d) and (i)-(l). However, the images from WGAN-HL still maintain a good CNR in Fig.15 (e)-(h) and (m)-(p), although we trained the network on the simulated data sets and tested it on real images. Experiments have shown that our training results have practical applicability as long as the train and test remain within a same energy range.
Discussion
Deep learning has indeed got great achievements in the field of computer vision in recent years, such as image super resolution, segmentation, conversion and so on. The progress in the medical field has also shocked us, and it has undoubtedly promoted the advancement of human medicine. However, it has more potential applications waiting for us to further explore.
In this paper, we tried to apply neural networks to develop the possibility of virtual monochromatic imaging for spectral CT, and achieved encouraging results. With the trend of technology development, instead of the simple CNN, a well-behaved Wasserstein generative adversarial network with a hybrid loss function was used. Also, the residual unit was added in generator to learn the local and global features by reusing feature maps for later layers. The network is able to find a tradeoff of data distributions by the competition between generator and discriminator. After the training stage, WGAN-HL satisfies our demanding requirements and performs well in both quantitative and qualitative assessment at 40keV, 55keV, 70keV, and 100keV. Different factors also are evaluated in the CT value measurement, beam hardening removal, and metal artifacts reduction, showing the good robustness of proposed method.
The widely used decomposition algorithm has also been used for comparison. It can be seen that this method does perform poorly at low energy and may even amplify the noise, which we do not want to see. Not only that, BHC is also a comparison for the removal of metal artifacts. Although it is not bad in image quality assessment, the data recovery is not as good as our network, and it is unable to obtain image information under different energies. Our network works best at high energy, but the improvement of quality in low-energy images is even better. The monochromatic images we generated are not disturbed by noise. Clinically, any monochromatic image can be selected for diagnosis according to actual needs. In addition, other network structures with only one loss are compared, and the effects are inferior to WGAN-HL. Finally, the test results on the real data sets demonstrate the possibility of the practical application of our method.
Although our proposed network has achieved high-quality virtual monochromatic images, there are still rooms for potential improvements. Some structural variations between ideal monochromatic images and generated monochromatic images do not perfectly match. Although we used the hybrid loss function, WGAN-HL struggles to find a balance between different losses, and it is difficult to converge to the minimum of various losses. A possible way to enhance correlation is to design a new network with multiple independent training paths, which is the study we have started.
Conclusion
In conclusion, we have presented a WGAN-based method with hybrid loss to virtual monochromatic imaging and are excited for its good robustness. WGAN introduces an optimal method of data transport. The hybrid loss removes the noise which interferes with the vision without losing the content, while fitting the relationship between the polychromatic images and the monochromatic images. As a result, instead of the traditional method based on material decomposition, this WGAN-HL method achieves better performance on the monochromatic images at any energy, because it can effectively take advantage of the contextual information given by neighboring voxels. It inherently better exploits spectral and spatial corrections. The method solves the problem of setting the best monochromatic energy value that must be personalized in the clinic. In the future, the network structure will be updated to reach the gold standard as much as possible. Finally, we are also interested in pushing this method to different spectral CT scanners.
ACKNOWLEDGMENT
The authors would like to thank to Prof. W. Paul Segars in Duke University for providing the phantom datasets.
Appendix A
Appendix A
See Table 6.
Appendix B
Appendix B
Appendix C
Appendix C
See Table 8.