Fluorescence Molecular Tomography Reconstruction of Small Targets Using Stacked Auto-Encoder Neural Networks

As a noninvasive and quantitative method, fluorescence molecular tomography (FMT) has many potential applications in biomedical field. It has the power to resolve in three-dimension (3D), the molecular processes in small animal in-vivo in both theory and practice. This paper proposes to solve the problem of reconstruction error and speed by using stacked auto-encoders (SAE). A finite element method (FEM) solution to the Laplace transformed time-domain coupled diffusion equations is employed as the forward model. The reconstruction model is formulated under the framework of SAE. Numerical simulation experiments were conducted to compare the reconstruction results of SAE and algebraic reconstruction technique (ART). We demonstrated that the proposed reconstruction algorithm can retrieve the positions and shapes of the targets more accurately than ART. This advantage of SAE is especially reflected in the reconstruction for small targets with a radius of 2 mm and 3 mm.


I. INTRODUCTION
Fluorescence molecular tomography (FMT) has attracted widespread interest due to its non-invasive property and sensitivity in applications such as drug discovery [1], cancer detection [2], and gene expression visualization [3], among others. However, FMT reconstruction is highly illconditioned because of the lack of sufficient external measurements required for complex images [4]- [6].
To alleviate the ill-conditioned nature of FMT, many approaches have been successfully proposed to improve the quality of FMT-based image reconstruction. Such methods have exploited a priori knowledge including anatomical information [7], optical properties [8], permissible region [9], and sparsity of target distribution [10]. In addition, many regularization techniques such as Newton method [11] and conjugate gradient method [12] have been applied to improve The associate editor coordinating the review of this manuscript and approving it for publication was Yi Zhang .
FMT-based image reconstruction. These methods either utilize structured prior knowledge from other imaging modalities or penalize the reconstruction with sparse regularization terms. Although the reconstruction quality is improved by these methods, the reconstructed images still have problems such as poor spatial resolution and blurred borders, especially for small target heterogeneous body reconstruction [13], [14]. In addition, there are many restrictions on the use of these methods, which hinders the promotion of this technology. Furthermore, the long reconstruction time associated with these methods needs to be addressed because it has delayed the adoption of FMT in many practical time-sensitive applications. Another problem that has not been adequately addressed in the literature is the effect of target size on the quality of FMT reconstruction. Usually, small targets which are the focus of FMT are poorly reconstructed. So, this paper is motivated by the need to solve these two problems; designing a fast and efficient reconstruction algorithm for small targets. VOLUME 8, 2020 This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see http://creativecommons.org/licenses/by/4.0/ In this study, based on the intuition that tomography is a function approximation problem, we adopt a learning approach to recover the distribution of fluorescent yield. Specifically, a stacked auto-encoder (SAE) (a neural network architecture) was applied to FMT data. Different from the conventional machine learning dimensionality reduction and 3D deep encoder-decoder network [15]- [17], we exploited the redundancy inherent in the projections and extract a subspace representation through the auto-encoder. This can reduce the parameter space of the networks required to complete the reconstruction. Compared with the classical algebraic reconstruction technique (ART), the proposed method reduced the computational burden and accurately retrieved the positions and shapes of the targets. The performance of the proposed method in FMT reconstruction for small target was validated in numerical simulation experiments.

A. FORWARD MODEL
Radiative transfer equation (RTE) is usually employed to model the propagation of photons through turbid media such as biological tissue. The accompanying discretization of the RTE leads to an increased computational burden because of the introduction of angular directions and spatial variables. FMT problem, describing the photon migration in highly scattering media, is non-linear and can be linearized by using the coupled diffusion equations (DEs) with the Robintype boundary condition as a lower-order approximation. For time-domain FMT with point excitation sources, the coupled diffuse equations provide a description of the forward problem [18], [19].
where the subscripts x and m denote the excitation wavelengths and emission wavelengths respectively; x,m denotes the photon flux density; c is the speed of light in medium; r ∈ is the domain under consideration; κ x,m = 1/3[µ ax,am + (1 − g)µ sx,sm ] is the diffusion coefficient in biological tissues, with µ ax,am being the absorption coefficient, µ sx,sm the scattering coefficient and g the anisotropy parameter; ηµ af (r) denotes the fluorescent yield. Lifetime τ (r) are the fluorescence properties. In (1), the source term of the excitation equation, i.e., S x (r, r s , t) = δ(r − r s , t) is an impulse function at excitation position r s = 1 [µ a + (1 − g)µ s ] and t = 0, while the source term of the emission equation, i.e., S m (r, r s , t) = ηµ af (r) t 0 c x (r, t) (r)e [−(t−t ) (r)] dt ( (r) = 1 τ (r)) originates from the propagation of the excitation light in fluorescent target, and might be distributed throughout the medium. To solve these equations, the Robin-type boundary conditions are adopted on the boundary ∂ of the domain .
where X is an N × 1 vector which represents the unknown fluorescent source distribution, is an M × 1 vector which denotes the measurement vector, and A is an M × N weight matrix, that depends on the geometry and the optical properties of the turbid medium. Based on the above theory, the experiment combines the surface photon density data with the fluorescence source distribution data to be reconstructed to form a data pair, and is used as the training set of the neural network.

B. STACKED AUTO-ENCODER-BASED IMAGE RECONSTRUCTION
Auto-Encoder is a three-layer neural network consisting of an input layer, a hidden layer and an output layer. It is a selfsupervised learning algorithm whose output value is the same as the input value. The bottleneck structure in an auto-encoder allows the extraction of features for the reconstruction segment of the architecture. Hence it is possible to learn the essential features required to map. The stacked auto-encoder provides a deep architecture by stacking together a number of AE units and its structure is shown in Fig. 1.
The key to training the SAE model is to solve the network parameters using a suitable unconstrained optimization algorithm, mainly involving weight parameters. For deep neural networks, layer-by-layer greedy training is generally used, because layer-by-layer greedy training can obtain a better initial value of network parameters, reduce the possibility that network parameters fall into local minimum, and accelerate the convergence speed of supervision and training. Layerby-layer greedy training implies that only one layer in the network is trained at a time. First, the network parameters of the first hidden layer are trained, and the output of the layer is used as the input of the next layer to continue training the network parameters of the second hidden layer. The above steps are completed until all auto-encoder training is completed. The training process of the network model based on SAE-based fluorescence tomography mainly includes two steps. The first step is to use two auto-encoding networks to complete the pre-training process by layer-bylayer greedy training.
Layer-by-layer greedy training can obtain the optimal network parameters between adjacent layers. These parameters are fine-tuned to get global optimal network parameters. Therefore, in the second step, the initialized weights and offsets are passed to the NN network for further training, and back-propagation is used to adjust the weights and offsets of all hidden layers. After repeated iterations, the global optimal network parameters of the entire network are obtained. The training process of the network model based on SAEbased fluorescence tomography is shown in Fig. 2. Where the input layer SD represents a pair of light source detectors for generating N = 480 input data, and the schematic diagram is as shown in Fig.3; M = 200 and J = 60 neurons were set in hidden layer 1 and 2 respectively; The output layer PX 1 represents the value of the fluorescence pixel 1 of the simulation model, a total of 7651 pixels, corresponding to K = 7651 output data.

III. SIMULATION AND RESULT A. FMT SIMULATION MODEL AND EXPERIMENTAL DESIGN
In this section, numerical simulations experiments are conducted to demonstrate the performance of the proposed method. The algorithm is validated using simulated data generated from the forward diffusion models with the corresponding optical and fluorescent properties. The reconstruction obtained from the proposed method are evaluated and then compared with the classical ART methods based on the same signal-to-noise ratio (SNR).
The schematic diagram of Fig. 3 shows the phantoms of a 2-D circular turbid medium with the radius of R = 20 mm used for the investigations. It contains background and a circular fluorescent target. The circular fluorescent target which is embedded into the background medium has higher absorption coefficients. Sixteen (16) source-detector pairs are distributed one mean transport length inside the boundary. When 16 sources illuminate the surface in series (except the source position), the remaining 15 detectors located on the surface of the medium collects the photons in parallel. This leads to a total of 16 × 15 measurements. A finite element method with 7651 nodes and 15000 triangle elements is used for the forward calculations.
In our work, the optical properties of the background and targets are both set to µax, m = 0.5 mm −1 and µsx, m = 1.0 mm −1 for the excitation and emission wavelengths, and the fluorescent properties of background are ηµaf = 0.001 mm −1 and τ = 100 ps. These values are in the range of the optical properties for in vivo muscle [21].
To test the ability of the algorithm to reconstruct target's locations, we define Yield as the target's fluorescence yield, (Xc, Yc) as its center, and r as its radius. The fluorescence yield of the target is set from 0.002 mm −1 -0.005 mm −1 in steps of 0.001 mm −1 . The radius of the target is set from 2mm -5 mm in steps of 1 mm. The abscissa of the target's center is set from −10 mm -10 mm in steps of 5 mm. The ordinate of the target's center is set from −10 mm -10 mm in steps of 5 mm. The specific parameters distribution of target was shown in Table 1. We placed targets with different parameters at different position in the tissue and measured the fluorescence intensity of the corresponding tissue boundaries thus generating 400 samples of pure simulation data. Fluorescence molecular tomography was performed to validate the efficiency of the proposed method and ART.
To more closely simulate the real case, we add 40dB Gaussian noise to the pure simulated measurements. The 400 samples of pure simulated measurements and 400 samples of noisy simulated measurements are normalized together as VOLUME 8, 2020 stacked auto-encoders (SAE) method input data set. The 800 samples of true fluorescence yield normalized at each node obtained by finite element method are taken as the output data set of the neural network. Therefore, the input of the neural network totals 800 samples. We selected 400 pure samples and 350 noisy sample for training, 42 noisy samples for testing and 8 noisy samples for verification. The 42 noisy samples for testing were simultaneously reconstructed using the ART algorithm. In our study, the number of ART iterations is 15 and a relaxation parameter is 0.5. The network is based on MATLAB for training and testing. The loss function of SAE network is the mean square errors at the pixel level between the reconstructed and true results. It was trained with epochs =100, batch size =75. All computer processing was accomplished by a personal computer with a 2.81 GHz Intel Core i5 CPU, 16 GB RAM, and GTX1060 3GB and the time of training to convergence was about 2 h.

B. COMPARISON OF SAE NEURAL NETWORK AND ART RECONSTRUCTION
To evaluate the performance of SAE neural networks, the following experiments were performed. Experiment 1: Qualitative evaluation of the proposed reconstruction method was performed by analyzing a selection of representative reconstructed images. Using the same sample, the reconstruction effects on small target using the proposed algorithm and the classical ART are shown in Fig. 4. Where the sample target fluorescence yield is 0.004mm −1 , located at Xc = −10 and Yc = 0.
For the same sample, the reconstructed images of fluorescent yield obtained by using SAE showed target sizes and position better than the traditional ART method. The reconstructed target sizes in the fluorescent yield images are approximately the same as the original one by SAE. The SAE demonstrates its superiority when the radius of the target is 2 mm and 3 mm, as depicted in Fig. 4. When the radius of the target becomes larger (e.g. 4 mm), both algorithms successfully achieve the reconstruction of target sizes and position in the yield images. However, the comparative results reveal that targets reconstructed using the stacked auto-encoders have less artifacts.
In order to further validate the efficiency of the proposed algorithm and the classical ART based on different SNR. The noise samples were obtained by adding 30dB and 35dB signal-to-noise ratio (SNR) noise into 400 pure data samples. Using the same sample with the same SNR, the reconstruction effects using the proposed algorithm and the traditional ART are shown in Fig. 5-Fig. 6. Experiment 2: We provide a quantitative analysis of the SAE neural network reconstruction. The model was quantitatively evaluated using the barycenter error (BCE) between the reconstructed and the true fluorescence targets. The BCE is defined as follows: It is the 2-norm of the distance between the true (c trk ) and reconstructed (c rek ) barycenters. The barycenter, c k is defined as, where T k is the k th target heterogeneity in the target heterogeneous set, p i is the i th pixel point in T k and x i ∈ {0,1} is the reconstruction strength of p i . Forty-two (42) test images (SNR = 30dB) were used for the evaluation. In the test set there are 3 image groups: 14 with 2mm radius, 14 with 3mm radius; and 14 with 4mm radius. They were all reconstructed using the proposed SAE method and classical ART method, and the barycenters of the reconstructed images recorded. A threshold of 0.3 was set as the quantitative index of quality of reconstruction. Hence a comparison was made between the number of reconstructed  As can be seen from the table, for SAE neural network the percentage of reconstructed images with BCE > 0.3 is not affected by the change in the radius of the heterogeneous object. This is in contrast with ART reconstruction where the percentage of reconstructed images with BCE > 0.3 reduces from 92.9% to 50%. This implies that while SAE provides better reconstruction at all object radii tested, ART performance indicates that it is affected by the radius of the object under test. ART performs better with larger objects. Visual inspection also indicates that SAE reconstruction, especially with small object radii, provides clearer images and less artifacts. For the reconstruction of heterogeneous bodies with   a radius of 4mm, although the ART reconstruction algorithm shows a certain reconstruction ability, it is still inferior to the SAE reconstruction algorithm in terms of image clarity. Therefore, the experiment shows that compared with the traditional ART reconstruction algorithm, SAE neural network performs better in reconstructing heterogeneous body weight of small targets with a radius of 2mm and 3mm.
Robustness to noise is important in evaluating reconstruction algorithms. SAE and ART algorithms were tested on noisy data with different SNR, 35dB and 40dB. The same quality index, percentage of reconstructed objects with BCE > 0.3, was used in the comparative evaluation and the results are shown in Tables 3 and 4. VOLUME 8, 2020 The results in Table 3 indicate that SAE provides a noiserobust approach to reconstruction even for small radius targets. When the noise level was relaxed (40dB) the results in Table 4 indicates an improvement in the performance of ART. However, the improvement for small target size (2mm) was not as significant as that demonstrated by SAE.

IV. DISCUSSION AND CONCLUSION
In this study, we proposed a deep learning approach that uses stacked auto-encoders-to solve the inverse problem of FMT. Its relative superiority over classical ART method, with regard to reconstructed position accuracy and robustness to noise was demonstrated for FMT reconstruction of small heterogenous targets. The evaluation was performed with simulation data, quantitatively and qualitatively. The simulation results showed that the proposed method outperforms ART methods and accurately reconstructs small target with a radius of 2 mm and 3 mm without iterative procedure or linear approximation. Moreover, the targets reconstructed using the stacked auto-encoders also have less artifacts when the radius of the target becomes larger (we reported 4mm case). It is well known that the reason for FMT reconstruction results with relatively low spatial resolution is the ill-posedness of the FMT inverse problem. SAE neural network completely abandoned the forward photon propagation modeling and model-based inverse reconstruction, which makes it different from all the classical traditional ART. The auto-encoder learns a subspace representation of the data and at the same time filter the noise. In essence the deep architecture learns a mapping from the denoised subspace representation to the reconstructed image space. Given the small sample size used to train the SAE, we conjecture that the subspace learning required for this problem is not complex. The number of parameters in the networks was not sufficiently large to demand a significantly large training data. By using SAE learning framework, the inverse problem of FMT is easily solved and the reconstruction can be completed in a few hundred milliseconds. This observation indicates that the proposed method is promising for dynamic imaging applications.
However, the reconstructed fluorescent yield of the proposed method was still bigger than the true value. So new strategies that may improve the reconstructed fluorescent yield will be pursued in our future work. We also plan to extend our proposed method to the multi-targets reconstruction and the whole mouse and test the performance of the algorithm using phantom and mouse experiments.
In After her graduation, she was an Assistant Researcher with the Tianjin Medical University Cancer Institute and Hospital, focusing on data analysis. Her main research direction is time domain DOT/FDOT imaging method based on high-order spherical harmonic approximation model of radiative transfer equation.
JINHAI WANG was born in 1966. He received the bachelor's degree from Xi'an Jiaotong University, the master's degree from Tianjin Textile Institute, and the Ph.D. degree from Tianjin University.
He is currently working with the Tianjin University of Technology, as a Professor and a Master Instructor, and the Sensor and Measurement Technology Laboratory, Tianjin Medical Electronic Diagnosis and Treatment Technology Engineering Center. He has published more than 100 articles. His main research directions are embedded system development and application, information detection and intelligent processing, medical electronic diagnosis, treatment technology and instruments, and modern sensing technology and systems.
PHILIP O. OGUNBONA received the bachelor's degree in electronic and electrical engineering from the University of Ife, Nigeria, and the D.I.C. (electrical engineering) and Ph.D. degrees from Imperial College London.
After his graduation, he has worked for many years at the University of Wollongong, Australia, and the Motorola Multimedia Center Laboratory. He is currently a Professor and the Co-Director of the Advanced Multimedia Research Lab, University of Wollongong, Australia. His research interests include computer vision, pattern recognition, medical imaging, and machine learning. VOLUME 8, 2020