Boundary-Preserved Deep Denoising of Stochastic Resonance Enhanced Multiphoton Images

Objective: With the rapid growth of high-speed deep-tissue imaging in biomedical research, there is an urgent need to develop a robust and effective denoising method to retain morphological features for further texture analysis and segmentation. Conventional denoising filters and models can easily suppress the perturbative noise in high-contrast images; however, for low photon budget multiphoton images, a high detector gain will not only boost the signals but also bring significant background noise. In such a stochastic resonance imaging regime, subthreshold signals may be detectable with the help of noise, meaning that a denoising filter capable of removing noise without sacrificing important cellular features, such as cell boundaries, is desirable. Method: We propose a convolutional neural network-based denoising autoencoder method — a fully convolutional deep denoising autoencoder (DDAE) — to improve the quality of three-photon fluorescence (3PF) and third-harmonic generation (THG) microscopy images. Results: The average of 200 acquired images of a given location served as the low-noise answer for the DDAE training. Compared with other conventional denoising methods, our DDAE model shows a better signal-to-noise ratio (28.86 and 21.66 for 3PF and THG, respectively), structural similarity (0.89 and 0.70 for 3PF and THG, respectively), and preservation of the nuclear or cellular boundaries (F1-score of 0.662 and 0.736 for 3PF and THG, respectively). It shows that DDAE is a better trade-off approach between structural similarity and preserving signal regions. Conclusions: The results of this study validate the effectiveness of the DDAE system in boundary-preserved image denoising. Clinical Impact: The proposed deep denoising system can enhance the quality of microscopic images and effectively support clinical evaluation and assessment.


I. INTRODUCTION
Real-time visualization of living cells in their tissue environment is crucial for many applications in life sciences and medical devices [1], [2]. For instance, deep-tissue cellular imaging with transgenic labeling of reporters could reveal niche environments and functional interactions of multiple cells in the context of hematopoiesis [3], tumor metastasis [4], [5], [6], and neuronal connection [7], [8]. In vitro or non-invasive in vivo imaging flow cytometry (IFC) through real-time microscopy of the intra-fluidic channel or intravascular blood cells could have the potential to benefit patients and caregivers in detecting physiological aberrancy more efficiently [9], [10], [11], [12], [13]. Several studies have successfully employed IFC to provide quantitative image data of cellular targets to support decision-making in clinical diagnosis [14]. Results show that IFC can significantly improve the classification accuracy of routine white blood cell counts by their sizes, shapes, structures, and nucleus-to-cytoplasm ratios.
To obtain sharp sectioning images of cells in tissues, researchers developed confocal or multiphoton fluorescence microscopy to improve axial resolution [1], [15]. For instance, two-photon fluorescence microscopy has been widely applied in many deep-tissue studies, such as in brain research. Its near-infrared (800-1300 nm) laser excitation can greatly reduce the photobleaching of probes, distortion of the wavefront, scattering of photons, and maintenance of the subcellular resolution of images [1], [16], [17]. For sufficient contrast of fluorescence imaging in deep tissues, the excitation intensity must be high, which can lead to background interference originating from the diffused fluorescence photons caused by multiple scattering. To obtain high-acuity images at depths greater than 700 µm, three-photon contrasts were excited by a 1700-nm high pulse energy laser. Three-photon microscopy can significantly improve emission localization, reducing the out-offocus background compared to two-photon microscopy [18]. Various existing fluorescent dyes can still be used in threephoton fluorescence (3PF) microscopy. However, to realize in vivo imaging and medical testing such as virtual optical biopsy of immune cells, in vivo label-free microscopy is critical. Many researchers have focused on the development of third-harmonic generation (THG) microscopy for label-free imaging of tissues [11], [19], [20], [21], [22]. At an excitation wavelength of 1230 nm, THG microscopy can noninvasively obtain cellular morphology and visualize subcellular organelles in deep tissues without labeling. It can deliver an alternative contrast modality to complement multiphoton fluorescence microscopy, which provides information on cellular morphologies in three-dimensional tissue culture [17], [23].
Although there have been several studies on both 3PF and THG microscopies, their low signal-to-noise ratio remains a crucial issue for the delineation and segmentation of cells. The signal counts of third-order nonlinear optical microscopy are low and comparable to the noise counts resulting from the high bias voltages of the detection units. The type of noise includes signal-dependent Poisson noise and detector-dependent Gaussian noise. The former involves a random process of photon emission and the discrete nature of the photo-excited charges. The latter typically results from flicker or thermal noise in electronic systems. Under such low photon-budget conditions, the signals may have stochastic resonance effects, where sub-threshold signals can be boosted over the threshold with the help of detector noise [24]. The signal pattern within cells carries features of noise, making it difficult to extract the true signals from the background. Therefore, it is crucial to find an effective noise-filtering method to enhance image contrast while retaining structural information for further segmentation and texture analysis. Typically, researchers have applied Gaussian and median filters to remove Poisson or Gaussian noise. Poisson noise can be transformed by a stabilizing method such as Anscombe transformation into Gaussian white noise and a Gaussian noise filter can be utilized to alleviate it [25]. The nonlocal mean method performs noise filtering on image patches with similar textures. Block-matching and collaborative filtering methods, such as block-matching and 3D filtering (BM3D), have been proposed and widely used [26]. The feasibility of these enhancement approaches has been validated in various clinical examinations, including breast cancer [27], pancreatic neuroendocrine tumors [28], and hematology [29]. In addition, some researchers have used the Bayesian method to extract noise information from prior knowledge of images, train the likelihood function to predict the residual images V, and remove them from noisy observations [25], [30]. These methods perform well in many image-denoising problems; however, conventional signal processing assumes that noise is a linear addition of signals by y = S + V , where S represents the signals and V represents the noise. This model is appropriate when noise V is perturbative to S; however, in the stochastic resonance regime, V is comparable to or larger than S and the detection threshold T is larger than S in many-pixel images. The representation of overall signals should be y Therefore, the conventional de-speckle filter, median filter, or Gaussian filter may decrease the details of cellular morphology. For signal-noise entangled stochastic resonance images, these methods may not work well because of the difference in noise modeling, that is, the noisy observation is not simply a superposition of signals and noise. Hence, for low photon-budget multiphoton microscopy images acquired in high-speed or deep-tissue imaging, it remains a challenging task to enhance the signal-to-noise ratio without sacrificing structural information.
Several machine learning and deep learning-based denoising methods have been proposed and demonstrated better filtering results compared to traditional filters. In particular, deep convolutional neural network (CNN)-based algorithms have been widely applied in image classification [30], segmentation [31], and denoising [32], [33]. In the field of acoustics, researchers have proposed CNN-based methods and denoising autoencoder (DAE) architectures to perform speech denoising [34], [35], [36], where the DAE method successfully filtered background noise and improved the perceptual evaluation of speech quality (PESQ). Reference [33]. Inspired by the success of CNN and DAE methods, in this work we propose a fully convolutional deep denoising autoencoder (DDAE) method to reduce noise in low photon budget multiphoton microscopy images, especially for 3PF and THG images. In our experiments, we confirm that the DDAE model outperforms the Gaussian filter, median filter, and the benchmark BM3D algorithm in terms of signal-tonoise ratio and structural similarity. Moreover, DDAE effec-VOLUME 10, 2022 tively preserves stochastic resonance-enhanced features so that regions of nuclei or cells can be delineated more correctly, which is important for further segmentation.

A. CELL CULTURE, CELL STAINING, AND ACQUISITION OF MULTIPHOTON IMAGES
RAW 264.7 -a murine macrophage cell line -was plated on bottom glass dishes (Nest Scientific, 801001) and cultured in Dulbecco's modified Eagle's medium (DMEM) containing 10% fetal bovine serum (FBS), 100 U/ml penicillin, and 100 µg/ml streptomycin. For 3PF imaging, no further treatment was added. For THG imaging, three hours after plating, 50 ng/ml lipopolysaccharide (LPS; Sigma-Aldrich) was used to elicit inflammatory macrophages in the M1 state. After 24 h of cytokine stimulation, the medium was replaced with a normal medium for cell imaging. For 3PF imaging, 2 µg/mL Hoechst 33342 (Thermo Fisher Scientific) was used to stain cell nuclei for 5 min. Unloaded Hoechst 33342 was removed by washing the cells with normal medium. For THG imaging, there was no cell labeling. Lipid granules in M1-activated macrophages can produce strong THG signals.
Time-lapse 3PF and THG images were acquired using an inverted multiphoton microscope (A1MP + ; Nikon, Japan) and a near-infrared (800-1300 nm) femtosecond laser (InSight X3, Spectra-Physics, Mountain View, California) with a 100-fs pulse width and 80-MHz repetition rate was used as the excitation source. The operation wavelength for the 3PF and THG images was 1250 nm, which has the least on-focus phototoxicity and deepest penetration depth for biomedical samples. The laser light first transmitted an 820 nm edged the multiphoton dichroic beam splitter and was then focused through a water-immersed 40× and 1.15 NA objective. To avoid photobleaching, the Hoechst blue-labeled cells were excited at an average power of 11 mW (100 GW/cm 2 instantaneous intensity), whereas 37 mW (335 GW/cm 2 instantaneous intensity) was required to obtain detectable signals. All the pairs of the resonant scanner and galvanometer mirrors generated three-photon signals that were epi-collected by the same objective, reflected by the 820 nm edged multiphoton dichroic beam splitter, which was further reflected by a 495-nm edged dichroic beam splitter in the non-descanned detection unit, filtered by a 415-485 nm bandpass filter, and finally detected by the same photomultiplier tubes. Then, the laser was raster scanned to perform point-by-point excitation and detection, forming 512 × 512 pixel images at a 15-Hz frame rate. All the images were subsequently exported to the TIFF format for denoising and deep learning processes.

C. FULLY CONVOLUTIONAL DEEP DENOISING AUTOENCODER MODEL
We used the Keras framework to implement the DDAE model with a fully convolutional neural network architecture. A 5-fold cross-validation method was used to validate the proposed DDAE model. We divided all data into five groups. One group was considered as the testing set, and the remaining four groups were used for training. During training, we randomly selected one group as the validation set, while the other three groups were used to train the DDAE models. The cross-validation approach repeated five iterations until all groups have been tested. Next, we repeated the 5-fold crossvalidation three times, with random selection of grouped data for each round. The final reported average results across all folds from all runs were used to assess the reliability of the proposed DDAE.
We trained the model with 50 epochs for the THD and 3PF, and early stopping with the patience of 10 epochs was applied for model training to avoid overfitting. The loss for training DDAE was the mean-squared error. The architecture of DDAE and its details are shown in Table 1. As shown in the table, different numbers of filters are investigated in this work.

D. ANALYZING THE CONTRAST AND STRUCTURAL SIMILARITY OF RESTORED IMAGES
To evaluate the signal-to-noise ratio and the restoration of structural information, we used peak signal-to-noise ratio (PSNR) analysis [37] and the structural similarity index (SSIM) [38].

E. ANALYZING THE RESTORATION OF NUCLEAR AND CELLULAR BOUNDARIES
To assess how well the denoising filter retained the stochastic resonance-enhanced features in cells, we analyzed the precision, recall, specificity and F-measure of nuclear and cellular boundaries from the denoised image Img O and the low-noise answer image Img Ans . The contrasts of Img O and Img Ans were first enhanced through histogram equalization (see Fig. 1), and then binarized using the intensity auto-threshold method -IsoData -thereby obtaining the binary images of bImg O and bImg Ans , respectively. The boundaries of the binarized answer images could precisely depict the boundaries of the cells and nuclei [ Fig. 2(a)]. bImg AND is the overlap of bImg O and bImg Ans , representing the true positive of the nucleus (in the 3PF images) or cell (in the THG images) regions. We also use a NOT OR gate to the bImg O and bImg Ans to obtain the true negative of the nucleus bImg NOR, and inverse bImgAns (as bImg IAns ) to represent the actual negative. Finally, we used the pixel counts of the five binary images, count O , count Ans , count AND , count NOR , and count IAns to compute the precision, recall, specificity and F-measure of the denoised images, using the following functions: We also created pseudo-color images to visualize the retained regions of nuclei or cells (blue, Fig. 1), false negative regions of bImg Ans -bImg AND (green, Fig. 1), and false positive regions of bImg O -bImg AND (magenta, Fig. 1). We then combined these three pseudo-color images to obtain the superposition image [ Fig. 2(b)], such that the mismatch of boundaries could be visualized.

III. RESULTS
To evaluate the denoising performance of the different approaches, we performed 1250 nm excited 3PF and THG microscopy on RAW 264.7. The 3PF contrast mostly labeled the nuclei, and the THG contrast revealed lipid granules within cells. The excitation intensity used for the 3PF microscopy was too low to generate sufficient THG signals and for 3PF imaging, we did not activate the proliferation of lipid granules in the RAW cells. There was no crosstalk between the 3PF and THG signals.
At each observation location in a petri-dish, we acquired 200 images at a 15-Hz frame rate, with fixed excitation power, using the same detection channel and bias voltage. The acquired images served as low-photon-budget images to be denoised. To evaluate the denoising performance, we obtained a low-noise answer image of the corresponding location from the average of the acquired 200 images. The low-noise answer image also served as a low-noise answer for DDAE training. Several studies have utilized a similar approach to obtain ground-truth images [39], [40], [41] and we referred to the same principles to select the required number of images for ground-truth estimation [42].
For each imaging modality, we selected 31 locations on each petri-dish and acquired 31 image batches. All the batches were used for 5-fold cross-validation to validate DDAE. Then, we compared the results with those processed by traditional denoising methods such as Gaussian filter, median filter, and BM3D. We measured the quality of the results using PSNR and SSIM to represent the fidelity of the signal and structures. BM3D. We measured the quality of results by PSNR and SSIM to represent the fidelity of signal and structures.

A. DENOISING OF THREE-PHOTON FLUORESCENCE IMAGES
Three testing cases of 3PF images were sampled from each testing batch. Given that the images were acquired at a 1/15-s frame time, they contained a lot of salt-andpepper noise [ Fig. 3(a), upper row]. The stochastic resonanceenhanced signals made the cells faintly discernible [ Fig. 3(c)]. By applying the trained DDAE model, they showed a great improvement in image contrast [ Fig. 3(a), bottom row]. The nuclear boundaries were well preserved [ Fig. 3(d), yellowdashed contour]. For cells with relatively low signal levels in low-noise answer images [ Fig. 3(b), indicated by blue arrows], noise cannot boost them in the high frame-rate image, and the DDAE cannot restore them in such a situation. Compared with traditional filtering methods such as the Gaussian filter, median filter, and BM3D (Fig. 4), the results of the 3PF DDAE denoising surpass all of them on the PSNR and SSIM scores ( Table 2 and Appendix Table 4). In general, the DDAE method achieves a higher signal-to-noise ratio of 28.86 PSNR and retains more structural information of 0.89 SSIM. Among the traditional methods, by choosing the optimal sigma value, the performance of the BM3D filter is the best with 38.66 PSNR. It shows a relatively clear cellular outline but loses several intracellular details with 0.88 SSIM. As for the results of the median filter, although it can retain a few significant signal spots, it cannot show nuclear boundaries. The Gaussian filter method generally performs better than the median filter, but it contained significant noise and vague nuclear boundaries in the results. In brief, 3PF DDAE model can improve the image quality better than most traditional denoising methods, and users can obtain results in a few seconds without time-consuming steps, such as by trying optimal sigma values.

B. DENOISING OF THIRD HARMONIC GENERATION IMAGES
Similarly, we built THG DDAE models by applying 31 batches to 5-fold cross-validation, composed of 200 THG images acquired at 31 different locations on the petri-dish.  Instead of the nuclei, the THG images revealed granules in the cytoplasm (Fig. 5), which delineated the outline of the cells. The results (Fig. 6) show that the THG DDAE model also outperforms most traditional methods in terms of both the PSNR and SSIM scores (Table 2 and Appendix Table 5), except BM3D. In addition, among all of the denoising methods, the PSNR of BM3D and the SSIM of Gaussian filter are the best, which achieve 26.38 and 0.52, respectively. The BM3D method performs relatively well in terms of its signal-to-noise ratio, but becomes vague at the cellular boundaries. In the Gaussian filter, we find that it can retain the general structure of cells, but there is still significant noise in the background. Similar to the case in 3PF images, the median filter retains the structures and signals of the cells. In general, the THG DDAE model was the better choice for denoising THG microscopy images in terms of PSNR (=21.66) and SSIM (=0.70).

C. PRESERVATION OF NUCLEAR AND CELLULAR BOUNDARIES
To understand whether the DDAE can preserve more stochastic resonance-enhanced signals than the other denoising filters, following the procedures described in the Methods section, we identified pixels that represent the regions of nuclei or cells in both denoised and low-noise answer images. We then computed their precision, recall rates, and F1 scores (Table 3 and Appendix Table 5) to evaluate how well the boundary information was preserved after denoising ( Figs. 7 and 8). For the 3PF cases, we found that the 58-59% precision rate of the DDAE mode is higher than that of most of the filters, except for a few cases of median filters (σ = 3 or 5) in extreme situations that have very low recall rates of 8% and 10%. The 76-78% recall rate of DDAE performed better than most BM3D and median filters but not as good as Gaussian Filters (84-91%). This indicates that DDAE generates more false negatives (green in Fig. 7) than BM3D in the analysis of nuclei boundaries. Balanced with the F1score, which takes precision and recall as equal weighting, the DDAE16 model has an average score of 0.667, which is much higher than that of the other filters. For the THG cases, the signal level and dynamic range of the low-noise answer image were much lower than those of 3PF. We obtained an average 65-66% precision rate for DDAE, which is lower than part of the median filter results. However, DDAE's average VOLUME 10, 2022  83% recall rate is higher than theirs. This result indicates that DDAE may result in more false-positive pixels and fewer false-negative pixels in the THG images of the cytoplasm. Balanced with the F1-score, the 0.736 scores of the DDAE model again outperformed the other filters. These results indicate that for low photon-budget images containing stochastic resonance-enhanced signals, DDAE can retain nuclear or cellular boundaries more accurately for further segmentation.

IV. DISCUSSION AND CONCLUSION
For low photon-budget multiphoton biomedical imaging, it is crucial to find a balanced denoising method to preserve stochastic resonance-enhanced regions and retain cell boundary features for further segmentation. The despeckle denoising strategy removes most of the noise-boosted signals and corrodes the cell regions, whereas the low-pass spatial filtering strategy sacrifices resolution and expands cellular  regions. Nonlocal mean algorithms such as BM3D and Gaussian filters average the patches with similar textures and achieve state-of-the-art performance; however, these methods involve either time-consuming optimization processes or manually chosen parameters, which result in low com-putational efficiency when pursuing high performance. This problem becomes an analysis bottleneck for many highthroughput 3D microscopies.
Our results for the compact DDAE model revealed its capability to realize both feature preservation and noise filtering.   After several epochs of training, the validation loss was significantly reduced (Appendix Fig. 9 and 10). The optimization and parameter selection process can be accomplished in advance during the training of the DDAE model, and the denoising process requires only a few seconds. Even at such a high denoising speed, the DDAE model outperformed other conventional methods in preserving signal regions but sacrificing a bit of denoising ability in PSNR. It shows that DDAE under low-phonon budget conditions is a better trade-off approach between structural similarity and the preservation of the nuclear or cellular boundaries. This may be owed to the nonlinear transformation characteristics of deep learning. The stochastic noise in low photon-budget multiphoton imaging is not a perturbative interference to signals. It can nonlinearly boost subthreshold signals above the detection limit. Hence, methods such as Gaussian filters and BM3D may be unable to handle such a situation well within a short computation time. In contrast, the deep learning model could learn the nonlinear features from the training dataset, effectively suppress the noise, and correctly preserve the cellular boundaries. This pilot study explored the effectiveness of the DDAE system for boundary-preserved image denoising. We plan to explore advanced deep learning models to further improve denoising performance.
This study also investigated the effects of three filter sizes (m = 8, 16, and 32) on the denoising performance of DDAE. The results show that the performance of DDAE models with different filter sizes varies while using different images and evaluation metrics. For example, DDAE with m = 32 achieves the best PSNR and SSIM in 3PF images but has relatively limited performance in THG images (Appendix Table 4 and 5). However, the differences between DDAE models are extremely low (<1%). It shows the proposed DDAE model with smaller filter sizes (m = 8) requiring less computation complexity is sufficient and more suitable to enhance image quality and preserve nuclear and cellular boundaries.
Previous studies have demonstrated the need for effective image denoising tools in clinical applications. For example, [14] white blood cell counting is a routine practice for medical diagnosis, which requires high-SNR white blood cell images of size, shape, structure, and nucleus-to-cytoplasm ratio to accurately classify and identify cells based on morphological phenotypes. The proposed DDAE could restore nuclear and cellular boundaries, which supports nucleus segmentation in white blood cell counting [43]. Texture analysis after segmentation could further improve the accuracy and reliability of white blood cell counting.
In future works, because the DDAE model has the potential to improve image quality from various types of image sources, we plan to implement our DDAE model on different types of microscopy images for practical medical technology applications. To reduce the operation time, we plan to develop a lightweight DDAE using advanced machine-learning techniques at a low photon budget condition and under limited hardware resources, such as by using pruning and tinyML. In addition, Lehtinen et al. showed that without clean dataset, the deep learning model can still perform good image restoration [44]. We will also attempt to train the neural network with cleaner images excited by a higher laser power and detected at a lower gain. A reliable denoising method can speed up applications, such as high-throughput image segmentation, fast 3D morphodynamic analysis, and long-term cell tracking. We expect that this denoising method can be organically integrated into the process of imaging analysis, help improve the processing efficiency of many high-throughput microscopies, and achieve online denoising on hardware.