Deep DIH: Single-Shot Digital In-Line Holography Reconstruction by Deep Learning

Digital in-line holography (DIH) is broadly used to reconstruct 3D shapes of microscopic objects from their 2D holograms. One of the technical challenges in the reconstruction stage is eliminating the twin image originating from the phase-conjugate wavefront. The twin image removal is typically formulated as a non-linear inverse problem since the scattering process involved in generating the hologram is irreversible. Conventional phase recovery methods rely on multiple holographic imaging at different distances from the object plane along with iterative algorithms. Recently, end-to-end deep learning (DL) methods are utilized to reconstruct the object wavefront (as a surrogate for the 3D structure of the object) directly from the single-shot in-line digital hologram. However, massive data pairs are required to train the utilize DL model for an acceptable reconstruction precision. In contrast to typical image processing problems, well-curated datasets for in-line digital holography do not exist. The trained models are also highly influenced by the objects’ morphological properties, hence can vary from one application to another. Therefore, data collection can be prohibitively laborious and time-consuming, as a critical drawback of using DL methods for DH. In this article, we propose a novel DL method that takes advantages of the main characteristic of auto-encoders for blind single-shot hologram reconstruction solely based on the captured sample and without the need for a large dataset of samples with available ground truth to train the model. The simulation results demonstrate the superior performance of the proposed method compared to the state-of-the-art methods used for single-shot hologram reconstruction.


I. INTRODUCTION
Digital Holography is a powerful imaging technique which is used to record the information of the three-dimensional (3D) surface of an object from a two dimensional (2D) image captured by visual sensors.It is mainly used for the investigation of micro-scaled as well as nano-scaled objects, and is used in wide range of different applications areas such as chemistry [1], biomedical microscopy [2], nano-material fabrication [3], [4], and nano-security [5].
Digital holography can be used in several different modalities, including that of in-line digital holography transmission imaging for mostly transparent objects [6].The sample modulates the wavefront phase of the emitted linearlypolarized laser beam.The 3D structure of the object can be easily reconstructed from the recovered phase information, as shown in Fig. 1.
Regardless of the object type, there are two different implementation approaches for digital holography, off-axis holography [7], and in-line holography [8].In Off-axis holography, the laser beam is split into two waves, the reference wave denoted by R and object wave denoted by Fig. 1.The amplitude and phase of the wavefront can be extracted from the recorded hologram using numerical methods.The phase information represents the surface depth or the thickness of the object, that can be used to reconstruct the 3D view of the object.This figure shows that how digital holography record the 3D information of an object into 2D form.
O, where only the latter passes through the object.The two waves are combined with a small relative incidence angle θ at the exit of the interferometer to create the hologram intensity as I H (x, y) = |R|2 + |O| 2 + R * O + RO * , where X * denotes the complex conjugate of X.The relative angle causes the real images and twin images to formed in separable locations in Fourier space.This spatial separation facilitates easier phase recovery through filtering in the Fourier domain.However, this method faces practical implementation problems as it requires an accurate synchronization between the reference and object waves that become prohibitively hard for nanoscaled imaging.An accurate characterization of the reference wave based on the FresnelKirchhoff integral is also required for the numerical phase reconstruction [6], [9].Digital inline holography (DIH) uses only a single laser beam with numerical reconstruction by the angular spectrum algorithm for phase retrieval.Other advantages of DIH, include the elimination of the need for objective lenses, the simplicity of sample preparation with no need for sectioning and staining, as well as its high-speed imaging capabilities [2].
To further explore the physical model behind the concept of twin image removal, we investigate the process of inline holography, as shown in Fig. 2. Suppose that we have an object field ρ(x, y) and the propagation transfer function function h(x, y), the scattered wave O(x, y) can be described as [10]: where Σ represents a aperture window.The transmittance function h(x, y) depends on the light wavelength λ and the propagation distance z between the image plane and the arXiv:2004.12231v2[eess.IV] 24 Jun 2020 hologram.The transfer function in frequency domain is: where k = 2π/λ is the wave number.In addition to the diffracted wave O(x, y), there exists a non-scattered reference wave R(x, y).The Hologram I H (x, y) records the intensity of the mixed waves captured by the light sensors and can be expressed as: where we define U (x, y) = OR for notation convenience.
The captured hologram includes the object field O(x, y) and its conjugation O(x, y), respectively, representing the virtual and real images [6].This phenomenon leads to the twin image problem present during the reconstruction.As one focuses on one of the holographic terms, the out of focus conjugate smears the reconstructed image.Noting that the unscattered field (|R| 2 ) can be assumed one with the loss of generality and can be removed from the hologram.Also, the term |O| 2 can be regarded as the noise term n(x, y).Therefore, the problem of reconstructing the object field boils down to removing the twin image [10], which has been the center of attention in many prior works [2], [11]- [13].If we define transformation T : ρ(x, y) → U (x, y).Therefore, the image reconstruction can be recast as the following standard inverse problem: Both U (x, y) and its conjugation U * (x, y) are interchangeably consistent with the solution of Equation.4 which could both be the solution to this problem, the reconstruction of the digital in-line holography is typically under-determined.Also, standard inverse problems may not be utilized to solve Equation.4, as it includes the non-linear transformation and the symmetric diffracting which towards the opposite direction.
There exist several means for solving the twin image problem.Recording a collection of holograms at different propagation distances and reconstructing the object field by the Transport of Intensity (TIE) method has yield promising results [14], [15].Most conventional phase retrieval methods use the following TIE imaging equation to recover the phase term φ(x, y) [12], [16], [17]: where I(x, y) is the hologram intensity, λ is the wavelength, and ∇ is the gradient operator in the lateral dimensions (x, y) [12].When the intensity is constant (or normalized), the following simplified equations can be used to recover φ(x, y) [11], [12], [18]: Since then, several extensions to the TIE method are proposed in the literature to extend it for different applications including volume holography [19], and holographic x-ray imaging [20].One technical difficulty in solving Equation.5 and Equation.6 is the need for multiple imaging at finetuned distances from the focal plane (i.e, ∆z, 2∆z, . . . ) to precisely quantify the gradient term ∂I(x, y)/∂z using least square method [21], hybrid linearization method [22], [23], and iterative methods [11].Therefore, developing methods that can recover phase information from only one measurement has obvious practical advantages.Phase retrieval (PR) is one of the most commonly used numerical approaches which perform doubleside constraint iteration with a specific support region.Mathematically, the in-line hologram provides an undesirable component that can be traced to the loss of phase information.PR permits the separation of real-object distribution from the twin-image interference.Gerchberg-Saxton (GS) algorithm [13], [24]- [26] and Hybrid input-output (HIO) algorithm [27], [28] perform iterative phase retrieval followed the below steps: • Step 1: Let ρ (n) be a trial scattering density in the n th iteration cycle.• Step 2: Let ρ (n) be a density obtained from ρ (n) by Fourier transform.• Step 3: Replacing all Fourier amplitudes by the experimentally observed amplitudes, and applying inverse Fourier transform.• Step 4: Imposing constraints to the object plane in the support region.the support region is usually designed based on a known prior.In GS algorithm, the object plane ρ n in the support region γ are constraint as: while the HIO algorithm deploys a relaxing factors β to reduce the probability of stagnation that contains feedback information concerning previous iterations as: Although PR shows excellent performance on the object reconstruction.Due to the double-side constraint iteration with a specific support region, the reconstruction area is under a severe limitation.Recently, deep learning based approaches [29]- [31] were proposed for end-to-end digital hologram reconstruction and proven effective by utilizing the outstanding learning capability of deep convolutional neural networks (CNNs).As a universal approximator, CNNs are widely used in solving inverse problems in the field of computer vision.The general workflow of the deep learning method is first training a CNN on labeled data pairs (holograms, and twin image free phase and amplitude), then using the well-trained CNNs to predict the unlabeled data.Deep learning based methods are typically data-driven approaches that massive data pairs are needed for training the CNN.In most natural image processing tasks, massive data pairs are easily accessible.Unfortunately, digital holography is usually deployed in biomedical imaging that getting large amounts of data is costly since both capturing holograms and generating the corresponding ground truth is pretty difficult.Meanwhile, the CNNs are regarded as black boxes when the training and inferring steps are invisible and unexplainable.That means when using a well trained CNN to reconstruct the hologram, it is impossible to deal with the upcoming problems if the reconstruction is not correct.
In [10], a compressive sensing (CS) approach to reconstruct a twin image free hologram was proposed.The CS method is able to remove the twin image with single-shot hologram and does not need massive training pairs.As a physics-driven method, the CS method lies on the sparsity difference between the reconstructed object and the twin image that filters out the diffuse conjugated signal by imposing sparsity constraints on the object plane.Total variation (TV) norm is suitable for removing twin image since the in-focus object has sharp edges while the out-of-focus twin image is diffuse.A two-step iterative shrinkage/thresholding (TwIST) algorithm is used in [10] to address the twin image removal problem by minimizing an objective function formed by Mean Square Error (MSE) and TV norm: where τ is the relative weight between the TV norm 2 and the MSE term.The ∆ x i and ∆ y i refers to the horizontal and vertical first-order gradients.
Based on the idea proposed in [32], the reconstruction with a more dense edge matrix commonly suffers a more outof-focus twin image as well as has a larger TV norm.The CS method has been proven more effective than PR that can reconstruct a more clear and twin image free hologram.It still has a couple of problems.Deploying TV norm to remove the twin image should make a trade-off on the relative weight τ .Since large values of τ lead to blur the reconstruction and small values of τ have a weak effeteness on twin image removal.Also, imposing sparsity constraints on an image restoration problem leads to edge distortion.In this paper, a novel deep learning implementation based on fitting an untrained auto-encoders to the possible solutions of a single captured hologram through minimizing a physicsdriven object function.This method performs noise reduction and twin image removal simultaneously and does not require massive data to train the model.In the presented manner, we do not suppress or remove the twin image in the reconstruction.Instead, we directly fit the CNNs to search the possible intensity and phase of the target 3D object consistent with the captured hologram.We show that neural networks equipped with convolutional layers naturally tend to produce a more transparent result.Experimental results prove the feasibility and the superior performance of the proposed method over the existing CS methods.

II. DEEP LEARNING SCHEME
A deep network with encoder-decoder architecture which is also called auto-encoder maps a high dimensional input x into low dimensional latent code z = f encode (x) and reconstruct a high dimensional output x = f decode (z) from the latent code.The common formulation used in supervised image restoration is to minimize the error between the output x and the ground truth y.In [33], an unsupervised blind image restoration called Deep Image Prior (DIP) has proven that fit a randomly initialized CNN to a single corrupted image is able to recover the clean image since the CNNs could naturally learn the uncorrupted and realistic part.Inspire by DIP, we consider using the same scheme with DIP to remove the twin image in the reconstructed object plane.But there arises another problem that there is a high coupling between the virtual and real object plane in both spatial and frequency domain, and the CNNs will generate an output with the twin image.As mentioned in [10], the twin image term is denser than the object term.Here we investigate a novel learning procedure that using the physical model in the objective function in the training process, as shown in Fig. 3. Assume there is an autoencoder with random initiated weights w, the output reconstruction ρ can be expressed as ρ = f (x, w), where x is a fixed input.And the objective function could be formulated as: where we want to propagate the reconstruction to the hologram plane with the transmission T and minimize the error between the captured hologram and the forward-propagated results.When minimizing the object function, the network actually performs searching the possible results from parameter space.Through the experiments we conduct, which will be shown later in this paper at Section.IV, the network tends first to generate the primary instance, which is the rough shape of the reconstructed object.Then the network gradually recovers the details from different levels of the object.This phenomenon usually causes the network applied for other image recovery tasks such as denoising and superresolution overfit the degraded term in the corrupted image.However, in the case of hologram reconstruction, both the twin image and clean object could be the solution of the non-linear inverse problem.Therefore, after generating the main body of the object, the network continues to generate the real details instead of the twin image.Fig. 3.This figure shows the learning procedure of the proposed method.After feeding a fixed input into the network, the network generates a reconstructed result.The reconstructed result will be propagated to the hologram plane by the transmission depending on the optical parameters.The network updates its weights by minimizing the pixel error between the forward-propagated results and captured hologram.

III. IMPLEMENTATION OF AUTO-ENCODER
We use the wavelet transform as the downsampling method as an alternative of pooling or strided convolution.According to the previous work [34], using a wavelet transform could impose sparsity on the reconstruction object plane.Therefore, we take Haar wavelet and its inverse transform as the downsampling and upsampling method in our network.The Haar wavelet decomposes the input image or feature map into four sub-band by four convolutional filters (one low pass filter f LL , three high pass filters f LH , f HL , and f HH ).The four filters are defined as: , and f HH = 1 −1 −1 1 .The four sub-bands are obtained by convolution operation as x LL = (f LL ⊗x), x LH = (f LH ⊗x), x HL = (f HL ⊗x), and x HH = (f HH ⊗ x), where ⊗ refers to convolution operator.The inverse transform of Haar wavelet in (x, y) − th pixels can be written as: x(2i, 2j) = 1 4 (x LL (i, j) + x LH (i, j) + x HL (i, j) We first build a Auto-encoder with "Hourglass" architecture, as shown in Fig 4 .The encoder f e (ρ) maps the fixed network input into lower-dimensional manifold, and the decoder f d (f e (ρ)) recover the object we want from the latent code.It is noticeable that during our experiment, we found that if skip-connection is used in the CNNs, the network will identically map the input to the output instead of searching a possible result.

IV. SIMULATION RESULTS
In this section, several comparison experiments with the CS method used in [10] are conducted on several simulated holograms to verify the feasibility of the presented methodology.We implement our model using the PyTorch Framework [36] in a GPU workstation with an NVIDIA Quadro RTX5000 graphics card.Adam optimizer [37] is adopted and set with a fixed learning rate at 0.0005.We train the network for 1500 to 3500 epochs for different holograms.For CS method, We set the relative weight of TV norm between 0.01 to 0.1 based on different holograms, as well as training iteration between 150 to 350.
Three metrics are used to evaluate the reconstruction quality.The mean squared error (MSE) measures the average of the squares of the pixel-wise errors between ground truth image and reconstructed image, which is defined as: Peak signal-to-noise ratio (PSNR) is an engineering term for the ratio between the maximum possible power of a signal and the power of corrupting noise that affects the fidelity of its representation.PSNR is most easily defined via the MSE that can be expressed as: The Structural Similarity Index (SSIM) [38] is a perceptual metric that quantifies image quality degradation* caused by processing such as data compression or by losses in data transmission.The SSIM has been proven to be more consistent with the human visual system when compared to PSNR and MSE that the SSIM quantifies the changes in structural information by inspecting the relationship among the image contrast, luminance, and structural components.
The SSIM between two images is given by: where µ x , µ y , σ x , σ y , and σ xy are the local means, standard deviations, and cross-covariance for images x, y.C 1 and C 2 are two variables to stabilize the division with a weak denominator.Fig. 5 compares the amplitude reconstruction of the proposed method with the CS method on the simulated USAF resolution chart.The image is resized into 1000×1000 pixels.The illumination light is set with a wavelength at 532 µm, and a complementary metal-oxide-semiconductor(CMOS) sensor is set with a pixel size of 4 µm.The distance between objects and the sensor is 1.2 cm.The proposed method produces a higher quality reconstructed image, which is more similar to the ground truth.Also, a Canny edge detector  is used to extract the edge matrix of the enlarged area in these two reconstruction results and ground truth images.The edge matrix shows that the presented method has a better denoising capacity than the CS method, even without any hand-craft prior such as the TV norm.
Further experiments illustrate that the proposed method also dramatically improves the ability to restore detailed textures and phase information.A simulation on cell image shown in Fig. 6 is used to inspect that by taking advantage of the natural superiority of CNNs for image processing problems, the proposed method is able to restore more explicit details on images with more complex structures, as well as more precise phase information.For simulating a hologram with an implicit phase, we apply the grayscale image of the RGB image as the reference amplitude and the green channel as the reference phase.The hologram size is 500 × 500 that are generated with the same light wavelength and object-to-sensor distance on a sensor with a pixel size of 1.67 µm.Compare with the CS method, our result maintains more structural textures to both the ground truth amplitude and phase.Another simulation followed the same configuration on a human dendrite image to provide a further prof to the outstanding phase reconstruction ability of our method.The reconstruction results are shown in Fig. 7.Here we can see that our method reconstruct a image with more clear detailed texture both for amplitude and phase that let the result has a higher SSIM and PSNR with the ground truth image.
To reveal the reason why the presented method could fit the networks to obtain the required results, an experiment on a Pi image is conducted.In this experiment, the reconstruction results at different training iterations are shown in Fig. 8.During the optimization process, the CNNs tend first to restore the general shape of objects and add details to them.This characteristic explains why our method works.Compared with objects, the twin image usually shows a more obscure shape.Therefore, when the network is used to restore the object from a captured hologram, it will converge before the twin image is recovered as the clean object is the solution to the inverse problem.

V. OPTICAL EXPERIMENTS
In order to verify the performance of the proposed method in real-world data, a series of optical experiments are conducted in the laboratory.Fig. 9 illustrates the configuration   for the lensless Gabor DHM system used in our experiments.The light source consists of a Thorlabs single mode fibercoupled laser.A pigtailed light beam is emitted to a single mode fiber that is terminated at an FC/PC bulkhead.The sample is placed between the light source and an image sensor (Imaging Source DMM 27UJ003-ML -pixel size 1.67µm) with an object reconstruction distance z.Performing hologram reconstruction in piratical is a relatively harder task than in simulation as a consequence of the error between the actual parameters and the preset parameters in the experiment.Meanwhile, the influence of ambient light and air dust in the environment leads to high noise in the real hologram.Therefore, the algorithm applied in real-world data is expected to be robust to noise and optical parameter error.
Fig. 10 shows the reconstruction result on a USAF positive high-resolution test target (which means the stripes and digits are thicker than the background).An illuminated plane wave at the wavelength of 406 µm is used, and the distance between the target and the image sensor is set at around 1110 µm.A multi-height TIE based algorithm is used for comparison with ten captured hologram with a step-size 15um between the adjacent hologram planes.In previous deep learning based work [29]- [31], multi-height TIE based algorithms are used for producing the ground truth of the training pairs that have been proven to hold an excellent performance.The reconstructed amplitude and phase show the outstanding denoising and twin image removal capability of the proposed method that the reconstructed results have comparable quality to the multi-height methods with singleshot hologram.The enlarged area proves that our approach can retain high-quality details to a great extent while removing the twin image at the same time.The effeteness of the twin image removal ability is quantified as a mean edge factor, which is calculated as 1 N M N,M i,j=0 A i,j , where the A i,j is the edge matrix obtained by the Canny edge detector.We choose the Canny edge detector for getting edge matrix since it is more sensitive than the Sobel operator.The mean edge factors for multi-height method is 0.0990, respectively, 0.1210 for our deep learning based methodology.
We also show the reconstruction of our method at different training iterations to examine the theoretical explanation we proposed in Section.II in Fig. 11.The results shows that our interpretation of why the presented method works still holds true for real-world data.
An experiment on a sectioned dysplasia tonsillar mucosa tissue is conducted to verify the potential of our method on biomedical usage.The tissue holography could be used to analyze beforehand with clinical histological diagnosis.The hologram is captured with an illuminated plane wave with a wavelength at 0.635 µm and an object to sensor distance set at 857 µm.Fig. 12 shows the captured hologram and reconstruction.The reconstructed phase shows the relative depth of the tissue structure that could be used to reconstruct the 3D surface of the tissue.Another experiment on a nonkeratinizing squamous cell carcinoma is shown in Fig. 13 also proves the effeteness of the proposed method on biomedical

VI. CONCLUSION
In summary, a deep learning method for single-shot reconstruction of In-line Digital Holography reconstruction is proposed in this paper.The physical symmetry of the holography lead object image and twin image both can be the solution of the hologram.With a given prior, the Autoencoder is able to reconstruct the object image.The proposed method has been proven powerful and potential through both simulated and optical hologram experiment.Although deep learning based method a relatively time consuming, compared to the complex experimental setup of multi-height phase retrieval, our method is cost-effective.

Fig. 2 .
Fig. 2. The twin-image issue: the scattered object wave interferes with the unscattered reference wave in the inline holography.

Fig. 4 .
Fig. 4.This figure shows the deep convolutional autoencoder with hourglass architecture used in this paper.We deploy Batch Normalization [35] after each convolution layer except the last three layers to stabilize the training steps.

Fig. 5 .
Fig. 5.The reconstruction intensity of USAF resolution chart, the enlarged area from reconstruction ,and the edge matrix obtained by Canny edge.(A) Ground Truth.(B) Our method.(C) CS method.Although the proposed method does not use TV norm as prior to remove the twin image, it still reconstruct a more clear image with sparser edge matrix compare to the CS method [10].

Fig. 6 .
Fig. 6.Cell image reconstructions and the evaluating metrics for amplitude (phase): (A) The RGB image and simulated hologram.(B) The reference amplitude and phase.(C) Our method.(D) CS.Here we can see that our method reconstruct a image with more clear detailed texture both for amplitude and phase that let the result has a higher SSIM and PSNR with the ground truth image.

Fig. 7 .
Fig. 7. Human dendrite image reconstructions and the evaluating metrics for amplitude (phase): (A) The RGB image and simulated hologram.(B) The reference amplitude and phase.(C) Our method.(D) CS.In this experiment, the proposed method has been shown that have a much better performance on phase information recovery that the CS method.

Fig. 8 .
Fig. 8. Pi image restoration at 100, 200, 500, 1000, and 1500 training epochs.Obviously, the rough shape of the object is restored first, then more details and sharp edges are restored.

Fig. 10 .
Fig. 10.(A) The captured hologram of the USAF positive high-resolution test target.(B) Multi-height reconstruction.(C) The proposed deep learning reconstruction.

Fig. 11 .
Fig.11.USAF positive high-resolution test target restoration at 100, 300, 500, 1000, 1500, 2500, and 500 training epochs.The reconstruction still follows the regular pattern that the rough shape is restored first and details is restored later.

Fig. 12 .
Fig. 12. Optical Experimental hologram of USAF Resolution Chart and reconstructions.(A) The captured hologram.(B) Amplitude reconstruction with our method.(C) The reconstructed quantitative phase with our method.

Fig. 13 .
Fig. 13.Optical Experimental hologram of a non-keratinizing squamous cell carcinoma and reconstructions.(A) The captured hologram.(B) Amplitude reconstruction with our method.(C) The reconstructed quantitative phase with our method.