Resolution analysis in a lens-free on-chip digital holographic microscope

Lens-free on-chip digital holographic microscopy (LFOCDHM) is a modern imaging technique whereby the sample is placed directly onto or very close to the digital sensor, and illuminated by a partially coherent source located far above it. The scattered object wave interferes with the reference (unscattered) wave at the plane where a digital sensor is situated, producing a digital hologram that can be processed in several ways to extract and numerically reconstruct an in-focus image using the back propagation algorithm. Without requiring any lenses and other intermediate optical components, the LFOCDHM has unique advantages of offering a large effective numerical aperture (NA) close to unity across the native wide field-of-view (FOV) of the imaging sensor in a cost-effective and compact design. However, unlike conventional coherent diffraction limited imaging systems, where the limiting aperture is used to define the system performance, typical lens-free microscopes only produce compromised imaging resolution that far below the ideal coherent diffraction limit. At least five major factors may contribute to this limitation, namely, the sample-to-sensor distance, spatial and temporal coherence of the illumination, finite size of the equally spaced sensor pixels, and finite extent of the image sub-FOV used for the reconstruction, which have not been systematically and rigorously explored until now. In this work, we derive five transfer function models that account for all these physical effects and interactions of these models on the imaging resolution of LFOCDHM. We also examine how our theoretical models can be utilized to optimize the optical design or predict the theoretical resolution limit of a given LFOCDHM system. We present a series of simulations and experiments to confirm the validity of our theoretical models.


Introduction
High-throughput optical microscopy is essential to various biomedical applications such as cell cycle assay, drug development, digital pathology, and high-content biological screening [1,2]. For conventional whole slide imaging (WSI) systems, in order to capture a high-throughput image with both high-resolution and large field of view (FOV), mechanical scanning and stitching are required to expand the limited FOV of a conventional high magnification objective [3], which not only complicate the imaging process, but also significantly increase the overall cost of the system. The recently developed computational microscopy techniques provide new opportunities to create high-resolution wide FOV images without any mechanical scanning and stitching, such as synthetic aperture interferometric microscopy [4][5][6][7][8][9], Fourier ptychography microscopy (FPM) [10][11][12][13][14][15][16], and lens-free on-chip microscopy [17][18][19][20]. Among these approaches, the lens-free on-chip microscopy has unique advantages of achieving a large effective numerical aperture (NA) ∼ 1 across the native FOV of the imaging sensor tens of mm 2 , based on a so-called unit-magnification configuration, where the samples are placed as close as possible to the imaging sensor [21,22]. Without requiring any lenses and other optical components between the object and the sensor planes, lens-free on-chip microscopy allows to significantly simplify the imaging system and meanwhile effectively circumvent the optical aberrations and chromaticity that are inevitable in conventional lens-based imaging systems [23,24]. There are two typical designs for a lens-free on-chip microscope, so-called contact-mode shadow imaging-based microscope [17,25] and lens-free on-chip digital holographic microscope (LFOCDHM) [21,26]. In the contact-mode shadow imaging-based microscopes, the distance between the sample and the sensor need to be quite small (typically less than 10 µm), and the captured shadows of the objects can be regarded as a two-dimensional absorption image of the specimen [27]. However, the small distance is very difficult to achieve in practice due to the existence of protective glass covering the surface of the camera sensor. In LFOCDHM, the distance between the objects and the sensor chip can be sizeable, and diffraction patterns are generated from the interference between the scattered light from each object and itself or the unscattered background light. The diffraction patterns are be digitally processed to reconstruct an image of the specimen, and the associated twin-image artifacts need to be eliminated or partially removed relying on computational phase retrieval algorithm [28,29]. In the following analysis, we will examine LFOCDHM exclusively.
Despite the advantages mentioned earlier, the LFOCDHM systems generally suffer from low imaging resolution which is far from enough to meet the demand of recent biomedical research, particularly with respect to the visualization of cellular or subcellular details of biological structures and processes. Unlike conventional coherent diffraction limited imaging systems, where the limiting aperture is used to define the system performance, typical LFOCDHM systems only produce compromised imaging resolution that far below the ideal coherent diffraction limit. According to Nyquist-Shannon sampling theorem, the resolution of the holographic reconstruction is fundamentally limited to the sampling resolution of the imaging devices since the recorded holographic fringes are not magnified. In other words, the physical pixel-size is one important limiting factor of these lens-free imaging systems [27]. Because of the spatial aliasing/undersampling, the imaging sensor will fail to record holographic oscillation corresponding to high spatial frequency information of the specimen. To address this problem, pixel super-resolution (SR) methods have been proposed in which the hologram with a smaller effective pixel size can be synthesized from multiple low-resolution (LR) measurements through specific computational algorithms [17,18,25,26,30]. With these pixel SR methods, the imaging resolution of the LFOCDHM systems can be improved from Nyquist-Shannon limit (half-pitch lateral resolution of ∼ 2µm, effective NA of ∼ 0.1 − 0.2) to an effective numerical aperture of ∼ 0.4 − 0.5 [17,18,26,31]. Even though the achieved imaging resolution is still only less than half of the ideal coherent diffraction limit (NA ∼ 1). The reason for this is that besides the pixel size of the sensor, at least 4 additional factors act to significantly limit the performance of LFOCDHM systems, namely, the sample-to-sensor distance, spatial and temporal coherence of the illumination, and finite extent of the image sub-FOV used for the reconstruction. This is not unexpected and has been discussed by other authors see, for example, Refs. [18,27]. However, either only qualitative analyses were presented [32,33], or only one or two of these factors on the imaging resolution have been considered [33][34][35][36]. In these quantitative analyses [34][35][36], the discrete features of the sensor attract more attention, but the other basic parameters, e.g., the sample-to-sensor distance [37], spatial and temporal coherence of the illumination [33], and finite extent of the image sub-FOV [38], are sporadically mentioned in the off-axis/in-line digital holographic microscopy. Thus, the influence of these 5 factors on the imaging resolution of LFOCDHM has not been systematically examined and rigorously explored until now.
In this work, we have conducted a systematical research on the effect of five major factors on imaging resolution of a LFOCDHM system, i.e., the sample-to-sensor distance, spatial and temporal coherence of the illumination, finite size of the equally spaced sensor pixels, and finite extent of the image sub-FOV used for the reconstruction. We derive five transfer function models that account for all these physical effects and their interactions on the imaging resolution of LFOCDHM. We further combine all these effects into a unified transfer function, which is the continued multiplication of the five sub-transfer functions. We examine how these theoretical models can be utilized to predict the theoretical resolution limit of a given LFOCDHM system or provide a useful guide to the selection of different system parameters for the optimization of the imaging resolution when designing a new LFOCDHM system. A series of simulations and experiments are presented to confirm the validity of our theoretical models.

Typical optical setup for LFOCDHM
In the lens-free holographic microscope as depicted in Fig. 1(a), the source can simply be a laser [20,39,40], a LED (an array of LEDs) [41][42][43][44] or even a smartphone screen [17]. The coherent or partially coherent light illuminates the specimen, and then the scattered light and the transmitted light co-propagate in the same direction, finally forming interference fringes on the imaging device. In the ideal case, the sample should be placed on a sensor array which can directly capture the shadows of the objects and avoid the twin-image artifacts. However, due to the existence of protective glass covering the surface of the camera sensor, there is usually always a certain distance between the sample plane and the detector plane (typically 0.3 − 2mm) [22,26,45]. Since the distance is much larger than the wavelength, and the object information (including both amplitude and phase) is encoded into the diffraction patterns, which needs to be computationally reconstructed by phase retrieval and numerical back propagation algorithms.
As illustrated in the schematic diagram Fig. 1(b) of the lens-free holographic microscope, neglecting the noise effect, the achievable resolution of LFOCDHM is determined by the maximum visualized radius R of the diffraction patterns, which refer to the cut-off frequency of the transfer function. This transfer function can be further decomposed into five sub transfer functions, and the least cut-off frequency of the five transfer functions limits the maximum imaging resolution of LFOCDHM. The five sub transfer functions respectively correspond to the impact of the defocus distance, the limited temporal coherence length (the spectral width ∆λ), the spatial coherence length (the diameter of light-emitting area ∆s) of the source, the finite pixel size (∆p), and the finite extent of the image sub-FOV used for the reconstruction (the side length ∆L). The absorption and phase transfer functions resulting from propagation are respectively denoted as AT FP and PT FP. Then the temporal coherence transfer function, the spatial coherence transfer function, pixel size transfer function, the reconstructed region transfer function are severally expressed as TCT F, SCT F, PST F, RRT F. Here, the latter four sub-transfer functions are mutually independent, and together have impacts on the final imaging results. In this subsection, we adopt the weak object approximation to simplify the mathematical formulation and linearize the phase retrieval problem [46,47]. The complex transmittance of a weak object can be represented as

Theoretical analysis of resolution in LFOCDHM
where a (x) is the absorption distribution with a mean value of a 0 , φ (x) is the phase distribution, x represents the two-dimensional coordinate (x,y) in spatial domain. Taking Fourier transform of both sides of Eq. 1, the Fourier spectrum of t (x) can be obtained as where u is the two-dimensional coordinate in frequency domain, δ (u) is the Dirac Delta function, A (u) and Φ (u) respectively represent the Fourier spectrum of the absorption and phase distribution. Before reaching the digital camera, the complex wave-front is propagated over the distance of z 2 in air (the medium of refractive index ≈ 1) with the angular spectrum method [48], which is equivalent to introducing an imaginary part into the transmitted complex wave-front in the Fourier domain: where P (u) = e ikz 2 √ 1−λ 2 |u | 2 represents the effect of defocus. At last, by calculating the convolution between W cam (u) and its complex conjugate W cam (u) = a 0 δ (u) P (u) + A (u) P (u) − ia 0 Φ (u) P (u), we can get the intensity spectrum as: In Eq. 4, we neglect the high order convolution terms between A (u) and Φ (u) to linearize the problem [49]. Thus, the absorption transfer function (AT F p ) and phase transfer function (PT F p ) of LFOCDHM with the defocus distance z 2 can be written as: The transfer functions of AT F p (u) and PT F p (u) with the wavelength 600nm are shown in Fig.  2 for various defocus distances and the response value of them has been normalized to 0 − 1.
The sample-to-sensor distance z 2 varies from 1µm to 3µm. The simulation results of Fig. 2

(a)
Normalized spatial frequency Normalized spatial frequency  Fig. 2. The absorption transfer function AT F p (u) (a) and phase transfer function PT F p (u) (b) for various defocus distances. λ= 600nm, the spatial frequency coordinate is normalized against the resolution limit 1/λ.
show that with the increase in defocus distance, the AT F p (u) decreases earlier and the declining rate of these curves accelerates. Moreover, the increase in defocus distance also introduces higher oscillation frequency with more zero-crossings. The low responses of frequency around these zero-crossing points pose severe difficulties for the information reconstruction at these corresponding frequencies, suggesting that the information at these frequencies can no longer transfer into intensity and such high oscillation should be avoided as much as possible. Thus, for AT F p (u), the smaller defocus distance will benefit for the reconstructed intensity image. However, for phase imaging PT F p (u), Fig. 2(b) shows that the response of frequency around the zero-point is always very low, suggesting the low-frequency phase can hardly transfer into intensity via defocusing. As the defocus distance getting large, the response at low frequencies gradually increases. In other words, large defocus distance is conducive to the recovery of the low-frequency phase information. Nevertheless, the accompanied high oscillation frequency will also introduce a large number of zero-crossing points. Thus, for the reconstruction of phase objects based on single sample-to-sensor distance, the selection of the defocus distance faces a fundamental tradeoff between low-frequency information reconstruction quality and the loss of frequency components. Thus, in general, multiple sample-to-sensor distances are required to construct a synthetic phase transfer function with high responses over a wider range of spatial frequencies: where z i 2 represents the different defocus distances and N tot al is the total number of defocus planes. Under the same simulation conditions (λ= 600nm, p= 300nm, the spatial frequency coordinate is normalized against the resolution limit 1/λ.), the synthesized transfer functions of AT F syn (u) and PT F syn (u) are shown in Fig. 3(a).
The simulation result of Fig. 3(a) shows that the multi-height measurements can significantly reduce the number of zero-crossings by synthesization of transfer function. However, the recovery of the very low frequency (near zero frequency) phase component is still quite difficult. In the practical experiment, due to the cover glass of the sensor, the defocus distance usually exceeds 400µm, and the oscillation frequency of the absorption transfer function AT F P (u) and phase transfer function PT F P (u) is extremely high, as shown in Fig. 3(b). However, such a large distance can effectively reduce the low-response frequencies range, which is beneficial to recover the frequency components near zero-crossing points. Thus, when the defocusing distance reaches the order of several hundred microns, appropriately increasing z 2 can improve the reconstruction quality to some extent. Generally, when the components of the lens-free imaging system such as the light source and the sensor have been predetermined, multi-height measurements can optimize the synthetic transfer functions, which is beneficial for the intensity and phase reconstruction quality. But for single-height measurement, limited by the relatively large defocus distance, the influence of defocus distance on the reconstruction result can be neglected due to the rapid oscillation of the transfer functions. In the following part of this work, all simulations and experiments are carried out with single-height measurement to avoid the influence of multi-height selection on the reconstruction quality.
The synthesized absorption transfer function AT F syn (u) and synthesized phase transfer function PT F syn (u) with various defocus distances (z 2 = 1, 2, 3µm); (b) The absorption transfer function AT F P (u) and phase transfer function PT F P (u) with z 2 = 400µm; AT F syn (u) and PT F syn (u) with various defocus distances (z 2 = 400, 410, 420µm).

Influence of temporal coherence of the illumination on imaging resolution
In this section, we will analyze the influence of temporal coherence on the illumination on imaging resolution, which can be attributed to the temporal coherence transfer function (TCT F).
Here, it is assumed that the temporal coherence is the only factor affecting the reconstruction resolution. Furthermore, in practical experiments, the ideal light source is difficult to obtain, and the LED light source is usually has a certain range of spectral width (for temporal coherence) and also luminous area (for spatial coherence). Supposing that the central wavelength λ, the spectrum width ∆λ, the spectral distribution S λ (λ i ) are the predetermined parameters, and other system parameters are close to ideal values (do not affect the imaging resolution). If we further invoke the paraxial approximations [47], the two transfer functions Eqs. (5) and (6) can be simplified as If the effect of spectral width of the illumination source is further taken into account, the absorption and phase transfer functions of LFOCDHM with the sample-to-sensor distance z 2 and the spectral width ∆λ can be can be represented as: In most cases, the spectral distribution S λ can be approximated by an gaussian function: where the mean value is λ and the standard deviation is ∆λ/6. Here standard deviation ∆λ/6 is assumed to ensure that the normalized intensity of the wavelengths exceeding [λ − ∆λ/2, λ + ∆λ/2] will dip to 0.011 and can be ignored. By incorporating the effect of temporal coherence, the transfer functions can be further expressed as the integrals over the full spectral range: We can find that Eq. 10 is not integrable on real space, which will make this equation difficult to provide an analytical cut-off frequency expression. In addition, to give the theoretical cut-off frequency limit, in consideration of the ideal spectral distribution, we assume that S λ (λ i ) is a rectangular function, and then AT F p+t and PT F p+t will be noted as: Based on Eq. 11, the finite spectral width introduces an additional sinc term to the transfer functions. Here, since the temporal coherence of light source play equally important role in the AT F P (u) and PT F P (u), we use TCT F (u) to represent the overall influence of finite spectral width: Then the temporal coherence transfer functions TCT F (u) for different spectral width ∆λ and various defocus distances are shown in Fig. 4. In Fig. 4(a), under the condition of z 2 = 200µm, λ = 660nm, the spectral width ∆λ varying from 10nm to 30nm, as ∆λ gets wider, the frequency response decreases more rapidly and reaches zero earlier (at so-called the first zero-crossing or the first cut-off frequency). The response of the frequencies above the first cut-off frequency may slightly overshoot, but these frequency components are difficult to be recovered since the response is highly fluctuant. In contrast, for a given defocus distance z 2 , higher temporal coherence (decreasing ∆λ) provides a wider range of the high-response frequency regions and higher cut-off frequency, which is beneficial to improve the imaging resolution. In actual experiments, ∆λ usually is pre-defined parameter while the defocus distance z 2 is flexible, and thus the frequency response curves will be similar to those shown in Fig. 4(b). The first cut-off frequency will gradually decrease as the defocus distance increases when the parameters of the light source are fixed. From Eq. 12, we can deduce that the first cut-off frequency is at |u| = 2 z 2 ∆λ , and the corresponding reconstructed half-pitch resolution is To verify the resolution limit resulting from the finite spectral width ∆λ, we simulate a resolution target under conditions of z 2 = 500µm, λ = 660nm, as shown in Fig. 5. From the line profiles in Fig. 5, we can see that each element of the resolution target can be recovered when the light source is perfectly coherent, but the high-frequency elements gradually become blurred with the increase of ∆λ. More specifically, when ∆λ is 5.2nm, the theoretical half-pitch resolution is q = 0.57µm, which coincides well with the simulation result shown in Fig. 5. For ∆λ = 26nm, the elements of group 3 can be distinguished easily, but elements of group 2 are barely discernable. According to Eq. 13 (the theoretical resolution q = 1.27µm), group 2 of the target should be completely indistinguishable, so the slightly discernible elements may result from the non-zero responses of the transfer function beyond the first cut-off frequency, as shown in Fig. 4. In summary, the temporal coherence of illumination have an impact on the ultimate imaging resolution of the LFOCDHM system. Increasing temporal coherence of the source by using a laser, or insert a narrow band-pass filter in front of the source can directly reduce its influence on the resolution. When the light source of the system is determined (∆λ is a constant value), it should be guaranteed that the object-to-sample distance z 2 must be smaller than 2λ 2 /∆λ (guarantee q is smaller than λ/2) so that the temporal coherence of the source does not influence the final resolution, and the reconstructed resolution will be only affected by the ideal coherent diffraction limit (λ/2). For example, when the spectrum width of illumination source is about 20nm and the ideal half-pitch resolution limit is 0.5µm, the object-to-sample distance z 2 should be smaller than 100µm ideally. However, for imaging phase objects, z 2 should not be too small to guarantee sufficient responses of the phase transfer function, which is crucial to the recovery accuracy of low-frequency phase information. As mentioned earlier, due to the manufacturing technology of sensors, the defocusing distance z 2 cannot go below 300 µm. When the distance z 2 cannot be small enough, we should use a light source with higher temporal coherence (narrower spectral width ∆λ) to guarantee the diffraction-limited imaging resolution.

Influence of spatial coherence of the illumination on imaging resolution
In this section, we will analyze the influence of spatial coherence on the illumination on imaging resolution, which can be attributed to the spatial coherence transfer function (SCT F). In addition to the temporal coherence of the light source, the spatial coherence also affects the reconstructed resolution. Same as before, assuming that the reconstructed resolution is only affected by the spatial coherence of the light source. We also assume that the sample is illuminated by the light emitting from a spatially incoherent delta-correlated light source (any two different points in the source plane are uncorrelated), and the acquired hologram can be interpreted as an incoherent superposition of all partial holograms arising from all light source points. In other words, the influence of the spatial coherence can be modeled as a convolution of the ideal in-line hologram I (x) (arising from the central point source) with a properly resized source intensity distribution S s (x s ) [50].
where x represents the coordinates in the imaging sensor plane, x s are the coordinates in the illumination plane. Without loss of generality, the scaled factor (z 1 /z 2 ) 2 can be neglected. According to Eq. 14, assuming that the illumination source is circular with a diameter of ∆s, the spatial coherence transfer function (SCT F) can be expressed as: The simulation results of the transfer function SCT F (u) for different source sizes and defocus distances are shown in Fig. 6(a) and 6(b). In Fig. 6(a), ∆λ → 0, λ = 660nm, z 1 = 5mm, z 2 = 200µm, ∆s = 3.3, 33, 165µm are given to analyze the resolution limit resulting from the spatial coherence. From the simulation results of Fig. 6(a), the effect of the spatial coherence on the reconstruction resolution will reduce as the illumination area getting smaller. From the curves of SCT F (u) in Fig. 6(a), while ∆s gets larger, the response of the transfer function decrease earlier and reach the first cut-off frequency more rapidly.
In actual experiments, when the illumination source is determined, the diameter of the luminous area (∆s) is unalterable. Under such condition, in order to improve the spatial coherence, we can increase the shrink ratio of z 1 /z 2 to reduce the effective illumination area, alternatively. In our simulations, the system parameters are ∆λ → 0, λ = 660nm, z 1 = 3, 5, 7mm, z 2 = 200µm, ∆s = 33µm, and the frequency response curves are shown in Fig. 6(b). From these curves, we can observe that larger z 1 /z 2 will increase the first cut-off frequency, and thus, improve the reconstruction resolution. Based on Eq. 15, we can derive that the first cut-off frequency is |u| = z 1 z 2 ∆s , and the corresponding reconstructed half-pitch resolution is This reconstruction resolution involves many parameters and factors according to Eq. 16. In Fig. 7, z 1 = 30mm, z 2 = 500µm are given to verify the resolution limit. In Fig. 7, when ∆s gradually increases, the reconstruction resolution will get worse correspondingly. For example, when ∆s = 68µm, the theoretical resolution is 0.57µm, and the corresponding simulation result is 0.66µm which is lower than that of the ideal illumination ∆s → 0. If ∆s further increases to 153µm, the resolution reduced to 1.32µm, which agrees with the theoretical value 1.28µm.
From the above analysis, we know that the spatial coherence may affect the ultimate imaging resolution of the LFOCDHM system, which is associated with the ratio z 2 /z 1 and ∆s. Thus, in the lens-free experimental setups, when the LED is used as a light source, there are several ways to improve the spatial coherence and reduce its effect on imaging resolution. On the one hand, we can insert a small pin-hole in front of the source to reduce the source size. On the other hand, we can reduce to ratio z 2 /z 1 to reduce the effective size of the source. As we mentioned earlier, the object-to-sample distance z 2 cannot be too small, so we can the source-to-sample distance z 1 instead. All these experimental manipulations are to avoid the effect of the poor spatial coherence on the reconstruction resolution, and guarantee the diffraction-limited imaging resolution [q (Eq. 16] is smaller than λ/2). For example, when the diameter ∆s of illumination source is about 200µm and the ideal half-pitch resolution limit is 0.5µm, ratio z 2 /z 1 must be smaller than 1/200 theoretically. However, for actual imaging objects, z 2 is usually larger than 400µm, and thus, to guarantee sufficient responses of the transfer function, z 1 must be larger than 80mm. Consequently, for an established lens-free microscopic imaging system, the effect of spatial coherence can be avoided as far as possible by increasing z 1 .

Influence of sensor pixel size on imaging resolution
In lens-free imaging system, the pixel size is a key factor influencing the achievable spatial resolution. Assuming that the actual pixel size and resolution of the camera respectively are ∆p and m × n, the finest feature to be reconstructed corresponds to the half-pitch resolution ∆p/w, which is w (w 1) times smaller than the actual sampling rate of the camera. The number of pixels of the reconstructed image is M × N. The ideal pixel aliasing can be interpreted as a procedure that the ideal image is first pixel binning and then sub-sampled. Specifically, the pixel binning effect can be modeled as: where I (x) is the ideal image, x is two-dimensional coordinates on camera plane. Thus, in the frequency domain, this process can be represented as: where O bin (u) and O (u) is the Fourier transform of I bin (x) and I (x), respectively. PST F (u) is the transfer function corresponding to the pixel binning, which takes the following form: When u x = ± r x w or u y = ± r y w or w = 1 (r x , r y is a positive integer not greater than w/2 and the frequency has been normalized to −1/2 ∼ 1/2.), PST F will be zero, suggesting that the corresponding spectral information is lost. Thus, the normalized first cut-off frequency will be 1/w. Due to the previous assumptions that the ideal theoretical half-pitch resolution is ∆p/w, the resolution limit after aliasing can be noted as: For the second step, the sampling process is that the ideal images are sampled at uniform intervals (w pixels). One way to model sampling is to multiply I (x) by a sampling function S w (x) equal to a train of impulses w units apart [51]. That is where I Sam (x) is the image after sampling, S w (x) is the two-dimensional sampling function.
In the Fourier space, Eq. 21 can be written as: In discrete numerical calculation, the dimension of the captured image is different from that of the original image, so the sampling process can be written in the form of matrix: and M right is a n × N matrix. Concretely, M le f t = . When I A is the M 2w × M 2w unit matrix, then A 1 and A 2 can be denoted by The process shows that the high-frequency information will be mixed into the low-frequency domain.
To show the information aliasing and spectrum loss resulting from the finite pixel size, the simulation results with the down-sampling factors w = 1, 2, 3, 4 are illustrated in Fig. 8. On the other hand, w can also be regarded as the resolution up-sampling factor for the pixel SR reconstruction algorithm from LR intensity measurements. The line curves of PST F (u) show that when w gradually increases, the more criss-crossed frequency gaps will appear, suggesting the information around these frequencies will exceptionally difficult to be recovered. When w = 2, PST F (u) tends to zero only at the highest frequency (the periphery of the Fourier spectrum). When w > 2, more spectral information at interlaced regions in PST F (u) becomes zero. The lower right of Fig. 8 shows the Fourier spectrum O bin (u) after pixel binning with w = 4, and the red rectangular area ( M w × N w ) has the same dimensional size with the captured image. The whole process shows that the high-frequency information will be mixed into the low-frequency domain within the red rectangle, and the aliasing problem will be more serious when w getting larger. For normal pixel size of the current image sensor (typically 0.8 − 5µm), the pixel aliasing is a key limiting factor directly affect the imaging resolution of the LFOCDHM system. When the resolution of the object to be reconstructed (by pixel SR algorithms [18,26,45,52]) is w times higher than that limited by the original pixel size, the number of the captured raw LR images (theoretical amount of information) will linearly increase with a factor of w 2 [53].

Influence of the finite extent of reconstructed sub-FOV on imaging resolution
As we mentioned in the introduction, one of the most important advantages of the LFOCDHM is the large effective numerical aperture ∼ 1 over a very large FOV because the sample-to-sensor sensor distance is much smaller than the size of the imaging sensor. However, in practice, due to the limited processing capability and memory of the computer, usually each raw image is divided into several subregions for the holographic reconstruction, and the reconstructed sub-images are then stitched together to obtain the whole-FOV image. Due to the limited extent of the selected reconstructed area (assuming that the side length of the sub-FOV is ∆L), some high-angle diffraction patterns corresponding to the high-frequency of the object will not be included in the reconstructed area, leading to the reduction of imaging resolution. We attribute the effect of finite extent of reconstructed sub-FOV on the Fourier spectrum to the transfer function RRT F, and the cut-off frequency of RRT F is |u| = ∆L/2 λ √ z 2 2 +(∆L/2) 2 . Thus, the reconstructed half-pitch resolution is determined by the effective NA of the LFOCDHM system, which can be represented as the ratio between ∆L/2 and z 2 2 + (∆L/2) 2 (as shown in Fig. 1), and the restricted half-pitch resolution is According to Eq. 23, in order to achieve the half-pitch resolution q, the side length of reconstructed sub-FOV should meet the following requirement: In the simulation, we use λ = 600nm, z 2 = 200µm, ∆p = 1µm, and the theoretical half-pitch resolution q = 1, 2, 4µm can be calculated to verify the influence of the reconstructed area on the resolution. In Fig. 9, we can find that when the side length is ∆L 1 = 126µm, the maximum half-pitch resolution is about 1µm. However, when ∆L is getting smaller, the maximum half-pitch The normalized data size The restricted half-pitch resolution Fig. 9. From the first to fifth row: The simulation results with different reconstructed area sizes (∆L 1 = 126µm, ∆L 2 = 61µm, ∆L 3 = 30µm). The last row: Left: The halfpitch-resolution-dependent curve of the reconstructed area size; Right: The relative size of reconstructed region corresponding to different half-pitch resolution. resolution will gradually decrease, e.g., when the side length is ∆L 2 = 61µm, the half-pitch resolution will reduce to 2µm. As shown in Fig. 9, the reconstructed area size almost increases exponentially with the improvement of the half-pitch resolution. Thus, for example, when the sample-to-sensor distance is 400µm, in order to achieve the high imaging resolution close to the diffraction limit (e.g. NA ∼ 0.8), the slide length of the reconstructed sub-FOV should be at least 2845µm, which again brings a big challenge to the computational efficiency and memory requirement (especially when the pixel SR algorithm is used).
Furthermore, for each reconstruction of sub-FOV, only very limited central region can achieve the expected resolution. For the rest part, the region more close to the border will have lower imaging resolution. Thus, to decrease the influence of the finite extent of reconstructed sub-FOV on imaging resolution, in actual experiments, the selection of the reconstructed area faces a fundamental tradeoff between the loss of the high-frequency diffraction and the practicability of the implementation of the reconstruction algorithm. It should be also noted that when pixel SR algorithm is used to achieve an expected sub-pixel resolution, the reconstructed area should be larger than theoretical one calculated by Eq. 24 to guarantee that such a resolution is theoretically achievable.

Comprehensive influence of multiple factors on imaging resolution
Based on the above-mentioned analysis, the comprehensive absorption and phase transfer functions of all above-mentioned factors can be denoted as AT F (u) = AT FP·TCT F ·SCT F ·PST F · RRT F and PT F (u) = PT FP · TCT F · SCT F · PST F · RRT F. Although the frequency response of each transfer function may slightly overshoot for the frequencies exceeds each first cut-off frequency, their contribution to imaging resolution can be neglected because the final imaging resolution is codetermined by multiple parameters, and the overall response value for these high frequencies in AT F (u) and PT F (u) after multiplication of each transfer functions will be quite small. Therefore, the final imaging resolution limit is determined by the minimum of the first cut-off frequencies of these sub-transfer functions. For a given LFOCDHM system where each system parameters are determined, we can calculate the resolution limit governed by each transfer function, Eqs. (13,16,20,23), and then compare them with ideal coherent diffraction limit λ/2 to choose the maximal one as the ultimate theoretical imaging resolution. Note that the pixel SR methods are not considered in above analysis. When the SR methods are considered, the theoretical limit resolution will be determined by the maximal value among Eqs. (13,16,23), and the effective pixel size ∆p/w, λ/2. In this work, we only consider the cases when no pixel SR methods are employed. The results can be easily extended to the cases when pixel SR methods are involved.
For example, considering the situation in the experiments, the sample-to-senor distance is usually 450µm, and the source-to-sample distance is about 10cm. In addition, the illumination source has central wavelength 600nm with the spectral width 10nm and 100 2 π µm 2 luminous area, and the sensor has the pixel size of 1.67µm and imaging area of 6466 × 4615µm 2 . According to Eqs. (13,16,20,23), we can find that when no pixel SR methods are employed, the final resolution will be limited by the pixel size. The reconstructed results will be constrained principally by the spectral width ∆λ when the pixel SR methods are adopted. Thus, in a conventional experimental system, the pixel size is the key limiting factor for the high-resolution object reconstruction, but the developed pixel SR methods can effectively solve this spatial resolution reduction problem. In addition, the spectral width of the source is usually another main limiting factor for the resolution improvement, which is difficult to be solved or alleviated only with the numerical methods.

Optimization of the imaging resolution for a LFOCDHM system
Our theoretical models can also be utilized to optimize the optical design to improve the imaging resolution when designing a LFOCDHM system. It is recommended that the following procedure should be adopted.

During the system construction stage:
1. Choose the light source with the best possible temporal and spatial coherence; 2. For low temporal coherent source such as LED, a narrow band-pass filter can be used to increase the temporal coherence of the source; 3. For low spatial coherent source with a large light-emitting area, a small pin-hole can be inserted in front of the source to increase the spatial coherence of the source; 4. Use an imaging sensor with the smallest possible pixel size to reduce aliasing. During the data acquisition stage: 1. Minimize the sample-to-sensor distance z 2 to reduce the influence of temporal coherence of the source; 2. Maximum the ratio between source-to-sample distance z 1 and sample-to-sensor distance z 2 to reduce the influence of spatial coherence of the source; 3. Minimize the sample-to-sensor distance z 2 to reduce the influence of the finite extent of reconstructed sub-FOV; 4. For imaging phase object, use the multi-height phase retrieval algorithm with large sample-tosensor distances z 2 to guarantee reliably phase recovery, especially for low-frequency components. It should be emphasized that z 1 can only affect the spatial coherence, while z 2 can affect the selection of the size of the reconstructed region, the temporal and spatial coherence. During the data processing stage: 1. Choose the largest possible reconstructed sub-FOV to reduce the influence of the finite extent of reconstructed sub-FOV. 2. Choose the reconstructed sub-FOV to make the targeted object in the center. sample that is mounted on a slide holder, and a CMOS image sensor chip (DMM 27UJ003-ML, the imaging source, Germany) is placed below the sample. To quantify the effect of the abovementioned factors on the reconstruction results, we will respectively change the temporal [ Fig.  10(c)], spatial [ Fig. 10(d)] coherence of the light source, the pixel size of the imaging sensor, and the reconstructed region.

Influence of temporal coherence on imaging resolution
To quantify the spatial resolution alternation due to the above-mentioned factors respectively, we firstly change the temporal coherence of the light source by introducing different optical band-pass filters (spectral bandwidths ∆λ = 20, 30nm) into the experimental system. The partially coherent illumination is provided through a light-emitting diode (LED) which is placed far away (z 1 20cm) from the sample plane to eliminate the effect of the spatial coherence. Figure. 11(a) shows that the raw image directly captured by the camera, and Fig. 11(b) is the reconstructed region which is large enough to avoid its effect on the spatial resolution. The central wavelength of the illumination source is ∼ 520nm, and the resolution target is ∼ 1499µm(z 2 ) away from the sensor. When the spectral width is 20nm, the theoretical half-pitch resolution calculated according to Eq. 13 is 1.936µm, and the actual reconstruction resolution is ∼ 1.953µm, as shown in Figs. 11(c,d,e) which corresponds to the 1st element in group 8 of the resolution target. Similarly, Figs. 11(f,g,h) show that the reconstruction resolution is about 2.461µm (5th element in group 7) with the spectral width ∆λ = 30nm, while the theoretical resolution is around 2.371µm which lies between the 5th element and 6th element in group 7. Thus, the reconstructed results match well with the theoretical value calculated by Eq. 13. Note that in our experiment, we directly back propagate the image from the sensor plane to the object plane with the angular spectrum method, and no phase retrieval procedure is used to eliminate the twin-image artifacts in the background of the reconstructed images.

Influence of spatial coherence on imaging resolution
Next, we change the spatial coherence of the source by inserting the different pin-holes (the diameter of the pin-holes ∆s = 1.0, 1.3mm) to verify the correctness of Eq. 16. The luminous area of a LED is usually in the several hundreds of microns order of magnitude, thus in order to show the influence of spatial coherence on resolution more intuitively, a diffuser is placed between the source and pin-hole to ensure that the luminous area is the size of the pin hole. The center wavelength λ is ∼ 620nm and the sample-to-sensor distance is z 2 = 465µm. Figure 12 shows the reconstruction results which are recovered by back-propagating the captured image to the object plane with angular spectrum method. When ∆s = 1mm, the reconstructed results with different the source-to-sample distances z 1 are shown in Figs. 12(b1-b3). When z 1 is 4cm, the theoretical half-pitch resolution is 5.81µm, and the actual reconstructed result is ∼ 6.20µm, corresponding to the 3rd element of group 6. Since the 4th element in group 6 corresponds to the half-pitch resolution of 5.52µm, it can hardly be distinguished, as shown in Fig. 12(b1). In addition, when ∆s = 1.3mm, the experimental results are also agreed well with the theoretical values, as shown in Figs. 12(d1-d3). The line profiles along different resolution elements are respectively illustrated in Figs. 12(f1-f3). On the other hand, when z 1 is fixed, a smaller ∆s provides higher resolution. Thus, in the actual experiments, we can simply increase the source-to-sample distance z 1 to reduce the influence of spatial coherence, which is equivalent to reducing ∆s.

Influence of pixel size on imaging resolution
In actual experiments, the pixel size of the image sensor is a key factor directly limiting the achievable spatial resolution. Although increasing the pixel resolution and reducing the pixel size has already become the major trend in consumer electronics, the minimum pixel size of the commercially available imaging sensor is around 0.8µm, which is much larger than the coherent diffraction resolution limit. In order to give an intuitive comparison of the influence of pixel size on imaging resolution, we use the cameras with the different pixel sizes (1.67µm, 2.2µm, 3.75µm, 4.4µm) to record the diffraction patterns. Figure 13(a1-d1) show the reconstructed area, and the reconstructed results are illustrated in Figs. 13(a2-d2). The wavelength of source used in the system is 620nm while the source-to-sample distance z 1 is large enough (usually z 1 20cm) to exclude the influence of spatial coherence, and the sample-to-sensor distance z 2 is 465µm. The line profiles corresponding to the smallest resolvable elements are shown in Figs. 13(a3-d3), suggesting that the experimental results are in agreement with the theoretical values limited by pixel sizes.  p ∆ Fig. 13. The effect of pixel size on the spatial resolution. The directly reconstructed results with different pixel sizes 1.67µm (a1-a3), 2.2µm (b1-b3), 3.75µm (c1-c3), 4.4µm (d1-d3).

Influence of the reconstructed region on imaging resolution
In this experiment, the center wavelength of the light source λ is 620nm, and the sample-to-sensor distance z 2 is 547µm. According to Eq. 23, the size of the selected area for the reconstruction will affect the final imaging resolution. Figure 14(a) gives the whole captured image, and the pink rectangular area (length of side 198µm) was extracted for the holographic reconstruction. The result is shown in Fig. 14(b), and corresponding line profiles are shown in Fig. 14(f1), suggesting that the resolution is at least 1.74µm. When we select another region nearby with the same size, we can obtain the reconstruction result shown Fig. 14(c). If we reduce the size of the reconstructed region to the the yellow boxed area (length of side 110µm) in Figs. 14(b-c), the results shown in Figs. 14(d-e) indicate that the reconstructed resolution will decrease significantly. The line profiles in Figs. 14(g1-g2) manifest that the resolution is reduced to only 3.10µm (3rd element in group 7), which is again in accordance with the theoretical prediction.
In addition to the size of reconstructed sub-FOV, the location of the object to be measured in the selected reconstructed sub-FOV will also affect the reconstructed resolution. As shown in Figs. 14(b-c), we can find that the 2rd element in group 8 can be distinguishable in Fig. 14(b) but not in Fig. 14(c). Thus, in order to ensure the expected high reconstruction resolution, the reconstructed sub-FOV should not be too small and the objects to be reconstructed are supposed to be close to the limited central region for each reconstructed sub-FOV. Meanwhile, the object-to-sensor distance z 2 should not be too large according to Eq. 23. Otherwise, the reconstructed region needs to be expanded accordingly to ensure the reconstruction resolution, which may significantly prolong the processing time and create difficulties in practical implementation of the reconstruction algorithm.
Number of the pixels a.u.

Conclusions and Discussions
In this work, we have conducted a systematical research on the effect of five major factors on imaging resolution of a LFOCDHM system, i.e., the sample-to-sensor distance, spatial and temporal coherence of the illumination, finite size of the equally spaced sensor pixels, and finite extent of the image sub-FOV used for the reconstruction. From the above analysis and experiments, it can be deduced that the most limiting factor restricting the imaging resolution of LFOCDHM is the sensor pixel size because the side-effect arising from other experimental factors is relatively easy to handle. For example, using a laser as an ideal temporally coherent light source, increasing source-to-sample distance to obtain a close to the ideal spatially coherent source. To reduce the effective size of the imaging sensor, pixel SR algorithms should be used. But even so, using an imaging sensor with smaller pixel size can still improve the quality of the SR reconstructions. Specifically, assuming that the expected resolution to be reconstructed is around 1µm, and the up-sampling factor w will be different for various pixel sizes. When the pixel size is much closer to the desired resolution, the w will be smaller, so less information for the reconstruction is required. When a higher up-sampling factor w is required (for large pixel size), more criss-crossed frequency gaps will appear, which can never be recovered even pixel SR reconstruction algorithms are used. Thus, for LFOCDHM techniques, a smaller pixel size is very helpfully to achieve higher resolution and need less information to reach the expected super-resolved resolution. On the other hand, using LED as the light source can make the system more compact, portable, low-cost. But the coherence length of the LED will also affect the reconstructed resolution. According to Eqs. (13,16), increasing z 1 and decreasing z 2 can effectively improve the coherence of light sources and improve the imaging resolution. Furthermore, decreasing z 2 can reduce the reconstructed area according to Eq. 24 when the desired resolution is determined.
The analysis of these parameters based on transfer functions has given the quantitative resolution limit determined by the minimum first cut-off frequency of these transfer functions. According to the quantitative relationship, the preliminary estimates of the ultimate resolution are available after employing the SR methods. Thus, the derived theoretical models can provide useful guidance to choosing the appropriate system parameters to obtain higher imaging resolution. To verify the validity of each theoretical model, we have used the variable-controlling method and only changed only one or two parameters during each experiment. The resolution target has been used to quantify the imaging resolution. The experimental results have confirmed the validity of our theoretical models.
Finally, it should be mentioned that, although in this work we have demonstrated how our theoretical models can be utilized to improve the imaging resolution by optimizing the optical design of a LFOCDHM system, it should also be possible to counteract the effects of these imperfect system parameters through certain computational approaches. Based on the transfer functions we have derived, we can easily establish the forward image formation model (from object to image) for a given LFOCDHM system. Then certain mathematical algorithm should be adopted to recover the ideal object information from the actual measurement, i.e., to solve the corresponding inverse problem. In future work, we will make effort to address the resolution reduction associated with these factors and compensate for their adverse impact through post-processing algorithms.