Integral Imaging-Based Light Field Display System With Optimum Voxel Space

Integral imaging (InIm) based light field display (LFD) is still faced with challenges in the complexity of light field data acquisition and generation and unsatisfactory display effects. This paper presents an InIm-based LFD system with an optimum voxel space. The study's main innovation lies in two aspects: Firstly, it reveals the intrinsic voxel spatial distribution of the InIm-based LFD and analyzes how the voxel space affects the display performances such as spatial resolution, depth of field, and smoothness of parallax. Secondly, it proposes a method to generate an elemental image array (EIA) from a pair of RGB and depth (RGBD) images based on the optimally selected voxel space. In the experiments, we tested the display performances of the voxels on different depth planes and obtained results consistent with the theoretical analyses. We also experimented with a computer 3D model and a real-world scene on two InIm-based LFD prototypes working in different modes, one in real mode and the other in virtual mode, and obtained favorable 3D display effects. The proposed system has a simple and compact hardware structure and has shown flexibility and scalability in the aspect of light field data acquisition. Also, it applies to various 3D scenes, either virtual models or realistic objects. We expect the proposed system to help the practical application of InIm-based LFD technology.


Integral Imaging-Based Light Field Display System
With Optimum Voxel Space Ze-Sheng Liu , Da-Hai Li , and Huan Deng Abstract-Integral imaging (InIm) based light field display (LFD) is still faced with challenges in the complexity of light field data acquisition and generation and unsatisfactory display effects.This paper presents an InIm-based LFD system with an optimum voxel space.The study's main innovation lies in two aspects: Firstly, it reveals the intrinsic voxel spatial distribution of the InIm-based LFD and analyzes how the voxel space affects the display performances such as spatial resolution, depth of field, and smoothness of parallax.Secondly, it proposes a method to generate an elemental image array (EIA) from a pair of RGB and depth (RGBD) images based on the optimally selected voxel space.In the experiments, we tested the display performances of the voxels on different depth planes and obtained results consistent with the theoretical analyses.We also experimented with a computer 3D model and a real-world scene on two InIm-based LFD prototypes working in different modes, one in real mode and the other in virtual mode, and obtained favorable 3D display effects.The proposed system has a simple and compact hardware structure and has shown flexibility and scalability in the aspect of light field data acquisition.Also, it applies to various 3D scenes, either virtual models or realistic objects.We expect the proposed system to help the practical application of InIm-based LFD technology.

I. INTRODUCTION
T HE real world around us can be described as the light field of a three-dimensional (3D) space.3D display technologies based on light field theories reproduce vivid 3D images by reconstructing the light ray distributions of 3D scenes [1], [2], [3].Among them, integral imaging (InIm) based light field display (LFD) attracts great attention and is considered as one of the most promising true 3D display techniques due to its advantages of compact form factor and viewing comfort [4], [5], [6].Generally, InIm-based LFD comprises two stages: light field acquisition and light field reconstruction.Research in both areas has made significant advancements over the past few decades.With respect to the light field acquisitions, traditional optical methods primarily use a camera array [7], [8], [9] or light field camera [10], [11] to capture the light field of 3D scenes directly.The authors are with the College of Electronics and Information Engineering, Sichuan University, Chengdu 610065, China (e-mail: shine_lzs@outlook.com;lidahai@scu.edu.cn;huandeng@scu.edu.cn).
Digital Object Identifier 10.1109/JPHOT.2024.3355444 Camera array-based approaches have advantages in terms of high resolution and real-time pickup but are still limited by the bulky structures and difficulties in synchronizations and calibrations.Light field camera-based acquisition system becomes much more compact and portable by inserting a micro-lens array (MLA) between the image sensor and the primary lens of a conventional camera.However, the spatial resolution of recorded sub-images is significantly reduced with the increased number of micro-lenses.Computer-generated integral imaging (CGII) approaches [12], [13], [14], [15] build the geometric models for 3D objects and the viewpoints of LFD devices in computers and render multi-view images using ray tracing methods.CGII can effectively suppress image crosstalk and avoid lens aberrations, making it a popular light field acquisition scheme.However, it has to generate dense perspective views for specific viewpoints, which results in heavy computation.Some researchers have introduced a GPU-based parallel computing scheme [16] and a lookup table (LUT) based scheme [17] to accelerate the generation of elemental images in CGII.Overall, CGII relies on the virtual models built in the computer to collect the light field information.Thus, it is not suitable for handling real-world 3D scenes.Besides, neither CGII nor conventional optical acquisition methods can get rid of the dependence on the structures and specifications of LFD devices, which limits the possibility of collecting image contents through extensive channels, just like the content acquisitions for ordinary 2D displays.
We have noticed that the LFD device has its inherent voxel space properties, which indicates that the reconstruction space for 3D images is independent of the realistic spatial size and position of the 3D scenes.Therefore, 3D images can be reconstructed by restoring the texture and depth data of the 3D scenes to the voxel space of the LFD device.In the 3D sensing area, there have been many technologies [18], [19] and products [20], [21] to collect the 3D data of real-world 3D objects.Also, depth estimation methods based on deep learning algorithms [22], [23] are reported to estimate the depth data from a monocular image.These technologies provide many options for light field acquisitions.By applying the 3D data acquired from various techniques to the InIm-based LFD, the light field acquisition process can be decoupled from light field reconstruction, and either virtual or real-world 3D scenes can be easily reconstructed.
On the other hand, the display performance of InIm-based LFD is still not enough to meet people's expectations of highquality 3D displays.High spatial resolution, high visual presence, and smooth motion parallax are the main goals that people always pursue.However, the finite pixel density of display panels restricts the amount of information to be displayed and results in a trade-off between the spatial and the angular resolution of the reconstructed 3D images [24].Even though sustained efforts are devoted to improving the spatial resolutions and viewing angles [25], [26], because of the complex volumetric properties of 3D images [27], these methods increase the upper limit of the display performance to a certain extent but still cannot break through the constraint of the characteristic formula [24].Thus, it is of practical importance to make a reasonable design for the spatial range of reconstructed 3D images to avoid exceeding or wasting the total performance amount of the display device.
This paper presents an InIm-based LFD system with optimum voxel space.From the perspective of voxel reconstruction, we revealed the intrinsic voxel spatial distributions of the InIm-based LFDs and discussed the relationships between the voxel spatial distributions and the system's display performances, such as the spatial resolution, depth of field, and parallax smoothness.Then, we proposed a method to generate EIA from a pair of RGB and depth (RGBD) images based on the optimally selected voxel space.The proposed system realizes the decoupling of light field acquisition and reconstruction processes and allows light field acquisition, EIA generation, and 3D display for both virtual and real-world 3D scenes with simple and compact hardware configurations.The EIA generation process fully considers the display performances of the InIm-based LFD and ensures it works on the best status.

A. Configuration of the Proposed InIm-Based LFD System
The schematic diagram of the system configurations is shown in Fig. 1.The proposed system consists of three units: a light field acquisition unit, an image processing unit, and a 3D display unit.The light field acquisition unit uses a commercial depth camera to capture a real-world 3D scene and output a pair of RGBD images.3D modeling software such as 3ds Max also supports rendering RGBD image pairs for the computer 3D model.The image processing unit is the core of the proposed system.It processes the RGBD image pair to match the optimally selected voxel space and generates an EIA that fulfills the requirements

B. Voxel Space and Its Optimum Selection
For the InIm-based LFD system, a voxel is formed by the integration of the light rays in the light field space, where the light rays are emitted from a set of homologous pixels on the display panel and refracted by the micro-lenses of the MLA.For the sake of analysis, the pixels are treated as point light sources, and the micro-lenses are considered as thin lenses.Therefore, the spatial position of each voxel's geometric center is determined by the intersection of several light rays called reconstruction light rays (RLRs).Each RLR is emitted from a pixel and passes through the optical center of a micro-lens.Though voxels in different spatial positions vary in size and shape, their geometric centers are distributed regularly on several parallel depth planes because of the rasterized and uniform arrangement of the pixels and micro-lenses.According to the geometric relationships, the complete voxel spatial distributions of an InIm-based LFD are uniquely determined by its hardware parameters, as shown in Fig. 2(a).A set of characteristic parameters of the voxel space can be deduced as follows.
Take the horizontal case as an example.Assuming each microlens covers R pixels, then there will be R RLRs propagating in the light field space with a uniform angular spacing, Δθ, which can be expressed as: Authorized licensed use limited to the terms of the applicable license agreement with IEEE.Restrictions apply.where p d is the pixel size of the 2D display panel, and g is the gap between the 2D display panel and the principal plane of the MLA.The number, N dp , of the depth planes where the voxels are distributed is given by: Let the kth depth plane be DP(k) where 1 ≤ k ≤ N dp , then the distance of DP(k) from the MLA, denoted as z(k), can be calculated by: ( The gap between two adjacent depth planes is: The spacing between adjacent voxels, D v (k), on DP(k), can be deduced as: where p represents the pitch of the micro-lens.( 1)-( 5) describe the characteristic parameters of an InIm-based LFD.They apply to the thick lens array model as well, as shown in Fig. 2(b), where g is the distance from the 2D display panel to the first principal plane of the MLA.Note that Fig. 2 exhibits only the voxel space in the real image space (z(k) > 0), and it is apparent that the extensions of RLRs also intersect each other and form a similar voxel space behind the 2D display panel, which corresponds to virtual image space for z(k) < 0. Fig. 3 shows the complete voxel space of the InIm-based LFD system, where the real and the virtual image spaces are symmetric about the MLA.The definition of 3D images and visual comfort are the intuitive visual metrics for a 3D display.For the InIm-based LFD, the definition of reconstructed 3D images can be measured in their depth of field and spatial resolution.While visual comfort refers typically to the smoothness that a viewer perceives the parallax images and the human eye's accommodation response to the 3D images, both of them are related to the number of RLRs that form a voxel.In addition, the image quality that the human eye perceives could also be affected by texture features, especially the richness of high-frequency details.The requirement of 3D image reconstruction ability for the high-frequency content differs from that of the low-frequency content [28].According to the characteristic parameters described above, features of voxels vary on different depth planes and result in significant differences in the 3D images reconstructed on them.Therefore, a suitable depth range should be selected from the entire voxel space to achieve high-performance reconstruction for the 3D images.
As seen in Fig. 2, some voxels are missing on the depth planes near the MLA, so these depth planes should be discarded.Moreover, the positions of some voxels have exceeded the lateral size of the display panel.Light rays emitted from those voxels can hardly be perceived by the viewers located within the effective viewing angle of the InIm-based LFD.Thus, the lateral range of each depth plane can be clipped to the width of the 2D display panel.In this case, the number of voxels on the depth plane DP(k) can be expressed as: where w d is the width of the display panel.
The spatial resolution of voxels mainly relates to their quantities and sizes.From ( 5) and ( 6), it comes out that given a pixel size and a micro-lens pitch, the number of voxels on each depth plane is determined.The voxel sizes depend on the position of the central depth plane (CDP), which can be deduced from the Gauss's formula: where f denotes the focal length of the micro-lens.Voxel size on CDP can be represented by the image spot of a pixel through the MLA.Ignoring the aberrations of the micro-lens, voxels on CDP should always have the smallest lateral sizes and no overlaps between each other.Thus, one can adjust the position of CDP by modifying the gap g to achieve the highest voxel spatial resolution around a specific depth plane.On the depth planes far away from CDP, the voxel size increases with depth, z, from the MLA, and adjacent voxels start to overlap each other.The voxel size, S v , in such conditions, can be deduced from the geometric relationships shown in Fig. 4(a): Following the idea of the Rayleigh criterion, the degree of voxel overlaps can be used as the criterion of the expressible depth range [29], the overlap degree, β, between two adjacent voxels can be expressed as: where S v and D v are the voxel size and voxel spacing shown in Fig. 4(b), respectively.Setting a suitable β, the depth of two edge planes, denoted by z min and z max , can be determined.Then, the total expressible depth range, also named as the depth of field, of the reconstructed 3D images can be expressed as: Note that the Rayleigh criterion is an objective and relatively strict condition.However, the tolerance of the human visual system (HVS) to the image blurs is much higher than that of an optical system [30], so in most practical applications, the actual threshold of the overlap degree is relatively larger.The contrast sensitivity function (CSF) model [31] of the human visual system (HVS) can be used to quantitatively evaluate and experimentally measure the threshold β.
Smooth motion parallax plays an important role in relieving the convergence-accommodation conflict (VAC) [32], [33], [34].Within a continuous viewing angle, better parallax smoothness lies in more RLRs that form a voxel.As shown in Fig. 2, the correspondence between a voxel and a set of homologous pixels is given by: where Vx and HomoPx stand for the set of voxels and corresponding homologous pixels, respectively, and F is the mapping function between them.For an arbitrary element in set Vx, the number of corresponding elements in set HomoPx represents the number of RLRs.Therefore, dense viewing points and smooth motion parallax can be achieved by reconstructing the voxels with more RLRs.Performance factors such as voxel missing, spatial resolution, depth of field, and parallax smoothness can be used to quantitatively evaluate and select an optimum voxel space.However, these factors are interrelated and mutually restrained, in practical applications, there should be a trade-off among them according to the requirements of display performances.Fig. 5 figuratively depicts the optimum selections of the voxel space, where the depth planes filled in green are optimum for reconstructing the 3D images.

C. Generation of EIA With Optimum Voxel Space
The generation of EIA with optimum voxel space is a process of restoring the captured texture and depth data to the optimally selected voxel space through depth converting and then mapping the color of each voxel to its corresponding homologous pixels.Fig. 6 shows a flow chart of the proposed EIA generation algorithm, which mainly includes three steps: (i) depth segmentation, (ii) texture mapping and resampling, and (iii) voxel-pixel mapping.
i) The entire depth of the 3D scene is converted to the depth range of the selected voxel space and split into several depth segments in correspondence with the depth planes of the voxel space.ii) The RGB image is split into pieces and attached to the depth segments, and the resolution of each piece is adjusted to the same as the voxel numbers on its corresponding depth plane through up-sampling or down-sampling.iii) The RGB slices are projected onto the EIA plane using the function F defined in (11).The EIA may contain holes because the RGBD image pair provides perspective from only a single viewpoint, which leads to the degradation of reconstructed 3D images.Conventional hole-filling algorithms based on neighborhood interpolation [35] can alleviate this issue to some content, but they may cause image blurs, especially when dealing with large depthof-field images.In our method, the holes are filled by extending the texture slices with specific pixels.Since the extension areas can be precisely calculated from the relationships between the depth plane and the parallax described in pixels, holes in the EIA can be well-filled.

III. EXPERIMENTS AND RESULTS
A prototype was developed to verify the proposed methods.Configurations of the prototype are shown in Fig. 7, where a commercial depth camera is used as the light field acquisition unit, and a pair of artwares were placed as the 3D objects.The LFD consists of a high-resolution cell phone screen and a microlens array, which works in either real mode or virtual mode by assigning a specific gap between the 2D display panel and the MLA.In the experiment, A pair of RGBD images were captured by the depth camera, and an EIA with optimum voxel space was generated by the proposed algorithm and loaded into the cell phone.A digital camera was used to shoot the reconstructed 3D images from different perspectives.Detailed parameters of the prototype are listed in Table I.

TABLE I PARAMETERS AND SPECIFICATIONS OF THE INIM-BASED LFD PROTOTYPE
When the LFD device is working in real mode, the depth value and voxel numbers of each depth plane are shown in Fig. 8(a), where the first 14 of 29 depth planes are discarded because some voxels are missing on them.Also, the last two are discarded because of the few voxels and the dramatically expanded depth values.For simplicity, the remaining depth planes, from DP (15) and DP (27), are renumbered from DP (1) and DP (13).The CDP, calculated to be 18.85 mm ahead of the MLA from (7), is set between DP (9) and DP (10).
The characteristics of an optimum voxel space with consideration of the degree of voxel overlaps and the number of RLRs are shown in Fig. 8(b), where the purple curve represents the overlaps between adjacent voxels.The heights of the stacked bars denote the voxel numbers on each depth plane, where the colors of different segments represent the numbers of RLRs that   2), (b) DP( 7), (c) DP (10), and (d) DP (13).form a voxel.Take DP as an example.There are a total of 1300 voxels on this plane.Among them, 1260 voxels are formed with three RLRs, 20 voxels are formed with two RLRs, and the remaining 20 are formed with just one RLR.As shown in Fig. 8(b), the proportions of voxels with more RLRs increase significantly with the rise of depth plane number, which contributes to the parallax smoothness.
To verify the spatial resolutions of images reconstructed on different depth planes, a planar model of the 1951 USAF resolution test chart was reconstructed from DP(1) to DP (13) in turn.A digital camera was used to shoot the reconstructed images and the smallest discernible targets, and parts of them are shown in Fig. 9.The corresponding line widths of those targets were treated as the measured voxel sizes.In comparison with the theoretical voxel sizes calculated from (8), the two agree well with each other, as shown in Fig. 10, which demonstrates the validity of the proposed voxel space model.
A replicate experiment was performed by substituting the 1951 USAF resolution test chart with a real bookshelf image.The reconstructed images on DP(2), DP (7), DP (10), and DP (13) are shown in Fig. 11.It can be seen that more high-frequency  2), (b) DP( 7), (c) DP( 10) and (d) DP (13).details can be distinguished from the depth planes closer to the CDP, e.g., DP (7), and DP (10), and the reconstructed images on further depth planes become blurry, e.g., DP(2) and DP (13).Only a few low-frequency image content can be distinguished.This experiment demonstrates that, for an LFD with specific structure parameters, the image reconstruction ability on each depth plane can be measured in either subjective or objective ways.In practical applications, we can flexibly select the most suitable voxel space, according to the richness of its high-frequency image contents, to generate the EIA and obtain satisfactory image clarity.
Two sets of 3D scenes, one is a computational 3D model created in the software 3ds Max, and the other is a pair of real-world toys, were used to verify the display effects for the reconstructed 3D images.For the LFD device working in real mode, an optimum voxel space consisting of DP(1) to DP (13) was selected to reconstruct the 3D images, and the 3D display results are shown in Fig. 12.A ruler was attached to the MLA to demonstrate the motion parallaxes between different perspective views.Seen from the partially magnified details of the left, middle, and right views, when the viewpoints shift from left to right, the 3D images, for instance, the symbol "M" on Mario's hat and the eye of the green dinosaur, move toward the left.It shows that the reconstructed 3D images have a stereoscopic depth that protrudes from the screen and a smooth motion parallax.Also, the image details within different perspective views can be clearly distinguished.The experimental results demonstrate that the proposed system can reconstruct 3D images in high quality.
Similar display experiments were implemented on the LFD prototype working in the virtual mode, where the CDP is 5.08 mm inside the screen, and the depth planes DP(−1) to DP(−12) are selected as the optimum voxel space.Perspective images captured by the digital camera are shown in Fig. 13.The highqualified 3D images demonstrate that the proposed system works fine in virtual mode as well.

IV. CONCLUSION
In summary, we proposed an InIm-based LFD system with optimum voxel space.We revealed the intrinsic voxel spatial distribution of the InIm-based LFD and analyzed the relationships between voxel space and display performances.Then, we proposed a method to generate EIA from a generally available RGBD image pair based on the optimally selected voxel space.Display experiments proved the validity of our methods and demonstrated that the proposed system applies to the capture and display of both virtual and real-world 3D scenes.Thanks to its simple and compact hardware configuration and extensive potential in light field data acquisitions, we expect the proposed system to help the practical application of InIm-based LFD technology.

Manuscript received 28
November 2023; revised 12 January 2024; accepted 15 January 2024.Date of publication 18 January 2024; date of current version 9 February 2024.This work was supported in part by the National Key Research and Development Program of China under Grant 2022YFB3606600, and in part by the National Natural Science Foundation of China under Grant U20A20215, Grant 61875142, and Grant 62275179.(Corresponding author: Da-Hai Li.)

Fig. 2 .
Fig. 2. Voxel spatial distributions of an InIm-based LFD for (a) thin lens array and (b) thick lens array model.

Fig. 4 .
Fig. 4. Schematic diagram of (a) the voxel size and (b) the depth of field.

Fig. 8 .
Fig. 8. Optimum voxel space with consideration of (a) voxel numbers and depth values, and (b) voxel overlap degrees and voxels with various RLRs.

Fig. 12 .
Fig. 12. 3D display effects of both computational 3D model and real-world 3D scene on the InIm-based LFD prototype working in real mode.(a) RGB image, (b) depth image, (c) left view, (d) middle view, and (e) right view.

Fig. 13 .
Fig. 13.3D display effects of both computational 3D model and real-world 3D scene on the InIm-based LFD prototype working in virtual mode.(a) left view, (b) middle view, and (c) right view.