Shadowless Projection Mapping using Retrotransmissive Optics

This paper presents a shadowless projection mapping system for interactive applications in which a target surface is frequently occluded from a projector with a user's body. We propose a delay-free optical solution for this critical problem. Specifically, as the primary technical contribution, we apply a large format retrotransmissive plate to project images onto the target surface from wide viewing angles. We also tackle technical issues unique to the proposed shadowless principle. First, the retrotransmissive optics inevitably suffer from stray light, which leads to significant contrast degradation of the projected result. We propose to block the stray light by covering the retrotransmissive plate with a spatial mask. Because the mask reduces not only the stray light but the achievable luminance of the projected result, we develop a computational algorithm that determines the shape of the mask to balance the image quality. Second, we propose a touch sensing technique by leveraging the optically bidirectional property of the retrotransmissive plate to support interaction between the user and the projected contents on the target object. We implement a proof-of-concept prototype and validate the above-mentioned techniques through experiments.


INTRODUCTION
Projection mapping (PM), a spatial augmented reality (AR) approach, seamlessly merges physical and virtual worlds using projected imagery [8]. Unlike video see-through and optical see-through AR, PM allows multiple users to see augmentations on a physical surface without requiring them to wear or hold any display devices. When combined with user action sensing, PM allows users to interact with projected images [52]. Researchers have found a high applicability of such interactive PM for a wide range of fields, including but not limited to medicine [43], teleconferencing [19,50,54], museum guides [7,57], makeup [6,59], object searches [21,28,38,53], product design [9,36,39,64], urban planning [67], and artwork creation [4,12,56]. However, the immersive user experience is easily degraded by a cast shadow [63], which is a critical technical limitation unique to PM. For instance, when a user touches a projected image on a physical object, a shadow of the user's hand is cast on the object, in which the projected image is no longer visible.
In theory, the cast shadow does not occur when the apparent size of the aperture of a projector viewed from a projection surface is larger than that of an occluder. The primary conventional solution is a synthetic aperture technique, in which a multi-projection system is used to virtually realize a large aperture projector [20,22,23,27,41,45,62,65,66]. Multiple projectors are spatially distributed such that they project images from a wide range of viewing angles onto a target projection surface. When an area of a projection surface is occluded from a projector by a user's interaction, these methods select another projector that is visible from the occluded area and project a compensation image from the selected projector. The compensation image needs to be carefully computed to avoid geometric and photometric artifacts such as overlaps or seams between images from different projectors and color inconsistencies across them. However, because there is inevitable delay in the compensation process from the shadow occurrence to the compensating projection, the artifacts occur on the surface when the occluder moves by the user's interaction.
This paper aims to realize shadowless PM without relying on the conventional synthetic aperture approach. Instead, we adopt an optical approach that does not require any complex computations for the shadow removal. The prime technical contribution is to use a large format retrotransmissive plate to project images onto a surface from wide viewing angles. The retrotransmissive plate is an optical element that has been primarily used in three-dimensional aerial image displays [48]. It collects the light rays, which are emitted from a point in a space, in the plane-symmetrical position with respect to it. In this paper, we build our shadowless PM system as follows (Figure 1). First, we prepare a white diffuse object whose shape is plane symmetry of a projection target. Hereinafter, we call this object a proxy object. We place the target and proxy objects in plane symmetry with respect to the retrotransmissive plate. We then project an image onto the proxy object from normal projectors. Projected light incident on a surface point is diffusely reflected, and the reflected light rays fall on the retrotransmissive plate. After traveling through the plate, the rays converge to a point on the target object that corresponds with the projected point on the proxy object. Consequently, the appearance of the proxy object is duplicated on the target object's surface. When we use a retrotransmissive plate whose size is large enough relative to an occluder, shadowless PM is achieved without requiring the shadow removal computations used in the conventional synthetic aperture approaches. Note that the we do not use the word "shadowless" to mean shadow-free or non-shadow where shadows are completely removed from the projected result. Instead, we design our system such that natural penumbra or soft shadows occur around the user's touch areas.
This paper also tackles technical issues unique to the proposed shadowless principle. First, the naïve setup described above suffers from stray light, which significantly decreases the contrast of the projected results. To alleviate the contrast degradation, we propose to block the stray light by covering the retrotransmissive plate with a spatial mask. Although the mask improves the contrast of the projected result, it simultaneously reduces the amount of projection rays incident on the target object, and thus, restricts the achievable luminance of the projected result. Therefore, we optimize the shape of the mask to balance the image quality based on our computational model of the retrotransmissive optics. Second, we propose a user sensing technique using the same optical framework to support interaction between the user and the projected contents on the target object. Specifically, we measure a user's touch position on the target surface by leveraging the optically bidirectional property of the retrotransmissive optics. We illuminate the target object using infrared (IR) ambient light. The reflected light rays converge to the corresponding point on the proxy object after traveling through the retrotransmissive plate. Therefore, a user's touch on the target object changes its appearance in the IR spectrum, which can be observed on the proxy object. We detect the touch position from the appearance of the proxy object captured by IR cameras. We implement a proof-of-concept prototype and validate the above-mentioned techniques through experiments.
Our primary contributions are that we: • Introduce a shadowless PM system that optically removes shadows on a projection target using a large format retrotransmissive plate, • Develop a computational algorithm optimizing the shape of a spatial mask for reducing the stray light in the retrotransmissive plate to maximize the image quality of the PM result, • Develop a user touch sensing technique by leveraging the optical duality of the retrotransmissive plate, and • Implement a prototype that demonstrates shadowless PM with improved image quality and user touch detection.
Overview of limitations: A projection target in our PM system is limited to a relatively small object to avoid an impractically large space behind the retrotransmissive optics for placing the proxy object.
In addition, we assume that the target is fixed in our system, and thus, it is limited to a static object. Despite the constraints of the target object, our system supports various potential application fields of PM such as product design [9,49], diorama [55], preoperative planning [37], education [12], and museum guides [2]. There is another limitation regarding displayed image quality. In principle, the current retrotransmissive devices cannot make a point light source converge to a single point at the plane-symmetrical positions, and thus, the projected result becomes blurred. In this work, we demonstrate that the blur can be alleviated by applying a defocus blur compensation technique [25] that computationally corrects projection images using a convolutional neural network (CNN).

RELATED WORK 2.1 Shadowless PM
The simplest shadowless PM framework is a rear-projection system where a projector is installed behind a semi-transparent projection surface, and thus, any occluders are not inserted between the projector and the surface. Thanks to the shadowless advantage, rear-projection mechanisms have been widely applied in interactive systems where users interact with projected graphics using touch interfaces [5,14,16]. However, the shape of a projection surface is limited. First, the surface needs to be an open shape so that projected rays can reach the surface from a projector. In addition, the shape needs to be rather simple so that the projector is visible from all the surface points. As another disadvantage, semi-transparent materials that can be used as projection surfaces are limited, which leads to limited tactile sensations that a user perceives in a touch interaction.
Front-projection systems do not suffer from these technical limitations. Shadowless PM of front-projection has been achieved by a synthetic aperture projection approach. Most of the previous works spatially distribute multiple projectors over the environment to ensure that users do not occlude a projection target simultaneously from all the projectors. Once either an occluder [3,20,27,45] or its shadow [22,23,60,62,65,66] is detected by cameras, the system compensates for the shadow by illuminating that area from an unoccluded projector. A multi-projection system was also implemented using a mirror array which reflects a projected image from a single projector to a projection target such that each mirror is regarded as a distributed projector [33,41]. However, as discussed in Sect. 1, the synthetic aperture approach suffers from the delay in the computational compensation process. That is, a shadow cannot be perfectly removed when an occluder moves.
In this paper, we propose a novel front-projection system for shadow removal, which supports a projection target of any surface shape. In addition, our system adopts an optical solution for shadow removal rather than delay-prone computational occlusion compensation.

Delay-free interactive systems by optical phenomena
Other than shadow removal, a successful example of delay-free interaction by utilizing an optical phenomenon is silhouette-based interaction, where a user interacts with a displayed virtual object using the silhouettes of their bodies. Conventional computation-based techniques extract the silhouette from a captured image of a user and update the displayed image for the next frame [32]. This process, as well as the data transfer of the captured and displayed images among system components, inevitably causes a perceivable delay of the displayed silhouette. Researchers solved this problem by utilizing the physical shadow of a user as the silhouette [10,18,40,58,70].
In this paper, we utilize another optical phenomenon to realize a novel type of delay-free interaction-shadowless PM. Specifically, we develop a large aperture projection system using retrotransmissive optics so that projected light rays are incident on a target object from wide solid angles.

Retrotransmissive optics
The retrotransmissive optical system forms a real image of a light source in the plane-symmetrical position with respect to its planar optical element. Several implementation methods achieving the retrotransmissive property have been developed so far such as micro-mirror array plate (MMAP) [48], dihedral corner reflector array (DCRA) [34], and the combination of a half mirror and a retroreflector (AIRR: Aerial Imaging by Retro-Reflection) [73]. The most popular industrial application of retrotransmissive optics is a three-dimensional aerial image display. A flat panel display placed behind a retrotransmissive optical device is used as a light source, and the device forms the real image of the light source such that it is floating over the device. Building upon such aerial image displays, researchers have developed various 3D user interaction techniques [26,35,71,72]. Other application capabilities of retrotransmissive optics have been also explored. For instance, they have been applied as a beam combiner of an optical see-through display [47]. Other researchers used them as large aperture lenses of their capturing systems [44,74]. They demonstrated occlusion-free imaging that could see a target object through the cluttered foreground without computations. Since retrotransmissive optics are based on reflection rather than refraction, they do not suffer from chromatic aberration and geometric distortion. Therefore, the image quality of a displayed or captured image is not chromatically and geometrically degraded even when a large aperture device is used.
On the other hand, retrotransmissive optics cause a ghost image in principle. When rays from a light source are reflected an appropriate number of times in an MMAP or DCRA device, the device forms a midair image of the light source. However, a part of the rays are inevitably reflected an inappropriate number of times in the device or even pass through it without any reflections, which we call stray light. The stray light causes a ghost image. Previous works solved the ghost image problem by applying another optical device such as a polarizer and a view-control film [11,17,29,71,75]. However, since these solutions are tailored for displaying mid-air images, they cannot be directly applied to our PM system.
We proposed the concept of applying retrotransmissive optics to PM and demonstrated the first prototype using a large format MMAP by which we showed a delay-free shadowless PM system without chromatic aberration and geometric distortion [15]. Since then, other researchers have followed this work and shown various extensions [30,31,46,68]. However, because these systems used the retrotransmissive optical element for a different purpose than the original purpose (i.e., aerial image displays) without any modifications, the image quality such as contrast and spatial resolution of the projected results was significantly low. Therefore, only high-contrast and simple textures consisting of large graphical elements such as a checker pattern were perceptually recognizable in the projected results. In this paper, we provide a solution for the image degradation problem. Specifically, we propose to apply a spatial mask to block the stray light that is the primary factor of the image quality degradation in the previous systems. We also realize user touch sensing with minimum modification of the interaction space by leveraging the optical duality of the retrotransmissive optics. These two techniques can be directly applied to all the previous systems.

SHADOWLESS PROJECTION METHOD 3.1 Projection mapping using an MMAP
We use a large format MMAP in our projection mapping system. As shown in Fig. 2 (a), an MMAP consists of two layers of micro-mirror arrays which are attached orthogonally to each other using a glass medium. Figure 2 (b) shows a typical path of an incident ray, which reflects at the top and the bottom layers of the MMAP such that the directions of incident and outgoing rays are plane symmetry with respect to the MMAP. As a result, light rays emitted from a point in a space converge at the plane-symmetrical point with respect to the MMAP. Therefore, the MMAP forms a real image (as an optical term) of an object of equal magnification at the position of its symmetry.
Our system consists of the MMAP, a projector, a projection target, and its proxy object ( Fig. 2 (c)). The proxy object is a white diffuse object whose shape is plane symmetry of the target. We place the target and proxy objects in plane symmetry with respect to the MMAP. The projector directly projects images onto the proxy object. Reflected light rays from each surface point of the proxy object fall on the MMAP and converge to the corresponding point of the target object. Consequently, the projected appearance of the proxy object is duplicated on the target object's surface. We use an MMAP that is large enough relative to an expected occluder such as a user's hand, by which shadowless PM is achieved without requiring any shadow removal computations.

Stray light reduction using a spatial mask
A naïve system described in Sec. 3.1 suffers from stray light of the MMAP, which significantly decreases the contrast of a projected result. We propose a solution for this technical problem. Specifically, we attach a binary spatial mask to the MMAP that shields the areas where a large amount of incident rays become stray light. A large mask significantly reduces the stray light and consequently improves the contrast of the projected result. However, at the same time, it reduces the achievable luminance of the projected result and degrades the shadow removal performance. We propose a technique to determine the shape of the mask to balance the above trade-off.

Computational model of MMAP
Micro-mirrors of an MMAP's top layer are aligned as they are rotated -45 degrees from the horizontal side of the plate, while those of its bottom layer are rotated 45 degrees ( Fig. 3 (a)). Hereinafter, we consider the paths of light rays in each rhombus-shaped area of the MMAP surface, which is divided by the micro-mirrors, and call it a mirror hole. Suppose the surface of a proxy object exhibits a Lambertian reflectance, and thus, diffusely reflects projected light. Radiant flux of light rays, which are emitted from a surface element k and incident on the i-th mirror hole, can be computed as follows: where L k and s k are the radiance and the area of the surface element k, respectively ( Fig. 3 (b)). ω i,k and θ i,k are the solid angle of the i-th mirror hole from the surface element and the outgoing angle of the rays, respectively.
When light rays pass through each layer of the MMAP, their directions must be changed by the micro-mirrors to be converged at the plane-symmetrical point. If they do not, they become stray light. Specifically, the direction of a ray is changed when it reflects between two adjacent micro-mirrors for an odd number of times while passing through the layer (the dark red arrows in Fig. 3 (c)). On the other hand, the direction of a ray is not changed when it reflects for an even number of times or does not reflect at all (the pale red arrows in Fig. 3 (c)). Suppose t(i, k) ∈ [0, 1] represents the ratio of rays whose directions are changed by the top layer of the i-th mirror hole to all incoming rays from the k-th surface element, then the ratio can be computed as follows: where ⌊·⌋ is the floor function and n ∈ N. w and h represent the distance between adjacent mirrors and the height of the mirrors, respectively. φ i,k is the geometrically projected incident angle of the rays to a plane that is perpendicular to the mirror arrays of the top layer ( Fig. 3 (c)).
The same ratio for the bottom layer, which we denote as b(i, k) ∈ [0, 1], can be computed in the same manner.

Mask shape computation
The rays whose directions are not changed or changed only once at either the top layer or the bottom layer become stray light. The ratio of the former rays is k). Then, the radiant flux of the stray light from the i-th mirror hole can be computed as: where α 1 , α 2 , α 3 ∈ {0, 1} take 0 if the rays do not hit any points on the target object and 1 otherwise. When we cover the hole using the spatial mask, the mask blocks not only the stray light but light rays whose directions are changed twice at both the top and bottom layers, which then converge at the plane-symmetrical point on the target object. We call these light rays converging light. Suppose the radiant flux of the converging light at the i-th mirror hole is Φ c (i), then: The spatial mask should cover a mirror hole from which the stray light is emitted and should not cover one from which the converging light is emitted. Thus, we determine the density of the mask for the i-th mirror hole using the ratio of the stray light to the converging light, which is Once we compute the ratio ρ(i) for all the mirror holes, we normalize them such thatρ(i) ∈ [0, 1]. Then, we compute the mask density for each mirror hole by a simple thresholding as: The i-th mirror hole is covered by the mask when m λ (i) = 1. Thus, the radiant flux of the stray light passing through each mirror hole (i.e., not being blocked by the mask) is (1 − m λ (i))Φ s (i). By adjusting the threshold value λ ∈ [0, 1], we can balance the contrast and the brightness of the projected result. For example, a large amount of stray light can be removed and a high contrast image is displayed with a small λ , but at the same time, the displayed image becomes dark because the converging light is also blocked. We determine the optimal threshold value λ * to minimize the radiant flux values of the unblocked stray light and the blocked converging light. To balance the ranges of the radiant flux values between the stray light and the converging light, we compute λ * using a weighted sum of them as follows: Because this optimization problem is with respect to a single bounded parameter, we apply a linear search to find λ * in our implementation. A pseudo-code of our mask generation algorithm is shown in the supplementary material.
The optimal mask shape is dependent on the desired appearance L k even for the same target object. Because projected contents are dynamically updated in interactive PM, a single mask optimized for a certain desired appearance should be effective for other different appearances. Considering the dominance of the direct-current (DC) component of natural images in the Fourier domain, we propose to compute a universal mask by conducting the above optimization process for a uniform gray appearance.

Blur correction
The projected result in the proposed system inevitably becomes blurred even when we can perfectly block the stray light. There are several factors. First, the height of each mirror h needs to be non-zero in the MMAP, and consequently, light rays reflected at an upper part of the mirror and those reflected at a lower part do not converge at the same point on the target object (Fig. 4). The displacement of the two layers is another hindrance factor of the convergence. Furthermore, because Target object MMAP Proxy object  light rays illuminate the target object from wide range of directions, a blur easily occurs with even a slight misalignment of the proxy object relative to the target object. Considering a blur is caused primarily by these two factors, we hypothesize that light rays are spread relatively locally around the converging point on the target object, and thus, form a defocus-like blur. Therefore, we propose to apply a defocus blur compensation technique to alleviate the blur in our system.
We decide to use a CNN-based defocus blur compensation technique [24,25] for the following two reasons. First, CNN-based techniques do not require explicit estimation of the point spread function (PSF), which is generally performed by projecting artificial dot patterns, and thus, significantly degrades the user experience. Second, CNNbased techniques well balance the trade-off between the compensation performance and the computational time. On the other hand, a camera must observe projected results from the same perspective of the projector in CNN-based techniques. In our system, this requirement can be fulfilled simply by placing a camera at the plane-symmetrical position of the projector with respect to the MMAP such that the camera directly observes the target object. We validate our hypothesis through an experiment using a prototype where we check how much the blur can be compensated by the CNN-based technique.

Touch sensing
Leveraging the optically bidirectional property of the MMAP, we measure a user's touch position on the target object for interactive applications in which users can interact with projected contents on the object. Inversely to the projection mapping principle as described in Sec. 3.1, light rays reflected at a point on the target object travel through the MMAP and converge on the plane-symmetrical point on the proxy object. Consequently, the same appearance of the target object can be observed on the proxy object. That is, the appearance of the proxy object is disturbed by a user's touch on the target object. Therefore, we detect user's touch actions on the target object by observing the appearance of the proxy object with a camera. This approach can be interpreted as an extension of existing touch detection methods using the light-blocking area [42,69]. As described in Sec. 3.1, the target and proxy objects are fixed in plane symmetry with respect to the MMAP. Therefore, this touch detection method does not rely on the pose sensing of these objects. To avoid the interference by projected results, we use IR spectra for the user touch detection. We illuminate the target object using IR ambient light and place an IR camera observing the proxy object (Fig. 5).
We initially formed a simple hypothesis as: due to the shadowless property, when a user touches the target object, an area on the proxy object corresponding to the touched area on the target object becomes significantly darker than the other areas. However, through an informal preliminary experiment, we found that this hypothesis was not supported. Specifically, we observed that not only the above mentioned area but also another area on the proxy object which corresponds to a target object area under the user's hand became dark to the same degree. We speculate that the larger dark area was caused by the shadow of the IR ambient light on the target object by the user's body. We also observed this phenomenon on the target object by touching the proxy object. As a result, a larger area on the proxy object than a touched area became dark to almost the same brightness level, and thus, a simple thresholding for a captured IR image would not be significantly robust to locate the touched area.
On the other hand, we found in the same preliminary experiment that there was a sharp brightness change at an area on the proxy object corresponding to the upper border of the touched area on the target object. The sharp brightness change did not occur when a user's body hovered over the target object. Therefore, we propose to locate touched positions by detecting edges in the IR image. Specifically, after applying an edge extraction filter (i.e., Sobel filter) to the captured IR image, we estimate the touch position by computing the center position of each group of edge pixels.

EXPERIMENT 4.1 Experimental setup
We built a prototype to validate the proposed shadowless principle ( Fig. 1 (a)). As a large format retrotransmissive plate, we applied an MMAP (Asukanet, ASUKA3D488, 488×488×5.3 mm). The distance between adjacent micro-mirrors was 0.5 mm, and the height of each micro-mirror was 1.5 mm. A target object and its proxy object were placed 300 mm away from the MMAP. A projector (Laser Beam Pro C200, 1366×768 pixels, ANSI 100 Lumens) illuminated the proxy object. To avoid undesirable environment light, the projector and the proxy object were enclosed by a black box, a side of which was open for inserting the proxy object, the projector, and the MMAP. We describe the details of aligning the equipment in the supplementary material. On top of the MMAP, we put a mask that was fabricated either by electrostatic printing a black toner on a transparent film using OKI MC843 (600 dpi) or by manually cutting a sheet of black paper.
For the validation of the proposed touch sensing technique, we installed 1D IR LED arrays (wavelength: 850 nm) and an IR camera consisting of a gray scale camera (Basler acA720-520um, 720×540 pixels) and a visible light cut filter (Hoya IR-80). The LED arrays were attached next to the MMAP to illuminate the target object from wide directions. The IR camera was installed such that it captured the appearance of the proxy object.

Validation of the stray light reduction technique
This section shows the validation results of our stray light reduction technique described in Sec. 3.2.

Computational model of the MMAP
We qualitatively checked if our computational model of the MMAP effectively simulates the physical phenomena. We set a tilted square plane (width: 200 mm, tilt angle: 45 degrees) as the target object, and a dot image composed of a white circle (diameter: 10 mm) and a black background as the projection image ( Fig. 6 (a)). Note that the model took the dot image as the surface radiance map of the proxy object L k . We computed the radiant flux values of the stray light and the converging light on the MMAP using Eq. (3) and Eq. (4), respectively. Fig. 6 (b) shows the normalized ratio of these values (i.e.,ρ(i)). Then, we computed a mask using Eq. (5) with a threshold value of λ = 0.0 ( Fig. 6 (c)). We call this mask the first mask. We also computed another mask (the second mask) by repeating the same process described above except the target object was switched to a 50×50 mm square plane ( Fig. 6 (d)). Figure 6 (e) shows the simulated and actual projected results of the dot image under three mask conditions (without mask, with the first mask, and with the second mask). Although we set two target objects in computing the masks, we used the 200×200 mm square plane as the target object for all the PM results. In the figure, the spatial patterns of the stray light in the actual projected results look similar to those in the simulated results. For example, stray light in the 50×50 mm square region at the center of the target object was selectively removed in both the simulated and actual projected results under the second mask condition. There exist some differences between the simulated and actual projected results, which may come from the imperfection of our model (e.g., refraction and diffraction are not taken into account) and the misalignment of the objects relative to the MMAP in the prototype. Nevertheless, the physical masks significantly removed the stray light in the target areas (i.e., 200×200 mm square region by the first mask and 50×50 mm square region by the second mask). Therefore, we believe our computational model predicts the behavior of light rays accurately enough to generate a functional mask.

Universal mask
As described in Sec. 3.2.2, we propose a universal mask that is optimized for a uniform gray appearance. Using the tilted square plane (200×200 mm), we checked how effectively the universal mask reduces the stray light for other target appearances. We randomly selected nine natural images from a publicly available dataset (DIV2K dataset) [1] to test the universal mask. Figure 7 shows the selected images. We computed the ratio of the radiant flux of the stray light on the MMAP to that of the converging light (i.e.,ρ(i)) for the uniform gray image and the selected natural images (Fig. 7). Because these results look similar to each other, we expect that the masks optimized for the uniform gray image and the other natural images also become similar to each other, and consequently, that the universal mask is effective for other projection images. Figure 8 shows the simulated results of the projected appearances of a natural image ("A" in Fig. 7) with masks computed with different threshold values λ by Eq. (6). We confirm that E(λ ) (i.e., the weighted sum of the unblocked stray light and the blocked converging light) takes the minimum value at λ = 0.25 for this particular projection image.
Regarding the image quality, the projected appearance at λ = 0.25 best balances the overall contrast and the luminance. Note that the "Target" appearance in Fig. 8 is computed by removing all the stray light from the projected appearance, which cannot be physically achieved by a mask. Figure 9 shows the simulated results of the projected appearances of a natural image ("E" in Fig. 7) without a mask, with the mask optimal for the image, and with the universal mask, respectively. The results show that the image quality of the projected appearance with the universal mask is similar to that with the optimal mask. As mentioned above, the universal mask is the optimal mask for the uniform gray image. The graph shows E(λ ) values of all the three mask conditions of all the natural images. The average and the standard deviation of E(λ ) values with the optimal mask are 0.656 and 7.9 × 10 −3 , respectively. Those with the universal mask are 0.661 and 8.1 × 10 −3 , respectively. Therefore, it is quantitatively confirmed that the universal mask is as effective as the optimal mask on image quality improvement for our shadowless PM system.

Stray light reduction in actual PM
We validated our universal masks in an actual PM setup. We used a 3D-printed Stanford bunny (height: 110 mm) as well as the tilted square plane as the target objects. Figure 1 and Fig. 10 show the projected results in a normal projection system and in the proposed system with and without the universal masks, by which we confirmed the following three points. First, projected textures were not completely occluded on both the planar and the bunny objects even when hands approached them closely in the proposed system, though there were  Fig. 9: Simulated results of projected appearances of the natural image "E" without a mask, with the optimal mask, and the universal mask, respectively. Insets show the masks. "Target" is computed by removing all the stray light from the projected appearance. The bottom graph shows E(λ ) values of the three mask conditions (pink: without mask, blue: with the optimal mask, green: with the universal mask) for each natural image.
clear shadows of the hands on the objects in the normal projection system. Second, the masks significantly improved the contrast of projected appearances. The masks darkened the projected appearances, which were, however, not perceptually too dark. Third, the masks did not significantly degrade the shadowless performance of the system between the with and without mask conditions. These results validated the effectiveness of our universal mask in stray light reduction.
By projecting a white and black 4×4 checker pattern onto the tilted square plane surface, we evaluated how much the peak luminance and the ANSI contrast were degraded in the proposed system compared to normal projection, and how much they were improved by the universal mask. We used a spectroradiometer (Topcon, SR-LEDW) for the measurement. Table 1 shows the results. The peak luminance and the ANSI contrast were about one order of magnitude lower in the proposed system than the normal projection, respectively. The universal mask improved the contrast by 3.0 times, while reducing the peak luminance by 3.1 times.

Validation of the blur correction
As described in Sec. 3.3, we hypothesize that blur artifacts caused in our system are defocus-like blurs and that we can alleviate them using a defocus blur compensate technique. First, we measured the PSF of the system by projecting a dot pattern image onto the proxy object of the tilted square plane and capturing the appearance of the target object. Note that the universal mask was applied in the measurement. Figure 11 (a) shows the captured dots. We confirm that the light rays are spread locally on the target object and the PSFs are spatially varying, which are similar to defocus blurs in normal projection on a non-planar surface. Therefore, this result supports our hypothesis. Second, we validated the effectiveness of the CNN-based blur compensation technique [25] on the projected image quality in our system. We used the network and learned parameters of [25] without fine-tuning. We used both the tilted square plane and the 3D-printed Stanford bunny as the target objects. Figure 11 (b) shows the projected results without and with the compensation. We can see that the projected results with the blur compensation are significantly sharper than those without the compensation. Based on the obvious improvements, we confirmed that the blur compensation technique effectively alleviated the blur artifacts in the projected results on both surfaces.

Validation of the touch sensing
We validated our touch sensing technique using the tilted square plane and a hemisphere surface (radius: 80 mm). Figure 12 shows the captured IR images in the following two conditions: (1) two fingers touched a target object, and (2) a hand hovered over it (10 mm above the surface). We confirmed in the images that not only the touched areas but other areas became darker. This proved that a simple thresholding does not work for robustly estimating the touched positions. According to our proposed method, we applied a Sobel filter to the captured images and then applied thresholding and morphology operations to them to detect blobs of edge pixels. From these results, we confirmed that each touched area was detected by the proposed method even when multiple fingers touched the objects simultaneously. We conducted a user study to evaluate the spatial accuracy of the touch sensing technique. We asked each participant to touch nine predefined target points on the tilted square plane and eight points on the hemisphere surface. Each target point was indicated by a projected cross image as shown in Fig. 13 (a). We evaluated the error between the ground truth and the estimated touched position. Six participants (6 male, 22-24 years old) volunteered for the study. Each participant touched each indicated point three times, and thus, we gathered 162 (6 participants×9 points×3 touches) touch data in total for the tilted square plane and 144 (6 participants×8 points×3 touches) touch data in total for the hemisphere surface. Figure 13 (b) and (c) show the results. For the tilted square plane, the average and the standard deviation of the error in the camera's coordinate system were 20.8 and 8.5 pixels, respectively. We computed the physical scale of the errors by converting the estimated touch positions using a homography transformation, and the average and the standard deviation of the errors were 7.6 and 2.9 mm, respectively. For the hemisphere surface, the average and the standard deviation of the error in the camera's coordinate system were 27.8 and 13.8 pixels, respectively. To intuitively understand how much the estimation error has an significant effect on a user interaction, we  visualized the estimated touched area by projecting a bright spot as shown in Fig. 14.
It should be noted that the participants may not have been able to touch the indicated points accurately. Although the touching error was less than 5 mm in our observation, it was possibly accumulated in the estimation error. Although the estimation accuracy is not as high as that of recent commercially available touch panels, the proposed method can still support a wide range of user interactions. For example, the surface of a target object is divided into several small areas, and a user specifies a segmented area that they want to change its color or the texture by their touch action. We implemented such interaction techniques and Fig. 1 (c) shows the results. We confirmed that the system allowed a user to touch a part of the object to change its appearance.

DISCUSSION
Through simulation and actual PM experiments, we demonstrated our shadowless PM proposal. Specifically, we showed that a PM system applying a large format MMAP rarely exhibited shadows even when hands covered a target object. In addition, we demonstrated that our universal mask successfully reduced stray light, and a defocus blur compensation technique significantly alleviated the blurs of the projected result on the target object. Finally, we confirmed that touch positions on the target object could be estimated by utilizing the optically bidirectional property of the MMAP. On the other hand, our experiments also revealed technical limitations of the current prototype and the method, which we believe can trigger interesting research questions in the near future in the fields of optical design and computational algorithm. In the rest of this section, we discuss the technical limitations and indicate a set of future research directions.
Scalability: A projection target in our PM system is limited to a relatively small object. As shown in Fig. 1, our system requires a certain space behind the retrotransmissive plate for placing the proxy object. Furthermore, the size of the plate depends on the size of the target object because the projected rays need to be incident on the target from sufficiently wide solid angles to achieve shadowless PM. Therefore, to avoid an impractically large space for the proxy object and an unmanufacturable size of the retrotransmissive plate, the projection target needs to be reasonably small. Specifically, the target object in our current prototype should be smaller than a cube 200 mm on each side. However, we believe this limitation does not significantly degrade the applicability of the proposal, because such small objects are the majority in current interactive PM applications.
As another constraint, the target needs to be fixed in our system, and thus, is limited to a static object. Recently, researchers tried to overcome this limitation by applying a robotic arm to move the proxy object according to the target object's movement or by replacing the proxy object with a volumetric display [30,31,46]. In addition, a liquid crystal panel may be applied as a dynamic mask [51]. Although there are still some technical burdens in these approaches such as a delay in online geometric registration and a low peak luminance, tackling these issues are interesting future directions of the shadowless PM system. However, in total, the proposed optical layout is not scalable for a large and moving object.
Image quality: The image quality (i.e., achievable luminance and contrast) of a displayed result in the proposed system is significantly degraded compared to normal projection, as shown in Sec. 4.2.3. The luminance reduction is caused by several factors. As a major one, only a part of the light rays reflected at each surface point of a proxy object reaches the MMAP. Furthermore, some amount of the light rays incident on the MMAP become stray light, as described in Sec. 3.2, and do not converge at the target object. The stray light also reduces the contrast of the projected result on the target object. Although we showed that our universal mask improved the image quality, there still exists a huge quality gap between the proposed projection and normal projection. In addition, the thickness of mirror arrays of the MMAP introduces defocus-like blurs in the projected image. Although we showed the artifacts could be improved by a defocus blur compensation technique, as shown in Sec. 4.3, the software-based method inevitably suffers from contrast reduction. Furthermore, no light is incident on an area of the object if it is not visible from the MMAP. This is a hard limitation of the proposed system.
We applied MMAP in this research because we could build a shadowless PM system in a simple structure and a large format device was commercially available. On the other hand, other retrotransmissive optics have potential to present better image quality. For example, AIRR [73] does not use mirrors, and thus, it reduces the amount of stray light caused by inappropriate number of reflections between mirrors in MMAP. DCRA [34] uses micro-mirror arrays, but the mirrors are enclosed in only one layer. Thus, the defocus-like blur caused by the displacement of two layers in MMAP can be avoided. We did not use AIRR and DCRA in our prototype because the system becomes more complex with the AIRR, and a sufficiently large DCRA device is not commercially available at the moment. However, it is an interesting future direction to consider to apply different types of retrotransmissive optics to see if further improvement of the image quality is achievable.
Touch sensing: The advantage of our touch sensing method is that it works with the same optical setup as the projection system. Existing touch sensing technologies, such as depth camera-based and motion capture-based, can be applied to our system and may provide more accurate touch sensing results than our method. However, these methods require us to attach multiple sensors around the target object to avoid occlusions, potentially degrading user experience, especially in our setup that already limits the interaction space by the MMAP placed right above the target object. On the other hand, our method shares a limitation with these existing methods. That is, our method cannot distinguish between a touch and a slight hovering over the surface. We can overcome this limitation by observing the fingernail color [13,61], which, however, requires additional cameras.
Shadowless performance: As we mentioned in Sect. 1, the current system does not achieve truly shadowless mapping since soft shadows occur around the user's touch areas. We can improve the shadowless performance by digital image processing to the projected image for the proxy object to compensate for the soft shadows. This can be achieved by adaptively increasing the brightness of specific areas in the projected image, which correspond to the touch areas on the target object. How much we increase the brightness could be determined either by our computational model of the retrotransmissive optics or by a visual feedback system applying a camera. This is a crucial future work of this project.

CONCLUSION
In this paper, we presented a delay-free shadowless PM system applying an MMAP plate to project images onto a target surface from wide viewing angles to achieve an occlusion-free interactive PM application. In addition to the shadowless principle, this paper makes the following three technical contributions. First, we developed a computational algorithm to design a spatial mask to reduce stray light from the MMAP, while improving the projected image quality in terms of the contrast and peak luminance. Second, based on the analysis of blurs caused by the MMAP, we proposed to apply a defocus-blur compensation technique to alleviate the blur artifacts. Finally, we introduced a touch sensing technique that estimated user's touch positions on the target surface from the proxy object's appearance, which was in principle the copy of the target surface's appearance. The simulation and physical experiments demonstrated that all of the proposed techniques worked effectively. Therefore, we believe that our proposal has the potential to significantly improve the user experience in various interactive PM applications. We will keep improving the system by working on the future directions described in Sec. 5 to realize more practical and general shadowless PM systems.