HaptoMapping: Visuo-Haptic Augmented Reality by Embedding User-Imperceptible Tactile Display Control Signals in a Projected Image

This article proposes HaptoMapping, a projection-based visuo-haptic augmented reality (VHAR) system, that can render visual and haptic content independently and present consistent visuo-haptic sensations on physical surfaces. HaptoMapping controls wearable haptic displays by embedded control signals that are imperceptible to the user in projected images using a pixel-level visible light communication technique. The prototype system is comprised of a high-speed projector and three types of haptic devices—finger worn, stylus, and arm mounted. The finger-worn and stylus devices present vibrotactile sensations to a user's fingertips. The arm-mounted device presents stroking sensations on a user's forearm using arrayed actuators with a synchronized hand projection mapping. We identified that the developed system's maximum latency of haptic from visual sensations was 93.4 ms. We conducted user studies on the latency perception of our VHAR system. The results revealed that the developed haptic devices can present haptic sensations without user-perceivable latencies, and the visual-haptic latency tolerance of our VHAR system was 100, 159, 500 ms for the finger-worn, stylus, and arm-mounted devices, respectively. Another user study with the arm-mounted device discovered that the visuo-haptic stroking system maintained both continuity and pleasantness when the spacing between each substrate was relatively sparse, such as 20 mm, and significantly improved both the continuity and pleasantness at 80 and 150 mm/s when compared to the haptic only stroking system. Lastly, we introduced four potential applications in daily scenes. Our system methodology allows for a wide range of VHAR application design without concern for latency and misalignment effects.


INTRODUCTION
T ACTILE sensation is essential for capturing the material characteristics of physical surfaces and conveying emotions between humans. We can distinguish textures (roughness, hardness, etc.) by a haptic exploration of a surface. We can also exchange emotional cues, such as love, happiness, and sadness with others through stroking, grasping, and tapping, called social touch [1]. Presenting these tactile sensations is essential to improve the realism and immersion of user experiences in Augmented Reality (AR) [2]. Many studies have attempted to present tactile information in AR, especially incorporating visual information called Visuo-Haptic AR (VHAR) [3], [4], [5], [6], [7].
One limitation of these VHAR systems was that the user's view was enclosed with a display, which constrained the field of view (FOV). To address the limitation, haptic displays can be embedded into flat-panel visual displays [10], [11]. These systems enable to ensure a spatiotemporal consistency of visuo-haptic information without covering a user's FOV. However, visuo-haptic touch panels restrict the workspace to flat surfaces. Wearable haptic displays controlled by projected lights with projection-based visual displays can present haptic information on nonplanar surfaces as projected patterns [12], [13]. Additionally, these systems can achieve high spatiotemporal consistency of visuo-haptic information since they do not require an external tracking system. However, because the visual and haptic information were strongly dependent on each other in their synchronization methods (e.g., a vibration occurs only in a bright area), application designers cannot independently design each source of information. Therefore, there has not been a VHAR system for texture presentation that enables high spatiotemporal consistency and independent visual and haptic designs. As for haptic devices for social touch, conventional systems have focused only on presenting tactile sensation [14], [15], [16]. Therefore, there has not been a VHAR system for social touch.
This paper introduces a projection-based VHAR system, HaptoMapping, that can render visual and haptic information independently while achieving high spatiotemporal consistency between the two modalities. HaptoMapping controls wearable haptic displays by embedded control signals that are imperceptible to the user in projected images using a pixel-level visible light communication (PVLC) technique [17]. The PVLC enables both a pixel-level data embedding and natural image projection by a high-speed projection of binary images. Haptic displays receive embedded control signals using photosensors and present vibrotactile sensations. In addition, the haptic feedback is presented with an imperceptible short-latency because the haptic devices can directly receive the embedded control signals from the projector's light. We can employ various shapes of surfaces for targets as long as they are suitable for projection. Multiple users can simultaneously feel haptic feedback on the same visual display because this system does not require individual displays, as in the use of head-mounted displays (HMDs). It is easy to cooperate with other modalities, such as auditory by attaching corresponding sensory presentation devices (e.g., speakers) to the haptic device and embedding additional control signals.
We have built a prototype system of HaptoMapping comprised of a PVLC projection system and three types of haptic devices-finger worn, stylus, and arm mounted. The projection system can embed up to 24-bit control signals when a static image projection and up to 8-bit signals when a video projection. The finger-worn and stylus devices present a simple haptic cue or a more complex texture using a single actuator. The arm-mounted device creates stroking sensations as social haptics using arrayed multiple actuators with synchronized projection mapping of a hand image.
We evaluated the maximum latency of our VHAR system. We can also use the system as an experimental platform to investigate the human perception characteristics of visualhaptic sensations. We have conducted user studies using the finger-worn, stylus, and the arm-mounted devices to investigate whether the visual-haptic asynchrony (latency) of our systems were within an acceptable range for user experience. We also conducted a user study to investigate the continuity and pleasantness of the stroking sensation when using the arm-mounted device. In particular, we investigated the effect of visual information and the optimal system configuration for creating continuous and pleasant stroking sensations. Finally, we introduced several application scenarios using each device in daily scenes.
The contributions of this paper are that we developed the stylus device and the arm-mounted device in addition to the initially developed fingerworn device, which were designed for presenting texture and mediated social touch, respectively; extended the PVLC projection system to enable a video projection as well as a static image projection; investigated the latency perception characteristics of haptic from visual sensations using the developed haptic devices and the continuity and pleasantness of stroking sensations using the arm-mounted device; and developed four VHAR applications, a texture design support system, a multimodal map enhanced with visual, haptic, and audio information, an interactive dictionary, and an online social touch system. The progress from our previously presented system [18], [19] are detailed in Section 2.4.

RELATED WORK
In this section, we discuss previous haptic devices designed for texture presentation and social touch. First, we give an overview of existing haptic devices in VHAR, their limitations, and an explanation of the PVLC principle. We then review haptic devices designed for social touch and discuss the potential of collaboration in visual information as a VHAR system. We also review guidelines for latency from haptic to visual information. Finally, we highlight the progress from our previous works [18], [19].

Haptic Devices in VHAR
The early approach of haptic technology is using grounded haptic displays such as the PHANToM TM [20]. They typically collaborate with several visual displays such as a display using a half-mirror [3], [8] or a head-mounted display [4], [9] for VHAR applications. These systems enabled the presentation of haptic information on virtual images. Major limitations were that the user's view was enclosed with a display, which constrained the field of view (FOV), and the user's workspace was restricted to a small manipulatable area by using grounded haptic devices.
One way to address these limitations is integrating a tactile display into a flat-panel visual display [10], [11]. This system ensures a spatiotemporal consistency of the visuohaptic display without covering a user's FOV. A user can also experience applications simultaneously on the same display. However, such visuo-haptic touch panels restrict the workspace to flat surfaces.
Another approach to overcome the above limitations is to combine a projection-based visual display and a wearable haptic display controlled by projected lights. HALUX [12] is a wearable haptic display that utilizes the projected illuminance to control vibration actuators. The actuators are turned on when a projector lights them. Although the display enables tactile sensation presentation using projected lights, it does not display visual content for human observers because it was only used to control haptic displays. SenseableRays [13] uses structured light to control the vibration of a piezoelectric actuator. It presents vibrotactile sensations by converting signals for visual information to signals for haptic information. It utilized the amplified received light as signals for the piezoelectric actuator. As the system does not require an external tracking system, it could miniaturize the haptic device and reduce the delay time for a tactile presentation. One drawback of this system was that they could not design visual and haptic information independently, which means the haptic information automatically determined the visual information due to the direct conversion between the projection lights and vibration patterns. In our VHAR system, we employ PVLC for haptic device control and image projection. It enables independent visual and haptic designs and consistent visuo-haptic presentation simultaneously.

PVLC
PVLC is a wireless communication technology that embeds a digital signal (that is imperceptible to the user) into each projected pixel by modulating temporal blinking patterns using a high-speed projector [17]. In Fig. 1, we show the concept of PVLC.
The PVLC technique is achieved using a digital light processing (DLP) projector. A DLP projector generates spatial light patterns by controlling the inclination of discretely arrayed micromirrors (called a digital micromirror device, or DMD) that reflects incident light from a source to either a black absorber or projector lens. Each micromirror corresponds to one pixel of a projected image. The projector can display binary images at a high frequency by controlling the state of the DMD. Since human observers perceive the luminance of fast blinking patterns in an integrated manner [30], the DLP projector sequentially projects binary images at a high speed that are integrated at the desired luminance. PVLC performs visible light communication by utilizing the fact that the human perception of the integrated luminance does not depend on the order of the mirror flip pattern. It presents an image to a human observer while transmitting different binary codes to a distant device according to the light-receiving position. In particular, we first project the time-modulated binary data for the device as a series of images and then project other images that compensate for the undesired luminance disturbance by the previous ones, thus presenting the desired image for a human observer. A DLP projector can present the above-mentioned binary images at higher than 50 Hz. Therefore, a human observer perceives the desired image as the integration of the projected intensities, while the pixel-dependent binary data are transmitted to a distant device (e.g., swarm robots [31]) located under a projected pixel. In addition, because we can embed information into each pixel of the image using PVLC, there is no misalignment between the image and the information of the pixel in principle and no need for geometric calibration regarding their alignment.

Mediated Social Touch Devices for Stroking Presentation
Mediated social touch technologies aim to reproduce affective touch using actuators in mediated communications. Various systems have been proposed for creating hugs, handshakes, tickles, and strokings [32], [33], [34], [35], [36]. Since stroking is a common and effective form of social touch [37], researchers have focused on creating stroking systems by direct skin stimulation using lateral motions [36], [38]. However, the actual length of the stroke is limited in these systems. Recently, stroking systems using multiple actuators discretely aligned on the forearm have been drawing attention for creating the sensation of a longer stroke. Culbertson et al. [14] have proposed a social device that uses voice coils distributed on the forearm to reproduce a stroking sensation by progressively applying pressure to the skin of the forearm. Through a user study, they found that users' sensations of continuity and pleasantness for the stroke stimuli could be maximized by increasing the driving duration and shortening the delay of each voice coil. Nunez et al. [15] developed a device that uses rotating motors to present a sequential discrete lateral skin-slip sensation on the forearm. Through a user study, they found that their device can maintain continuity and pleasantness even at a large contact distance. Israr et al. [16] have proposed a device that uses vibrating voice coils linearly distributed on the forearm to reproduce a stroking sensation. The results of user studies showed that the pleasantness increased at relatively low frequencies and amplitudes of the vibrations.
The previous studies investigated the continuity and pleasantness of presented sensations using only haptic information. However, some researchers have revealed that visual information in collaboration with haptic information can enhance the perceived sensations of softness [39], shapes [40], resistive forces [41], and the presence of the presented object [42]. Therefore, we were interested in how visual information when presented synchronously with haptic information could affect the continuity and pleasantness of stroking sensations.

Human Perception Characteristics for Latency of Haptic From Visual Sensations
When designing a VHAR system, we must know the latency perception threshold so that the presented visuo-haptic stimuli are spatiotemporally consistent. As for the delay perception, Miyasato et al. [43] have investigated the tolerance of latency of haptic from visual sensations in a visuohaptic teleconference system. Through a user study, they have found that the threshold for time-delayed perception in their system was about 100 ms. Silva et al. [44] have also investigated the tolerance of perceivable latency of haptic from visual sensations in a video game. The experimental results showed that the stimulus threshold of the latency in their system was about 100 ms. The previous studies focused on investigating the visuohaptic latency tolerance for visuo-haptic virtual reality systems, which means they did not spatially superimpose the presented visual and haptic stimuli. Thus, there are still no design criteria for the consistency between visual and tactile information in VHAR systems. We conducted experiments on the temporal latency of haptic from visual sensations by Fig. 1. Concept of PVLC. A high-speed DLP projector displays binary images comprised of three segments-sync, data, and image. A light transition at a pixel during the data segment represents a binary signal, which can be extracted using a photodiode. When viewed by a human, the image appears natural without noticing the embedded signal. Since they perceive the luminance of fast blinking patterns in an integrated manner, we adjust the perceived luminance to be the target one in the image segment.
superimposing visual and haptic stimuli using the developed prototype system and report on the results in this paper.

Progress From Previous Iterations of the HaptoMapping System
We presented a preliminary system in conferences [18], [19].
In [18], we introduced the original idea of HaptoMapping with the finger-worn device and the PVLC projection system. The projection system is enabled to illuminate surfaces with static images and embed only 26-bit control signals in this system. We evaluated the system latency when the data length is 26-bit and explored the latency perception characteristics of haptic from visual sensations when using the finger-worn device. In [19], we demonstrated three applications using the finger-worn device.
In this paper, we improved our VHAR system in terms of applicability, conducted additional subjective experiments, and developed an additional application. As for the system improvement, we developed two types of application specific haptic devices-stylus and arm mounted. The stylus and arm-mounted devices are designed for texture presentation and mediated social touch, respectively. In addition, we developed a video projection mode that is compatible with the PVLC technique and made the data length variable. We then evaluated the maximum delay when the system configuration was changed. We conducted user studies to investigate the latency perception characteristics when using newly developed devices-stylus and arm mounted. We also revealed the continuity and pleasantness of the stroking sensation when using the arm-mounted device. We developed an additional application for online social touch using the arm-mounted device, which expands our system's applicability to a social communication system from a personal visuo-haptic system.

Principle of HaptoMapping
HaptoMapping is a projection-based VHAR system that can render visual and haptic content independently and present consistent visuo-haptic sensations on a nonplanar physical surface. Fig. 2 shows the concept of the proposed system. It controls wearable haptic displays using a blinking pattern (that is imperceptible to the user) embedded into each pixel of the projected images using PVLC [17].
This system can employ various visuo-haptic combinations because we can design visual and haptic information independently in PVLC. It enables us to keep temporal consistency between the visual and haptic sensations because the haptic devices can directly receive control signals from the projector's lights. In addition, it keeps spatial consistency between the visual and haptic sensations because the control signals are embedded into projected images at a pixel level and received by haptic devices using photodiodes.

Projection System
HaptoMapping comprises a projection system and wearable haptic displays. The projection system is comprised of a high-speed DLP projector and projection surfaces. We embed control signals for haptic devices that are imperceptible to the user in each pixel of the projected image using PVLC.
Each frame of the PVLC consists of three segments of binary images. These are called the sync, data, and image segment (Fig. 1). The sync segment contains synchronization images, which are spatially uniform binary images (i.e., white or black flood images). The synchronization images indicate to the haptic devices when control signals are being projected. The binary images in the data segment are the control signals of the haptic devices, which represent a combination of vibration pattern ID, on/off information, x and y projector coordinates, and/or other control signals (such as delay values, described further in Sections 5 and 6). The image segment represents a full-color image for human observers. The binary images in this segment compensate for the disturbance of the projected appearance caused by the sync and data segments to display the full-color image target. Finally, users observe the full-color image target while the haptic devices receive the embedded control signals.
The projection time in the sync segment (T sync ) and the data segment (T data ) is the product of the projection period of each binary image (t sync and t data , respectively) and the number of binary images in the segments (N sync and N data , respectively). Therefore, T sync ¼ t sync Â N sync and T data ¼ t data Â N data . The projection time in the image segment (T image ) was determined by the desired refresh rate of a image/video (f PVLC ), which means (1)

Haptic Devices Controlled by PVLC
The wearable haptic display consists of a receiver circuit, a microcontroller, an audio module, a vibration actuator, and a battery. The audio module has the storage and function of a music player. Because we can handle vibration patterns as audio formats (such as the WAVE format), we utilized the audio module as both the storage and player of the vibration patterns. The wearable haptic display presents vibrotactile sensations according to the embedded control signals in the projected image. First, the receiver circuit uses a photodiode to react to the projected lights and convert them to binary signals with an amplifier and comparator. The received signals are sent to the microcontroller. After receiving the synchronization signals, the microcontroller then acquires the embedded control signals. The microcontroller sends vibration patterns to the actuator according to the received control signals. When we present a single-frequency vibration pattern, the microcontroller directly sends the vibration pattern to the actuator. If we drive a texture's vibration pattern, which is the mixture of multiple frequency patterns, the microcontroller sends the vibration pattern to the actuator through the audio module (Fig. 2). The vibration patterns of textures are preloaded in the audio module as WAVE format audio files.
The proposed system's latency (T late ) is defined as the duration from when the haptic device is placed on a projection area with embedded control signals to when the device presents a vibration. T late can be calculated as follows [45]: T wait is the waiting time for the start of the sync segment and depends on the timing of when the device enters a pixel of a projected image. Therefore, T wait is from zero to the projection time of a frame (¼ 1=f PVLC ). T recv is the time for the reception of the sync and data segments. Therefore, T vib is the time after the actuation signal is sent from the microcontroller until the start of the haptic presentation. When we drive an actuator without the audio module, T vib is determined by the actuator's mechanical properties. If we utilize the audio module, T vib is the sum of the audio module's processing time and the actuation time. t is the processing time in the microcontroller. In this paper, we do not consider the effect of t for calculating T late because t is three orders of magnitude smaller than T wait , T recv , and T vib .

Implementation of HaptoMapping
We have developed a prototype system of HaptoMapping, which comprises a projection system and haptic devices. We also describe a preprocess to prepare projection images and control signals for the haptic devices.

Projection System
We employed a high-speed DLP projector development kit (DLP LightCrafter 4500, Texas Instruments) to display images using PVLC. The projector displays binary images sent from a PC according to a specified period. We can control the projection period of each binary image using the software from the development kit. We set t sync ¼ t data = 0.235 ms and f PVLC ¼ 50 Hz. In the prototype system, the sync segment consisted of ten binary images (N sync ¼ 10), which were a combination of black and white flood images. Therefore, T sync ¼ 2:35 ms. The data segment consisted of one or more binary images representing each bit of the control signals of a haptic device. The number of binary images (N data ) is dependent on each application's required data length, and therefore, T data ¼ 0:235 Â N data ms. The image segment consisted of a 5-bit RGB image (5 Â 3 binary images). T image was determined according to Equation (1).
As for the sync segment, because the projector can project a color-inverted image of a binary image, a black flood image was projected by inverting the color of a white flood image. Also, the projector can repeatedly utilize a binary image in its storage. Therefore, we could generate binary images of the sync segment by inputting one white flood image for each frame.
The projector can stream up to 24 binary images from a PC for each frame and can project a full-color video sequence as with ordinary projectors. We call this video mode. In this mode, at maximum, we can use eight binary images (=24 -1 -15) for the data segment. This mode allows for a variety of visual expressions but limits the haptic representation. On the other hand, we can preload up to 48 binary images to the projector, although we can not update these images online. Then, the projector can project a full-color static image using these binary images. We call this static mode. In this mode, at maximum, 32 binary images (=48 -1 -15) can be used for the data segment. Although this mode can only present a static image, it can handle rich tactile presentations, such as adaptive haptic feedback according to the speed of the hand movement. We used two different modes depending on the application. Table 1 summarizes the number of binary images projected in each frame in each projection mode.

Haptic Devices Controlled by PVLC
We developed three types of wearable haptic devices (finger worn, stylus, and arm mounted) controlled by PVLC. The finger-worn device presents tactile sensations on a user's fingernail. The stylus device presents haptic sensations on a user's finger pad. The arm-mounted device presents stroking sensations on a user's forearm. They share the same controlling system, but each has a different use. Fig. 3 shows the hardware configurations of the haptic devices. In the finger-worn device (Fig. 3a), a receiver circuit and an actuator are placed on the fingertip, and a microcontroller, audio module, and battery are placed in a box (120 cm 3 ) on the wrist. A photodiode is attached to the front of the fingertip. The bottom side of the device is a clip to fix the device on the user's fingertip. This device presents a vibrotactile sensation on a user's fingertip by driving the actuator on the user's nail. It is known from previous studies that vibrations on a fingernail simultaneously vibrate a fingertip, and that creates the illusion of vibrations from the finger pad [46]. This structure helps a user in directly touching the object's surface with the finger pad.
In the stylus device (Fig. 3b), a receiver circuit and an actuator are stored inside a cylinder-shaped stylus (length: 140 mm, maximum diameter: 20 mm), and a microcontroller, an audio module, and a battery are stored in a box (120 cm 3 ). A photodiode is attached to the nib. The actuator is placed inside the stylus under the user's grip position and presents the vibrotactile sensation to the user's finger through the stylus's body. The stylus device is compatible with other data-driven haptic rendering methods for stylusshaped devices [47], [48], [49], [50], [51]. They synthesize vibrotactile patterns of various textures using prerecorded vibration patterns.
The arm-mounted device (Fig. 3c) presents stroking sensations on a user's forearm by sequentially driving multiple actuators arranged in a row. The device consists of a projection screen (a white rubber plate), six substrates, and a battery box. Each substrate comprises a receiver circuit, a microcontroller, and an actuator. Each microcontroller receives control signals independently. A buckle is used to hold the device on the user's forearm. This device utilizes a haptic illusion, called apparent tactile motion [52], to present a continuous stroking sensation by discretely aligned actuators. Apparent tactile motion [52] is a tactile illusion in which a sequentially activated series of actuators makes a user believe that one actuator is moving along the skin. Although previous studies have applied this illusion to social haptic devices to reproduce a stroking sensation, they have displayed haptic information without synchronized visual information [14], [15], [16], [53], [54], [55]. Our arm-mounted device, however, is capable of the visuo-haptic presentation of stroking sensations by embedding activation signals in a hand image projected onto the user's forearm. We expect that the superimposing of visual information improves the continuity and pleasantness of the stroking sensation. In addition, we assume that this device can be used in remote communication systems in which an upper body image of a distant person is displayed on a vertical monitor and her/his forearms are projected onto a horizontal surface so that the forearms are extended from the vertical display [56], [57].
We used an S2506-02 (Hamamatsu Photonicsa) as the photodiode, a microcontroller (finger-worn and stylus devices: Nucleo STM32F303K8, STMicroelectronics, armmounted device: LPC1114FDH28/102, NXP semiconductors), a vibration actuator, and a Li-Po battery. We used two linear resonant actuators, an LD14-002 (Nidec Copal) and a HAPTIC TM Reactor (ALPS ALPINE). The LD14-002 is a small and thin actuator. It has a single resonance frequency around 150 Hz, and its maximum power decreases as the driving frequency deviates from 150 Hz. Thus, this actuator is not desirable to present complex broadband vibration patterns, and we used it when presenting a single-frequency pattern by direct control from the microcontroller. The HAPTIC TM Reactor is a bigger and thicker actuator than the LD14-002. It has a uniform magnitude of frequency response over a range of 50 to 400 Hz, and therefore, it can present complex vibration patterns, such as natural texture patterns using an audio module (DFR0534, DFRobot). Although both actuators respond quickly, the total latency increases due to the processing time in the audio module when we present the vibration patterns of textures.

Preprocess for Projection Images and Vibration Patterns
As we described above, we need to prepare a set of binary images for PVLC projection and vibration patterns for texture presentation. Fig. 2 (left) presents a diagram of the preprocess for the projection and haptic patterns. For the projection (Fig. 2a), we have created a converter of binary images that takes two images as inputs. One represents the target projection appearance (visual image in Fig. 2a). The other represents the control signals (for each pixel) to embed (the control signal map in Fig. 2a). The converter outputs three segments-sync, data, and image. First, it generates ten binary images in the sync segment, a combination of black and white flood images according to a predefined synchronization signal. Second, it produces binary images in the data segment following to the specifications below. The number of images depends on the data's bit length (e.g., four binary images for 4-bit data). Each binary image represents the corresponding bit of the data. For example, each pixel value in the first image corresponds to the first bit of the data. Then, the converter determines each pixel value of binary images using the control signal map. Third, it creates 15 binary images (three 5-bit RGB images) in the image segment. These images correct the undesired luminance disturbance caused by the sync and data segments to be perceived as target visual image. The converter computes difference between the integrated luminance of the first two segments and the target luminance to be perceived by a user at each pixel, and determines the pixels values of the 15 binary images of the image segment. As as a result, the converter generates 48 binary images for the static mode and 24 binary images for the video mode. In the static mode, we preloaded them to the high-speed projector with a setting file defining the period of projection of each binary image. In this mode, the projector can display a static image at 50 Hz without the connection to the PC. In contrast, in the video mode, we preloaded only the setting file and streamed the generated binary images through HDMI to the projector from a PC. We utilized OpenGL to display each frame at 50 Hz.
For the vibration patterns (Fig. 2b), when we present single-frequency patterns using the LD14-002, we simply generated these patterns using the microcontroller's output. Therefore, no preparations for the vibration patterns were required. However, when we present texture patterns, which are a mixture of multiple frequency patterns on the Haptic TM Reactors, we need to prepare and store them in the audio module. We employed haptic texture databases that offer texture images and corresponding vibration models, such as the Haptic Texture Toolkit (HaTT) [49] and the LMT haptic texture database [58]. For example, HaTT offers one hundred texture images and vibration models of each texture. Texture images were used for the visual images in Fig. 2a. The vibration models in HaTT are autoregressive moving-average (ARMA) models created from prerecorded vibration patterns for each texture under various hand speeds. New haptic patterns can be synthesized using the models that correspond to each texture image. We first synthesized haptic patterns for various hand speeds. We then preloaded all vibration patterns to the audio module and created a lookup table in the microcontroller to select a pattern using the corresponding ID number.

MAXIMUM LATENCY OF HAPTOMAPPING
We investigated the latency of HaptoMapping. It was important to know the latency to evaluate whether it was perceivable by users. Since the system's latency varies with the system configuration (actuators and data length), we focused on the maximum latency of our system in the latency evaluation.
We evaluated the worst case of T wait , T recv , and T vib (Equation (2)). T wait is maximum when the timing is just after the start of the synchronization segment, and it is equal to 1=f PVLC , which is adjusted to be 20 ms. T recv is maximized when we embed 32 bits for the data segment, and it is 9.8 ms (T sync þ T data ) at that time. T vib is maximized when we use the HAPTIC TM Reactor with the audio module for tactile presentation. We estimated T vib by calculating the time from the moment when the microcontroller sends a control signal to the actuator to the moment when the actuator starts vibration.
We measured T vib using a timer inside the microcontoroller. The timer was started when the microcontroller sends a control signal to the actuator and stopped when the actuator starts vibration. We detected the start of the vibration using an acceleration sensor (KXR94-2050, Kionix), which was attached to the fingertip side of the device. We conducted the measurement 100 times using the microcontroller. As a result, the mean and standard deviation of T vib were 66.5 ms and 0.6 ms, respectively, when we used the HAPTIC TM Reactor and the audio module. Table 2 summarizes T wait , T recv , and T vib . It also shows T latemax which is the maximum latency of the proposed system and the sum of those values.

USER STUDY OF FINGER-WORN AND STYLUS DEVICES ON LATENCY PERCEPTION
We conducted a user study to investigate if users noticed the latency between the visual and haptic sensations while using the system. Since the latency changes depend on the system configuration, such as embedding a long control signal for the rich tactile presentation, we also investigated the tolerance of latencies in user studies. We performed the experiment using the stylus devices in addition to the finger-worn device [18]. In this section, first, we explain the experimental setups and methods, and we then compare the results of the finger-worn and stylus devices. The protocol was approved by the Osaka University Institutional Review Board (Registration number: R3-2), and all subjects gave informed consent.  Fig. 4) consisted of green and red areas aligned in strips.

Setup for User Study
In this experiment, we investigated the visuo-haptic latency of the haptic from the visual sensations by introducing a delay time in the microcontroller. We used the LD14- 002 for this user study because it enabled a fast tactile presentation. The haptic device turned on a vibration after a specified delay time in the green area and turned off the vibration in the red area (Fig. 4). We embedded 26-bit control signals (21-bit for x and y coordinates, 1-bit for on/off switching, and 4-bit for delay values). Here, we utilized the x and y coordinates to measure the user's hand speed. We confirmed the average T late when using LD14-002 and embedding 26-bit control signal is 34.6 ms [18]. Since we could adjust the delay time by making a sleep time in the microcontroller in addition to the T late , the system correctly provided haptic feedback after the delay time specified by the control signal (! 34:6 ms). We set the width of the moving area equal to 182 pixels in the projected image, and the resolution of the projected image was 0.82 mm/pixel in this setup; therefore, the actual width of the moving area was approximately 150 mm.

Participants and Experimental Methods
Ten participants (nine males and one female; nine righthanded and one left-handed; aged from 22 to 25) volunteered for the present user study. Seven of the ten subjects were the same as Sections 6, 7, and [18], and others were newly participated. They were equipped with the stylus device by naturally holding the center of the device with their dominant hand. We prepared 12 variations of the delay time from 50 to 270 ms at intervals of 20 ms. The participants were instructed to move their hands from a red to a green area and answer whether they noticed a delay corresponding to the haptic sensation with respect to the moment when they visually identified the nib entering the green area. To control the moving speed of the user's hand, a reference movie in which an experimenter was moving his hand at a speed of 150 mm/s was displayed on an additional vertical monitor. Each participant evaluated 12 conditions in a trial and performed 120 evaluations total by repeating each trial 10 times. The order of conditions was randomized for every trial and for every participant. The same experimental procedure was applied to investigate the latency tolerance of the finger-worn device with the same number of participants [18]. The blue data are the results when using the finger-worn device [18], and the green data are from the stylus device. The bars represent the standard errors of the means, and the curve is fitted using a sigmoid function defined as follow:

Results and Discussion
As a result of the fitting, we obtained the following values of parameters: k ¼ 0:04 and x 0 ¼ 103 for the finger-worn device, k ¼ 0:05 and x 0 ¼ 159 for the stylus device. We used these values in the calculations. We identified the threshold time of the visual-haptic latency tolerance. It was set as the time at which users perceived the delay with a 50% probability. Here, we denote the threshold for the finger-worn device as T finger and for the stylus device as T stylus . The results indicated T finger % 100 ms and T stylus % 160 ms, as presented in the fitted sigmoid curve. Since T latemax (=96.3 ms) was less than T finger (=100 ms) and T stylus (=160 ms), the proposed haptic displays presented haptic sensations without latencies that were perceivable by most of the users. Therefore, our system maintained temporal consistency between visual and haptic modalities for both the finger-worn and stylus devices. These measured thresholds can also be used as design criteria for temporal consistency when exploring other configurations of our system, such as using other actuators, embedding more rich control signals, etc.
In terms of the difference between the finger-worn and stylus devices, with the setup causing the maximum latency (96.3 ms on Fig. 5), 43% of users noticed the delay of the device with the finger-worn device. In comparison, only 4.6% of users noticed the delay with the stylus device. Because the maximum latency may occur when we use HAPTIC TM Reactor with the audio module to present texture patterns, the stylus device is suitable for presenting complex textures to users without perceivable delays.   [18] and the stylus device, respectively. Each bar represents the standard error of the mean, and curves follow sigmoid functions fitted to each data.
Furthermore, T stylus was about 60 ms greater than T finger . This indicates that the users were more tolerant of temporal errors in our system when using the stylus device. With the stylus device, users interacted with the image indirectly through the stylus; therefore, the sensitivity to latency was considered to be reduced compared to the case in which the user touched the screen directly.
Since seven of the ten subjects have experienced a similar experimental setup using the finger-worn device, we statistically analyzed the experience effect on latency sensitivity. We conducted a t-test on latency perception between the three new subjects and the seven experienced subjects who participated in the user study using the stylus device. As a result, there was not a significant difference between them (p > 0.05). We assume that all subjects were neutral on latency evaluation because we left a couple of weeks between experiments, and all latencies are randomly presented without telling the answer to subjects.
To summarize this experiment, these results revealed important parameters in designing a VHAR system and showed that the stylus device could be tolerant of temporal errors in tactile presentation.

USER STUDY OF ARM-MOUNTED DEVICE ON LATENCY
We conducted a second user study to investigate if users noticed the latency of visual and haptic sensations when presenting a stroking sensation through apparent tactile motion [52]. We also investigated the tolerance of latencies during our user study in cases in which the latencies of our system increased (e.g., embedding long control signals). The protocol was approved by the Osaka University Institutional Review Board (Registration number: R3-2), and all subjects gave informed consent. Fig. 6 shows the experimental setup of this user study. The user sat behind a desk and placed their right arm, wearing the arm-mounted device, on an armrest. Users used a tablet placed on their front left to answer the questionnaire. During the experiment, the multiple actuators caused slight sounds when they were actuated at the same time; therefore, the user wore noise-canceling headphones (WH-1000XM3, Sony) to avoid any confounding regarding the haptic stimulus. We fixed the PVLC projector on the ceiling. The projector displayed a hand image on the device screen and moved it from the wrist to elbow of a participant using the video mode so that it visually stroked the user's arm.

Setup for User Study
In this experiment, we investigated the visuo-haptic latency tolerance by changing the latency of haptic from visual sensation by introducing a delay time in the microcontroller. We embedded delay values as control signals in the hand area. When the projected hand reached a photodiode, the microcontroller activated the actuator (LD14-002) after creating a delay according to the received delay value. The actuator presented a single-frequency pattern at 150 Hz. The device elicits a perception of apparent tactile motion [52] because discrete actuators are activated in order along with the projected hand movement. We adjusted the hand size to be the average of adult men [59]-the width was 90 mm and the length was 190 mm.

Participants and Experimental Methods
Ten participants (10 males; nine right-handed and one lefthanded; aged from 21 to 25) volunteered to participate in the present user study. Seven of the ten subjects were the same as Sections 5, 7, and [18], and others newly participated. We prepared ten variations of the delay time from 100 to 1000 ms at intervals of 100 ms The participants were instructed to place their right arm, wearing the device, on the armrest and using their left hand, answer whether they noticed the delay of the haptic sensation with respect to the movement of the projected hand. The projected hand speed was fixed at 150 mm/s during the experiment. Each participant experienced ten randomized types of delays as one set and answered if they felt a latency after each condition. The evaluation consisted of ten sets; thus, the participant repeated each procedure 100 times in total. Fig. 7 shows the average percentage of positive answers obtained in this experiment. The bars represent the standard errors of the means, and the curve is fitted using the sigmoid function in Equation (3). As a result of the fitting, we obtained the following values of parameters: k ¼ 0:008 and x 0 ¼ 502. We used these values in the calculations. When using the arm-mounted device, we identified the threshold time of the visual-haptic latency tolerance as T arm . It was set as the time at which the users perceived the delay with a 50% probability. The results indicated T arm % 500 ms, as presented in the fitted sigmoid curve.

Results and Discussions
Because the T latemax (=96.3 ms) was less than T arm , the arm-mounted device presented stroking sensations without perceivable latencies that were perceivable by the users. Therefore, our system maintained temporal consistency between the visual and haptic modalities even when we used the arm-mounted device.
The results showed that the threshold time of visual-haptic latency tolerance (T arm ) was much larger than those values obtained in the previous experiments (T finger and T stylus ). We assume the difference comes from the difference in the task design between this user study and others. In the studies using the finger-worn and stylus devices, we clearly indicated the start of vibration using the lines and colors (green and red), while there was no precise spatial indication in this study. Although we can not simply compare these values, they are helpful when designing applications as the latency criteria since the tasks are based on expected use cases (e.g., active surface explorations and passive social touches).
To summarize this experiment, we found that the armmounted device presented stroking sensations without perceivable latencies.

USER STUDY OF ARM-MOUNTED DEVICE ON CONTINUITY AND PLEASANTNESS OF STROKING SENSATION
We conducted another user study using the arm-mounted device to investigate the optimal parameters for maximizing the continuity and pleasantness of the presented stroking sensations. The parameters in this study were the stroking speed and the contact spacing, which affect the continuity and pleasantness of the stroking sensations [14], [15], [16]. We also investigated the effect of synchronized visual information by changing the visibility of the projected hand during the experiment. The protocol was approved by the Osaka University Institutional Review Board (Registration number: R3-2), and all subjects gave informed consent.

Setup for User Study
We used almost the same experimental setup of the previous study (Fig. 6). In this user study, we used the video mode to project the hand moving back and forth on the haptic device's screen (200 mm) so that it created the appearance of stroking the user's arm. Because a vibration-based device should present a lower frequency and amplitude to create pleasant stroking sensations [16], we set the frequency of the actuators to 50 Hz. Although the resonant frequency of the LD14-002 is around 150 Hz, we confirmed that it can still present perceivable vibrations at 50 Hz.

Participants and Experimental Methods
Ten participants (10 males; nine right-handed and one lefthanded; aged from 21 to 25) volunteered for this study. Seven of the ten subjects were the same as Sections 5, 6, and [18], and others were newly participated. We instructed each participant about the procedure before the experiment and confirmed their informed consent to participate. We prepared 24 experimental conditions as combinations of the following variables-four stroking speeds, three contact spacings, and two visibility of the projected hand. Each participant completed the evaluation of continuity and pleasantness for the randomized 24 conditions twice, which means each completed 48 evaluations. Continuity was rated on a 7-point Likert scale (1 = Discrete, 7 = Continuous), and pleasantness was rated on a 15-point Likert scale (-7 = Very Unpleasant, 0 = Neutral, 7 = Very Pleasant). The participants were allowed to remove their headphones and take a 2-minute break after every eight evaluations.
For stroking speeds, we prepared four variations: 10, 80, 150, and 210 mm/s. Previous studies reported that an optimal stroking speed for continuous and pleasant sensations is 135 mm/s when using light pressure on the skin [14], 77 mm/s for continuity, and 55 mm/s for pleasantness when using lateral skin slip [15]. However, there is no investigation of the optimal stroking speed for continuity and pleasantness when using a vibration-based stroking device.
For contact spacing, we prepared three versions of the arm-mounted device with spacing distances for each substrate of 0, 10, and 20 mm (Fig. 8). When the substrates are more densely placed, the continuity may increase because the greater density reduces the interval between each vibration. However, a sparser arrangement of substrates enables a lower cost device design due to the use of few substrates over the same distance (as long as there is not a problem in terms of lessened continuity).
Regarding the visibility of the projected hand, the user used the stroking presentation system normally in the visible condition (Fig. 9). For the non-visible condition, the user wore an eyemask to prevent any visual stimuli.

ANOVA Group Comparisons
We had two dependent variables (ratings for continuous and pleasantness) and three independent variables (visibility of the hand, stroking speeds, contact spacings). Two three-way repeated measures ANOVA were used to examine the main and interaction effects of the factors on the Fig. 7. Percentages of positive answers in the experiment of threshold time of visual-haptic latency perception when using the arm-mounted device. Fig. 8. Contact spacings. The left image is the densest alignment (the distance between the substrates is 0 mm). The right image is the most sparse alignment (the distance between the substrates is 20 mm). dependent variables. Because repeated measures were used, Mauchly's test of sphericity was used to examine if adjustments were needed based on the sphericity of the withinsubject factors. Bonferroni's tests were also performed when a significant result was found in the main or interaction effects. If the assumption of sphericity was violated (as indicated by Mauchly's test), we used Greenhouse-Geisser's epsilon for F and p in the ANOVA, as indicated by F Ã and p Ã . Fig. 10 (left) shows the mean values of the continuity ratings as dots, the standard errors of the means as error bars, and significance as asterisks ðÃ : p < 0:05Þ and a dagger ðy : 0:05 < p < 0:1Þ. The optimal parameters for maximizing continuity were a hand speed of 80 mm/s and a contact spacing of 0 mm. We conducted one-sample t-tests on the continuous ratings compared to a neutral rating (continuous = 4) to investigate if the continuity was maintained even when the substrates were sparsely arranged. The contact spacing of 0 mm had continuity ratings statistically better than neutral whether the projected hand was seen or not (p < 0.05). In addition, only when the projected hand was seen, there was a continuity rating significantly greater than neutral for the contact spacing of 10 mm (p < 0.05) and for contact spacings of 20 mm (p < 0.1, albeit with less confidence).

Results of Continuity
A three-way repeated measures ANOVA of continuity was performed for the independent variables of stroking speed, contact spacing, and visibility of the projected hand. There was a significant main effect on continuity for the visibility of the projected hand [F ð1; 19Þ ¼ 4:94, p ¼ 0:035, h 2 ¼ :019] and for contact spacings [F Ã ð1:58; 30:1Þ ¼ 6:41, p Ã ¼ 0:0078, h 2 ¼ :031], while there was no significant main effect for stroking speed. There was a significant interaction between the visibility of the hand and stroking speed [F Ã ð2:25; 42:71Þ ¼ 4:80, p Ã ¼ 0:011, h 2 ¼ :023]. However, there were no significant interactions between the visibility of the projected hand and contact spacing and between stroking speed and contact spacing.
In the post hoc analysis, we performed Bonferroni's tests to the visibility of the projected hand and stroking speed on continuity. We found that there was a significant difference in continuity due to the visibility of the projected hand at 150 mm/s [F ð1; 19Þ ¼ 7: 15

Results of Pleasantness
Fig. 10 (right) shows the mean values of the pleasantness ratings as dots, the standard errors of the means as error bars, and significance as asterisks ðÃ : p < 0:05Þ and a dagger ðy : 0:05 < p < 0:1Þ. The optimal parameters for maximizing pleasantness were a hand speed of 80 mm/s and a contact spacing of 0 mm. We conducted one-sample t-tests on the continuous ratings compared to a neutral rating (pleasantness = 0) to investigate if pleasantness was maintained even when the substrates were sparsely arranged. Only when the projected hand was visible, the contact spacings of 0 mm and 10 mm each had pleasantness ratings statistically greater than neutral (p < 0.05). In addition, only when the projected hand was visible, the pleasantness rating was significantly greater than neutral for the contact spacing of 20 mm (p < 0.1, albeit with less confidence).
A three-way repeated measures ANOVA of pleasantness was also performed for the independent variables of stroking speed, contact spacing, and visibility of the projected hand. There was a significant main effect on pleasantness for the visibility of the projected hand [F ð1; 19Þ ¼ 11:22, p ¼ 0:003, h 2 ¼ :023]. However, there was no significant main effect for stroking speed or contact spacing. There was  We performed Bonferroni's tests to the visibility of the projected hand and stroking speed on pleasantness. There was a significant difference in pleasantness due to the visibility of the projected hand at 80 mm/s [F ð1; 19Þ ¼ 14:60, p ¼ 0:001, h 2 ¼ :074] and at 150 mm/s [F ð1; 19Þ ¼ 11:47, p ¼ 0:003, h 2 ¼ :074]. However, there was no significant difference in pleasantness due the visibility of the projected hand at 10 mm/s and 210 mm/s.

Discussion
We found the optimal parameters for maximizing both continuity and pleasantness were at a hand speed of 80 mm/s and with contact spacing at 0 mm with the projected hand visible to the user. A hand speed of 80 mm/s was in the previously established speed range for pleasant stroking [60]. The results of the one-sample t-tests indicated that the haptic devices with 10 mm or 20 mm contact spacings and synchronized projected hand still maintained both the continuity and pleasantness of the stroking sensations. This indicates that our visuo-haptic stroking presentation system enabled us to implement a low-cost design while maintaining both the continuity and pleasantness of the stroking sensation.
These results also indicated that the hand projection superimposed on the tactile sensation statistically improved both the continuity and pleasantness of the stroking sensation when the stroking speed was 80 mm/s or 150 mm/s. Therefore, we conclude that our arm-mounted device improved the continuity of the stroking sensation by the superimposing of a hand projection at typical stroking speeds.

APPLICATIONS
We have developed four potential applications that use HaptoMapping in daily scenes. The first three applications utilize the finger-worn and stylus devices with the static projection mode. The last application employs the armmounted device with the video projection mode. Fig. 11 shows a texture design support system for physical surfaces. In this application, we embed IDs into texture images and preloaded the corresponding tactile patterns into the audio module. When a user touches and explores the surface, the haptic device selects a tactile pattern corresponding to the position and presents it to the user's finger. We did not conduct precise geometric registrations to 3D nonplanar surfaces, while we roughly adjusted the projection positions using 2D homography. Although the system can support both the finger-worn and stylus devices, the stylus has the potential to reproduce tactile sensations of textures, especially when using databases recorded by penlike devices [49], [58]. When we utilize data-driven texture modeling and rendering methods, such as HaTT [49], we also preload the synthesized tactile patterns corresponding to the hand speed of each.

Texture Design Support System
We expect that users would benefit from this system when seeking a desirable texture from a group of samples for a new product. Since the textures are just projected on the surface, users can easily try another texture by changing the projected image and vibration patterns. This system would assist users in testing textures on 3D surfaces and finding a suitable texture without scattering physical texture samples (if available) in their workspaces. Fig. 12 shows a multimodal interactive map enhanced with haptic and audio information. Users' can acquire the details of the map from not only visual information but also haptic and audio information. The haptic information relates the shape of the aisles of a location (a mall) by vibration, and the audio information conveys the name of the store through sound. This system may have the potential to help visually impaired individuals. Because it is easy to extend the modalities by adding another sensory presentation device, we added an audio speaker to the user's wrist. When the user scans the surface, the actuator vibrates on the shop area, and  audio from the speaker describes what kind of shop it is. This system is reconfigurable in terms of the visual, audio, and haptic information. Fig. 13 shows a visual and haptic interactive dictionary for educational use. We developed an animal dictionary as an example of the interactive dictionary. Users can not only see graphics of animals but also touch them (with a variety of textures being simulated). We have built two types of applications: a planar book-style dictionary and a 3D-printed version. The former presents the animals' appearances and additional tactile information separately and includes a variety of animal information. The latter presents both the animal's appearance and tactile information on a 3D-printed animal model. It may be more immersive than the former application, but it requires 3D-printed objects for each animal.

Interactive Dictionary
Because each device independently receives the embedded control signals, multiple users can simultaneously experience this visuo-haptic AR application. In this system, users can not only see an animal's appearance but also can touch them, and therefore, it offers them additional sense cues in learning about animals and their differences.

Stroking Creation System for Remote Communications With Visual Hand Effects
Fig. 14 shows a stroking creation system using a hand projection controlled by a remote user and the arm-mounted device worn by a local user observing the projected hand. This application aims to present tactile sensations of "being touched by the projection hand" to the local user observing the hand. When the hand appears to stroke the arm-mounted device on the local user's forearm, the controllers on the device activate the associated actuators on them accordingly. The arm-mounted device then creates stroking sensations corresponding to the hand motion. This system could be used in daily mediated communications. More specifically, this system may allow people to touch again (even though it is indirectly), who have been unable to physically meet each other due to external effects such as a health quarantine.

LIMITATIONS AND FUTURE WORK
It is important that we describe the limitations of Hapto-Mapping that have not been discussed so far and explain the room for future work against the limitations. We noticed that there are three limitations. First, objects suitable for projection are required to use our system as with general projection-based AR systems. This sometimes limits applications when the desired object is not available or is not suitable for projection. One solution is to manufacture a suitable object using a 3D printer. In addition, we assume it may be possible to employ the PVLC data embedding method to aerial images using a dihedral corner reflector array (DCRA). This may allow the presentation of haptic feedback on even aerial images without physical objects. Second, this system requires wearing haptic displays to feel the presented tactile sensations. Although the developed haptic displays are lightweight, it can be a burden for users to wear them for a long time. One possible solution is embedding haptic displays on objects instead of affixing them to the users. The apparent challenges of this method are how to embed actuators on objects and change vibration patterns depending on the touching position. Also, employing non-contact haptic displays, such as with ultrasound [61] or infrared light [42], enables the presentation of tactile sensation without affixing any devices to users. The apparent challenges of this method are how to increase the perceived intensity of the tactile stimuli because the intensity when using non-contact haptic displays is generally lower than that when using actuators on the skin. Furthermore, it is also important for both methods to track the hand positions precisely and reduce latencies (T late ) to an acceptable level (as described in detail in this paper).
Third, with a data-driven rendering method, such as HaTT [49], our system requires the offline generation of haptic patterns, while previous studies were capable of online synthesis (streaming) [47], [48]. Therefore, adding online synthesis functions to our device, such as using a digital signal processor, will be one of the future works (although there is a trade-off between the device size and the processing complexity). In addition, if the desirable haptic patterns are not available in those databases, designers currently need to manually prepare haptic patterns for their applications. Recently, generative methods for creating haptic patterns from a texture image using neural networks have been emerging [62], and they have the potential to reduce the burden in haptic pattern preparation.

CONCLUSION
This paper proposed HaptoMapping, a VHAR system that can render visual and haptic content independently while maintaining temporal and spatial consistency. We introduced a prototype system of HaptoMapping and evaluated the maximum latency of the prototype system. We investigated the visual-haptic latency perception when using finger-worn, stylus, and arm-mounted devices. We discovered that the developed haptic devices can present haptic sensation without user-perceivable latencies, and the visual-haptic latency tolerance was established at 100 ms, 159 ms, and 500 ms for the finger-worn, stylus, and arm-mounted devices, respectively. We also investigated perceptions of continuity and pleasantness when using the arm-mounted device. The results indicated that the developed visuo-haptic stroking system maintained both continuity and pleasantness even when the contact spacing was 20 mm and had both improved continuity and pleasantness at 80 mm/s, 150 mm/s when compared to the hapticonly stroking system. The values obtained in these results will be useful for those designing future applications involving similar systems. Lastly, we introduced four application scenarios that show the broad applicability of HaptoMapping using the developed haptic devices.