Using the Visuo-Haptic Illusion to Perceive and Manipulate Different Virtual Objects in Augmented Reality

The prop-based 3D virtual object manipulation method is widely used for interaction in Augmented Reality (AR) due to its convenience and flexibility. However, when the represented virtual object is different in size and shape from the physical prop, the look and feel of the object are not well aligned. To address this problem, we present a dynamic finger remapping approach to creating a visuohaptic illusion that dynamically adjusts the presented virtual hand posture to fit different sizes and shapes of virtual objects in AR. The finger movement toward a physical prop is synchronously remapped to the movement of the virtual fingers towards the corresponding virtual object. We developed a system that enables users to perceive consistent visual and tactile feedback while grasping and releasing various virtual objects represented by a physical prop. We conducted a user study to explore the effect of this visuo-haptic illusion on the perceived size of virtual objects. The sizes of the rendered virtual object and the physical prop were set as independent variables. We found that the perceived size of a virtual object varied with its rendered size in an almost linear fashion, while the physical prop size did not significantly affect the perception. We also conducted a second study to compare our system with a current prop-based method on virtual object manipulation. The results indicated that when using a physical prop to represent different virtual objects in AR, the remapped hands could effectively improve the realism and naturalness of the experience.


I. INTRODUCTION
Augmented Reality (AR) technology creates a seamless connection between physical and digital worlds, and supports natural 3D interaction with virtual objects [1]. In recent years, 3D virtual object manipulation has become an important topic in the Human-Computer Interaction (HCI) community. A popular method for manipulating a virtual model in AR is using a physical prop with identical size and shape to serve as a real proxy [2] [3]. In this way, users can feel the expected tactile feedback from the prop when interacting with virtual models.
Compared with controller-based or gesture-based manipulation methods, prop-based interaction can be more intuitive and effective in practical AR tasks [4]. However, since it is impossible to provide a corresponding physical prop for each virtual object, the presented virtual model and the prop are sometimes different shapes, resulting in a visual inconsistency between the human hand and the virtual object. For instance, when the tactical perception is obtained from the physical prop, the user can observe an AR scene where their fingers are still away from the virtual model surface or have already penetrated inside the model. The misaligned vision and touch feedback significantly limits the use of this method [5]. Although the appearance of the physical prop and user's real hands can be easily hidden in a pure virtual environment (VE) [6] [7], the situation is quite different in AR, especially using optical see-through head-mounted displays (HMDs), where the user's hands are visible. For example, if the virtual object represented is smaller, the nonoverlapping part of the prop will greatly disrupt the visual coherence [8].
As shown in Figure 1, this research presents a general approach for manipulating different virtual objects with the same prop, based on using the visuo-haptic illusion (VHI) effect. The VHI is a psychological perception using the visual dominance effect [9] [10], which is that vision dominates cognition when it conflicts with touch. The typical application of VHI is to redirect or remap the actual movement of the hands or body to create a consistent multimodal virtual presentation. Similar solutions have been explored on food size changing [11] and handheld object modification [12]. However, these systems deformed the hand to fit the size and shape of a virtual model shown in a 2D image using a video see-through display without dynamic remapping. In our case, we want to perform real-time hand deformation for a 3D virtual model throughout grasping and releasing procedures, shown in an optical see-through AR display.
This work creates a realistic object-grabbing experience with a dynamic finger remapping algorithm by predicting contact points on various virtual objects. While the actual fingers move toward the prop, the corresponding virtual fingers synchronously move toward the represented virtual model, being adaptively and dynamically fitted with the size and shape of the targeted virtual model. In this case, when users feel the real prop with their real fingers, they will observe that the virtual fingers also simultaneously contact the virtual model surface. A simple AR-based occlusion method [13] is applied to cover the real hand appearance and the prop in the work area to help users focus on the virtual content.
Although there have been many research studies on virtual object manipulation and VHI, none has combined both topics for 3D dynamic grasping and releasing of different virtual objects in AR applications. Compared with previous work, our research makes the following contributions: • We introduced a finger remapping algorithm to create a VHI that dynamically adapts virtual fingers to fit different sizes and shapes of 3D virtual objects throughout the grabbing and releasing procedures in a see-through AR scenario. • We designed and implemented a prototype system that enables users to obtain consistent visual and tactile perception when grasping and releasing different virtual objects in AR. • We conducted a user study to investigate the effect of providing such VHI on the perceived size of virtual objects by rendering different sizes of objects in the AR scenario and providing different prop sizes. • We conducted a formal user study to explore how the dynamic remapping procedure affects user experience and perception when manipulating various virtual models in an AR manipulation task compared with an approach without dynamic remapping. In the rest of this paper, we first review related works to compare our approach with others, and then present the finger remapping algorithm and prototype system implementation. We describe the two user studies and findings, and finally discuss the design implications for future research.

II. RELATED WORK
Our research builds on prior research on virtual object manipulation in AR and the visuo-haptic illusion. In this section, we review related literature and highlight the differences with our research.

A. VIRTUAL OBJECT MANIPULATION IN AR
Virtual object manipulation in AR can benefit from the natural grabbing and releasing of 3D virtual targets in physical space [14]. In current AR research, an important goal is improving operation efficiency and obtaining a more realistic and natural experience when manipulating virtual objects [4].
Vision-based manipulation is the most commonly used approach due to the intuitiveness of visual perception [15]. Kato et al. [16] manipulated the virtual objects naturally and intuitively with a marker-based vision tracking technique in a tabletop AR interface. Hand tracking sensors are also used to detect hands and fingers in real-time to grasp 3D virtual objects in midair [17] [18]. However, users could not actually touch the virtual objects.
Physical props are widely used with vision-based methods to create handheld interfaces for interacting with virtual objects. Compared with mid-air gestures, prop-based manipulation provides tangible feedback for the fingers and makes users more confident in manipulation [19]. For example, Ha and Woo [20] attached a box with multiple markers to a physical handler to provide haptic feedback while holding a virtual model. Tanikawa et al. [21] integrated virtual objects into a handheld tablet and used the tablet to move and rotate virtual objects while viewing the AR environment on the screen. Bai et al. [22] calibrated a Vive controller with the coordinate system of a HoloLens AR display to support controller-based virtual object manipulation in AR. Cosco et al. [13] used a kinematic haptic device to collocate touch feedback and utilized image processing to hide the device's appearance from the physical scene.
Although these handheld interfaces help enhance operation efficiency, the mismatched appearance between the physical device and the represented virtual models greatly limits the immersive tangible experience and user engagement [23]. Directly manipulating a tangible prop with the same shape and size as the virtual one is preferred to provide the best haptic affordance [24] [25]. Kwon et al. [8] found that the virtual object manipulation efficiency was enhanced when the props and virtual objects were alike in size and shape. To avoid preparing multiple different props for each virtual object, Hettiarachchi and Wigdor [26] placed different virtual objects on similar physical targets to serve as the corresponding props.
However, these prop-based methods are not practical in actual virtual object manipulation scenarios. The virtual objects and props should be well matched to ensure a realistic haptic experience [23], but it can be impossible to provide matched props for all of the virtual objects. Therefore, we use a finger remapping algorithm to create a dynamic VHI representing different virtual objects with one prop with consistent visual and tactile perception. In this work, the virtual fingers are dynamically remapped to fit the size and shape of different virtual objects. Instead of preparing multiple props, one physical prop is used to represent various virtual objects.

B. VISUO-HAPTIC ILLUSION
The visuo-haptic illusion (VHI) is a perceptual illusion based on psychological cognition being different from actual organic perception [9]. When the tactile and visual perception of a specific target conflict (for example, a physical cube is seen as a sphere), then vision dominates the reasoning process based on cognition habits. Human perception can thus be reshaped to obtain an expected understanding with slight visual presentation changes. Advancements in visualization technology have meant that the VHI is an ideal easy-to-use approach to simulate haptic experiences in HCI.
Pseudo-haptic feedback is a primary method for creating VHI, usually used with prop-based passive haptic feedback [27]. The visual dominance effect helps users obtain a different haptic perception of the prop from its physical properties and form a coherent new understanding of the environment [28]. Prior research has demonstrated the effect of VHI on reshaping human haptic perceptions of the shape, size, texture, stiffness, and mass of the physical prop [10] [13] [29] [30].
Typical applications of these illusions include haptic redirection, haptic retargeting, and Control/Display (C/D) ratio scaling. Haptic redirection refers to dynamically changing the virtual representation when users keep touching the physical prop. For example, Matsumoto et al. [31] redirected users to walk straight forward when they walked around a curve and touched a circular wall. They also changed the perceived shape of virtual objects with rotational manipulation and body warping when walking around the same table [32]. Ban et al. [11] [12] modified the represented sizes of static 2D virtual objects augmented on a physical cylinder and visually displaced the human hand to create an impression of touching different objects from the physical cylinder. Kohli [33] warped the space of different virtual objects onto one prop with the visual dominance effect. In our research, human fingers are dynamically redirected to fit the size and shape of virtual objects.
Haptic retargeting spatially and dynamically maps different virtual objects onto a physical prop. Azmandian et al. [34] repurposed the passive haptic sensation from one cube for different virtual objects using background shifting and human body warping. Cheng et al. [35] developed a sparse haptic proxy to represent the dense haptic sensation of virtual information with body warping. Abtahi and Follmer [36] applied haptic retargeting in a shape display to enhance the perceived resolution. Our work enables users to see and grab different virtual objects each time they approach the physical proxy. They can see the contact on virtual objects and perceive tactile feedback through their fingers.
The C/D ratio in HCI refers to the movement speed ratio between the real human hand and the virtual replica. Lécuyer et al. [37] scaled the C/D ratio of a cursor when an input device passed over simulated texture to help users feel bumps and holes. Dominjon et al. [38] changed the perceived mass of virtual objects when manipulating a physical device with different C/D ratios. Jang and Lee [39] scaled the displacement of the virtual fingertip when the virtual object was contacted to change the perceived virtual stiffness. In our work, the displacements of virtual finger joints are different from the real fingers, and the movement speed of each finger joint is scaled to enable synchronization.
Overall, compared to the related work, our research is novel in a number of ways. The illusion is designed with real-time dynamic hand deformation covering the grasping and releasing procedures. The proposed approach makes a cylindrical prop compatible with the representation of various virtual objects, with consistent visual and haptic feedback. Therefore, such VHI is beneficial to enhance the realistic user experience of the widely used prop-based object manipulation method.

A. ARCHITECTURE AND WORKFLOW
The VHI in this work dynamically remaps the represented fingers when the dominant hand is grabbing and releasing different virtual objects. The architecture of the proposed approach includes three layers, all of which are aligned and synthesized in 3D space and visualized with an AR HMD: VOLUME 4, 2016 1) Physical scenario layer -A large tracking marker in the workspace and a real prop with a small tracking marker on the top; 2) Virtual background layer -A virtual background image augmented onto the real background marker to overlay the physical scenario; 3) Virtual content layer -The represented virtual object and remapped virtual hand.
The overall workflow is to dynamically remap the presented hand posture to adapt to the different shapes and sizes of virtual objects, as shown in Figure 1. The hand and markers are tracked in real-time and aligned well in the AR system. During initialization, a scaling ratio is obtained after predicting the possible contact points of the fingers both on the prop and the virtual model. The virtual finger posture is then remapped with the scaling ratio in each frame and visualized in the AR scene. This method achieves the illusion of coherence between the visual feedback of grabbing virtual objects and the tactile feedback of touching the physical prop. The same approach can also be applied to the releasing process. More detailed explanations are described in the following sections.
However, there are many hand gestures available for object manipulation. For research purposes, this work only explores one-handed pinch gestures that apply opposing forces between the thumb and other fingers. The remapping algorithm is also designed based on such an assumption.

B. HAND TRACKING AND CALIBRATION
To enable instant and stable hand tracking, a Leap Motion Controller (LMC) was attached horizontally on top of a HoloLens, and was fine-tuned to align its field of view (FoV) with the HoloLens (see Figure 2a). The LMC SDK 1 was used to capture and extract the right-hand tracking result in realtime. The tracking result was streamed to a laptop server via a wired USB connection and transferred to HoloLens via a wireless TCP/IP connection. This framework enabled real-time hand synchronization and visualization in the AR scenario.
Based on the single-point calibration method [40] and LMC-based hand interaction [41], we utilized a markerbased tracking technique and the Iterative Closest Point (ICP) algorithm [42] to align the LMC coordinate system with the HoloLens. As shown in Figure 2b, we used a common frame with a square fiducial marker and defined its central point as a reference point. The HoloLens detected the reference point using the Vuforia tracking library 2 , while the LMC detected the fingertip physically collocated with the reference point. Therefore, a transformation matrix was obtained to register the LMC and the HoloLens in a shared coordinate system.

C. HAND MODEL SIMPLIFICATION
As shown in Figure 3 and Figure 4, the hand was represented with an anatomical structure. The bone structure of the thumb consists of three phalanges, namely the interpha- In single-handed manipulation, the thumb and index finger are commonly used to grab objects [43]. Grabbing and releasing functions are mainly accomplished by the opposite force applied to the grasped object between the thumb and the other four fingers [44]. The thumb and the other fingers tend to cross in the center of the virtual target and the fingertips form parallel motion trajectory planes. Therefore, we focused on the 2D horizontal cross-section of the vertically positioned cylindrical prop from the top view.

D. CONTACT POINT PREDICTION
The theoretical contact point of the fingertip on both the physical prop and the virtual target object must be predicted to define the remapping interval. In this case, we first predicted the thumb and index finger contact positions and then applied the same algorithm to other fingers.

1) Definition of grasping center
From the top view, the geometric center of an object in the projected section plane varies based on its shape and size. A physical cylinder was used as the prop since its rotational symmetry has strong compatibility with different shapes and allows users to grasp the object arbitrarily. The marker attached to the cylinder provided real-time tracking relative to the HoloLens coordinate system, which also defined a local coordinate system to represent virtual objects. Users may grasp the virtual object from any direction so that they may miss the physical prop or slide across its surface. Therefore, we defined a grasping center (P cen ) on each virtual object on the section plane to indicate the finger movement direction. The central axis of the prop was collocated with the P cen to locate the virtual object. For virtual objects with relatively simple convex crosssection shapes (e.g., triangle, rectangle, and circle), we uniformly define the center of the inscribed circle as P cen . Users can grasp the object from several different directions. However, the definition might be unreasonable for some shapes (e.g., very long rectangles). For objects with concave or irregular shapes in the section plane, P cen can be defined manually (see Figure 5). In this case, users must approach the object from a specific direction to ensure its touch feasibility. This definition not only applies to stretched 3D objects but may also be possible for other types. For instance, the P cen can be easily defined as the inscribed circle center on a section plane for cones and pyramids. When a P cen and a proper grasping direction can be found on a section plane, the object can be graspable with this approach. However, for some objects with more complex section contours, it can be hard to define the P cen to find a proper grasping direction.
With the P cen highlighted on top of a virtual model, it can be reasonably speculated that the contact point of each finger with the virtual object or the prop is on the line between P cen and the current fingertip position (P f t ). This assumption applies to both virtual fingers and real fingers and provides the foundation for finger pose estimation.

2) Contact point prediction
The contact points are assumed to occur where the user's fingers (as a skeletal representation) are within the average distal phalanx dimension of the object contour (both physical and virtual). In this case, the finger thickness is not considered in the hand tracking. As shown in Figure 3 and Figure 4, the revised green dashed curve is equidistant from the prop or virtual object contour with an offset of the average distal phalanx dimension [45]. Therefore, the contact point is predicted to be the intersection of the revised contour and the segment from P cen to P f t , which is obtained from: In this equation P con represents the predicted contact point of the fingertip on the physical prop (P con,p ) or the virtual object (P con,v ), ∆R is the offset distance of the revised contour, and P contour is the point where the line from P cen to P f t intersects the real contour. As the fingers move toward the prop, the approximation is iteratively more accurate.

E. DYNAMIC FINGER REMAPPING 1) Fingertip remapping
When physical fingers move toward the prop surface, the rendered virtual fingers should move synchronously toward the surface of the represented virtual object. The distance between the fingertips of the index finger and the thumb is used to judge the movement starting time (t 0 ) (see Figure 6a). The remapping process will be triggered when the distance is less than a threshold. We defined this value based on the distance between the thumb fingertip and the other fingertips when an adult's hand is fully opened with a pinching gesture. In our setup, the threshold was set to 95% of the minimum value of the distances sampled to make the system compatible with most users. Therefore, the initial predicted contact points on the prop P con,p,0 and the virtual model P con,v,0 can be obtained following Equation 1. The fingertip movement from the starting point P f t,0 to P con,p,0 was synchronously remapped to the movement from P f t,0 to P con,v,0 . The scaling ratio S ratio was linearly defined as the ratio of P f t,0 P con,v,0 to P f t,0 P con,p,0 . Therefore, as shown in Figure 6b, the real-time  S ratio = P f t,0 P con,v,0 P f t,0 P con,p,0 = P f t,v P con,v P f t,p P con,p When the real fingertip touches the physical prop, at the same time the virtual fingertip visually contacts the surface of the corresponding virtual object, as shown in Figure 6c.

2) Virtual hand finger pose estimation
After getting the virtual fingertip's position, the positions of the joints (namely IP and MCP on the thumb, DIP and PIP on other fingers) must be estimated as well to render the virtual hand configurations. As shown in Figure 6b and Figure 6c, we empirically assume each finger rotates around a stationary base point (CMC on the thumb and MCP on other fingers) in grabbing an object. The fingertip, joints, and the base point of each finger thus form a quadrilateral area. The quadrangle formed by the corresponding virtual finger, represented by the green dashed quadrangle in the figures, is considered geometrically similar to the real finger. Taking the index finger as an example, the remapped virtual finger can be resolved with the theory of similarity following the equation: In this equation P M CP , P DIP and P P IP represent the joint positions of the real finger, P DIP,v and P P IP,v are the corresponding joint positions of the virtual finger. The process also applies to other fingers in the releasing procedure. Each finger is remapped independently in its trajectory plane and forms a 3D presentation in the AR scenario.
The remapping algorithm is specifically designed for the situation in which users grasp the object toward the prescribed central point. Although it is still compatible even when the fingers deviate from it, there will be distortion in visualization, which increases with the offset from the prescribed center.

F. PROTOTYPE IMPLEMENTATION
The hardware part of the prototype system consisted of a HoloLens, an LMC, a large image tracking marker, and a cylindrical prop with a small image marker, as shown in Figure 2. The large background marker with a size of 0.6m by 0.6m was pasted on a table as the workspace; a smaller marker was attached to the cylinder and captured by the HoloLens camera to track the prop. The prop size was determined by a follow-on user study.
The software on the HoloLens was developed with the Unity 3D Engine 3 . After the calibration procedure, the hand pose was synchronized to HoloLens in real-time and aligned in its local coordinate system. When grabbing a virtual object, the CPU on the HoloLens was responsible for the remapping process. The system provided negligible low-latency tracking and visualization with around 40-millisecond delay in total. A virtual image identical to the background tacking marker was augmented in the AR scenario, as shown in Figure 7. It was collocated with the physical marker to cover the appearance of the prop and the real hand. As shown in Figure 8, five kinds of example objects were presented with the same prop, but the virtual fingers were remapped to fit the specific shape and size of each virtual object. Since we used an optical see-through AR display, users wearing the glass can still observe a blurred physical background. Thus, the lighting condition was controlled for users to more clearly see the augmented virtual content over the background.

IV. USER STUDY 1: PERCEPTION OF DIFFERENT SIZES
With this system, we first conducted a user study to investigate the effect of VHI on the perceived size of different virtual objects. Specifically, we would like to augment a collocated virtual cylinder with different diameters to compare size perception differences.

A. STUDY ENVIRONMENT AND SETUP
To measure the largest distance between the thumb and the other fingers at the beginning of the pinching operation, we conducted a user study with 20 volunteers (10 males and 10 females) from the local university. The distance ranged from 89mm to 115mm (Mean=100.35, SD=6.7), which indicated the largest diameter of the prop and the virtual cylinder must be smaller than 89mm. Therefore, the threshold for judging the starting point was configured to 85mm to ensure the system was usable for all participants.
We chose the prop diameter ranging from 30mm to 70mm with a step of 10mm, and the diameter of the virtual cylinder ranging from 20mm to 80mm with a step of 10mm. For convenience, we abbreviate the diameters of the prop, the rendered virtual cylinder, and the mentally perceived cylinder as D physical , D virtual , and D perceived . In this study, the cylinder height was set to 100mm, being larger than the hand width. An equal-height virtual cylinder was collocated with the physical cylinder at the center of the tracking marker. Thus P cen was not illustrated in the visualization. Through the communication with the HMD, the D virtual could be manually changed in the laptop PC.

1) Participants
We recruited 14 volunteers (8 males and 6 females) from the university campus to conduct the study. They were aged between 21 and 24 years old (M =22.33, SD =1.01) and were all right hand dominant. Twelve of them (86%) had never experienced AR or VR before.

2) Experiment design
The experiment task was to report the perceived size of the represented virtual cylinder. For each trial, participants were asked to grab and release the virtual cylinder with a specific D physical and D virtual . They were encouraged to handle the virtual cylinder by applying force with distal phalanges. During the procedure, we set no time limit for participants. They could pick the cylinder up and turn their heads to observe from different perspectives for as long as wanted. After that, they had to take off the headset and use a digital vernier caliper to report D perceived as precisely as possible. The scale of the caliper was covered for them to reproduce the perceived size from memory. We conducted a mixed study on the two parameters, and each participant experienced 5 × 7 trials.

3) Experiment procedure
Before the study started, participants were introduced to the anonymous study design and the overall setup. The experiment began with participants signing a consent form and

Dphysical=30mm
The diameter of the presented virtual object (Dvirtual) /mm  answering demographic questions. They were trained to get familiar with the task and the novel AR experience.
Each participant experienced 35 trials after the training session in a random order. The result of each test was recorded after the participant gave the perceived diameter using the vernier caliper. Meanwhile, D physical and D virtual were manually changed by an operator when the user's attention was distracted by the caliper. Each participant spent about 30 minutes completing the experiment. We conducted 490 trials for the user study in total.

C. RESULTS AND DISCUSSION
As shown in Figure 9, D perceived was almost the same as D virtual with a maximum of ±20mm offset. With the Pearson correlation test, D perceived was significantly correlated with D virtual at a 0.01 level in all conditions. Furthermore, the result indicated that D perceived was significantly linearly correlated with the D virtual because r 2 was close to 1 in all conditions (r 2 ranging from .865 to .924). With Linear Regression, the fitted line slope was also close to 1 in all conditions (ranging from .89 to .982), which illuminated that the D perceived changed with D virtual with a similar scale. The average absolute residual error of D perceived regarding D virtual ranged from 4.89mm to 6.05mm. These results suggested that dynamically remapping the fingers could effectively help participants perceive the different sizes of virtual objects, regardless of the diameter of the physical prop or the virtual model. This finding was different from a prior research result [12] where the difference between D perceived and D virtual was greater when the difference between D physical and D virtual increased. In our work, the hand displacement and the virtual object were presented in 3D space with a stereoscopic see-through AR glass rather than a static 2D image on a computer display. The participant could move their head to observe the virtual object from different perspectives or pick up the model in the space. The virtual fingers were updated synchronously with the movement of real fingers and contacted the virtual model when tactile perception was obtained from the physical prop. The multimodal consistency was coherent with psychological expectations of grabbing an object [46], enabling users to make more accurate perception and judgment of the model size. Meanwhile, the visual and touch senses provided a realistic interaction experience [47]. The user's spatial perception ability was enhanced with these dynamic interactions.
Interestingly, D physical showed NO significant effect on D perceived . That might be because the selected D physical values were within the interval that the user could not recognize a significant tangible difference. The remapping algorithm and dynamic interactions greatly eliminated the perception differences. The visual dominance effect also enabled participants to believe the rendered object size.
According to the measured distances between the thumb and other fingers, we could have chosen 80mm as the upper limit of D physical . However, the prop with a diameter of 80mm was hard to grasp in a pilot study because the hand must be fully opened in the beginning and only a small space was left for remapping. In addition, the user experience could benefit from a longer finger movement distance before touching the prop. The lower limit of 30mm was referenced from prior research [12] that demonstrated 30mm as the minimal size of a planar prop to produce compelling hand modification.
By comparing the results, 50mm was the best prop diameter with the highest correlation coefficient, a fitted line slope closest to 1, and minimal residual error. In other words, the prop with this diameter could minimize the perceptual differences between D perceived and D virtual within the interval from 20mm to 80mm.

V. USER STUDY 2: MANIPULATING DIFFERENT VIRTUAL OBJECTS
We designed a second user study to explore how the dynamic remapping procedure affects the user experience for manipulating various virtual models by comparing with the approach without dynamic remapping. The study focused on collecting subjective feedback after the users fully experienced the operation procedure rather than evaluating the system performance.

A. USER STUDY
We used the same setup as in subsection IV-A, except for the prop diameter and virtual model shape. The size of the physical cylinder was set to 50mm, based on the results from the first study. Different from the first user study that only represented cylindrical shapes, various different virtual models were represented with the same prop. Therefore, the P cen of each virtual model was defined and represented in the test scenario.
To compare with the non-remapping approach, we implemented two different interfaces.
(1) Non-Remapping (NR) condition (baseline) The virtual model was overlapped on the prop, and users could see their hands grasping the physical prop.
(2) Dynamic Remapping (DR) condition Users could not see their hands and the prop, but they were able to see the remapped virtual hands grasping the virtual model.

2) Experiment task
We chose the assembly of tangram puzzles as the manipulation task, which included five kinds of flat polygons with different shapes and sizes. The geometric center of each polygon was defined as the corresponding P cen . These polygons were stretched into prisms with a height of 100mm. The hypotenuse of the largest triangle was 160mm. The diameters of the inscribed circles were 33mm (smallest triangle), 47mm (middle triangle), 66mm (largest triangle), 57mm (square), and 40mm (parallelogram).
As shown in Figure 10, participants were asked to assemble the tangram within the marker region. Before each operation, they would already learn where to place each piece given a particular shape. For each trial, one of the largest triangles was already placed in the assembly area as a reference to help participants decide where to place other pieces while constraining the tangram within the marker region. Initially, the prop was located at the bottom left corner of the background marker. Participants needed to grab the prop and then move the virtual model to its target position. When the distance between adjacent edges of the current model and the assembled model was smaller than 2mm, the virtual model would be detached from the prop and fixed there. Participants needed to move the prop back to its original position and release it on the desk. The presented virtual model was substituted for the next model until the assembly task was finished. The order of each piece was randomized following the assembly order.
In the DR condition, a virtual central axis was rendered on each virtual object to indicate the grasping center. In addition, a collocated virtual replica of the prop was rendered to avoid visual inconsistency when the presented virtual model was detached.
The target assembly shape was randomly selected from a tangram library 4 in each trial to avoid the learning effect. We mainly focused more on the user experience and ignored quantitative results such as the task completion time and assembly accuracy. The task had no time limit for participants but required them to locate the models as accurately as possible.

3) Experiment procedure
The same group of participants from the first study was recruited. They were first given a brief introduction to the new study and tasks, and were then trained for at least 5 minutes to get used to the task and the two different conditions. A random tangram shape was selected for them to try the system, with an explanation of the system features and the condition differences.
The study was conducted with a within-subject design where each participant experienced both conditions alternately following a Latin Square sequence to counteract the condition bias. After the trial for each condition, participants were asked to answer a Likert scale questionnaire with questions regarding realism, naturalism, and system usability, as shown in Table 1. These questions were customized from original presence and impressiveness questionnaires [48] [49]. The Likert scale scores ranged from 1 (I entirely disagree) to 7 (I entirely agree). After the trials, a short interview was conducted to collect subjective feedback about the interface. The study took about 30 minutes for each participant.

B. RESULTS AND DISCUSSION
This section reports the study results with a statistical analysis and discussion regarding the differences between our developed method (DR condition) and the popular propbased object manipulation method (NR condition).

1) Experiment measurement
The DR condition was rated significantly higher than the NR condition on average on all questions. We conducted the VOLUME 4, 2016 The visual information and touch feedback were consistent when grabbing/releasing a virtual model. Q7 I felt that the virtual object I hold was consistent with the real object. Q8 The interactive experience I obtained was compatible with the usual cognition/perception approach.
Usability Q9 I enjoyed the experience of using the system in the task. Q10 I was able to focus on the work actively. Q11 I was confident that I could finish the work smoothly. Q12 I obviously felt the different sizes of the virtual objects. Q13 I obviously felt the different shapes of the virtual objects. Q14 I could not feel the passing of time when I was doing the task. Q15 I felt that I was manipulating real objects in the task.
Wilcoxon Signed-Rank test (α=.05) to compare the statistical differences, as shown in Figure 11.

a: Realism
Regarding the realism with the system, the test showed significant differences in Likert scale questions, Q1, Q2, and Q3. The result indicated that the remapped hand enabled users to maintain a similar perception of different objects as in the real world. In the NR condition, participants found that the fingers either penetrated or had not touched the virtual model surface when tactile feedback was perceived from the hand. Such conflicting perceptions may not influence the operation efficiency due to the visual dominance effect [28]. However, the drawback of a realistic operation experience was magnified when compared with the DR condition. No significant difference was found in Q4, which indicated that the visualization of fingers did not affect the judgment and decision-making about the scenario.

b: Naturalism
The DR condition was rated significantly higher than the NR condition on Q5, Q6, Q7, and Q8, which revealed that the naturalism experience was significantly enhanced. The fingers were dynamically adjusted to fit different virtual models with the remapping algorithm throughout the grasping and releasing procedures. The remapped virtual fingers visually touched the model surface at the same time when the corresponding real fingers touched the prop. The consistent visual and tactile feedback in the DR condition catered to the usual impression and expectations for grabbing objects. Participants could manipulate the virtual objects with the same knowledge, habits, and experience as in reality. Thus, the DR cue helped users naturally operate different virtual objects and obtain consistent visual and tactile experiences.

c: Usability
The statistical results showed significant differences in Q10, Q12, Q13, and Q15 regarding the system usability. The results indicated that the DR cue enhanced the perception of different virtual objects represented with a single prop. It made the virtual models more realistic, and participants were more focused on the task. In the NR condition, participants could not see the intersection between the contour of virtual objects and the fingers. The inconsistent spatial relationship made it difficult for participants to believe that they were touching the virtual targets. In contrast, with the DR cue, the interactive illusion enabled users to see the contact on the virtual model surface and feel tactile perception on the prop surface simultaneously. This design helped users perceive different virtual objects with a deeper understanding of the sizes and shapes. The multimodal connection between reality and virtuality had also been enhanced. NO significant difference was found in Q9, Q11, and Q14. The reason might be that the skeleton presentation of the fingers affected the user's enjoyment and confidence in the DR condition. However, there was no obvious evidence that the rendering method affected the size and shape perception.   From the post-experiment interview, 10 out of the 14 participants (71%) reported that the NR condition was intuitive and straightforward to operate the virtual object and finish the task. However, 8 of them (57%) complained about the rendering of the hand and the realistic experience compared with the DR condition. On the other hand, 9 participants (64%) confirmed that the system with VHI was more realistic in grasping virtual objects while supporting task implementation.
More details were found to emphasize the differences from individual feedback. For the NR condition, some participants said that "just focused on its movement" (user 11), "felt I was operating a cylinder" (user 11), "got less information" (user 4), and "could not see my hands" (user 5, user 11, user 14). For the DR condition, users mentioned that "enhanced my sense of presence" (user 2), "the occlusions of the cylinder and hands made me feel I was operating different objects" (user 5, user 13), "the occlusion made me feel more realistic" (user 12), and "easier to grab the virtual object" (user 10). Therefore, the system with a DR cue greatly enhanced the realistic operational experience for different virtual objects.

3) General discussion
Apart from the results obtained, there are some general findings about the visualization effect, system design and object shapes.
We noticed that the virtual fingers were sometimes not accurately aligned with the corresponding virtual object. They either penetrated inside or stayed away from the virtual object surface with a tiny distance in some cases. A possible reason might be that the scaling ratio was estimated in the starting time regardless of the virtual model shape. However, a tangential movement was inevitable when the fingertips moved toward the grasping center, which changed the length of the segment from P cen to the predicted contact point on the virtual model. This situation was not reflected in the first study because the symmetry of cylinders in the section plane eliminated the bias.
Another reason for the inaccuracy in virtual finger placement was the dimension of the distal phalanx. It varied from person to person, and the force applied to the prop also affected the final position of the fingertips. However, we found that participants tended to increase the grabbing force on the physical cylinder when the virtual finger did not reach the virtual object, and vice versa. This may be because the participants noticed that the virtual finger didn't reach the virtual object and so tried to grab harder to improve the system performance.
The augmentation of the virtual background marker was essential for the system to provide a more realistic experience. The virtual background was the same as the physical background image, letting participants believe they saw the same workspace after wearing the HMD. It also covered the physical scenario layer in 3D space and served as a background to the added virtual content. Thus, potential occlusion problems between the hand and virtual objects were solved by using this visualization method.
In this study, the triangular prism and the cube were grabbed by touching the surfaces rather than the edges. For spiky and polyhedron objects with multiple surfaces, the cylinder could not provide realistic tactile stimuli. In this case, it would be more practical to choose a polygon prism as the real prop. However, this study focused more on the visual presentation of fingers with the remapping algorithm rather than simulating realistic tactile feedback.

VI. IMPLICATIONS AND LIMITATIONS
Based on the developed approach and the results of the user studies, we suggest the following implications for virtual object manipulation.
(1) Consistent multimodal perception (e.g., visual and haptic feedback) could significantly improve the perception and understanding of objects in prop-based virtual object manipulation.
(2) Dynamically remapping the virtual fingers to fit the size and shape of a virtual object can enhance the realism and naturalism for manipulating and perceiving different virtual objects in an AR environment.
(3) The designed interaction paradigm and multimodal experience in the AR interface should be consistent with everyday experience and cater to usual expectations.
However, there are some limitations in this research that could be investigated and addressed in the future. We set a 10mm interval for D physical and D virtual in the first study, which may cause the relationship between D physical , D virtual , and D perceived to be inaccurate. The largest diameters of the prop and virtual models were also affected by the hand size, limiting this approach's generalization. A comparison with a current prop-based approach was not done, as the primary research focus was on the size perception difference, and the independent variables were of the size of the represented virtual object and the size of the cylinder. However, we do acknowledge that providing a baseline from a traditional prop-based approach could make the results more reliable. We should perform this study in the future.
A skeleton model of fingers was used for hand presentation in the system, rather than a more realistic skin-colored full hand model. This unrealistic hand may have affected performance as users were unfamiliar with the hand representation. Although this paper focused on the remapping approach and the perception of different sizes and shapes, the virtual hand rendering style could be changed in subsequent work.
The remapping algorithm required users to grasp toward the prescribed center, but the definition of grasping center was not always applicable, and the predicted contact points might not apply to some complex objects. An alternative approach might be to explicitly offset the model from the prop and enable the user to grab the proxy as a handler to move the corresponding object.
We did not consider the task performance difference in the second user study. Another user study evaluating the performance of the proposed technique compared with other methods will be conducted in the future.

VII. CONCLUSION AND FUTURE WORK
This paper explored a VHI approach of virtual finger dynamic remapping to help users feel different sizes and shapes of virtual objects in AR manipulation. The virtual fingers were synchronously remapped and adjusted to fit the size and shape of a represented virtual model on grasping or releasing a physical prop.
The first user study explored the influence of providing such VHI on the perceived size of virtual cylinders with different rendered objects and prop sizes. The results showed that the perceived size of a virtual object varied with its rendered size in an almost linear fashion across the given interval. However, the prop size did not significantly influence the perceived size within the given interval of a rendered size. The second user study explored the differences between dynamic finger remapping and the current prop-based approach on virtual object manipulation. The results indicated that the dynamically remapped fingers could enhance the realistic and natural experience of manipulating different virtual objects represented with a single prop. The user's spatial perception ability and the impression of manipulating different virtual objects were enhanced.
As discussed in section VI, future explorations could improve the system, in particular through more evaluations and application development. In addition, the psychological differences of the interaction using different rendering styles could be studied. For example, the rendering style of virtual fingers could be changed to skin color or transparent. The system could also be further developed and improved for practical manipulation tasks and many other AR and VR scenarios. Although this study was designed for AR applications, the remapping algorithm can also be directly used in VR. In the future, we would like to explore how to provide more realistic tactile feedback in prop-based object manipulation, including 1) replacing the cylinder with a reconfigurable structure, 2) providing physical support for the prop with a robot, and 3) attaching vibration motors on the prop to simulate collision feedback with other objects.