A Shared Control Framework for Enhanced Grasping Performance in Teleoperation

Remote teleoperation has shown significant advancements since the first teleoperation system was proposed by Goertz in the 1940s. In recent years, the research on shared control methodologies in which the robot assists the operators in accomplishing the desired tasks has gained extensive attention. One such important task in teleoperation is object grasping. In this paper, we propose a shared control framework to enhance the teleoperated grasping performance. The proposed framework is built upon a virtual reality device-based direct teleoperation system. In this framework, a template matching-based object point cloud compensation is introduced for multi-angle grasping pose generation. Then, the feasible grasping candidates are selected considering joint constraints-aware manipulability. Finally, the grasping assistance is achieved by trajectory blending with dynamic authority adjustment. To validate the performance of the proposed framework, we carried out experimental evaluations. The output results indicate improved grasping performance in terms of reduced task completion time, linear trajectory, and workload.


I. INTRODUCTION
Robotic teleoperation has been widely and extensively studied throughout robotics history since the idea was first proposed by Goertz in the 1940s [1]. Generally, direct teleoperation systems are designed to constantly follow operators' control, and operators receive visual, haptic, or proprioceptive sensory feedback to achieve a remote presence. The quality of sensory perception was later denoted as transparency. The quality of transparency directly affects task performances and control intuitiveness in a direct control framework [2]. However, due to the degraded quality of long-range signal transmission and human-robot embodiment heterogeneity, perfect transparency only exists in ideal situations [3]. This facilitated the development of new approaches which integrate direct teleoperation with The associate editor coordinating the review of this manuscript and approving it for publication was Tao Liu . some level of automation on the follower side for operational assistance. The approach is referred to as shared-control [4], in which the human cognitive skills and the robustness of the follower robots are both leveraged [5].
The shared control research has been widely conducted in application areas such as space exploration [6], surgical robotics [7], hazardous material handling [8], multiple robots teleoperation [9], and assistive robotics [10]. In most of the above-mentioned application cases, human operators have to teleoperate complex multi-degrees-of-freedom (DOF) robotic systems to accomplish object manipulations. In this context, object grasping takes on a particularly significant and fundamental role, which serves as a critical component in the success of these operations [11]. However, controlling all DOFs of complex robotic systems to precisely grasp a desired object presents a significant challenge. This difficulty is due to the degraded sensory perception of depth information [12], the requirement for simultaneous control of the end-effector's position and orientation, and the presence of robot constraints [13]. To overcome these challenges and successfully execute a grasp, operators have to persistently monitor the condition of the robotic system, which can be both physically and mentally demanding, leading to operators' fatigue and degraded task performance [14].
Hence, it is of great necessity to develop grasping assistance systems to reduce workloads and enhance remote grasping performance. In [15], Abi Farraj et al. introduced a novel haptic shared-control approach, in which a point cloud-based autonomous grasping pose detection algorithm [16] was integrated with the haptic guidance for assisting a human operator in the sorting and segregation of different objects in a cluttered environment. Ghalamzan et al. [17] extended grasping assistance to the post-grasping phase and proposed a shared control approach to help operators select a stable grasping pose by considering post-grasping manipulability. For the non-haptic guidance approach, grasping assistance is also applied to the control of a multi-DOF robotic arm for the target grasping task through a simple non-invasive Brain-Computer Interface and computer vision guidance [18]. Xu et al. designed a system in which the operators only need to send two kinds of signal instructions to achieve the transnational motion of the robot arm, and once the end effector reaches the predefined visual guidance area, the control is switched to full autonomy to complete the grasping. Unlike the approach of switching on-off of automation in the grasping assistance process, in [19], Laghi et al. has developed a grasping assistance algorithm for a bimanual teleoperation system in which the end effector trajectories are blended based on the user's willingness to grasp. The system utilizes flexible Virtual Reality (VR) controllers as input devices and assists the operation through the automatic coordination of bimanual motions to grasp a single object with different sizes. Moreover, rather than assisting in full DOF, Bowman et al. [20] adopted a DOF-wise control authority allocation between the human and the robot to achieve flexible grasping assistance.
Above mentioned grasping assistance algorithms are dependent on the grasping poses generated directly on the perceived point cloud. Suitable grasping poses are detected based on the input point cloud and the internal classification network [16]. However, this makes the quality of generated grasping poses directly affected by the input point cloud. In most cases, the depth camera mounted on the robot can only obtain the point cloud from a single fixed direction either from the side or from the top. For example, in the works of [15], [18], and [19], the depth camera was mounted on top of the workspace to simplify the grasping pose generation and achieve the grasping from above for higher manipulability. However, for cutting-edge multi-purpose robotic platforms such as Tiago (Pal Robotics) [21], Human Support Robot (TOYOTA, HSR) [22], and PR2 (Willow Garage) [23], the depth camera is mounted on the robot head to provide point clouds from the front side of the object. Due to the object self-collision, the grasping poses are generated only in the part of the object that is visible to the depth camera and all facing the same direction. This limits the diversity of generated grasping poses and makes it difficult to select suitable assistive grasping poses from multiple angles, which potentially can be used to meet the robot constraints such as joint limits or manipulability [24].
Moreover, a strategy to select an adequate assistive grasping pose from multiple candidates is still an open problem. In human-robot teleoperation systems, due to the embodiment difference between humans and robots, grasping commands that are given by human operators may violate the kinematic constraints of follower robots and make them fall into singularities [21]. Although, most of the previously mentioned grasping assistance systems assumes that the robot approaches the target with high manipulability, in practice, it is important to consider robot constraints and manipulability [24] to guarantee robot execution when generating feasible grasping poses. Thus, to address the challenges of generating multi-directional grasping poses, and avoiding robot constraints for smooth task execution, this paper develops a shared control framework that can generate grasping poses from multiple angles and considers robot manipulability constraints when selecting feasible assistive grasping poses.
On the other hand, compared with conventional fixed-base teleoperation interfaces, VR devices have the advantage of lower cost, flexibility, and extended workspace, which makes it ideal to be utilized as a teleoperation interface [25]. In addition, unlike Liquid Cristal Display (LCD) screens which are commonly used as the visual feedback interface to provide a monocular view [26], VR devices can be integrated with stereo cameras to present a stereo vision to operators through Head Mounted Display (HMD) for improved spatial perception [27]. In this study, a VR devicebased teleoperation system is developed. The follower robot is controlled to follow human arm motion intuitively, and a stereo vision is provided for visual feedback with enhanced spatial perception. The proposed shared control framework that assists grasping is built upon this VR device-based teleoperation system.
In this paper, we propose a grasping assistance shared control framework for enhanced grasping performance in a VR device-based teleoperation system. The contributions of this paper are summarised as follows: 1) A VR device (HTC, Vive Pro) based teleoperation system is proposed for intuitive direct control. Human arm and head motions are captured by the optical trackers on the Head Mounted Display (HMD) and the VR controller. Captured motions are then mapped to an anthropomorphous robotic manipulator (Pal Robotics, Tiago++) through inverse kinematics with null space resolution. A stereo video stream is displayed on the HMD for stereo visual feedback. 2) A shared control framework is proposed for grasping assistance. In this framework, a multi-angle grasping poses generation is achieved by template-matching based point cloud compensation. These multi-angle grasping poses are then used to select feasible grasping VOLUME 11, 2023 poses by referencing joint limit-aware manipulability measurement. A trajectory blending method with dynamic authority adjustment is introduced to achieve smooth grasping assistance.
3) The contribution of point cloud compensation on the improved manipulability distribution over the workspace is analyzed. Moreover, human subject experiments are conducted to evaluate the system's performance through the metrics of task completion time, linear trajectories, and NASA TLX workload ratings.

II. VR DEVICE-BASED TELEOPERATION SYSTEM
This section describes the VR device-based teleoperation system that is proposed in this paper. The entire software architecture is built upon Robot Operating System (ROS). To achieve intuitive teleoperation, coordinated control of the robot arm and end-effector according to the operator's arm and hand motions is important. Hence, as the leader device, a VR device (HTC, Vive Pro) is used to capture human hand and head motions. For the follower device, a redundant robot manipulator with a 7-DOF arm and a 2-DOF head (Pal Robotics, Tiago++) is used to execute the human command. A parallel gripper is attached to the wrist of the robot arm as an end-effector and can be controlled by the trigger of the VR controller. The stereo vision is provided by displaying images captured by the stereo camera (Stereo Labs, ZED mini) on the HMD. Fig. 1 shows the architecture of the proposed teleoperation system.

A. MOTION MAPPING
In the proposed system, only arm motion mapping is considered. Tiago's arm has 7-DOFs which is a redundant manipulator that can be used to intrinsically track human arm motions in cartesian space. Tiago's head has two motors that can be used to generate pitch and yaw motions to track human head movement. Arm motion tracking and head motion tracking are performed individually. The end-effector pose in the base frame of the follower robot can be defined as a homogeneous transformation matrix in SE(3) as follows: where R b e is a rotation matrix in the robot base frame which satisfies R b e ∈ SO(3), and p b e ∈ R 3 is a translational vector that contains translational information in each cartesian coordinate axis. Similarly, the captured human hand pose is used as the desired end-effector position T b e,d . Firstly hand motion tracking is performed. Since the HTC Vive base station and robot base have different origins in each coordinate system, direct mapping would not be possible. Hence, for translational motion tracking, the relative position with respect to the initial tracker position is used.
where, k represents k'th sampling step, and 0 means the onset of the motion. Then, for the rotational motion mapping, the absolute orientation originating from the robot base frame is used as follows: where a rotation matrix R b htc maps the rotation in the HTC coordinate system to the robot base coordinate system.

B. INVERSE KINEMATICS
Once the desired end-effector pose is obtained from the previous section, an inverse kinematics solver is performed to solve the desired joint angles, which is denoted as: Firstly, the end-effector position and orientation error can be defined as: where, log( R e ) is the logarithm mapping of the orientation error R e which can be described as R e,d R T e,k . Note that in the practical implementation, unit quaternions are used to describe the orientation. Here, the task space error e e ∈ R 6 and the joint space configuration θ ∈ R 7 , which makes the robot manipulator redundant.
Then, in order to track the desired end-effector pose T b e with Tiago's redundant manipulator arm, an inverse kinematics solver with the null-space resolution is implemented. The general form of the inverse kinematics that contains null space projection can be described as follows [28]: whereθ ∈ R 7 is the desired joint velocities, and J # is the Moore-Penrose pseudo-inverse of the task Jacobian. The Moore-Penrose pseudo-inverse is solved by using the Singular Value Decomposition (SVD) method. The null space projection (I −J # J) projects the subsequent vector to the task Jacobian J, and the φ is a vector that contains the error of the task with secondary priority. In which the vector φ can also be interpreted as the desired null space velocity that modifies joint space behavior and does not interfere with the execution of the prioritized task. For a redundant manipulator, the null space vector φ can be used to optimize the null space behavior based on some criteria such as manipulability indices, joint limits, or joint velocities [29]. In this paper, we employ the null space optimization criterion proposed in [30] and [31] with a preferred arm posture to avoid unreachable task poses. The cost function of the criterion is defined as follows: where the K w is a diagonal weighting matrix and θ prefer is the preferred arm posture described in joint space. This criterion is computationally simple and has been used to create human-like motions in anthropomorphic robot arms [30]. The null space velocity φ can be given by taking the gradients of the cost function (7) in the descending direction. Moreover, the joint limit is implemented by setting a software constraint. This inverse kinematics is implemented under ROS framework with Pinoccio motion library [32].

III. SHARED CONTROL FRAMEWORK FOR GRASPING ASSISTANCE
In this section, the proposed shared control framework is introduced. The framework is composed of a point cloud compensation-based grasping pose generator, a manipulability-based grasping candidate selection, and a shared control framework with dynamic authority adjustment that complements human commands for grasping assistance. As stated earlier, most of the point cloud-based grasping assistance shared control generates grasping pose from only one direction [15], [18], [19]. Hence, we propose the multi-angle grasping pose generation method, in which the diverse grasping poses can be used as candidates to select feasible grasping poses according to manipulability measurement.

A. MULTI-DIRECTIONAL GRASPING POSE GENERATION
In this paper, a state-of-the-art grasping pose detection (GPD) algorithm [16] is adopted to generate 6-DOF grasping candidates. However, given a partially observed point cloud of an object, it is unable to calculate the curvature of the missing surface, and some grasp candidates from the side direction are not considered as successful grasp. This constrains the generated grasping poses to only one direction. Hence, the most straightforward way to generate multi-directional grasping poses is to compensate for the object's missing point cloud and then apply the GPD algorithm.
To compensate for the objects' point cloud, an object point cloud library-based template matching is applied. The first step for point cloud compensation is to establish a template library. Since the performance of the template matching can be affected by point cloud distribution characteristics, the depth camera (ASUS, Xtion) is used to create the template library and to do the real-time point cloud compensation.
Given the point cloud of the objects with a symmetric shape, the process of point cloud compensation is shown in Fig.2 and follows the following steps: 1) Central axis calculation: The point cloud of the object's top part (cap) is fully obtainable. By segmenting the top part and calculating its central axis, the central axis of the object can be obtained in vector form, as shown in Fig.2 (a). 2) Obtain the copies of the point cloud at four different angles: The rotation around the central axis can be applied to the point cloud to approximate the process of obtaining the point cloud from four different angles ( Fig.2 (b)).   3) Create a template: These rotated point clouds are merged, and voxel grid filtering is applied to remove the overlap before it is finally used as a template of the object. (Fig.2 (c)). 4) Point cloud compensation by template matching: A template matching algorithm (Algorithm 1) is performed to compensate for the missing point clouds.
The result of point cloud compensation is illustrated in Fig.2 (d). Algorithm 1 describes the process of point cloud matching and compensation. The input is the real-time point cloud P obtained by the depth camera, and the object point cloud template library L. Firstly, the P is segmented into subsets P * that contain the point cloud of each object in the scene. Then, the template matching is performed to each point cloud inside the sub-sets P * . Here, each segmented point cloud p i is used as a template to match each point cloud template l j in the library. Based on the highest matching score S i , the corresponding transformation matrix T i is applied to the matched template M i and added to the output point cloud set O. The Algorithm is implemented under the framework of Point Cloud Library (PCL) [33], with RANSAC-based SAC-IA function [34] to estimate the transformation matrix and matching scores. The output point cloud set O is then fed separately into the GPD for grasping pose generation. The difference between with and without point cloud compensation and corresponding grasping poses are shown in Fig. 3. The point cloud compensation for one object can be run within 1s. Figure 4 shows the error of the object's central axis after point cloud compensation. Target 1, Target 2, and Target 3 represent three different target objects used in the experiments (Figure 9 (b)). Figure 4 (a) presents the angular error between the object's compensated central axis and an axis that is vertical to the table. Since the object is in a symmetrical shape, the angular error can be given by using the axis-angle representation, without considering the rotation around the Z-axis. The median of the angular error is 5.057 degrees, 4.597 degrees, and 4.900 degrees for Target 1, Target 2, and Target 3, respectively. The positional error is shown in Figure 4 (b), which is the Euclidean distance from the object's centroid position in the compensated point cloud to its ground truth centroid position. The median of the positional error is 0.011 m, 0.009 m, and 0.009 m for Target 1, Target 2, and Target 3, respectively.

B. MANIPULABILITY-BASED GRASPING POSE SELECTION
Among the generated multi-directional grasping poses, some of them may cause the Tiago arm to fall into the singularity, making both prior and subsequent arm motions unfeasible to execute. Therefore, in this section, a manipulability indexbased grasping pose selection method is proposed to choose the most feasible grasping candidate.

1) MANIPULABILITY CALCULATION
Manipulability index [35] which describes the distance to the singular configuration, is a well-known criterion for determining the ability of robot manipulators to maneuver in the workspace. A larger manipulability value indicates that the robot arm can move smoothly around the corresponding joint configuration. By applying the manipulability index to grasping pose selection, the unfeasible grasping poses can be effectively filtered out. The manipulability measurement is defined as follows: where, M (θ ) is the manipulability value in the joint configuration θ corresponding to a grasping pose, and J(θ ) is the jacobian matrix under the joint space configuration θ.

2) MANIPULABILITY WITH JOINT LIMIT PENALIZATION
The equation (8) calculates the manipulability without considering robot joint limits, which can also bring the robot arm to unfeasible configurations. To further ensure the feasibility of the selected grasping poses, this paper follows the suggestion in [36] to introduce the following penalization term that considers the influence of the lower (l − j ) and the upper (l + j ) joint limits: where, k is a scaling factor that can be used to adjust the behavior near joint limits, n represents the number of the joints in the manipulator (n = 7), l − j and l + j are the lower and upper limit of j th joint and θ j is the angle value of j th joint. The penalization term is designed to be rapidly decreasing when the joint configuration given by a grasping pose approaches the joint limits.
By multiplying the M (θ ) with P(θ), a penalized manipulability can be obtained as follows: In this way, the configurations that are near joint limits are penalized and make the corresponding grasping poses less likely to be selected. The comparison of selected grasping candidates with and without joint limit penalization when the object is at the same position is illustrated in Fig. 5. In Fig. 5 (a), two joints of the arm are reaching the joint limits (θ 4 and θ 6 ), which makes the arm motion unfeasible to reach the target pose. On the other hand, Fig. 5 (b) shows that by penalization, the arm joints are kept away from limits, which makes reaching motion feasible. The change in the manipulable workspace of the shared control system before and after point cloud compensation is compared by drawing a heatmap of the manipulability (Fig. 6). The workspace is set as x ∈ [0.4m, 0.9m], y ∈ [−0.7m, 0.3m], and z ∈ [0.5m, 0.8m] with respect to the robot base frame. Then, an object is placed in the workspace with its position changes 1cm steps at a time for a total of 150,000 positions. The GPD generates grasping candidates for each position. Using the obtained data, heat maps that contain the penalized manipulability greater than 0.05 (equation (10)) are generated. By comparing the heat map Fig. 6 (a), and Fig. 6 (b), after the point cloud compensation, the range of high manipulability grows by about 124%. This indicates the increased assistive area for the proposed shared control framework.

C. SHARED CONTROL FRAMEWORK WITH DYNAMIC AUTHORITY ADJUSTMENT
In shared control, the distribution of authority between the operator's input and robot execution is an important factor. This factor is generally represented as a function α that represents the level of human control [37]. In this study, the α is designed to be a Sigmoid function with respect to distance, as follows: where σ is a scaling factor. And a(t) = |P h (t) − P g |/d, d = 0.3m is a function that maps the distance between the end-effector and the target into the range of [0.0, 1.0]. The design of the authority function α(t) ensures the smooth convergence of the operator's authority towards 0 when approaching the target object. As Fig. 7 shows, in the proposed shared control frame work, the whole grasping process can be divided into four states based on the value of α. The T e , T h , T g , and D = |P h −P g | represent the pose of the end-effector, the operator's hand, the grasping target, and the distance to the grasping target, respectively. In Fig. 7 (a), the system is initialized and a grasping pose is generated for each object in the scene, the operator is in full control of the robot arm, and selects the target object using the VR controller. In the assistance state ( Fig. 7 (b)), where D ∈ [0.12m, 0.3m], the end-effector tracks the blended trajectory. In the grasping state (Fig. 7 (c)), where D ∈ [0.0m, 0.12m], the robot takes full control to complete the grasp by automatic interpolation. And in the post-grasping state (Fig. 7 (d)), the operator takes back the control authority. The shared control process is illustrated in Figure 8 as motion curves.
In the assistance state ( Fig. 7 (b)), the trajectory is blended following the equations listed below. For the translational trajectory:  where, p e (t), p g , and p h (t) represent the desired end-effector position at time t, the grasping target position, and the position of the human hand at time t, respectively. For rotational trajectory blending, the spherical linear interpolation (SLERP) is used: where R e (t), R h (t), and R g are the quaternion representation of the desired end-effector orientation, human hand orientation, and orientation of the target grasping pose, respectively. The authority factor α in the equation (12) and (13) is the value computed according to equation (11).
To prevent the gripper from touching the object during the adjustment process, the p g calculated is shifted by 12cm from the target grasping pose. The system enters into the automatic grasping state (Fig. 7 (c)) either the operator presses the controller grasping trigger or the end-effector reaches the region within 12cm from the object.

IV. EXPERIMENTS AND DISCUSSION
We conducted two experiments: Fig. 9 (a) a single object grasping experiment, and Fig. 9 (b) multiple objects grasping experiment to evaluate the system's performance. In the first experiment, we consider a single object to grasp, comparing the proposed shared control approach with the proposed VR device-based direct teleoperation and an LCD screen-based direct teleoperation. Then, in the second experiment, more complex multiple objects grasping is considered, comparing the shared control approach with the direct teleoperation approach. Before starting each experiment, the experimenter explained the experimental procedures for each subject. Each subject is then given time to practice the control of the teleoperation system. The experiments are carried out with 10 right-handed subjects (average age 24.4). All study participants provided informed consent, and the study was approved by the Ethics Committee of Nagoya University (No. 22-3).

A. EXPERIMENT #1: SINGLE OBJECT GRASPING 1) EXPERIMENTAL SETUP AND CONDITIONS
In this experiment, only one object is considered for grasping. The experiment setup is shown in Fig. 9 (a), the target FIGURE 8. Motion Curves that show the process of the shared control. The first row shows the positional changes of the human hand and the robot end-effector along the X-axis, Y-axis, and Z-axis, respectively. The second row shows the angular changes in the roll, pitch, and yaw components of both the human hand and the robot end-effector, respectively. The colors of the shaded areas indicate the different states during the shared control process (Figure 7). The light-yellow area shows the initial state where the human is in full control. The light-green area shows the assistance state where the human trajectory and robot trajectory is blended to reach the target. The light-blue area indicates the grasping state where the robot is in full control to complete the grasp. The light-pink area indicates the post-grasping state where the human is in full control to place the object inside the box. R g indicates the orientation of the target grasping pose. Note that, to prevent the gripper from touching the object during the assistance state, the p g calculated is shifted by 12cm from the target grasping pose p ′ g . object is a cylinder-shaped bottle (r=6cm, height=20.5cm) placed at (x=0.74m, y=−0.07m) with respect to the robot base frame. The experimental task is to teleoperate the robot end-effector to grasp the object 5 times and place the objects in the cardboard box. During the experiment, the robot head angle is fixed to reduce the influence of other factors. The operators are required to complete the grasping under three separately given conditions:

LCD:
A direct teleoperation, where operators control the robot end-effector by VR controller, and receive visual feedback from an LCD screen. HMD: A direct teleoperation with VR display. The operators receive visual feedback from the Head Mounted Display (HMD) which presents stereo vision for enhanced spatial perception.

SC:
The proposed shared control framework, in addition to stereo vision enhanced visual feedback, the operators receive the grasping assistance for automatic alignment to feasible grasping poses and automatic grasping completion.
The comparison of the visual feedback is shown in Figure 10. The HMD is capable of displaying stereo vision to the operator, in which the left and right images are captured through the stereo camera (Stereo Labs, ZED mini). On the other hand, the LCD screen displays the monocular image to the operator.
The force applied to the object is controlled through the inner position controller provided by the parallel gripper mounted on the Tiago wrist. When the operator pushes the trigger of the VR controller to close the gripper, the position controller controls the position of the two gripper fingers to close completely without the gap. The maximum force applied to the object is set to 50N to guarantee the stability of the grasp. The grasping poses generated by the GPD ensure force closure grasps will be formed with respect to the target objects. Furthermore, the target objects considered in the current experiments are rigid cylinder-shaped objects, with which we have observed stable grasps.
In the experiment, the grasping procedure is as follows: (1) the operators are asked to first place their hand in the home position ( Fig. 7 (a)), (2) then move the hand for grasping and placing (Fig. 7 (b), (c), (d)), and (3) finally move back to the home position ( Fig.7 (a)). The task completion time and linear trajectories are recorded to compare the operators' performance under each condition. The experiment starts when the first time the robot end-effector appears on the display, and task completion time records the time duration of each grasping starting from the robot end-effector appearing on the HMD until the object is placed in the box.

2) RESULTS AND DISCUSSION
To compare different metrics, one-way ANOVA tests are performed on the data. Fig. 11 (a) shows the box plot of the task completion time. Note that, ''***'', ''**'', and ''*'' in the figure represent for p < 0.001, p < 0.01, and p < 0.05, respectively. The median value of task completion time is 9.39s, 10.94s, and 13.56s for the condition of SC, HMD, and LCD, respectively. The one-way ANOVA test shows a statistical significance in the task completion time across the conditions between SC vs. HMD (p < 0.05), SC vs. LCD (p < 0.001), and HMD vs. LCD (p < 0.01). The results (SC vs. HMD, SC vs. LCD) indicate that for a single object grasping, the proposed shared control framework could enhance the operator's performance regarding reduced task completion time. This also indicates that the grasping task is made easier by integrating robot automation. The result of HMD vs. LCD indicates that, by using stereo vision for visual feedback, operators have the better spatial perception that can accelerate the process of grasping pose alignment. Fig. 11 (b) shows the box plot of the linear trajectory of the human hand. The one-way ANOVA test reveals a statistical significance in the task linear trajectory across the conditions between SC vs. HMD (p < 0.001), and SC vs. LCD (p < 0.001). The median value of linear trajectory is 0.38m, 0.49m, and 0.48m for the condition of SC, HMD, and LCD, respectively. This result indicates that with shared control human hands travel shorter distances compared with direct teleoperation. The shorter moving distance could reduce the workload of the operators during the grasping process. On the other hand, statistical significance is not observed in the condition of HMD vs. LCD. Since both condition HMD and LCD are based on direct teleoperation, where the follower end-effector constantly tracks operators' hands, the operators must reach the full distance towards the targeted object. This additional movement potentially increases the workload during teleoperation.

B. EXPERIMENT #2: MULTIPLE OBJECTS GRASPING 1) EXPERIMENTAL SETUP AND CONDITIONS
In this experiment, multiple objects are considered for grasping. The experiment setup is shown in Fig. 9 (b), the target objects are cylinder-shaped bottles with different colors: #1 red (r=6.5cm, h=21.5cm), #2 green (r=6.5cm, h=22.5cm), #3 blue (r=6cm, h=20.5cm), and placed at different places (x=0.70m, y=0.07m), (x=0.74m, y=−0.07m), (x=0.76m, y=−0.23m) in the robot base frame, respectively. The experimental task is to teleoperate the robot end-effector to grasp the 3 objects and place the objects to the cardboard box. Each operator is asked to perform the task 3 times. During the experiment, the robot head is fixed to a predefined angle to reduce the influence of other factors. The operators are required to complete the grasping task under two separately given conditions: (1) SC, and (2) HMD. The details of each condition are described in experiment #1.
The process to generate grasping poses for different target objects follows Algorithm 1. The algorithm first takes the environment point cloud which contains target objects as an input. It then segments this point cloud to acquire individual clusters for each object, each representing a partially observed point cloud of the object. By applying template matching for each cluster, the compensated object point cloud which contains positional information can be obtained. The grasping poses for each object can be acquired by running the Grasping Pose Detection algorithm to the compensated point cloud. These poses are filtered by manipulability score to yield feasible grasping poses.
In the experiment, operators teleoperate the robot to grasp each object following the procedures in experiment #1. The  shared control for grasping each target object is achieved following the four states described in Section III, subsection C and depicted in Figure 7. Operators are asked to grasp the target object following the order of object #1, object #3, and object #2. After grasping each object, operators return to the home position and select the next target object by using the trackpad of the VR controller. The ID of the selected object is displayed on the HMD. The task completion time is measured starting from the first time the robot end-effector appears on the HMD until the last object is placed in the box. The linear trajectories track the whole grasping process. The screenshots given by Figure 12 illustrate the process of multiple objects grasping with shared control. Fig. 13 (a) shows the box plot of the task completion time. The one-way ANOVA test shows a statistical significance in the task completion time between the condition of SC and HMD (p < 0.01). The median value of task completion time is 46.52s, and 57.54s, for the condition of SC, and HMD, respectively. The statistical significance between SC and HMD reveals that even for more complex multiple objects grasping, in addition to the enhanced visual perception, the proposed shared control can enhance the grasping performance by means of auto-alignment of the grasping pose and auto-completion. Fig. 13 (b) shows the box plot of the linear trajectory of the human hand. A statistical significance is revealed by the ANOVA test for the condition of SC and HMD (p < 0.001).

2) RESULTS AND DISCUSSION
The median values are 1.196m, and 1.841m for SC and HMD, respectively. The result is similar to single object grasping, by using the proposed shared control, human hands travel shorter distances and hence decrease the workload. Note that, none of the subjects have previous experience with the operation of the teleoperation system. In addition, the Interquartile Range (IQR) of the completion time is 8.92s, and 19.64s for SC and HMD, respectively. The IQR of the linear trajectories is 0.420m, and 0.513m for SC and HMD, respectively. The IQR of the box plots suggests that the data of the shared control condition has less dispersion. This could imply that despite the operator's experience, the shared control approach can assist the operators to execute the task VOLUME 11, 2023 in a more stable way. Moreover, the NASA TLX subjective evaluation (Fig. 14) shows less mental demand, physical demand, effort, and frustration during the experiment #2. This result further verifies the previous results which indicate the reduced workload when using the proposed shared control. As a result, the subjective measurement of the shared control's performance is greater than the direct teleoperation.

V. CONCLUSION
This paper proposes a shared control framework for grasping assistance in teleoperation. The presented shared control framework is built upon an intuitive VR device-based direct teleoperation system. In which, human hand and head motions are captured by HTC VR devices, and then mapped to an anthropomorphous robotic manipulator (Pal Robotics, Tiago) through an inverse kinematics solver with null-space resolution.
In the shared control framework, a template matchingbased point cloud compensation is performed for multi-angle grasping pose generation. Then a joint limit penalized manipulability analysis is applied to the generated grasping poses to acquire the most feasible pose candidates. By applying for the object point cloud compensation, the assistive area with high manipulability is increased about 124%. The shared control is then achieved by dynamic authority adjustment-based trajectory blending. Two grasping experiments are carried out for system evaluation. For both experiments, enhanced performances are observed by means of faster task completion time, and reduced linear trajectories. The NASA TLX subjective evaluation also showed a reduced workload for multiple object-grasping tasks.
The current point cloud compensation is targeted for symmetrically shaped objects with controlled lighting conditions. In the future to further improve the system capability, the point cloud compensation for the asymmetrically shaped object given more complex lighting conditions and backgrounds is going to be carried out. The future approach will be guided by methodologies reported by computer vision researchers [38], [39], [40]. Additionally, the use of tactile sensing technology can be explored in the future to control the force applied to the objects and enable the stable grasping of deformable objects or complex shaped objects such as fruits. Human comfortableness is also an important factor in designing an effective teleoperation system, and this aspect will be explored in the future. The collision avoidance method can be integrated into the current system for task execution in a more complex and cluttered environment. In addition, the assistance in the post-grasping phase such as placing an object in the desired location, or handover an object to a human can be explored.