Haptics Electromyography Perception and Learning Enhanced Intelligence for Teleoperated Robot

—Due to the lack of transparent and friendly human–robot interaction (HRI) interface, as well as various uncertainties, it is usually a challenge to remotely manipulate a robot to accomplish a complicated task. To improve the teleoperation performance, we propose a new perception mechanism by integrating a novel learning method to operate the robots in the distance. In order to enhance the perception of the teleoperation system, we utilize a surface electromyogram signal to extract the human operator’s muscle activation. As a response to the changes in the external environment, as sensed through haptic and visual feedback, a human operator naturally reacts with various muscle activations. By imitating the human behaviors in task execution, not only motion trajectory but also arm stiffness adjusted by muscle activation, it is expected that the robot would be able to carry out the repetitive tasks autonomously or uncertain tasks with improved intelligence. To this end, we develop a robot learning algorithm based on probability sta- tistics under an integrated framework of the hidden semi-Markov model (HSMM) and the Gaussian mixture method. This method is employed to obtain a generative task model based on the robot’s trajectory. Then, Gaussian mixture regression based on HSMM is applied to correct the robot trajectory with the reproduced results from the learned task model. The execution procedures consist of a learning phase and a reproduction phase . To guarantee the stability, immersion, and maneuverability of the teleoperation system, a variable gain control method that involves electromyography (EMG) is introduced. Experimental results have demonstrated the effectiveness of the proposed method. Note to Practitioners —This paper is inspired by the limitations of teleoperation to perform a task with unfriendly HRI and lack of intelligence. The human operators need to concentrate


I. INTRODUCTION
P ROPELLED by sensor technologies, computer tech- nologies, control technologies, and mechatronic design, the intelligent robots have made a breakthrough over the past three decades [1], [2].Nowadays, the robots have been widely used in industry because of its high versatility and adaptability [3], [4].With the expansion of its application in various areas, the collaborative working environments of robots are being more complicated, and the complexity level of tasks has also greatly increased [5], [6].However, the development of robot technology has not been broadly consistent with human expectations.Related studies demonstrate that the autonomous robot system cannot accomplish a task in an unknown or a complicated environment in the foreseen future with the limitations of sensor, control, artificial intelligence, and mechanism.Therefore, the telerobot based on human-robot interaction (HRI) is a realistic option to manipulate a complex task by allocating the human intelligence and robot's capabilities to enhance the robot intelligence [7]- [9].As shown in Fig. 1, a possible human-in-the-loop teleoperation system consists of the following modules: human operator, information perception interface, task learning, and task reproduction.The human operator is the main factor for the telerobot [10], [11].The information perception module This work is licensed under a Creative Commons Attribution 3.0 License.For more information, see http://creativecommons.org/licenses/by/3.0/Fig. 1.
Information perception and robot learning for the teleoperation system.
is used to provide a perception platform for information deliver between the human operator and the robot [12].The operator's physical or physiological information [i.e., motion, electromyogrphy (EMG), etc.] can be used to enhance the capability of HRI.When the human operator receives the feedback from the external environment, the human operator can actively adjust muscle activation via EMG signals to cope with the change in the external environment.The task learning module is mainly used to learn a specific skill through HRI.The robot reproduction module recognizes the current task initial situations and updates the generative task model to improve the manipulation performance.
Given the precise information, including work environments and mission scenarios, the telerobots can be programed by the means of code by an expert or an experienced operator in industrial fields for tasks such as automatic spraying, automatic welding, and automatic guided vehicle.This programing method is widely applied in many areas that require high precision, high speed, and high repeatability.Alternatively, the robot learning is a method that the robot can learn to perform a certain task via HRI and to reduce the burden of the operator to meet the demand of efficiency in the industrial field.This learning method is also called programing by demonstration or learning from demonstration.Generally speaking, the robots learn a specific task through direct demonstration of human teleoperating a robot, 3-D vision teaching, virtual demonstration, and so on.The human operators teach the robots to exploit a specific skill by using the interaction information, such as interaction force, position, visual images, voice, physiological signals of the human, and so on.There are a number of algorithms dedicated to the study of the topic of robot learning.Robot learning methods can be divided into two groups: the perception system level and the learned task level.The perception system involves vision and perceptible motion, while the learned task level comprises the task trajectories or hidden state information (position, velocity, force, and human intention) [13], [14].Vision perception system is an effective way to capture the information of HRI [15]- [19].Chalodhorn et al. [15] proposed a learned sensory-motor model for a humanoid robot to learn human gait via motion capture system.Grollman and Jenkins [17] used a perception system to collect the data for the purpose of the robot imitation.In addition, many researchers employed motion sensors to capture human motion to teach the robots to manipulate a specific task [18], [19].
In addition, by extracting the information from a demonstrated trajectory could also facilitate robot learning.Field et al. [20] presented a method by learning a joint space trajectory model for robot programing.In [21], a complex trajectory reproduction method is used to transfer the knowledge of a human to a robot by demonstration.A similar learning model was presented in [22].Especially, Deniša et al. [23] developed a compliant movement primitives method to encode the position trajectory for robot learning.In order to improve the performance of robot programing, related researchers have proposed a number of methods at the robot learning phase.Racca et al. [24] proposed a method integrated hidden semi-Markov model (HSMM) with Cartesian impedance control to perform the complex tasks like opening a door or manipulating a button.In [25], a parametric hidden Markov model (HMM) was used to encode the data from the demonstrations in the training phase.Tanwani and Calinon [26], [27] developed a task-parameterized HSMM to copy with the environmental situations in the process of manipulation tasks.From the above-mentioned works, the robot learning can be regarded as a problem of feature extraction from the demonstrated training data for a specific skill in the process of HRI.By using robot learning, the robot can obtain a task model which embeds human intention.Khokar et al. [28] used HMMs to recognize human motion intention by an expert with offline training.Stefanov et al. [29] extracted features from the haptic device and adopted HMM algorithm to achieve human intention recognition through classification.In [30], an Intention-Driven Dynamics Model was presented to infer human intentions from observed motions.Maeda et al. [31] developed an interaction learning method to generate a collaborative trajectory from human movement observations.Ravichandar and Dani [32] proposed an adaptive-neural-intention estimator method to recognize the operator's motion intention by using the observations via offline training and online intention estimation.For teleoperated robots, a human operator is regarded as a factor to perform tasks cooperatively [28], [33], [34].The human performance greatly decides the accomplishment of the task.Pervez et al. [35] presented a learning method based on dynamical movement primitives to manipulate the peg-in-hole task with a three DOF master-slave robot.However, influenced by an uncertain environment, it is difficult to utilize the human cognition for the manipulation of the task.In addition, the motion and command of the operator involve his/her perception and intention in the process of task collaboration [36].
In this paper, we combine the human intelligence with the robot's capability to ensure the performance of the task and to enhance the robot intelligence of the teleoperated robot.In the information perception interface, surface electromyogram (sEMG) signal is applied to detect the operator's muscle activation when the operator manipulates the haptic device to adapt to the external environments.For a specific skill, the muscle activation varies with the operator's movements/commands.In addition, representations of human-telerobot collaboration, such as trajectories of the task/robot's end-effector, motion of the human, and variation of muscle activation, indicate a specific skill and human intention.First, by introducing a combined scheme of HSMM and Gaussian mixture model (HSMM-GMM), we can obtain a generative model in the learning phase.Similarly, in the reproduction phase, a task reproduction model is executed based on HSMM and Gaussian mixture regression (HSMM-GMR).Second, in the learning and reproduction phase, based on our previous work [37], a sEMG signal is embedded into control strategy to indicate the intentions of human control gain and movement.For the telerobot, it can learn how to actively use a suitable control gain to perform a task which is inspired by the human muscle activation according to the external environments.Finally, experimental studies are performed to show the effectiveness and superiority of the developed algorithms.
Section II presents preliminaries to introduce the teleoperated system and the sEMG signal processing.Section III presents the proposed task generative model in learning and reproduction phase.Section VI describes system dynamics and control strategy.The experimental setup and results are presented in Section V. Finally, Section VI concludes the work in this paper.

A. Teleoperation System Description
As shown in Fig. 2, a novel teleoperation system is developed in this paper.It includes three main modules: a biological signal perception interface module, a telerobot module, and a robot learning module.
1) Biological Signal Perception Interface Module: The biological signal perception interface module consists of sEMG electrodes, preprocessing unit, and a variable gain unit.This module is used to sense the human operator's muscle activation.The sampled sEMG information indicates the electrical activity of the hand muscle in the process of HRI.Through the signal processing module, an envelop line of the sEMG signal can be obtained.
In this module, the obtained variable gain is used to control the motion of the slave.2) Telerobot Module: The telerobot module is the main part of the frame which employs a master-slave structure.It can be observed in Fig. 2 that the slave device follows the master's motion generated by the human operator.3) Robot Learning Framework Module: The robot learning module is responsible for both task learning and reproduction.In the learning phase, the telerobot obtains a priori knowledge task model to learn a specific skill from the human operator by using statistical learning theory.In the reproduction phase, the task model is updated according to the current situations, and the robot executes the updated task.

B. Muscle Activation Descriptor
Generally, the EMG signal is a result of the comprehensive effect of motor unit action potential of muscle fiber both in time and in space for surface muscle [38], [39].The sEMG signals can be used in three applications: indicator of the muscle activation, representation of the force based on human muscle, and a descriptor of the fatigue for the muscle [38].In this paper, we use the sEMG signal as the muscle activation descriptor.The process of sEMG signals preprocessing can be seen in Fig. 3.
In this paper, an MYO armband (Thalmic Labs Inc.) with N = 8 detection channels is used to sample the EMG signal by employing bluetooth communication technique where u raw (i ) and u are the raw sEMG signals and the sEMG signal, respectively.Generally, the sEMG signal (blue) u involves noise.In order to extract the muscle activation from the sEMG as filtered as possible, the root mean square (rms) of the sEMG signal is applied in this paper as follows: where L win is the size of sampling moving window, rms presents the rms of the sEMG signal or the muscle activation of descriptor.The value of the rms represents the instantaneous electric power of the sEMG signal and reflects the effective value of the muscle surface electrodischarge.The size of the moving window is a parameter to tune according to experience for many times.

III. ROBOT LEARNING MODEL
In the teleoperated systems, the performance of teleoperation is highly correlated with the human operator's skill.When the human operator performs a task through the teleoperated robot, the completion of task may be affected by the relationship with the external environment, the telerobot, and the human subjective initiative.In order to enhance the capability of HRI, a novel method is utilized to obtain a task model for the teleoperation.
In this paper, we select the end-effector of a slave device to perform a specific task trained with different initial conditions, e.g., different initial locations.We then obtain a task learning model by using the HSMM method.
The proposed task generative frame 1 is shown in Fig. 4. In the learning phase, based on the collected data of task from demonstrations, HSMM-GMM is used to obtain the task model parameters.In the reproduction phase, the telerobot behavior is reproduced based on HSMM-GMR according to a given task parameter set.

A. Parameters of Task Generative Model
The input data is collected by the positions x and velocities ẋ of the robot end-effector. 2 Because the performance of task is demonstrated by the robot manipulator through the human operator, we define the pose of the slave device ξ ∈ R D as an observation sequence with 1 As seen in Fig. 4, in the reproduction phase, T is equal to T in the equations. 2To guarantee the accuracy of these data for task learning, we use the pose of end-effector of the slave device as input signals rather than the one of the master device.
where ξ I is the pose of manipulator of the slave with the input components and ξ O are out components, respectively.x s ∈ R 3 represents the position of the slave.
The task generative model incorporates six elements that are presented as follows: 1) State of Model: Suppose that there are N states or N hidden states in the HSMM, i.e., S = 2) Number of Observations: M indicates the number of observation.A set V incorporates M observations is 3) Initial Probability Distribution Vector: represents the initial probability distribution of model, it defines the probability distribution of each hidden state at the beginning of calculation π i = P(q t = S i ) ≥ 0, (1 ≤ i ≤ N) and satisfies N i=1 π i = 1.4) Transition Probability Distribution Matrix: A = [a i j ] N×N is the transition probability distribution matrix for hidden state i at time (t − 1) to hidden state j at time t, i.e., a i j = P(q t +1 = S j | q t = S i ), (1

5) Gaussian Mixture Parameters and Observation Probability Matrix:
We use the Gaussian joint probabilities to represent the output probability distribution for observation, i.e., μ , where μ C and σ C are the mean values and variances of the Gaussian duration probability distribution, respectively.The state duration probability density function Therefore, the proposed model with N states is parametrized by

B. Initialization of Task Generative Model
According to the above-mentioned specification, we need to initialize the model λ, i.e., the given values of parameters which are of reevaluation.Generally, the form of a Markov chain is determined by the parameters of model π and A. However, the initial values of both π and A have little impact on the final convergence effect of the model under the case of certainty for the Markov chain.In the parameters of the actual model, the initial value of the state duration probability density function p C i (t) does affect the final results, but the influence is very limited.Therefore, the parameters of {π, A, p C i (t)} can be initialized to be random or equal numbers.However, the parameters {μ, σ } have a big impact on the convergence property.We set the initial values of μ, σ with the K -means method [27], [40].

C. Training Method
The purpose of model training is to acquire the model parameters λ = {π, A, {μ, σ }, {μ C , σ C }} according to a given observation sequence through the expectation-maximization algorithm [41].We suppose that the observation obeys an independent Gaussian probability distribution in this paper.The process of training model is shown in Fig. 5.
According to the given observation sequence to evaluate the probability γ t (S j ) of state S j at time t According to γ t (S j ) to reevaluate the parameters of model λ

D. Observation Constituent
We define the pose of the slave device ξ t as an observation sequence with where ξ t includes the input variable ξ I t and the output variable ξ O t .μ i and σ i are the mean and variance related to the input and output.

E. Probability Calculation
According to the given model λ = {π, A, {μ, σ }, {μ C , σ C }} and observation sequence the probability of observation sequence V in the model λ can be calculated by using the forward algorithm as [26] For t = 1, 2, 3, • • • , T − 1, it can be obtained as where i = 1, 2, 3, • • • , N. α i,t is the forward probability a j i is the transition probability from the state j at time t to the state i at time t.

F. Reproduction
Based on the probability calculation of HSMM-GMR, a normalized parameter h i to describe the influence of state i is defined as where α k,t is defined in (11).From ( 12) to (13), we have Inspired by the work in [24], [42], and [43], the current target variable ẋ (velocity) can be obtained as with According to ( 15)-( 18), we can compute the velocity of the end-manipulator.
It is assumed that the position is known at time t.Motivated by the work in [24], the position can be computed at time t +1 by integration where T is a length of the single iteration time step.

A. System Dynamics
In this paper, the dynamics of the teleoperation system are described as [44] M m (q m ) ẍm + C m (q m , qm ) ẋm + G m (q m ) = F m M s (q s ) qs + C s (q s , qs ) qs + G s (q s ) = τ s (19) where C s (q s , qs ) ∈ R N s ×N s , G s (q s ) ∈ R N s } are the inertia matrix, Coriolis and centrifugal matrix, and gravitational matrix for the master and the slave in the joint space, respectively.q m and q s are the joint angle vector for the master and the slave.F m ∈ R N m is the force of the master in the process of human operation.τ s ∈ R N s is the control torque of the slave device.N m and N s are the DOFs of the master and the slave.

B. Control Strategy 1) Basic Control:
The potential difference (PD) control methods are both applied in the system for the master and the slave, which are described in where e m = x md − x m , e s = x sd − x s .{K pm , K dm } and {K ps , K ds } are the proportional term and the differential term of the controller for the master and the slave, respectively.
2) Task Space Control: Based on the Denavit-Hartenburg (D-H) parameters of the slave device, the closed-loop inverse kinematics method is introduced to avoid kinematic singularities and numerical drifts for the Cartesian position task.The slave joint velocity q s can be presented as [44] q s = K sp J T s (q s )e s (21) with where K sp and J T s (q s ) are the positive definite matrix and the Jacobian matrix for the slave, respectively.e s is the position error. 3 3) Variable Gain Control: When the human operator manipulates a telerobot to perform a task, the muscle activation of hand will change according to the feedback from the robot.Thus, a variable gain control method based on the muscle activation is used to enhance the telerobot control performance [37], [44], [45] 3 For simplicity, we just describe the control strategy in the Learning phase.The variables x sd , x s indicate the status of the teleoperated robot in the Learning phase.where a(i ) = (e A emg u(i) − 1)(e A emg − 1), u(i ) are sEMG signal.A emg represents the parameter of muscle activation.K α max indicates the maximum of K α and K α min represents the minimum of K α , i.e., K α min ≤ K α ≤ K α max .α k min and α k max are the upper bound and the lower bound of the muscle activation, respectively.
4) Tracking Force: In the process of task execution, 4 the tracking force F tr varies with the position error of the slave.A PD controller is used to compute the tracking force where K ptr and K dtr are the parameters of the PD controller.
x * and ẋ * are the position and velocity of the end-effector of the slave.x g is the given desired position.
V. RESULTS

A. sEMG Signal Preprocessing
In this paper, the length of moving window L win is used to evaluate the performance of the rms based on muscle activation.
As shown in Fig. 6(a)-(c), the length of the sampling window has greatly affected the envelope of the sEMG signal.5In Fig. 6(a), the envelop curve (red) involves high-frequency characteristic and brings disturbance in the applications.When L win = 30, the rms is of a smooth trend as shown in Fig. 6(c); however, the computing time is relatively longer for rms and it cannot meet the requirement for a real-time classifier.Compared with Fig. 6(a) and (c), the curve of the rms is of suitable smoothness and reasonable computing time with L win = 20.

B. Semi-Physical Experiment: Pulling the Cotter Pin Task
1) Semi-Physical Experiment Setting: In this paper, the proposed method is presented to show the process of robot learning and human intention insertion for the teleoperation system which utilizes a master-slave structure.As shown in Fig. 7, the experimental platform consists of a 6-DOF Touch X haptic device as the master and a virtual Baxter robot as the slave.The experimental facilities communications through a single computer with Windows 7, MATLAB, and Visual Studio 2013.The first part of the experiment is the trajectory tracking to verify the motion performance for a heterogeneous master-slave robotic system.The slave device follows the motion of the master through a communication channel.The second part of the experiment is the pulling the cotter pin task.In the learning phase, the task workspace trajectories are recorded as demonstrated observation.Observations are obtained by remotely manipulating the six degrees of freedom slave device.Through manipulation task group, a generative model {π, A, {μ, σ }, {μ C , σ C }} is obtained by using HSMM-GMM.In the reproduction phase, the robot reproduction trajectory can be corrected based on the learned model (HSMM-GMR).In the experiment, the designed controller parameters are presented in Table I.The sampling time for the teleoperated system 2) Learning Model Through Demonstrations: This experiment aims at learning a task model to enhance the robot intelligence for the teleoperation system.The trajectory observations of the slave are collected from several demonstration (C = 4) by the same human operator using the haptic device.Then, HSMM-GMM method is applied to encode the demonstrations.The number of Gaussian is chosen according to the task segmentation.The hidden states for a pulling the cotter pin task are set N = 4.
As shown in Fig. 8(a)-(b), the recorded trajecotries of position and velocities of the robot end-effector are demonstrated by the human operator with four demonstrations in the task space.According to (14) and Fig. 8(c), the HSMM component activation curves are computed for pulling the cotter pin task in the learning phase.
3) Reproduction Based on Learned Task Model: A learned task model can be obtained in learning phase.In reproduction phase, the learned task model can adapted according to the task initial conditions.From (15) to (18), the current position/velocity of the robot end-effector can be computed based on the learned task model.Fig. 8(d)-(e) shows the position and the velocity of reproduction for the pulling the cotter pin task.It can be seen that a relative smooth trend in the reproduction  phase in comparison with the human operator demonstrations.As shown in Fig. 8(f), the tracking force is computed in (23).
The root-mean-square errors (RMSEs) show the accuracy of the proposed method in reproduction phase in Table II.The RMSEs value indicates that the superiority of the proposed reproduction method based on the learned task model.

C. Experiment: Drawing Task
The purpose of this experiment is to evaluate the performance of the presented method in a simple task.
1) Experiment Setting: In this experiment, a typical drawing task is performed by the teleoperation system as presented in Fig. 9.In this experiment, a green pen is attached onto the endpoint of the slave right arm as a drawing tool.A human operator manipulates the master device to teleoperate the end-effector of the slave to perform a drawing task.We collect C = 3 demonstrations and train N = 18 states of the HSMM in the learning phase.We then perform a drawing task in a 210 mm × 297 mm (A4) 2-D space.
2) Results and Analysis: The motion trajectories and stiffness profile of the drawing task are shown in Fig. 10 In steps II and III, the end-effector of the slave leaves from the paper for another drawing operation.Similarly, subtask drawing task 2 and subtask drawing task 3 are performed in steps III and IV and steps V and VI, respectively.During the learning phase, humans' stiffness is variable which follows with the drawing operation.As shown in Fig. 10(d), humans' stiffness maintains a high level in steps I and II, III and IV, and V and VI.
In Fig. 11, the telerobot performs the drawing task by using a reproduced stiffness.From Fig. 11(a)-(f), it can be concluded that the drawing tasks are successfully performed by employing the proposed method.

VI. CONCLUSION
The purpose of this paper is to explore a mapping of relationship to represent the task model between the perception information and the robot learning method.This paper proposed a novel algorithm integrating the haptics EMG perception mechanism and robot learning based on HSMM and GMM/GMR.The human operator could adjust the muscle activation according to the HRI environment and this muscle activation process could be observed and recorded.
By utilizing the recorded sEMG signal and a task learning framework, the teleoperation system could naturally interact with the external environment and encode the demonstrations and reproduction of the HRI task to enhance the robot intelligence, respectively.Experimental results have demonstrated the effectiveness of the proposed haptic feedback with sEMG-based variable gain control mechanism and the robot learning method.In the future work, we will introduce the force information of the robot's end-effector and vision information into the telerobot perception system to construct the multimodal information fusion platform.Moreover, we will exploit the more effective task learning model for the enhancement of the robot intelligence in the teleoperated areas.
Therefore, the observation probability at time t for state S i is p i (t) = N (t; μ i , σ i ).B = b i (k) N×M indicates the observation probability matrix in state S i , and b i (k) = P(o t = v k , q t = S i ) (1 ≤ i ≤ N, 1 ≤ k ≤ M),6) Probability Density Function for State Dwell Time: We train the model c times in a set c ∈ {1, 2, • • • , C}, and we have μ

Fig. 7 .
Fig. 7. Experimental setup for a pulling the cotter pin task.

Fig. 8 .
Fig. 8. (a) Recorded position from four demonstrations.(b) Velocity by human operator demonstration.(c) HSMM component activation for the pulling the cotter pin task.Reproduction phase based on the learned task model.(d) Reproduced position in the reproduction phase.(e) Velocity in the reproduction phase.(f) Tracking force in the reproduction phase for the pulling the cotter pin task.

Fig. 10 .
Fig. 10.(a) Position of the robot in x-coordinate during the drawing task.(b) Position of the robot in y-coordinate during the drawing task.(c) Position of the robot in z-coordinate during the drawing task.(d) Stiffness of the human during the drawing task.
(a)-(c).The gray curves indicate the demonstration results.The red curve represents the result of preproduction.The reproduction phase can be divided into six steps (I-VI).In steps I and II, the robot begins to perform a subtask drawing task 1.

TABLE I DESIGNED
t = 0.01 s, the parameter involves muscle activation is chosen as A emg = −0.6981.

TABLE II RMSEs
FOR THE PULLING THE COTTER PIN TASK INVOLVES HUMAN OPERATOR DEMONSTRATIONS Fig. 9. Experimental setup for a drawing task.