On EMG Based Dexterous Robotic Telemanipulation: Assessing Machine Learning Techniques, Feature Extraction Methods, and Shared Control Schemes

Electromyography (EMG) signals are commonly used for the development of Muscle Machine Interfaces. EMG-based solutions provide intuitive and often hand-free control in a wide range of applications that range from the decoding of human intention in classification tasks to the continuous decoding of human motion employing regression models. In this work, we compare various machine learning and feature extraction methods for the creation of EMG based control frameworks for dexterous robotic telemanipulation. Various models are needed that can decode dexterous, in-hand manipulation motions and perform hand gesture classification in real-time. Three different machine learning methods and eight different time-domain features were evaluated and compared. The performance of the models was evaluated in terms of accuracy and time required to predict a data sample. The model that presented the best performance and prediction time trade-off was used for executing in real-time a telemanipulation task with the New Dexterity Autonomous Robotic Assistance (ARoA) platform (a humanoid robot). Various experiments have been conducted to experimentally validate the efficiency of the proposed methods. The robotic system is shown to successfully complete a series of tasks autonomously as well as to efficiently execute tasks in a shared control manner.


I. INTRODUCTION
These electrical signals are generated during muscle contrac-22 tion and carry vital information intrinsic to the movement 23 The associate editor coordinating the review of this manuscript and approving it for publication was Alberto Botter . of the muscles that are triggered to accomplish the motion 24 performed by the subject. MuMIs based on EMG signals 25 usually provide a hands-free, unobstructive solution, offering 26 intuitive and natural operation of robotic devices in complex 27 applications [1], such as the execution of teleoperation and 28 telemanipulation tasks with robot arm hand systems [2], [3], 29 [4], as shown in Fig. 1. 30 Machine learning (ML) techniques have been employed 31 to decode EMG signals to perform both classification (e.g. 32 decoding discrete human gestures) [  regression (e.g. decoding continuous human motions) [9], (TFD) features methods [13]. Past studies show that TD 48 features result in more consistent performance over time than 49 FD [14], [15]. In our previous study [16], we compared  In this paper, we assess the ability of ML and DL models  Fig. 2. The 72 models are evaluated in terms of accuracy and speed of 73 calculation. In the second step, the model that presented the 74 best performance for online applications (considering the 75 trade-off between accuracy and processing time) is employed 76 to develop a shared control framework for intuitive robotic 77 telemanipulation. 79 In this section we present experiments conducted to eval-80 uate the performance of the decoding models that were 81 developed for the regression and the classification tasks. 82 To do this, three different experiments were designed to 83 evaluate the EMG-based decoding models in the regres-84 sion and classification use cases: i) Decoding of Dexterous, 85 In-Hand Manipulation Motions, ii) Hand Gesture Classi-86 fication, and iii) Robotic Telemanipulation using the New 87 Dexterity Autonomous Robotic Assistant (ARoA) humanoid 88 platform [21]. In this subsection, we present the guidelines employed for 92 training the RF, CNNs, and TMC-T-based regression mod-93 els to perform decoding of dexterous, in-hand manipulation 94 motions using EMG signals as input. The regression models were trained on the dataset collected 97 by the New Dexterity research group [12]. For this dataset, 98 before the start of data collection, each participant was asked 99 about any disabilities that may affect the quality of the data. 100 Finally, myoelectric activations were acquired from 11 non-101 disabled subjects from 16 different muscles (8 on the hand 102 and 8 on the forearm of the subject). For the hand, three 103 electrodes were placed on the palm measuring the activity 104 of the Lumbrical muscles, four electrodes were placed at the 105 back of the palm measuring the activity of the Interossei and 106 one electrode was placed on the base of the thumb to measure 107 the myoelectric activations of the Opponens Pollicis muscle. 108 For the forearm, three electrodes were placed on the Extensor 109 Digitorum site, three were placed on the Flexor Digitorum 110 site, one was placed on Abductor Pollicis Longus, and the 111 final one was placed to measure the myoelectric activations of 112 the Extensor Pollicis Brevis. The ground electrode was placed 113 on the elbow, where the muscular activity becomes minimal. 114 For data acquisition, two double-differential EMG electrodes 115 were employed to measure the myoelectric activations. The 116 EMG signals were acquired at a sampling rate of 1200 Hz 117 by the bioamplifier, which bandpass filtered the data using a 118 Butterworth filter (5 Hz -500 Hz). The electric line noise was 119 filtered out using a notch filter of 50 Hz.

120
For each subject, ten trials were recorded for each motion. 121 For each trial, the subjects performed a 3-dimensional 122 equilibrium point manipulation task using a Rubik's cube, 123 a chips can from the Yale-CMU-Berkeley (YCB) grasping 124 object set [22], and a custom-made off-centered mass cube. 125 FIGURE 2. TMC-T model. The feature vector obtained using the feature extraction methods is provided to four convolutional layers, which extract features and learn embeddings from the input feature vector. Batch normalization layers follow the convolutional layers. Two max-pooling layers reduce the EMG channels dimension while keeping the most relevant information. After the convolutional blocks, the output is reshaped and provided to transformer blocks and fully-connected layers. The output from the TMC-T model is the Row, Pitch, and Yaw motions.
FIGURE 3. CNN model. Conv stands for convolutional and fc for fully-connected. First, the feature vector is obtained using the feature extraction methods discussed in Section III-A. Three convolutional layers extract features using convolutional kernels. The convolutional layers are followed by batch normalization and dropout layers. After the convolutional blocks, the output is flattened and provided to fully-connected layers. The output from the CNN model is the roll, pitch, and yaw motions.    In this subsection, we present the experiments that were 160 conducted to evaluate the efficiency of the hand gesture clas-161 sification models. For this set of experiments, we utilize the dataset collected 164 by the New Dexterity research group in [16]. In this dataset, 165 each subject was instructed to alternate between a rest state 166 and a gesture state. In total, six gestures were recorded: i) a 167 pinch grasp, ii) a tripod grasp, iii) a power grasp, iv) an open 168 hand configuration with abducted fingers, v) co-contraction 169 of all muscles, and vi) rest state. The myoelectric activations 170 were acquired from 8 different muscle groups of the human 171 arm and hand. The details regarding the electrode placement 172 can be found in [16].

173
For data acquisition, an appropriate package in the robot 174 operating system (ROS) [24] based framework was imple-175 mented. The beginning and termination of each gesture were 176 prompted to the subject with a visual cue on a computer 177 VOLUME 10, 2022 the accuracy-execution time trade-off is defined as follows: The first two terms of Equation 2 guarantee that the model In this subsection, we present the experiments performed to 217 evaluate a shared control framework for the real-time oper-218 ation of the ARoA humanoid platform in the execution of 219 telemanipulation tasks. To do this, the EMG-based shared 220 control framework presented in Fig. 4 was developed so as 221 to allow the user to take control of the robot platform and 222 perform complex tasks whenever autonomous execution is 223 not feasible. The proposed framework is divided into four 224 main modules, namely, the Autonomous Control Module, 225 the User Control Module, the Robot Interface Module and 226 the Control Blending Module. Each module is represented 227 in block layout, where each block conceptually corresponds 228 to a node implemented within ROS. state, the extension gesture should be executed. When the 270 arm control is enabled, the user can control the motion of 271 the robot. To enable and disable the manual control of the 272 robot arm, the co-contraction gesture is used. When manual 273 control of the arm control is disabled, the control can be 274 handed over to the robot using a power gesture that triggers 275 autonomous operation. When manual control is disabled, the 276 robot's current end-effector pose is maintained and used as 277 the start reference frame. An overview of the control state 278 machine is presented in Fig. 5   It is similar to IEMG, which detects the onset of muscle 334 activity. It also provides information regarding muscle 335 contraction levels.  TABLE 2. Average motion decoding correlation (C) and accuracy (A) across all subjects for RF-based models developed for specific objects using 1, 25, 50, 75, and 100 trees for each TD feature investigated in this study. For 1 tree, random forests behave like a classic decision tree. The highest accuracy for each object and feature is highlighted in blue while the lowest in yellow.

8) Log Detector:
LogD also provides an estimate of the 346 muscle contraction force.

347
A summary of all the different features that were calculated 348 and examined in this study along with details regarding the 349 calculation of each feature can be found in Table 1.

372
RF is an ensemble regression method that is based on a 373 combination of multiple decision trees. This is a classic ML 374 technique in which the output is the most popular class among 375 the decisions of the individual trees for the classification case 376 or the average of the estimations of the individual trees in 377 the regression case [12], [ Table 3, Table 4, and Table 5 present correlation and accu-449 racy of the decoded motion with the actual motion for the 450 two DL models and the RF-based models developed using 451 50 trees. In most cases, the DL models perform better in 452 terms of correlation and accuracy than the RF model. The 453 model that presents the best correlation and accuracy was 454 the TMC-T. 455 VOLUME 10, 2022 TABLE 5. Pearson correlation (C) and accuracy (A) for tested models using 8 TD features as input for the off-center object.   using each extracted EMG feature. The gesture classes were 468 balanced for each gesture to avoid any biases due to an 469 imbalanced dataset. Therefore, it was ensured that the training 470 validation sets have the same number of data points for each 471  Table 8 presents the time taken (in ms) by the RF models 489 and DL models to predict one sample. Table 8   real-time EMG-based execution of dexterous manipulation 509 tasks with the ARoA platform.    In order to test the autonomous task execution capabilities of 543 the proposed framework and the employed platform, the robot 544 was given the task of tidying up a table, where the objective 545 was to grasp and move all the objects that were on the table 546 into a bin.

548
Finally, the complete shared control framework was tested in 549 the fifth scenario. In this case, the human and the robot con-550 troller complete the task in a synergistic manner. To demon-551 strate the capabilities of the proposed framework, a special 552 table cleaning task was considered where the perception of 553 the robot system fails due to a transparent or irregular object. 554 In such a case, the robot system is unable to execute the task 555 in an autonomous manner since it is not capable to identify 556 where the object is on the table or it is unable to find an 557 efficient grasping strategy to execute the task. Therefore, 558 assistance from a human-in-the-loop is required to help the 559 robot grasp the object and then pass the control back to the 560 robot for autonomous task execution. Fig. 7 shows instances 561 of the experiment to demonstrate the real-time teleoperation 562 performance of the proposed shared control framework.