User Training With Error Augmentation for sEMG-Based Gesture Classification

We designed and tested a system for real-time control of a user interface by extracting surface electromyographic (sEMG) activity from eight electrodes in a wristband configuration. sEMG data were streamed into a machine-learning algorithm that classified hand gestures in real-time. After an initial model calibration, participants were presented with one of three types of feedback during a human-learning stage: veridical feedback, in which predicted probabilities from the gesture classification algorithm were displayed without alteration; modified feedback, in which we applied a hidden augmentation of error to these probabilities; and no feedback. User performance was then evaluated in a series of minigames, in which subjects were required to use eight gestures to manipulate their game avatar to complete a task. Experimental results indicated that relative to the baseline, the modified feedback condition led to significantly improved accuracy. Class separation also improved, though this trend was not significant. These findings suggest that real-time feedback in a gamified user interface with manipulation of feedback may enable intuitive, rapid, and accurate task acquisition for sEMG-based gesture recognition applications.


I. INTRODUCTION
S URFACE electromyography (sEMG) provides a conve- nient sensor modality for human-computer interaction (HCI) applications [1].In the past two decades, research efforts have sought to translate the electrical activity associated with muscle contraction into control commands for general use computing, prosthetic control, and motor rehabilitation [2], [3].As the demand for more intuitive and responsive interfaces has grown, the focus on sEMG-based gesture recognition has intensified.
Traditional approaches to sEMG-based gesture recognition assumed stationarity of the mapping between muscle activation and gestures, and did not consider the user's ability to adapt their behavior based on feedback about gesture classification performance.The emergence of co-adaptive learning algorithms in the past decade represented a marked shift, acknowledging both human and machine learning as parts of an integrated system [4], [5], [6], [7], [8], [9].One key finding from these approaches is that when the human receives continuous feedback about the mapping of muscle activation to gesture, they can increase classification performance through behavioral adaptations [10], [11].These adaptations can result in increased class separability [12] and increased movement repeatability [13].However, the relationship between feature space adaptations and classifier performance is complex.Increased real-time classifier performance has also been found even in the absence of EMG feature space changes in relative class distributions [14].Despite the complex relationship between feature space class distributions and classifier performance, the influence of human learning on myoelectric gesture classification remains a compelling target of investigation.
Human learning about myoelectric gesture classification can be considered as a form of motor skill learning.In the literature on motor learning, the canonical view is that humans use a combination of intrinsic feedback (sensory information) and augmented feedback (information that is not readily accessible through intrinsic feedback) [15].Augmented feedback can be further categorized as providing 'knowledge of performance' (information about specific movements and muscle activations), or 'knowledge of results' (information about outcomes) [16], [17].In the present study, we focus on myoelectric control, where providing knowledge of results corresponds to providing output from a classifier, while knowledge of performance corresponds to descriptions of the features extracted from the sEMG.The ability to shape human behavior in traditional motor skill learning settings through the use of augmented feedback is well established.Strategies such as error augmentation [18], [19], [20] and reward manipulation [21], [22] have been shown to affect the rate and retention of learning as well as behavioral variability.Yet, to our knowledge, the use of error-augmented feedback has not been tested for co-adaptation approaches to sEMGbased gesture recognition.
In this study, we conducted an experiment to test whether modified feedback about class posterior probabilities affects performance in a myoelectric control task.We provided subjects with a form of error-augmented knowledge of results; by altering class probabilities, we diminished the differences between classes, making it harder for the target gesture class to exceed a predefined decision threshold.In particular, we softened probabilities towards a uniform distribution.This form of feedback manipulation is closely related to previous uses of error augmentation, also referred to as error amplification [23], [24], [25].As mentioned, this form of feedback has been shown to hasten learning and improve the quality of self-evaluation [18], [26] and increase retention of learned skills [23], [27].We therefore hypothesized that error amplification by softening probabilities would increase subsequent gesture classification performance by enhancing human skill learning.The knowledge gained from this investigation has broad potential applications for use in myoelectric prosthetics, assistive devices, and human-computer interfaces where users perform only a brief 4-minute calibration, and human learning may be critical to the success of model performance.

II. EXPERIMENTAL DESIGN
All protocols were approved by the Northeastern University Institutional Review Board (IRB number 15-10-22) in conformance with the declaration of Helsinki.

A. Subjects
Forty-four right-handed subjects (21 male / 23 female, mean age ± 1 standard deviation: 20.9±4.3 years) participated after providing IRB-approved written informed consent.Subjects were free of orthopedic or neurological diseases that could interfere with the task and had normal or corrected-to-normal vision.

B. Experimental Setup
Subjects viewed a computer display while seated at a table with their right arm positioned comfortably in an armrest trough.Surface electromyography (sEMG) (Trigno, Delsys Inc., sampling frequency: 1926 Hz) was collected from the muscles of the right forearm.Eight sEMG electrodes were placed at equidistant positions around the circumference of the forearm, at a four finger-width distance from the ulnar styloid (the subject's left hand was wrapped around the right forearm at the ulnar styloid to determine the sEMG placement).The first electrode was placed mid-line on the dorsal aspect of the forearm, and the other electrodes were then equally spaced (see Figure 1).

C. Data Acquisition
1) Subject Group Assignment: Subjects were randomly assigned to one of three groups and performed a series of tasks as described below.Subjects who were unable to complete all tasks were excluded from further analysis.Each subject group was assigned a different feedback condition: no feedback ("Control", N=14), veridical feedback ("Veridical", N=14), or modified feedback ("Modified", N=16) (see Section II-C.5 for details).Subject group assignments were randomized before enrollment.In order to control for the possible confounding effect of biological variation in baseline performance across groups, we adopted a within-subject normalization strategy (see Section IV-A).
2) Gesture Timing: Subjects performed a series of tasks composed of one or more gesture trials to move an avatar dice (see details of user interface below).Prior to the start of a trial, the subject's forearm and wrist rested in a pronated position on the trough with the wrist neutral.In each trial, subjects were required to rest or to produce one of eight active gestures   2 shows the timing of an example gesture trial.This trial timing structure was chosen empirically to give enough time for subjects to prepare for each upcoming trial while keeping the total experiment duration short.Gesture trial timing was kept consistent to ensure that subject reaction times were not a source of variation in performance.
Each experimental session was divided into four blocks.Blocks one, two, and four used the trial timing described above.By contrast, in block three (in which some subjects received model feedback) the gesture production epoch lasted 30 seconds for each gesture.During this time period, continuous feedback was provided by applying a classifier model on a sliding window of data, with a step size of 13.5 milliseconds (based on the frequency of data packets delivered by our sEMG sensors).
3) Block One: Calibration: Subjects from all groups were instructed to perform five consecutive repetitions of each active gesture and eight repetitions of a rest gesture in which they were asked to relax the hand.This consecutive structure was chosen to help keep the task simple while the participant initially learned the set of available gestures.A classification model was trained on this small dataset before continuing to the next experimental block.
4) Block Two: Instructed Games: Subjects from all groups engaged in four practice mini-games.In each mini-game, subjects were instructed to perform a sequence of six gestures to bring an avatar that was shown on the computer screen from a starting position to a desired goal state (e.g.see Figure 3).The trial timing epochs (prompting, gesture production, and rest) were as shown in Figure 2. In this block, the classifier model's predicted probabilities were displayed as post-hoc feedback to the user, but were not used to modify the avatar position or state; the avatar always moved one step closer to the goal after each trial, so that each game lasted exactly six moves.These games were structured so that the 24 total gestures (4 games with 6 moves each) were evenly distributed among the 8 active gestures.After this block, the classification model was retrained from scratch using the labeled data from blocks one and two.This training set comprised 8 examples for each of the 9 classes (8 active gestures and "Rest").Fig. 3. Example mini game.The blue player avatar must be moved to match the gray target avatar.The minimal path includes moving right, down twice, decreasing the die number (using a pinch gesture), and reducing size (using a fist gesture).
5) Block Three: Live Feedback: Only subjects in the veridical feedback and modified feedback groups participated in this block.Subjects performed only one extended trial for each gesture while viewing real-time feedback; in these trials, the gesture production epoch lasted 30 seconds.Subjects were asked to freely explore their hand posture in order to maximize the predicted probability of the current gesture class, shown on a real-time histogram of the trained model's output.For the veridical feedback group, predicted class probabilities were displayed without modification.For the modified feedback group, probabilities were softened towards a uniform distribution as described in Section III-C.As discussed previously, the motivation behind this softening procedure was to encourage participants to compensate by performing more precise gestures.Subjects in the modified feedback group were not informed about this softening procedure.
6) Block Four: Free Games: All subjects were instructed to perform a series of 12 mini-games.The mini-games had the same structure as in block two, with each game requiring a minimum of six moves to bring the avatar from its starting position to a desired goal state.However, unlike the practice mini-games of block two, subjects were tasked with bringing the avatar to its goal state by planning and performing a gesture sequence of their choice.Critically, the avatar only changed its state when the classifier assigned one class a predicted probability above a decision threshold of 0.5.The experimenter manually recorded each attempted gesture to serve as labels for subsequent analysis, and the participant's hand movements were also recorded on video to cross-check these labels.

A. Feature Extraction
As described in Section II-C.2, we extracted raw data for classification from the final 500 ms of the active gesture production period of each gesture trial.From each of the 8 sensor channels of raw sEMG, we computed the Root-Mean-Square (RMS) value and the median frequency of the Fourier spectrum, resulting in 16-dimensional features.Given a data Authorized licensed use limited to the terms of the applicable license agreement with IEEE.Restrictions apply.
vector x, RMS is defined as: The Median Power Frequency is defined as the frequency value f MED that divides the Power Spectral Density (PSD) into two regions with equal power [28]:

B. Classification Model
Given extracted features, we used a two-stage classification pipeline to predict among 9 possible gestures: Up, Thumb, Right, Pinch, Down, Fist, Left, Open, and Rest.The classification model consisted of an encoder formed from Support Vector Machine (SVM) models that produced a latent representation, and a logistic regression classifier that produced predicted class probabilities.In the encoder portion of the model, we trained a one-vs-one (OVO) SVM classifier [29] for each of the 9  2 = 36 pairs of gestures.Each of these OVO-SVM models produced a scalar output (representing the probability of assigning to the first of its two classes); these 36 scalars were stacked into a latent vector and passed to the logistic regression model.
Given a supervised training dataset, we first fit the one-vsone SVM models using linear programming with the CVXPY Python library [30].The linear programming objective we used was based on the semi-supervised SVM formulation of [31], to allow future semi-supervised extensions.Specifically, the SVM parameters were trained according to the following optimization problem: where w, b were the parameters to be optimized, η i were slack variables allowing misclassification of individual points, and C > 0 is a fixed penalty parameter controlling the margin's strictness.
We implemented the logistic regression classifier with the PyTorch Python library [32] using a single linear layer and a SoftMax function.After the SVM encoder portion of the model was trained, it was held fixed while the logistic regression classifier model was trained by stochastic gradient descent to minimize the cross-entropy loss.We trained the classifier model for 1000 epochs with a batch size of 20 and AdamW [33] optimizer.See Algorithm 1 for a summary of our classifier training procedure.
Smoothing: As noted, participants in the veridical feedback and modified feedback groups were shown real-time output from the model.Due to the high sampling frequency of the sEMG sensors used, and the relatively computationally simple prediction model, the system was capable of making very fast adjustments to the predicted output, which can result in unwanted jitter due to slight fluctuations in raw signal or hand positioning.Therefore, we used an exponential moving average (EMA) to smooth the model's predictions in time.At time-step t, the model produces a raw probability vector P (t) , which is then mixed with the previous probability vector using a momentum parameter λ to produce a smoothed vector P (t) SMOOTH : For values of λ close to 1, this causes the probability vector to update more slowly and smoothly.We used a value of 0.9, which alleviated the issue of jitter in the model output, while still allowing model outputs to change quickly between different gestures.

C. Modified Feedback
As mentioned above, subjects in the modified feedback group were shown modified real-time output from the trained classifier during block three of the experiment.Specifically, the vector of smoothed predicted probabilities from the model was modified according to the following formula: where the modification exponent m was set to 0.75, and C represents the 9 classes used.The value of m was chosen subjectively to make a noticeable effect while not being too extreme; since subjects must still be able to exceed a decision threshold of 0.5 for a gesture to be correct.Note that this feedback can be viewed as a form of error augmentation.When asked to perform a certain target gesture, we can consider the error to be the distance (e.g.crossentropy distance or L2 norm) between the model's predicted probability vector and an idealized probability vector in which all mass is concentrated on the target class.Subjects in both feedback groups were instructed to explore gestures and maximize the predicted probability of the target class; thus they were instructed to minimize this error.However, subjects in the modified feedback group viewed a flattened probability vector; this flattening causes the vector to appear to have greater error.See Figure 5 for an example.

D. User Interface and Software Design
Figure 4 shows the user interface (UI) displayed to participants.All components of the UI were implemented using PyQt Authorized licensed use limited to the terms of the applicable license agreement with IEEE.Restrictions apply.Python package [34].Data collection and real-time processing were performed using the LabGraph Python package [35].On the top left, the UI displayed an instructed gesture via image and text during blocks one and two (see Section II-C.3 and II-C.4).On the bottom left, the UI showed post-hoc predicted probabilities for each gesture as a radial plot.The length of each line was scaled according to the value; the outer circle represented a value of 1, and the inner circle represented a value of 0.5 (i.e. the model's decision threshold).The opacity of gesture images around the radial plot was also scaled according to the value.The outer edge of the UI was colored yellow, green, or red to indicate gesture timing epoch as described in Section II-C.2.On the right of the UI was the task window in which the mini-games were played during blocks two and four (see Section II-C.4 and II-C.6).As described previously, participants used one of 8 active gestures to move their avatar (the blue die).The goal of each mini-game in blocks two and four was to use these gestures to match the blue die to the gray target die.
a) Error Augmentation in Live Feedback: During block three (see Section II-C.5), participants who received real-time feedback were presented with a different display, as shown in Figure 5. Here, the probability of each class was displayed using a bar plot that was updated in real-time.The participant's goal during this block of the experiment was to explore hand positions in order to maximize the predicted probability of the current gesture class.For participants in the modified feedback group, model outputs were flattened towards a uniform distribution using Equation 5.

E. Classifier Metrics
As mentioned in Section II-C.6, the experimenter recorded each intended gesture made by the participant, so that model accuracy could be evaluated after-the-fact.Accuracy was defined as the fraction of correctly classified items.In addition to the 8 active gestures and the "rest" class, the decision threshold of 0.5 that was used resulted in another possible outcome for gesture trials when no gesture rose above the decision threshold, which we refer to as "NoClass."Gesture trials in which the subject was not prepared to make a gesture during the "gesture production" epoch were recorded as having a true label of "Rest."

F. Feature-Space Class Structure
To evaluate how feedback affects human learning, we analyzed the feature-space distribution of trials from different gestures performed in block four of the experiment.This feature-space representation does not depend on the model, since these features are obtained using simple, deterministic transformations of the raw data (RMS and median frequency after Fourier transform).The differences in feature-space class structure across treatment groups can therefore give information about human learning.
Previous research has introduced a variety of feature space metrics for similar tasks, such as separability index and repeatability index [12], [14].Such metrics are based on the Mahalanobis distance and require computing a class covariance matrix.Since our experiment is focused on short calibration times and we operated in a regime of limited data, we do not have enough samples to compute reasonable estimates of class covariance matrices, even with shrinkage techniques.We therefore used feature-space metrics based on pairwise comparisons between samples.
a) Kernel Similarities: We base our analysis of featurespace structure on a Radial Basis Function (RBF) kernel similarity measure.The RBF kernel computes a similarity measure that corresponds to an implicit infinite-dimensional vector space.For two feature vectors x, x ′ belonging to a dataset X and a length scale parameter γ ∈ R, the RBF kernel similarity is computed as: The length scale γ is an important hyperparameter that determines the rate at which similarities decay as two points Authorized licensed use limited to the terms of the applicable license agreement with IEEE.Restrictions apply.
are moved farther apart.We follow the so-called "median heuristic" [36], in which γ is set based on the median length scale of a dataset X : We set γ MED individually for each subject, based on all of their pooled gesture trials.Note that this approach is effectively a non-linear rescaling of pairwise Euclidean distances, and also handles the potential issue of outlier points having extremely large Euclidean distances.
b) Class Similarity Matrices: We use this notion of kernel similarity to construct a class similarity matrix for each subject.For classes C 1 , . . ., C C , we build a square, symmetric matrix D ∈ R (C ×C ) such that the entry at position (i, j) describes the average RBF kernel similarity between items in classes C i and C j : After computing the entries in a similarity matrix, we normalize the entries to the range [0, 1] so that these matrices may be easily compared across subjects and groups.
Classes that are closer together in feature space will have a higher average similarity and therefore a larger entry in this similarity matrix.A subject whose gestures are easily classifiable may tend to have precise gestures that are also well-separated from each other.This would result in having a high average similarity between trials in the same gesture class (diagonal entries of the class similarity matrix) and a low average similarity between trials of different classes (off-diagonal entries).See Section IV-D for class similarity matrices from each experimental group, and see Figure 6 for didactic examples of similarity matrix D. c) Scalar Class Separation Measure: In order to look for trends in the feature-space distribution over time and to identify global trends across groups, we also summarize these normalized class similarity matrices using a scalar class separation measure, d SEP , which we define as the average within-class similarity divided by the average between-class similarity.Given a normalized similarity matrix D as described above, As indicated above, larger within-class similarities indicate that trials from the same gesture are precise and repeated with high-fidelity, while smaller between-class similarities indicate that trials from different gestures are easily distinguished.Thus, a dataset with a larger value of d SEP may contain gestures that will be more easily classified.
In Figure 6, we show examples of class similarity matrix D and scalar similarity measure d SEP .To produce an example that can be easily visualized, we select a subject from the "Modified" condition that showed a large improvement in feature-space separation.For this subject, we select three gestures ("Left", "Down", and "Right") and three features (RMS value from electrodes 1, 4, and 7).In the top row, we show metrics for this subject's data during the "Calibration" and "Instructed" blocks, and in the bottom row, we show metrics from the "Free" block; recall that the subject experiences live feedback training after the "Instructed" block.We observe that the features of each class become more distinct after the user performs live feedback training; this is captured as an increase in the similarities on the diagonal of D and a decrease in similarities off-diagonal.These changes in D are also summarized in d SEP , which increases from 2.8 to 3.55.

G. Within-Subject Normalization
The focus of this work is to measure the effect of the proposed veridical and modified feedback strategies on subject performance.We note that overall subject performance may be influenced by a relatively large number of factors of variation, such as factors affecting dexterity and motor precision, subject motor learning speed, and subject-intrinsic factors affecting raw sEMG signal-to-noise ratio.Thus, a prohibitively large sample size may be required to account for this variation without normalization.We therefore adopt a within-subject normalization strategy, obtaining baseline statistics for each subject using only data measured before our interventions.
For each subject, we measure baseline accuracy by training a model from scratch using that subject's block one data (calibration, Section II-C.3), and testing this model's classification accuracy on the subject's block two data (instructed games, Section II-C.4).
We obtain baselines for class similarity matrices in the same manner.Within each subject, we collect all gesture trials from the first two experimental blocks, and compute Authorized licensed use limited to the terms of the applicable license agreement with IEEE.Restrictions apply.
a normalized class similarity matrix.This is subtracted from the matrix computed using data from block four (free games, Section II-C.6) to visualize the difference in similarity for each class.Note that due to the short experimental design, we have relatively few samples per class with which to construct each matrix, and therefore this representation may be somewhat noisy.
We transform the normalized similarity matrix describing blocks one and two into the scalar class separation measure d SEP , and likewise transform the similarity matrix describing block four.This results in a baseline-subtracted class separation measure.
Overall, we measure changes from baseline as follows:

H. Statistical Analysis
We performed several statistical analyses to determine the effect of feedback on classification accuracy and feature space class separation.Differences between feedback groups at baseline (Acc BASELINE , d SEP, BASELINE ) were analyzed using one-way ANOVAs.Likewise, the effect of the feedback group on change scores ( Acc, D) was analyzed with one-way ANOVAs (α = 0.05).Alpha level was set at 0.05.Significant findings were further analyzed using post-hoc paired comparisons with Bonferroni correction for multiple comparisons.One-sided one-sample t-tests with Bonferroni correction for multiple comparisons (α = 0.0167) were used on change scores to test whether each feedback group significantly increased accuracy and distance.

IV. RESULTS
All participants were able to successfully complete the experiment, with no reported adverse events.

A. Group Baselines
In order to check whether random group assignment was a potential confounding factor in our comparisons between groups, we analyzed baseline metrics for each experimental group.One-way ANOVA indicated no significant differences in baseline accuracy (F(2, 43) = 1.15,P = 0.326) or class separation (F(2, 43) = 0.99, P = 0.380) between experimental groups.Figure 7 shows a group-level summary of the baseline accuracy and class separation measure.Though no significant differences were found, mean baseline accuracy and class separation scores were greatest in the Control group and smallest in the Modified group.

B. Effects of Feedback
Individual one-sided one-sample t-tests were used to test for significant improvement in Free block performance from baseline (Bonferroni corrected for 3 comparisons, α = 0.0167).For accuracy, only the Modified group showed significant improvement (t (13) = 2.566, P = .012).No group showed Figure 8 shows the average change from baseline performance in each experimental group, as measured in the accuracy of gesture classification (left panel) and feature-space class separation measure (right panel).These data demonstrate that, on average, the increase in performance over the course of the experiment was greatest for subjects in the modified feedback group.Note that the variation between subjects is relatively high, resulting in overlapping estimates of mean performance.We observe that both groups that received realtime feedback exhibited larger variation; in particular, the interquartile range for these two groups (0.18 and 0.19 units for Veridical and Modified, respectively) is nearly twice the range of the control group (0.10 units).This may indicate that some subjects are better at learning from this form of visual feedback than others, or that some subjects were adversely affected by feedback while others were positively affected.

C. Class Confusion
Figure 9 shows the group average confusion matrices of gesture trials during block four (free games) for each group.Rows represent the classification of the attempted gesture, normalized to 1.There are notable similarities across the groups, indicating several gestures that are intrinsically difficult and gesture pairs that are inherently close.In particular, the "thumb", "pinch", and "fist" gestures all have a large fraction (about 25%) of gestures that fall below the decision threshold.Similarly, there was an overall trend that these three gestures tended to be confused, resulting in non-zero entries for the off-diagonal entries (fist, thumb), (fist, pinch), (thumb, pinch), etc.The similarity between groups is an indication Boxplots show median and quartiles; dotted lines show mean.For each subject, we perform baseline subtraction as described in Section III-G.Change in accuracy for the modified group was significantly greater than zero using; see Section IV-B for statistical analysis.
that feedback did not grossly disrupt subject behavior for certain gesture classes or cause substantially different effects for different classes.

D. Class Feature Space Similarity
Figure 10 shows the average normalized class similarity matrix of each group.By examining the diagonal entries, we can understand the repeatability of gestures (i.e. the similarity between items of the same class); by examining the off-diagonal entries, we can understand the separability of gestures (i.e. the similarity across different classes).As described previously, a "desirable" pattern for easy downstream classification (in which the subject produced consistent and well-separated gestures) would consist of larger entries on the diagonal and smaller entries off-diagonal.
Each group demonstrated a consistent pattern in which the diagonal entries were brighter than the off-diagonal entries, indicating that the gestures were generally repeatable and well-separated.There was also a consistent pattern of bright off-diagonal cells, indicating high similarity between three specific gestures: "pinch", "fist", and "thumb".These patterns match well with the patterns visible in the class confusion matrices shown in Figure 9.This correspondence between our similarity metrics and confusion matrices may indicate that our chosen similarity metric is well-suited to this setting and aligns well with model performance.
We did not observe any gross changes in the structure of class similarity between groups; note that such a change could have occurred if feedback affected gestures differently, and this effect may not have been visible by only inspecting the scalar d SEP metric.

V. DISCUSSION AND FUTURE WORK
This study tested the potential of modified continuous feedback of model performance in a gamified user interface for rapid user training on a sEMG-based gesture recognition system for controlling actions on a computer display.
We hypothesized that we could use manipulation of feedback about the gesture class probabilities in a short (4-minute) online learning session to shape user behavior in a manner that would increase the separation between muscle activation patterns of different gestures and increase the accuracy of model performance on future attempts.Overall, our results demonstrate that a short user training session using modified feedback has the potential to increase post-calibration performance (accuracy and class separation relative) when compared to veridical feedback and a no-feedback control.

A. User Calibration
Despite the emergence of research into methods for coadaptive learning for sEMG-based gesture recognition, there have been few investigations specifically testing the effect of user training as a means of rapid calibration.Numerous studies have shown that extended user training on an sEMG-based controller results in significant gains in performance [12], [13], [37].The majority of these studies have found that increased model performance was accompanied by changes in muscle activation patterns that are theoretically favorable to better classification (such as improvements in class separability, variability, or repeatability).However, feature space characteristics of class distributions are not necessarily predictive of classifier performance, and this relationship is likely strongly dependent on the classifier used and the relationship between training and test data.For example, a recent investigation showed that the relationship between performance and feature-space metrics can be complex; these authors found that the realtime performance of an LDA classifier was only weakly correlated with class separability, but was not correlated with variability or repeatability [14].Krasoulis et.al. first demonstrated that short-term adaptation through biofeedback user training could positively impact prosthetic finger control using sEMG-based decoding [10].Our results demonstrate that subjects who received modified live feedback experienced a significant increase in classification accuracy.We also found that both veridical and modified feedback provided a trend of improvement in our feature space metric d SEP , though this effect was not statistically significant.

B. Influence of Feedback Manipulation on User Behavior
In our experiments, the Modified feedback group showed the largest change in classification accuracy and class separability.Flattening of the class probabilities as was done here can be considered a form of error augmentation, since subjects were led to believe that the separation between classes was smaller than it actually was.This approach is most closely related to techniques involving feedback with "error amplification," which has been studied extensively.Feedback of performance outcomes that are worse than actual performance (i.e.error amplification) has been found to expedite motor adaptations to novel task constraints compared to accurate feedback [38], [39].Amplification of task errors has also shown promise as an approach to facilitate motor recovery in Authorized licensed use limited to the terms of the applicable license agreement with IEEE.Restrictions apply.patients with neurological disorders [25], [40].Faster or more complete learning with error amplification has been attributed to more brain processes associated with greater attention to execution of the motor task [41], [42], [43] and reduction of sensorimotor noise [20].We speculate that improvement in classification accuracy with Modified feedback in this study may be a product of similar mechanisms.

C. Selected Gestures
We selected gestures that mimicked the manipulation of commonplace items such as remote controls and cell phones.
No subject commented that the gestures were unfamiliar or difficult to perform.Directional gestures using wrist movements ("Up", "Down", "Left", "Right") were generally more separable and yielded higher classification accuracy compared to gestures using grasping movements ("Pinch", "Thumb", "Open", "Fist").The extrinsic hand muscle groups used by each of these grasping gestures are similar, which may explain why subjects had a difficult time performing them accurately while also creating separation in muscle activation patterns.Thus the feature-space similarity that we observed for these grasping gestures is somewhat expected.

D. Limitations
There were several limitations of the current work that may have affected the results and interpretations.Only a single classification model was used.Several machine learning methods, including artificial neural networks, linear discriminant analysis, support vector machines (SVM), and Gaussian mixture models have been previously used for sEMG-based control.The choice to use a model based on SVM and logistic regression was due to its simplicity and the popularity of SVM for this application.It is possible that the choice of classifier model affects both calibration accuracy and the way that users explore the mapping of muscle activation to gestures.Nevertheless, the user training scheme employed here likely has general benefits for use and understanding of human coadaptive behavior.
There are a number of possible changes in the signal processing pipeline that may yield improvements in overall model performance.The active window for feature extraction may be tuned, and additional features such as time-frequency domain or higher-dimensional feature vectors may be extracted.The selected features (RMS, and median frequency) were chosen based on their common use for sEMG-based gesture classification and initial pilot testing.Future work should evaluate how sEMG feature selection affects user training.

E. Designing Improved Feedback
Only a single type of feedback manipulation was tested.We used a feedback manipulation that flattened probabilities across classes, making it more difficult to achieve a correct classification.This approach was selected as it was expected that participants would respond by increasing the separation between muscle activation patterns for different gestures.While we observed a non-significant trend of improvement in class separation, the manipulation was not directly optimized for this purpose.Future research should explore the optimization of feedback manipulation for shaping user behavior during co-adaptive sEMG-gesture recognition.Adaptive feedback manipulation based on user and model performance characteristics to target specific class confusions is an attractive future direction.Further improvement may come from iterating between rounds of visual feedback to induce human learning, and rounds of model re-training using the subject's most recent data.The approach we used was a form of modified knowledge of results; future work could explore using modified knowledge of performance by giving the user feedback about feature space characteristics such as distance between the current feature vector and a representative item from the target class, or aggregate feature metrics describing properties like separability and repeatability.
(label and action provided in brackets): index-thumb pinch ["Pinch", decrease number on avatar dice], index-thumb key press ["Thumb", increase the number on avatar dice], closed fist ["Fist", decrease size of avatar dice], full finger extension ["Open", increase size of avatar dice], wrist extension ["Up", move up], wrist flexion ["Down", move down], wrist radial deviation ["Left", move left], wrist ulnar deviation ["Right", move right].Each trial began with a 'prompting' epoch (3 sec) cued by a yellow bounding box the participant's display and a picture of the instructed gesture (Calibration and Instructed blocks only, see below), a 'gesture production' epoch (2 sec) cued by a green bounding box, and a 'recovery' epoch (3 sec) cued by a red bounding box.The final 500 milliseconds of the gesture production epoch were used for feature extraction and classification.

Fig. 2 .
Fig.2.Gesture Trial Timing.In the yellow 'prompting' epoch, the subject sees an instruction.In the green 'gesture production' epoch, the subject performs the gesture.In the red 'recovery' epoch, the subject returns to the rest position.Features for classification are extracted from the last 500 ms of gesture production to help ensure that steady-state features are collected.

Fig. 4 .
Fig. 4. The participant User Interface.Top left: instructed gesture.Bottom left: predicted gesture probabilities.Right: Task window including subject's avatar and target.Outer edge: gesture epoch indicator.

Fig. 5 .
Fig. 5. Top: Real-time probability feedback window.The horizontal line at 0.5 shows the decision threshold.Bottom: Example of probability values without modification ("Veridical") and with modification ("Modified") as described in Sec.III-C for several hypothetical values of m. m = 0.75 used for real experiments.Arrows highlight an example case where modification causes the gesture to become sub-threshold; participant may compensate by improving gesture quality.

Fig. 6 .
Fig. 6.Didactic example for class similarity matrices D and scalar class separation measure d SEP .For a chosen subject from the Modified condition, we analyze 3 of the original 16 features (RMS value from electrodes 1, 4, and 7) and a subset of gestures ("Left", "Down", and "Right").Top row: features from calibration and instructed blocks.Bottom row: features from free games.Left: Scatter plot of 3-dimensional features, and scalar class separation value.Right: The corresponding class separation matrix.

Fig. 7 .
Fig. 7. Baseline Performance.Left: Accuracy.Right: Scalar class separation measure d SEP .Boxplots show the median and quartiles; dotted lines show the mean.Note the relative difference in subject baseline task performance, visible as a gap in baseline accuracy.This discrepancy (due to random group assignment and low subject number) indicates the need for within-subject normalization, as described in Section III-G.See Section IV-A for statistical analysis.

Fig. 8 .
Fig. 8. Overall Changes from Baseline Performance.Left: Change in accuracy.Right: Change in scalar class separation measure d SEP .Boxplots show median and quartiles; dotted lines show mean.For each subject, we perform baseline subtraction as described in Section III-G.Change in accuracy for the modified group was significantly greater than zero using; see Section IV-B for statistical analysis.

Fig. 9 .
Fig. 9. Confusion Matrices averaged across subjects and normalized within each row.No within-subject correction is applied.Class confusion structure is largely similar across groups.Left: Control subject.Middle: Veridical feedback.Right: Modified Feedback.

Fig. 10 .
Fig. 10.Normalized Class Similarity Matrices.Top row: Raw similarities from block four (free games, see section II-C.6).Class similarity matrix D is computed for each subject, normalized to [0, 1], and then averaged across subjects in a group.Large values on the diagonal indicate tight clusters for each class.Small values off-diagonal indicate well-separated clusters.Bottom row: Change in similarity matrix from baseline 1D, as described in Equation 10.Positive values indicate pairs that became closer in feature space, compared to baseline; subjects whose structure improved would show positive values on the diagonal and negative values off-diagonal.See Section III-F for further details.Left: Control group.Middle: Veridical feedback.Right: Modified feedback.Upper triangular parts are omitted due to symmetry.