Rejecting Unknown Gestures Based on Surface-Electromyography Using Variational Autoencoder

The conventional surface electromyography (sEMG)-based gesture recognition systems exhibit impressive performance in controlled laboratory settings. As most systems are trained in a closed-set setting, the systems’s performance may see significant deterioration when novel gestures are presented as imposter. In addition, the state-of-the-art generative and discriminative methods have achieved considerable performance on high-density sEMG signals. This can be seen as an unrealistic setting as the real-world muscle computer interface are mainly comprised of sparse multichannel sEMG signals. In this work, we propose a novel variational autoencoder based approach for open-set gesture recognition based on sparse multichannel sEMG signals. Using the predefined corresponding latent conditional distribution of known gestures, the conditional Gaussian distribution of each known gesture is learned. Those samples with low probability density are identified as unknown gestures. The sEMG signals of known gestures are classified using the Kullback-Leibler divergences between the predefined prior distributions and input samples. The proposed approach is evaluated using three benchmark sparse multichannel sEMG databases. The experimental results demonstrate that our approach outperforms the existing open-set sEMG-based gesture recognition approaches and achieves a better trade-off between classifying known gestures and rejecting unknown gestures.


I. INTRODUCTION
T HE Muscle-Computer Interface (MCI) interprets a user's body language by decoding the surface electromyography Qingfeng Dai is with the State Key Laboratory of CAD and CG, College of Computer Science and Technology, Zhejiang University, Hangzhou 310058, China (e-mail: qfdai@zju.edu.cn).
Xiangdong Li is with the College of Computer Science and Technology, Zhejiang University, Hangzhou 310058, China (e-mail: axli@zju.edu.cn).
Digital Object Identifier 10.1109/TNSRE.2024.3360035 (sEMG) signal [1].In the prior arts, sEMG-based gesture recognition has attracted the most investigation [2], [3], [4].Normally, the closed-set setting, where the type of gestures presented in the training and test set are identical, is adopted.In contrast, a realistic scenario assumes that a system developer could not anticipate all the possible gestures available during the test or deployment phase.This is to say, a user may perform an unknown gesture, whereas the MCI system will identify it as a known one.Thus, it is essential to develop a system based on open-set recognition (OSR) [5] setting.
The objective of open-set sEMG-based gesture recognition is to identify a segment of sEMG signals and classify it as a known gesture or an unknown gesture.Most of the existing OSR approaches are designed for image-relevant tasks such as image recognition and image segmentation [5], [6], [7], [8].Only a handful of OSR approaches are proposed for sEMG-based gesture recognition [9], [10], [11].Briefly, Wu et al. [10] trained multiple autoencoders (AEs) to reject unknown gestures with significantly different representations of sEMG feature images.Besides, a discriminative method based on a convolutional prototype network (CPN) constructs multiple prototypes of known motion classes and identifies samples of known/unknown motions using the prototype matching [11].Despite achieved promising performance on high-density sEMG signals, there are no evaluations on sparse multichannel sEMG datasets, such as the NinaPro databases [2], [12], [13], [14].Moreover, sparse multichannel sEMG is more manageable than HD-sEMG in terms of data processing and computational requirements.In this paper, we introduce a novel open-set gesture recognition approach based on a variational autoencoder (VAE) with conditional Gaussian distribution learning [15] that enables classifying known gestures and rejecting unknown gestures given the input sparse sEMG signals for more efficiently training.
VAE exhibits the capacity to effectively reconstruct input data while simultaneously enforcing the approximation of a prior distribution by the posterior distribution in the latent feature space.Based on the reconstruction loss of a test sample, a predefined threshold is set to identify whether the given gesture is known or novel.Due to electrode placement, user differences, and muscle fatigue, sEMG signals can exhibit significant variability.Previous methods might not effectively account for this variability, leading to inadequate rejection of unknown gestures.Gaussian distributions can model a wide range of sEMG representations.By conditional Gaussian distribution learning(CGDL) [15], the model can adapt its understanding of the variability and central tendencies.Therefore, we adopt the techniques utilized in conditional Gaussian learning [15], where a distinct latent conditional distribution in the latent space for every known gesture is predefined.Then Kullback-Leibler divergences between the predefined prior distribution of known gestures and the representations of input sEMG samples are regarded as the confidences that input sEMG signals belong to a specific known gesture.Furthermore, we conduct experiments on three sparse multichannel sEMG datasets including NinaPro DB1, NinaPro DB2 and NinaPro DB5 [2], [12] to validate the effectiveness of our approach.
Our contributions can be summarized as follows: • We propose a novel variational autoencoder (VAE)-based approach for open-set gesture recognition based on sparse multichannel sEMG.We present how to classify known gestures using a VAE, which is commonly utilized for rejecting unknown gestures.
• We demonstrate the effectiveness of the VAE by utilizing different encoder and decoder architectures to train models for the open-set sEMG-based gesture recognition task.Not only our approach enables rejecting unknown gestures, but also achieves a competitive performance of classifying known gestures.
• We conduct comprehensive evaluations on three sparse multichannel sEMG datasets.The experimental results show that the proposed approach achieves better performance than the state-of-the-art open-set gesture recognition method.Specifically, our approach outperforms the prior arts by 0.04, 0.02 and 0.01 on NinaPro DB1, NinaPro DB2, and NinaPro DB5, using the metric of the area under the receiver operating characteristic (AUROC) which reflects the capacity to accurately reject unknown gestures.Moreover, the ablation studies examine various components of the proposed approach, including architectures of the encoder and decoder, known gestures utilized for training as well as the effects of the VAE.The remainder of this paper is organized as follows.Section II summarizes the related work.We formulate the problem of open-set gesture recognition based on sEMG and introduce the proposed VAE-based approach in Section III.Section IV describes the evaluation datasets, evaluation metrics, network architectures and implementation details.The experimental results are demonstrated in Section V. Finally, we conclude this paper and discuss future work in Section VI-A.

II. RELATED WORKS
In this section, we will discuss relevant literature pertaining to our field of work regarding surface electromyography (sEMG) based gesture recognition under the closed-set and open-set scenarios.
A. Closed-Set Gesture Recognition Based on sEMG This approach aims to interpret hand gestures from sEMG signals using computer algorithms under the closed-set setting.Depending on the way of feature extraction, the approaches can be categorized into conventional machine learning approach and deep learning approach [16].The former extracts handcrafted features such as mean absolute values, waveform lengths, and discrete wavelet transform coefficients [17].Upon extracting handcrafted features, conventional machine learning classifiers including SVMs, kNNs, Random Forests, and so on [14], [18], and [2] are typically utilized for performing downstream classification.Differently, deep learning-based approaches produce discriminative features using deep neural networks [3], [4], [19], [20], which are more effective and faster.For example, Geng et al. [3] converted the sEMG signals into grayscale images and leveraged a typical convolutional neural network to classify the converted sEMG images.Besides, a hybrid CNN-RNN [19] and XceptionTime [4] are proposed to model the spatial and temporal information of sEMG signals and achieve considerable gesture recognition performance.On the other hand, recent works also have looked at multi-modal gesture recognition for a higher recognition accuracy [17], [21], [22], [23].

B. Open-Set Gesture Recognition Based on sEMG
In the test phase of closed-set gesture recognition, all the available gestures are already known during training.Nevertheless, it is challenging to ensure these conditions are available in real-world scenarios.Therefore, open-set sEMGbased gesture recognition approaches are investigated to tackle this problem.
Open-set recognition (OSR) problem is firstly proposed for face recognition to detect unknown faces [24].Specifically, the OSR problem aims to learn a classifier to not only accurately distinguish known classes but also reject unknown ones.Existing approaches can be divided into two categories including discriminative approaches and generative ones.The former utilizes discriminative models such as support vector machine [5] and convolutional neural network [25] to output classification probabilities of rejecting unknown classes through a comparison with a predefined threshold.In addition, metric learning [26], [27]) and prototype learning [28], [29] are commonly used to distinguish between known and unknown samples via their distances on the feature space.With regard to the generative approaches, generative models are trained to learn the probability distribution of each known class.Specifically, if the likelihoods of the test sample estimated from all known class models are low, the sample will be classified as belonging to an unknown class.Typically, generative adversarial networks [30], [31], [32]) and autoencoder [15], [33]) are the most popular generative approaches applied for the OSR problem.
Although there are several typical strategies for traditional OSR problems, open-set gesture recognition based on sEMG is much more challenging.The main challenges lie in the inevitable noise and the low amplitude of sEMG signals that increase the difficulty of finding a proper function to identify unknown classes from sEMG signals.Similarly, discriminative approaches utilize well-performed models on sEMG-based gesture recognition to produce probabilities that input sEMG signals belong to unknown gestures.For example, Robertson et al. [34] used a support vector machine classifier to make a tradeoff between error mitigation and unknown rejections in real-time close-loop myoelectric control at a specific threshold.Linear discriminant analysis (LDA) [35] and artificial neural networks (ANN) [9] are also utilized to output entropy function to discriminate between known command gestures and unknown gestures based on sEMG signals.Besides the aforementioned discriminative approaches, Ding et al. [36], [37] proposed hybrid models that combine a multi-class classifier of known classes and an one-class classifier rejecting unknown classes.Wu et al. [10] utilized a reconstruction-based autoencoder (AE) to reject samples with high reconstruction errors and achieved significant performance using high-density surface electromyography.To overcome the respective drawbacks of discriminative and generative approaches, Wu et al. [11] further proposed a method that combines their advantage in an end-to-end manner based on HD-sEMG is proposed using convolutional prototype network (CPN) and prototype matching.Although open-set gesture recognition based on HD-sEMG has been investigated, few sparse multichannel sEMG datasets are utilized to evaluate these approaches.

III. METHOD
This section delineates the proposed method based on variational autoencoder (VAE) [38] for open-set sEMG-based gesture recognition (Fig. 1).As a probabilistic graphical model, a VAE is capable of accurately reconstructing input data while also enforcing the posterior distribution q φ (z|x) in the latent space to closely approximate a prior distribution p θ (z).Therefore, an appropriately trained VAE is expected to accurately discern known data and identify biased samples as unknown via the reconstruction loss.Nevertheless, the VAE cannot classify known categories because all samples follow a single distribution.To enable classifying known categories, we predefine a latent conditional distribution q φ (z|x, k) for each known class (denoted as kth) in the latent space via conditional Gaussian distribution learning(CGDL) [15].Subsequently, samples of the kth known gesture are required to approximate a multivariate Gaussian distribution p (k) θ (z) = N (z; µ k , I ), where µ k is the mean value of the kth distribution.Because samples of each gesture conform to its corresponding prior distribution, those samples with low probability density are identified as unknown gestures.With regard to samples of known gestures, Kullback-Leibler divergences between the predefined prior distributions and input samples are calculated as the confidences.The gesture with the highest level of confidence is regarded as the predicted one of the model.The complex and diverse patterns of sEMG signals demand a more discriminative architecture of VAE.Besides, certain unknown gestures can closely resemble known gestures, such as the thumb up and fist, which implies that their corresponding Gaussian distributions may be too similar.To tackle this issue, we utilize a parameter-tuning strategy based on Bayesian optimization to find the optimal probability threshold for a better trade-off between rejecting unknown gestures and classifying known ones.

A. Problem Formulation
This paper focuses on open-set sEMG-based gesture recognition.Given a training set of sEMG signals with data-label pairs the minimization of open space risk R O and empirical risk R ϵ is solved to find the optimal function f as follows: where f (x) > 0 means that the sEMG sample x belongs to a known gesture.The empirical risk R ϵ denotes the average classification loss of all samples in the D Train and λ r is the weight of classification loss.

B. Constructing Variational Auto-Encoder (VAE)
A VAE usually consists of an encoder and a decoder, whose loss function is represented as L(θ ; φ; x) where φ and θ are the parameters of the encoder and the decoder, respectively.A latent representation z is obtained after the input sEMG signal x is fed into the encoder.Conversely, a reconstructed sEMG sample x is obtained by feeding the latent representation z into the decoder.The loss function of VAE is commonly defined as follows: where q φ (z|x) denotes the approximate posterior distribution, p θ (z) is the prior distribution of the latent representation z, and p θ (x|z) represents the conditional possibility of input sample x given a latent representation z.The first item of Eq. ( 1) is the KL divergence between the approximate posterior distribution and the prior distribution.This loss item enforces the similarity between q φ (z|x) and p θ (z).The second item of L(θ ; φ; x) is often represented using the reconstruction loss.
The prior distribution p θ (z) is assumed to be an independent multivariate Gaussian distribution, denoted by p θ = N (z; 0, I ).Then the variational posterior distribution obtained from the trained VAE is formulated as follows: where the mean value µ and variance σ of Gaussian distribution are outputted by the trained VAE.The latent representation is z = µ + σ ⊙ ϵ and ϵ follows the standard normal distribution.The conditional information of the kth gesture is introduced into VAE and the prior distribution p θ (z) is denoted as N (z; µ k , I ), where the vector µ k is obtained by passing the one-hot encoding of each gesture through a fully-connected layer.Thus, the KL divergence is calculated as follows: The architecture of the proposed model for open-set sEMGbased gesture recognition is derived from GengNet [3].It consists of a number of modules, including an encoder F, a decoder G, a classifier C for known gestures and a detector D for unknown gestures.We delineate the details of these modules in the following: 1) Encoder F: The architecture of F follows that of GengNet and a module for calculating the mean and variance is integrated at each layer of F. Specifically, the output feature x l of the lth layer is transformed into a one-dimensional representation and then it is fed into a fully-connected layer to obtain the mean and variance values.To prevent information loss, we replace the original ReLU activation function with PReLU.
2) Decoder G: The architecture of G is equivalent to that of the inverse encoder, except the convolution layers are replaced by the transpose convolution layers and Tanh is employed as the activation function of the last layer.This module aims to reconstruct sEMG signals using parameter sampling from the learned distribution.In this decoder, a probability ladder architecture [39] is utilized to exchange bottom-up and top-down information for restoring the information that is dismissed by the encoder F. The information is also represented in terms of a Gaussian distribution and its mean, and a Gaussian distribution and its variance are as follows: 3) Classifier C for Known Gestures: Firstly, the classifier C calculates the probability densities of each gesture using the latent representation z and Gaussian prior distributions.Then, the probability densities of each gesture are normalized through the Softmax function.After that, the gesture with the highest confidence level is regarded as the final classification result.And the probability densities of the kth gesture are calculated as follows: 4) Detector D for Unknown Gestures: Given the welltrained encoder F and decoder G, a specific multivariate Gausssian distribution f k (z) = N (z; m k , σ 2 k ) of each gesture can be constructed.Then, we calculate the mean m k and variance σ 2 k of the latent representations of all the training samples of the kth gesture in the feature space.Thus, the probability that a sample being drawn from the distribution f k (z) is defined as follows: where n denotes the dimension of the feature space and z is represented as

C. Evaluation Procedure
We also investigate the hidden information that the reconstruction loss carries, while the reconstruction loss with inputs from known gestures is typically smaller than that of unknown gestures [40].In practical implementation, the threshold τ r for reconstruction error is determined by identifying a fixed ratio of training sEMG data that is classified as known.Overall, the evaluation procedure of our approach is shown in Algorithm 1.  X is predicted as unknown 12: else 13: X is predicted as a known gesture y pr ed 14: end if

D. Loss Function
The total loss function of our proposed approach is as follows: This loss function consists of three loss items including reconstruction loss (L r ), KL divergence (L K L ) and classification loss (L c ).The reconstruction loss is obtained by means of the L 1 distance and the classification loss is computed using the cross-entropy loss function.The KL divergence is leveraged to approximate the posterior distribution to the prior distribution of each gesture by maximizing the similarity between the intermediate representations as follows: where

IV. EXPERIMENT SETTINGS
In this section, we will briefly introduce the benchmark datasets used in our experiments, the training settings, evaluation metrics, and implementation details.

A. Datasets
Three sparse multichannel sEMG datasets, i.e., NinaPro DB1, NinaPro DB2 [2], and NinaPro DB5 [12] are utilized to train and evaluate the proposed VAE-based approach on open-set gesture recognition based on sEMG.The details of these three datasets are described in Table I.The training trials, validation trials and testing trials columns denote the number of trials used for training, validation and testing respectively.The unknown gestures column represents the gestures selected as unknown ones in the experiments.

B. Data Preprocessing
Due to the high noise of sEMG signals, it is necessary to preprocess the raw signals in the experiments.Firstly, a Butterworth low-pass filter, whose cutoff frequency coefficient is set as 0.

C. Evaluation Metrics
To evaluate the performance of the proposed approach on open-set sEMG-based gesture recognition, the gestures of each dataset are divided into two parts including known gestures and unknown gestures.Additionally, the datasets are split into a training set, a validation set and a testing set in terms of trials.The specific dividing strategy refers to Table I.To perform a comprehensive comparison, four kinds of evaluation metrics are employed: Precision: Intra-session gesture recognition precision is adopted for evaluation.Testing is conducted on sEMG data that contains K + N gestures, with the output being K + 1 classification results.Therefore, Precision is computed as follows: Authorized licensed use limited to the terms of the applicable license agreement with IEEE.Restrictions apply.

TABLE II AUROC AVERAGED OVER ALL SUBJECTS OF FOUR APPROACHES ON THREE SPARSE MULTICHANNEL SEMG DATASETS INCLUDING NINAPRO DB1, NINAPRO DB2 AND NINAPRO DB5. THE RESULTS IN BOLD ENTRIES INDICATE THE BEST PERFORMANCES
Recall: The weighted average recall of all gestures F1-Score: The harmonic mean between the Precision and Recall, which is a comprehensive evaluation metric for evaluating a model.
The ROC curve is a graphical representation of the classifier's true positive rate versus its false positive rate, where each point on the curve represents a different threshold value for classifying the known and unknown gestures.The AUROC is the area under the ROC curve, which ranges from 0 to 1.The higher the AUROC, the better the performance of the evaluated model.

D. Network Architecture
The architectures of F and are derived from the GengNet [3], which achieve strong performance on NinaPro databases.Specifically, GengNet is composed of two convolutional layers, two locally connected layers and three fully connected layers.Each convolutional layer consists of 64 3 × 3 filters with a stride of 1 and a zero padding of 1.Each locally connected layer consists of 64 1 × 1 filters.The three fully connected layers consist of 512, 512 and 128 hidden units, respectively.Batch normalization and the ReLU nonlinearity function are applied to each layer, and dropout is adopted to prevent overfitting.

E. Implementation Details
Our VAE-based approach is implemented with PyTorch.For all the three datasets, we utilize an optimizer of RAdam whose initial learning rate, weight decay and batch size are set as 0.001, 0.01 and 512 respectively for improved convergence and generalization.Besides, the learning rate is divided by 10 after each epoch and the total number of training epochs is set to 28 following the settings in GengNet [3].The weights of loss items including classification loss and KL divergence are set to 100.0 and 1.0, respectively.With regard to the window lengths of input signals, 20, 40 and 40 are respectively adopted for NinaPro DB1, NinaPro DB2 and NinaPro DB5, to account for time delay constraints.We can see that the same hyperparameters are used for all three NinaPro variants which indicates that our model is not fine-tuned for each dataset.

A. Evaluation on Benchmark Dataset
To demonstrate the effectiveness of our proposed VAEbased approach, we conduct a comprehensive comparison with existing approaches on open-set gesture recognition based on sEMG in the evaluation metrics of AUROC, Precision, Recall and F1-Score.For a fair comparison, several existing open-set sEMG-based gesture recognition approaches are implemented.The details of these approaches are as follows: • OpenMax [25] It is the most commonly used approach for open-set recognition where an OpenMax layer is proposed to replace the Softmax layer that is often used in the deep classification task.In this paper, we regard this approach as a baseline.
• Multiple-autoencoders [10] It trains multiple autoencoders (AEs) to reject representations from any unseen gesture that appeared significantly different from known gestures.
• Convolutional Prototype Network (CPN) [11] This approach constructs a CNN feature extractor and multiple prototypes of known gestures.Then samples of known or unknown gestures are identified using prototype matching.Firstly, we compare our approach with the aforementioned three approaches using the evaluation metric of AUROC to validate its performance on unknown gesture rejection.Table II shows the average AUROC using testing sEMG data from all subjects of the four approaches.Our approach outperforms CPN, Multiple-AEs, OpenMax with an improvement on NinaPro DB1 and NinaPro DB2.Specifically, our approach improves the AUROC metric by +0.04, +0.08, +0.10 on NinaPro DB1, +0.02, +0.05, +0.04 on NinaPro DB2, +0.01, +0.02, +0.02 on NinaPro DB5 compared with CPN, Multiple-AEs and OpenMax, respectively.Note that AUROC metric reflects the ability to detect unknown gestures, our approach performs best to detect unknown gestures.
Secondly, our approach is compared with the other three methods under the scenario to classify K + 1 gestures including K known gestures and one unknown category.Specifically, the metrics Precision, Recall and F1-Score in Section IV-C are adopted to evaluate the effectiveness of our approach.The performances of these four methods are shown in Table III.We can see that the three used evaluation metrics are consistent in validating the performance of classifying known gestures and rejecting unknown ones.Specifically, using the Precision metric, our approach outperforms CPN, Multiple-AEs, and OpenMax by +0.01, +0.03, and +0.06, respectively, on NinaPro DB1.Thus, our approach improves the performances of rejecting unknown gestures alongside classifying known gestures on NinaPro DB1.However, CPN performs best on NinaPro DB2 and NinaPro DB5 compared with the other three methods.Because discriminative methods can directly learn classification boundaries, the discriminative method CPN outperforms the other three methods in most scenarios.As a generative method, our approach achieves competitive performance against CPN and even performs better on NinaPro DB1.Note that our approach surpasses compared open-set sEMG-based gesture recognition approach Authorized licensed use limited to the terms of the applicable license agreement with IEEE.Restrictions apply.for rejecting unknown gestures, our approach achieves a better trade-off between classifying known gestures and rejecting unknown gestures.

B. Ablation Study 1) Effects of CGDL:
To demonstrate the concrete factors that contribute to the effectiveness of our approach, we conduct a fair comparison between the proposed approach and the typical VAE without CGDL on NinaPro DB1, NinaPro DB2 and NinaPro DB5.In the experiments, GengNet [3] is selected as the architecture of F and G.As shown in Table V, VAE with CGDL outperforms the typical VAE in the metric of Precision.However, their performances in rejecting unknown gestures are comparable when using the AUROC metric.These experimental results indicate that the introduction of CGDL could mitigate the degradation of classifying known gestures while sustaining the capability to reject unknown gestures.
2) Backbone Architecture of VAE: To validate whether the architectures of the encoder F and the decoder G affect the performance, we conduct a fair comparison using our approach with various architectures of F and G. Specifically, XceptionTime [4]) and TCN [41] are applied as the backbone architecture of F and G for this validation.The former contains multiple XceptionTime modules which are composed of several separable convolutional layers and max pooling layers.The latter is composed of 1D fully-convolutional network and dilated convolutions enabling an exponentially large receptive field.Table IV reports the performance of our approach with three kinds of architectures of F and G.As shown, GengNet outperforms XceptionTime and TCN.In addition, the proposed approach consistently outperforms the baseline with each architecture, which indicates our approach is model-agnostic and can be applied to common ones.In this experiment, the evaluation metrics of AUROC and Precision are adopted for comparison.
3) Variation on Known Gestures: Here, we examine the effects of known gestures used for training on the performance of open-set sEMG-based gesture recognition.To cope with this, we divide the gestures into three groups and then conduct a 3-fold cross-validation.Specifically, two groups are selected as known gestures for training and validation, but the evaluation is performed on test data of all gestures.In this experiment, we select NinaPro DB1 and mix the sEMG data of all subjects to dismiss the influence brought by the individual differences of subjects.As shown in Table VI, the performances of rejecting unknown gestures using different kinds of known gestures are comparable.However, the known gestures used for training in our approach could affect the performance in the metric F1-score.Specifically, when the second group and third group are selected as known gestures, our approach achieved an F1-score of 0.83 on NinaPro DB1, which is much higher than the results obtained in the other two settings.
4) Effects of VAE: In this part, we explore the effects brought by the VAE on the features extracted using F. To do this, we train the GengNet on NinaPro DB1 under the closedset and open-set scenarios respectively.Then we utilize TSNE for visualizing the deep sEMG features, with a perplexity set to 30 and 5000 iterations.From each gesture, we randomly select 1000 samples for visualization.The specific visualization results of TSNE are displayed in Fig. 2   the features of each gesture adhere to their respective prior Gaussian distributions by minimizing KL divergence.In this way, the extracted features of the same gesture can aggregate together, mitigating the open space risk.

A. Discussion
In this part, we discuss the reason why the introduction of VAE can improve the open-set gesture recognition performance and our future work.We first recall the framework of our approach, where the conditional Gaussian distribution learning [15]) is employed to define different prior distributions for all gestures, enabling classification for known gestures through comparison between distributions in terms of KL divergence.Although there are other competitive generative networks, such as GAN [42] and Glow [43], the reconstruction ability and latent space representation of VAE mitigates the negative impact caused by high noise in the sEMG signals.In addition, VAE has a probabilistic formulation where the encoder learns to approximate the posterior distribution of the latent variables given the sEMG data from a specific gesture.

B. Conclusion
Most existing sEMG-based gesture recognition approaches are investigated under closed-set settings, which means that gestures during testing are already known when training.However, in a realistic scenario, the user may perform an unknown gesture which will be misclassified as a known gesture by a closed-set muscle-computer interface.Therefore, we propose a novel open-set sEMG-based gesture recognition approach based on the variational autoencoder (VAE) and we evaluated it on 3 public sparse multichannel benchmark databases.Different from the typical VAE, we utilize conditional Gaussian distribution learning [15] to enable classifying known gestures.Experimental results indicate that the proposed approach achieves better performance in rejecting unknown gestures compared with three existing approaches.Specifically, our approach achieves respective AUROC improvements of +0.04, +0.02, and +0.01 on NinaPro DB1, NinaPro DB2, and NinaPro DB5, compared with the state-of-the-art approach.On the other hand, our approach achieves a competitive performance in classifying known gestures against the discriminative open-set recognition methods.So the proposed approach could reach a better tradeoff between classifying known gestures and rejecting unknown ones.We also conducted an ablation study to demonstrate the effects of known gestures for training and VAE on rejecting unknown gestures.The experimental results show that our approach is model-agnostic to the architecture of VAE and the introduction of VAE could mitigate the open space risk.Besides, the effects of known gestures used for training were investigated and we found that the performance of rejecting unknown gestures was minimally impacted, which means our approach is robust to training data.
Our future work will focus on enhancing the performance of rejecting unknown gestures meanwhile making the recognition accuracy of known gestures comparable to that under the closed-set settings.One way is to apply a more discriminative network as the encoder of VAE, whose performance on sEMGbased gesture recognition is superior.Another way is to replace the VAE with the latest generative networks, such as the diffusion probabilistic model [44].We will also investigate improving the inter-subject or inter-session open-set gesture recognition performance based on our work considering its more practical prospects.

Manuscript received 7
September 2023; revised 1 January 2024; accepted 20 January 2024.Date of publication 30 January 2024; date of current version 15 February 2024.This work was supported in part by the National Natural Science Foundation of China under Grant 61972346 and Grant 92148205; and in part by the Science and Technology Planning Project of Zhejiang, China, under Grant 2022C03103.(Corresponding authors: Weidong Geng; Xiangdong Li.)

Fig. 1 .
Fig. 1.Diagram of the proposed method based on VAE for open-set sEMG-based gesture recognition.Conv, LC, and FC respectively denote the convolution layer, locally-connected layer and fully-connected layer.The number following the layer name denotes the number of filters, and the numbers after the symbol @ denote the convolution kernel size and stride size.

Algorithm 1
The Procedure of Evaluation for Open-Set sEMG-Based Gesture Recognition Input: A test sample X Input: The trained encoder F, decoder D and classifier for known gestures C Input: The threshold probability τ l for Gaussian distribution Input: The threshold τ r for reconstruction error Input: With regard to each gesture, denoted as k, the latent representation of the correctly classified training sample x i,k is z i,k 1: for k = 1, . . ., K do 2:

Fig. 2 .
Fig. 2. The visual comparison of features extracted by GengNet under the closed-set and open-set scenarios using our approach.Features of eight gestures are displayed in this figure and the 7th and 8th gestures belong to the unknown category.
where x i , x j Authorized licensed use limited to the terms of the applicable license agreement with IEEE.Restrictions apply.denotes the sEMG signals and y i , y j for gesture labels.As an open-set task, the C Train is the known classes and {C Train + 1, • • • , C Test } is the unknown classes.Our target is utilizing D Train to train a model f to accurately classify the subset D known Test = {x i ∈ X Test , y i ∈ {1, • • • , C Train }} and detect unknown gestures in the subset D unknown Test

TABLE III
Precision,Recall AND F1-Score AVERAGED OVER ALL SUBJECTS OF FOUR APPROACHES ON THREE SPARSE MULTICHANNEL SEMG DATASETS INCLUDING NINAPRO DB1, NINAPRO DB2 AND NINAPRO DB5.THE RESULTS IN BOLD ENTRIES INDICATE THE BEST PERFORMANCES TABLE IV COMPARISON OF OUR APPROACH USING THREE DIFFERENT NETWORK ARCHITECTURES FOR F AND G ON NINAPRO DB1, NINAPRO DB2 AND NINAPRO DB5.THE RESULTS IN BOLD ENTRIES INDICATE THE BEST PERFORMANCES TABLE V COMPARISON OF VAE WITH CGDL AND THE TYPICAL VAE ON NINAPRO DB1, NINAPRO DB2 AND NINAPRO DB5.THE RESULTS IN BOLD ENTRIES INDICATE THE BEST PERFORMANCES . From Fig 2a, we can see that the features extracted by GengNet under the closedset scenario are discretely distributed throughout the feature space and exhibit mutual overlap.In our approach (Fig 2b),

TABLE VI OPEN
-SET SEMG-BASED GESTURE RECOGNITION PERFORMANCE WITH DIFFERENT KNOWN GESTURES FOR TRAINING ON NINAPRO DB1