LSTM-MSA: A Novel Deep Learning Model With Dual-Stage Attention Mechanisms Forearm EMG-Based Hand Gesture Recognition

This paper introduces the Long Short-Term Memory with Dual-Stage Attention (LSTM-MSA) model, an approach for analyzing electromyography (EMG) signals. EMG signals are crucial in applications like prosthetic control, rehabilitation, and human-computer interaction, but they come with inherent challenges such as non-stationarity and noise. The LSTM-MSA model addresses these challenges by combining LSTM layers with attention mechanisms to effectively capture relevant signal features and accurately predict intended actions. Notable features of this model include dual-stage attention, end-to-end feature extraction and classification integration, and personalized training. Extensive evaluations across diverse datasets consistently demonstrate the LSTM-MSA’s superiority in terms of F1 score, accuracy, recall, and precision. This research provides a model for real-world EMG signal applications, offering improved accuracy, robustness, and adaptability.


I. INTRODUCTION
E LECTROMYOGRAPHY (EMG) signals, with their non- stationarity, noise, and intricate relationship to intended actions, present challenges in analysis.Overcoming these obstacles is essential for optimizing applications such as prosthetic control and rehabilitation that rely on accurate interpretation of EMG data.While deep learning techniques have shown great potential in extracting relevant features and making accurate predictions about intended actions, there are still some limitations that need to be addressed, such as the need for large amounts of labeled data and the non-stationary and noisy nature of EMG signals.
To address these limitations, we propose the Long Short-Term Memory with Dual-Stage Attention (LSTM-MSA) model for processing EMG signals and predicting intended actions.The LSTM-MSA model combines LSTM layers with attention mechanisms to extract relevant features from input signals and make accurate predictions about intended actions.The novelty and contribution are summarized as follows: 1) Compared with existing myoelectric hand gesture recognition methods, the model uses attention mechanisms to weigh the important elements in the signal sequence and focus on the most relevent parts of the signal, leading to improved accuracy and robustness.
2) In particular, the model uses dual-stage attention, which involves two levels of attention: one on the input sequence and one on the output of the LSTM layer.Dual-stage attention allows the model to capture both local and global dependencies in the signal, resulting in better performance.
3) The model integrates feature extraction and classification into a single end-to-end trainable network, which simplifies the training process, reduces the risk of overfitting, and allows the model to learn more representative and discriminative features, leading to improved classification accuracy.
4) Innovative personalized training adapts EMG gesture recognition models to individuals, enhancing sim2real transfer, accuracy, and adaptability in real-world scenarios.
Moreover, We emphasize the novelty of our gesture recognition framework and explicitly connect it to performance improvements.We provide a detailed explanation of how our model differs from existing methods, showcasing its unique contributions to enhanced accuracy, robustness, and generalization, particularly in the context of challenging EMG signal variations.
The proposed LSTM-MSA model is a powerful architecture for processing EMG signals and making accurate predictions about intended actions.By combining LSTM layers with attention mechanisms, the model can capture complex temporal relationships in input signals while focusing on the most relevant features.
The organization of the remaining part of this work is as follows: In Section II, some related works are discussed.In Section III, the LSTM-MSA model, which combines LSTM layers with dual-stage attention mechanisms for EMG signal analysis and action prediction, is proposed.The architecture, attention mechanisms, and training process of the model are explained.In Section IV, the performance of the proposed model is evaluated, and a comparison with other state-of-theart models is conducted.Various metrics are used to measure the accuracy and robustness of the model.In Section V, the paper is concluded, and future research directions are suggested.

II. RELATED WORKS
The field of EMG for hand gesture recognition has been investigated extensively using various techniques such as deep learning, feature extraction, and pattern recognition [34].This section provides an overview of representative and important works in this area.

A. Convolutional Neural Network (CNN) Based EMG Classification Algorithms
CNN is often selected for gesture classification, and it normally views EMG signals as an image.Several studies have demonstrated the effectiveness of CNNs for hand gesture classification using EMG signals.Park et al. [1], Atzori et al. [2], Olsson et al. [3], Zhai et al. [4], and Chen et al. [5] have all explored different CNN-based approaches with promising results.However, the large number of parameters in deep CNN models can hinder real-time applications.To address this, Chen et al. [6] propose a compact CNN model called EMGNet with reduced parameters and improved accuracy.Additionally, studies have investigated the use of larger and "thinner" filters to exploit the narrow gap between the length and width of EMG images [7], [8].These developments contribute to the advancement of real-time EMG-based gesture classification.

B. Recurrent Neural Network (RNN) Based EMG Classification Algorithm
The RNN, which is usually selected to process temporal information for tasks such as natural language processing has also been applied to this problem [9], [10], [11], [12], [13].Nasri et al. [9] propose a GRU-based scheme with 77.85% accuracy for processing EMG segments.Koch et al. [10] present an RNN scheme with improved performance.Simão et al. [11] use various deep learning methods for single-frame EMG processing.Samadani et al. [12] achieve 86.7% accuracy with bidirectional LSTM for gesture recognition.Alfaro-Ponce et al. [13] compare TDNN, DifNN, and CVNN for EMG and foot pressure signals, all achieving over 95% accuracy.

C. Auto-Encoder (AE) Based EMG Classification Algorithm
AE-based schemes for EMG classification can be categorized into two types: hand-crafted feature-based methods [14] and raw data-based methods [15].Rehman et al. [14] apply stacked sparse auto-encoders (SSAE) to improve EMG classification using multiday recordings.SSAE outperforms linear discriminant analysis (LDA) with four time-domain features for intact and disabled subjects.Rehman et al. [15] compare the performance of CNN, SSAE, and LDA for hand gesture classification and find that SSAE performs better when using time-domain features as input.

D. Deep Belief Network (DBN) Based EMG Classification Algorithm
DBN approaches typically utilize hand-crafted features [16], [17].Shim et al. [16] propose the Split and Merge DBN, enhancing its performance using a genetic algorithm and achieving a 12.06% improvement over classical DBN.Zhang et al. [17] use DBN with time-domain feature sets to recognize normal and aggressive EMG signals, achieving an accuracy of 90.66% ± 1.47%.Sun et al. [18] introduce a generative flow model (GFM), similar to DBN, for converting EMG data into factorized features and applying softmax classification for EMG classification purposes.

E. Mixed Network Structures
Ding et al. [7] used a parallel multiple-scale CNN for hand gesture classification, while Wei et al. [19] employed a CNN with multiple sub-streams.Gao et al. [20] proposed a dual-flow network with CNN and LSTM, Wu et al. [21] used CNN and LSTM with attention mechanism, Xie et al. [22] combined CNN and LSTM, and Tong et al. [23] used CNN and RNN for gesture classification.Tsinganos et al. [24] achieved improved performance with a temporal convolutional network (TCN) on Ninapro DB1, and Zanghieri et al. [25] developed TEMPONet, a TCN-based network on an embedded system, outperforming existing methods on Ninapro DB6.

F. Associate Deep Learning With Machine Learning Methods
In some studies, a CNN is combined with other techniques, such as stacking ensemble learning [26] and machine learning methods [27], to enhance decision performance in human intention recognition.Shen et al. [26] used a CNN with inputs of EMG data, Discrete Fourier Transform (DFT) of EMG data, and Discrete Wavelet Packet Transform (DWPT) of EMG data, optimized by a stacking ensemble learning algorithm.Chen et al. [27] replaced the output layer of a typical CNN with Support Vector Machines (SVM), LDA, and K-Nearest Neighbors (KNN), showing improved performance compared to traditional feature-based methods.

G. Applying Analogical Procedures for Movement Recognition
Shao et al. [28] propose a scheme for upper limb motion recognition using single-channel EMG, employing the singular value decomposition (SVD) method and wavelet deep belief networks (WDBN).Deep learning approaches, such as gait stage classification [29], wrist motion recognition [30], [31], and arm motion prediction [32], [33], can be applied to similar tasks with slight differences in label names compared to hand gesture classification.

H. Other Analysis Methods for EMG
Several papers [35], [36], [37], and [38] introduce advanced techniques, including graph neural networks, genetic algorithms, semantic role labeling, and ensemble methods, primarily in the context of EMG analysis.However, these innovative methodologies have the potential to be adapted and applied to enhance EMG signal processing and analysis, bridging the gap between these domains.

III. METHODOLOGY A. Data Collection Equipment
Myo EMG armband is a wearable device equipped with eight surface EMG sensors, see Fig. 1.It captures and interprets the electrical signals generated by muscles, allowing users to control applications and devices through gestures and muscle movements.With its wireless Bluetooth connectivity and comprehensive software development kit (SDK), developers can create customized applications and integrations.The Myo armband has been widely utilized in various fields, including virtual reality, gaming, medical rehabilitation, and human-computer interaction research.It offers a non-invasive and intuitive way to harness muscle-generated signals for interactive experiences.

B. Datasets
Our dataset comprises sEMG signals recorded from the forearm muscles while individuals performed hand gestures.In our study, we utilized three distinct datasets.
The first dataset focused on two primary hand gestures: grasping and releasing.Each gesture was executed for 20 seconds, and the signals were recorded at a sampling frequency of 100 Hz.We collected data from 5 individuals, and each person performed both gestures 10 times.This resulted in a total of 100 samples, with 90% of them allocated for model training and 10% for testing.Visual representations of these gestures can be found in Fig. 2. The second dataset, known as the number gesture dataset, featured hand gestures corresponding to numbers 0 to 9.These gestures were also performed for 20 seconds, recorded at a 100 Hz frequency, and collected from 6 different individuals.Each individual performed each gesture three times, resulting in a total of 180 samples.Similar to the first dataset, 90% of these samples were used for training, and 10% for testing.This dataset was specifically designed for transfer learning purposes and is detailed in Fig. 3. Additionally, we collected data for 20 seconds of continuous motion for both grasping and releasing gestures, as well as for the number gestures.This additional data helped us assess whether our model could effectively recognize gestures in continuous motion scenarios.
Within our dataset, specific parameters were defined: a 100 Hz sampling frequency and a 300-millisecond window size.The sampling frequency represents the rate at which data points were recorded, while the window size was used to segment data for model training, enabling the analysis of local signal characteristics.
In addition to our proprietary datasets, we incorporated the NinaPro DB5 dataset to further validate the accuracy and robustness of our algorithm.The NinaPro DB5 dataset is a well-established benchmark in sEMG-based gesture recognition, encompassing a wide range of hand movements and gestures.Its diversity makes it an ideal choice for evaluating our algorithm's performance under various conditions.
The NinaPro dataset includes data from a larger participant pool, providing a comprehensive perspective on our approach's effectiveness.It encompasses 17 different hand gestures performed by multiple individuals, allowing us to assess our model's generalizability across different users and gestures.
The integration of the NinaPro dataset into our study demonstrates the versatility and effectiveness of our algorithm in real-world scenarios, reinforcing its reliability and applicability in sEMG-based gesture recognition.

C. Model Structure
The LSTM-MSA model proposed in this research is a deep learning architecture designed to process EMG signals.The model utilizes a combination of LSTM layers and attention mechanisms to extract relevant features from the input signals and make accurate predictions about the intended actions.The overall framework is shown in Fig. 4.
The proposed model is designed to classify hand gestures based on input EMG signals.The input consists of a batch of EMG signal sequences, each with a length of 2000 and 8 features representing 8 channels of EMG sensors.The output of the model is a two-dimensional vector representing the predicted class probabilities for each input sequence.The model has four main layers: an input attention layer, an LSTM layer, an output attention layer, and a fully connected layer.The input attention layer applies a self-attention mechanism to the input sequence to extract the most relevant features for classification.This mechanism uses a query matrix Q, a key matrix K , and a value matrix V to calculate an attention score matrix S. The score matrix is then normalized by a softmax function to obtain an attention weight matrix W , which is used to weight the value matrix V to produce an attention output matrix A. The attention output matrix is then fed into a fully connected layer that maps it to the hidden size of the LSTM layer.
The LSTM layer encodes the input sequence and generates a sequence of hidden states by using long short-term memory units.This allows the model to capture the temporal dependencies and dynamics of the input sequence.The LSTM layer consists of four gates: an input gate i, a forget gate f , an output gate o, and a cell gate c.The input gate decides how much new information to add to the cell state.The forget gate decides how much old information to forget from the cell state.The output gate decides how much information to output from the cell state.The cell gate updates the cell state based on the input and forget gates.
The output attention layer applies another self-attention mechanism to the LSTM output sequence to weight their importance for the classification task.This mechanism uses different linear transformations for Q, K , and V but follows the same procedure as the input attention layer.The output of this layer is a weighted sum of the LSTM hidden states.
The fully connected layer concatenates the weighted sum from the output attention layer with the weighted output from the input attention layer and feeds it into a linear transformation followed by a softmax function to produce the final output of the model -a two-dimensional vector representing the predicted class probabilities for each input sequence.The model also includes additional layers to improve its performance and generalization ability, including dropout layers to prevent Algorithm 1 Attention-Based LSTM for EMG Signal Classification Require: Input sequences X ∈ R N ×L×F , where N is the batch size, L is the sequence length, and F is the number of features.Ensure: Predicted class probabilities P ∈ R N ×C , where C is the number of classes.1: Apply self-attention to the input sequence: 2: Apply LSTM to the input sequence: 6: Apply self-attention to the output sequence: overfitting, batch normalization layers to improve training stability, and activation functions such as ReLU to introduce non-linearity.During training, the model is optimized using the Adam optimizer with a learning rate of 0.001 and cross-entropy loss as the objective function.The Adam optimizer adjusts the learning rate for each parameter based on its gradient magnitude and momentum.The cross-entropy loss measures the difference between the predicted probabilities and the true labels.
The model's use of two attention layers enhances its ability to process EMG signals.The input attention layer filters out noisy or irrelevant parts of the input signals, while the output attention layer highlights the most discriminative parts of the output sequences.Together, these layers capture both the global and local information of the EMG signals.The overall algorithm is shown in Algorithm 1.

D. Sim2Real Problem
To address the challenge of sim2real transfer, we have implemented a strategy of personalized training for users.Prior to using our system, individuals are required to undergo personalized training, which includes conducting ten consecutive training sessions.This process helps ensure that the model adapts more effectively to the specific physiological characteristics and EMG signal patterns of each individual, thereby enhancing its performance and adaptability in realworld applications.
In this approach, we maintain the parameters of the pre-trained model fixed and append a two-layer MLP classifier at the end.This personalized training setup allows us to fine-tune the model specifically for each user.After the initial ten training sessions, we evaluate and validate the model's suitability and accuracy for the individual user.
Through this personalized training process, our model is better equipped to handle the transition from simulated to real-world environments, ultimately improving its overall generalization and recognition accuracy in practical scenarios.For more details, please refer to Experiment 5.

IV. EXPERIMENT AND RESULTS
A. Different Experiments 1) Experiment 1: EMG Direct Classification: In this experiment, we aim to directly classify raw EMG signals into different categories without any additional processing or feature extraction.The main objective is to evaluate the performance of various classification algorithms on the raw EMG signals.

B. Parameter Settings
The parameter settings for the LSTM-MSA model are crucial for achieving high accuracy in EMG signal classification.In this section, we will describe the key parameters used in our experiments and the rationale behind their selection.
1) Input Sequence Length: The length of the input sequence is set to 300 milliseconds (ms) of EMG signal data collected from 8 channels, corresponding to a window time of 300 ms.This choice is based on the assumption that the most relevant information for action classification can be captured within this time frame.Furthermore, longer input sequences would increase computational costs and may not necessarily improve classification accuracy.
2) Hidden Size of LSTM Layer: The hidden size of the LSTM layer is set to 64.This parameter determines the number of LSTM units used in the layer and affects the model's capacity to capture temporal dependencies in the input sequence.A larger hidden size may result in improved accuracy but may also increase the risk of overfitting.After several experiments, we found that 64 is a reasonable size for balancing performance and computational cost.
3) Dropout Rate: Dropout is used to prevent overfitting by randomly dropping out nodes during training.We set the dropout rate to 0.5 for both the input and output of the LSTM layer.This means that during training, each node has a 50% probability of being dropped out.This parameter was chosen based on empirical results indicating that a dropout rate of 0.5 provides a good balance between preventing overfitting and preserving model capacity.
4) Learning Rate: The learning rate determines the step size of parameter updates during training.We use the Adam optimizer with a learning rate of 0.001.Adam is a popular optimization algorithm that adaptively adjusts the learning rate based on the gradient of the loss function.A smaller learning rate may result in slower convergence, while a larger learning rate may lead to unstable training and suboptimal solutions.After several experiments, we found that a learning rate of 0.001 is a reasonable choice for achieving good performance.
5) Batch Size: The batch size determines the number of samples processed in each training iteration.We set the batch size to 32 for our experiments.A larger batch size may result in faster convergence but may also increase memory usage and computational cost.After several experiments, we found that a batch size of 32 provides a good balance between convergence speed and memory usage.The parameter settings were selected through empirical experiments and careful consideration of their impact on the model's performance and computational cost.The chosen settings have been shown to produce good results in our experiments and provide a solid foundation for future research in EMG signal processing.
6) Experiment Times: Each experiment will be conducted 20 times to obtain reliable results, and the mean values will be calculated for analysis.This approach allows us to account for variations and fluctuations in the data, providing a more accurate assessment of the classification algorithms' performance.By calculating the mean values of metrics such as accuracy, F1 score, recall, and precision, we can evaluate the algorithms' overall performance.This rigorous approach enhances the reliability of our conclusions regarding the classification of raw EMG signals in different experiments.

C. Evaluation Method
In this study, the performance of the proposed LSTM-MSA model was evaluated using four commonly used metrics: accuracy, recall, F1 score, and t-tests.
Accuracy, defined as the percentage of correctly classified samples out of the total number of samples, measures the overall correctness of the model's predictions.Recall, also known as sensitivity or true positive rate, calculates the percentage of true positive samples correctly identified by the model out of all positive samples in the test set, focusing on the model's ability to capture positive instances.F1 score, the harmonic mean of precision and recall, offers a balanced assessment of both precision and recall.
To comprehensively assess the LSTM-MSA model's performance, we employed a 5-fold cross-validation approach.The dataset was randomly divided into 5 folds, with each fold used as the test set once and the remaining folds used for training.Metrics were computed for each fold, and the final evaluation results were obtained by averaging the metrics across the 5 folds.
In addition to these metrics, t-tests were conducted to determine the statistical significance of differences in performance metrics between the LSTM-MSA model and alternative models.These t-tests help validate whether the observed performance improvements are statistically significant.
The use of this comprehensive evaluation methodology, including accuracy, recall, F1 score, and t-tests within a cross-validation framework, ensures a robust and statistically validated assessment of the LSTM-MSA model's effectiveness in classifying EMG signals.It provides reliable insights into its performance and the significance of its superiority over other models

D. Comparision Models
Based on reference [34], in addition to the LSTM-MSA model, we will also evaluate five other models for the analysis of EMG signals: 1) Support Vector Machine (SVM): SVMs are a type of machine learning model that can be used for classification tasks.Like random forests, SVMs can be trained on smaller datasets and still achieve good performance.However, they may not be as effective at capturing complex patterns in the data as neural networks, and they may be more sensitive to the choice of hyperparameters.
2) Random Forest (RF): RFs are an ensemble learning method that uses decision trees to classify data.Unlike neural networks, which require large amounts of data to learn effectively, random forests can be trained on smaller datasets and still achieve good performance.However, they may not be as effective at capturing complex patterns in the data as neural networks.
3) Convolutional Neural Network (CNN): CNNs are commonly used for image classification tasks, but they can also be used for time series data like EMG signals.By using 1D convolutional layers, CNNs can extract local features from the input signals, which can then be used for classification.Compared to the LSTM-MSA model, CNNs may be more computationally efficient and easier to implement, but they may not be as effective at capturing long-term dependencies in the data.
4) Linear Discriminant Analysis(LDA): LDA is a supervised learning algorithm that extracts discriminative features and reduces dimensionality for classification tasks by maximizing the between-class scatter and minimizing the within-class scatter.It is widely used in various domains for effective separation and classification, although it assumes linearity in the data.
Authorized licensed use limited to the terms of the applicable license agreement with IEEE.Restrictions apply.

TABLE II RESULTS OF GRASPING AND RELEASING GESTURES TABLE III RESULTS OF NUMBER GESTURE DATASET
5) Gated Recurrent Unit GRU is an efficient variant of recurrent neural networks that addresses the vanishing graproblem and captures long-term dependencies.It utilizes gating mechanisms to selectively update and reset information within the hidden state, making it popular for processing sequential data in various applications.
6) Long Short-Term Memory (LSTM): LSTM is a type of recurrent neural network (RNN) architecture that addresses the vanishing gradient problem and captures long-term dependencies in sequential data.It utilizes memory cells with input, forget, and output gates to selectively remember and forget information over time, making it effective for modeling sequential data.
We have assessed the performance of these models in comparison to the LSTM-MSA model using the same set of evaluation metrics: test accuracy, test recall, and test F1 score.The hyperparameters for the comparative methods are detailed in Table I.All method parameters have been fine-tuned to guarantee the utilization of optimal configurations, eliminating the influence of parameter variations on the comparative analysis of the different methods.

1) EMG Direct Classification Experiment:
The results of Experiment 1 are presented for three datasets in Table II and Table III: Based on the results, it is evident that the LSTM-MSA algorithm consistently outperforms the other models in terms of F1 score, accuracy, recall, and precision for both datasets.Specifically, the LSTM-MSA model achieved an F1 score of 0.8921 for the Grasping and Releasing Gestures dataset and 0.9392 for the Number Gesture Dataset, showcasing its ability to accurately classify EMG signals into distinct categories.
In comparison, while the RF, CNN, and SVM models also demonstrated good performance, they consistently yielded slightly lower scores across all metrics compared to the LSTM-MSA model.This suggests that the LSTM-MSA algorithm excels in capturing the temporal dependencies within raw EMG signals, ultimately leading to enhanced classification performance.
These results underscore the efficacy of the LSTM-MSA algorithm in directly classifying raw EMG signals into different categories, without the need for additional processing or feature extraction.Its superior performance compared to other algorithms highlights its potential for accurate and robust EMG signal classification.
To further assess our algorithm's performance, we conducted an experiment using the NinaPro DB5 dataset, a well-established benchmark in the field of sEMG-based gesture recognition.The results of this experiment are presented in Table IV.
In this experiment, we employed the NinaPro DB5 dataset, which encompasses a diverse array of hand movements and gestures performed by multiple individuals.The dataset presents a challenging scenario for sEMG-based gesture recognition due to its complexity and variability.
Our algorithm, LSTM-MSA, achieved an F1 score of 0.9215, underscoring its robust performance in accurately classifying hand gestures within the NinaPro DB5 dataset.The high accuracy, recall, and precision values further validate the effectiveness of LSTM-MSA in recognizing a wide variety of gestures across different users.
Comparing these results with those from our previous experiments (Tables II and III), it is evident that the LSTM-MSA model consistently outperforms other models, including Random Forest (RF), Convolutional Neural Network (CNN),   Support Vector Machine (SVM), Linear Discriminant Analysis (LDA), Gated Recurrent Unit (GRU), and Long Short-Term Memory (LSTM).
These findings emphasize that our LSTM-MSA algorithm not only excels in classifying gestures in proprietary datasets but also maintains its strong performance when applied to the challenging NinaPro DB5 dataset.The LSTM-MSA's versatility and robustness make it a promising choice for sEMG-based gesture recognition, prioritizing accuracy and adaptability.
In order to rigorously assess the statistical significance of the observed performance differences in Experiment 1, ttests were conducted to compare the LSTM-MSA model with each of the alternative models (RF, CNN, SVM, LDA, GRU, and LSTM) across three distinct datasets: the Grasping and Releasing Gestures dataset, the Number Gesture Dataset, and the NinaPro DB5 Dataset.
Based on the conducted t-tests across the Grasping and Releasing Gestures dataset, Number Gesture Dataset, and NinaPro DB5 Dataset, the results consistently demonstrate highly significant differences in performance between the LSTM-MSA model and alternative models, including RF, CNN, SVM, LDA, GRU, and LSTM.These findings strongly support the superior performance of the LSTM-MSA model in accurately classifying EMG signals across all three datasets.
2) EMG Transfer Learning in Direct Classification: The results of Experiment 2 are presented for two datasets in Table VII and Table VIII: The experimental results for the two-gesture and numbergesture datasets demonstrate the consistent and robust performance of the models across multiple experiments.The precision values, ranging from 86.60% to 92.47% for the two-gesture dataset and 82.10% to 88.55% for the numbergesture dataset, indicate the models' ability to accurately classify positive samples while minimizing false positives.This demonstrates the models' reliability in correctly identifying the target gestures and number gestures.Furthermore, the recall values range from 87.65% to 92.39% for the two-gesture dataset and 82.12% to 88.05% for the number-gesture dataset.These values highlight the models' effectiveness in capturing true positive samples, thereby minimizing false negatives.The consistently high recall values demonstrate the models' capability to accurately recognize and classify the desired gestures and digits, ensuring fewer missed positive instances.The F1 scores, ranging from 0.8773 to 0.9212 for the two-gesture dataset and 0.8222 to 0.8831 for the number gesture dataset, provide a comprehensive evaluation by considering both precision and recall.These scores reflect a balanced performance, indicating the models' ability to achieve accurate and reliable classification results while considering the trade-off between false positives and false negatives.The consistently high F1 scores demonstrate the models' capability to achieve a desirable balance between precision and recall, ensuring accurate identification of the gestures and numbers gestures in both datasets.
These findings suggest the models' effectiveness and potential transferability in recognizing and classifying gestures and digits across different individuals, showcasing their utility in various applications involving EMG signal classification.
3) EMG Continuous Motion Classification Experiment: The results of Experiment 3 are presented for two datasets in Table VIII       To statistically evaluate the performance differences observed in Experiment 3, t-tests were conducted between the LSTM-MSA model and each of the other models (RF, CNN, SVM, LDA, GRU, LSTM) for both the Grasping and Releasing Gestures dataset and the Number Gesture Dataset.
For the Grasping and Releasing Gestures Dataset, the LSTM-MSA model demonstrated statistically significant performance improvements compared to all other models across all metrics, including F1 score, accuracy, recall, and precision.
For the Number Gesture Dataset, the LSTM-MSA model also exhibited statistically significant performance    It is worth noting that the specific reasons behind the superior performance of the models, such as the LSTM-MSA algorithm, can be attributed to their ability to capture temporal dependencies, leverage attention mechanisms, and effectively model complex patterns in EMG signals.These factors contribute to the models' enhanced performance in classifying continuous motion patterns compared to other algorithms.
5) Experiment 5: EMG Online Learning in Motion Classification: This experiment was primarily focused on evaluating the performance of online learning techniques in the context of direct EMG signal classification.The primary objective was to assess the model's adaptability to evolving EMG signal patterns in real-time, particularly when exposed to continuous learning scenarios.This investigation aimed to address the challenges related to concept drift and model stability that can occur in dynamic signal environments.
In the conducted experiment, the LSTM-MSA model demonstrated a notable enhancement in performance after incorporating online learning tailored to specific individuals.A comparison of key performance metrics before training (Epoch 0) and after 10 epochs of personalized online learning is presented in Table XVI.
The results underscore the remarkable improvement achieved by the LSTM-MSA model following personalized online learning.Noteworthy observations from this experiment include: The integration of concept drift detection mechanisms, allowing the model to identify and promptly respond to significant changes in EMG signal patterns.This adaptability ensures the model remains aligned with real-world variations.
The implementation of fine-tuning procedures for individual users after ten training sessions, leading to personalized adaptation and a substantial boost in overall accuracy.
These findings emphasize that personalized online learning enhances the LSTM-MSA model's performance, making it better equipped to adapt to specific users' EMG signal patterns.The results not only showcase the practical benefits of online learning in the realm of EMG-based motion classification but also elucidate the reasons behind the model's significant performance improvement.This adaptability is especially valuable in applications necessitating timely and precise gesture recognition, ensuring that the model remains both responsive and accurate.

V. CONCLUSION
The proposed LSTM-MSA model provides a novel approach to EMG signal processing and analysis.It effectively addresses the limitations associated with analyzing non-stationary and noisy EMG signals and the complex relationships between signal features and intended actions.By incorporating attention mechanisms, dual-stage attention, and end-to-end training, the LSTM-MSA model achieves improved accuracy, robustness, and feature extraction capabilities.The effectiveness and advantages of the proposed method have been validated by extensive experiments on different datasets and comparisons against representative machine learning algorithms.These innovations have implications for enhancing human-machine interaction and healthcare applications.Further research could explore advanced deep learning techniques such as reinforcement learning, to further enhance EMG signal classification performance.

2 )
Experiment 2: EMG Transfer Learning in Direct Classification: In this experiment, we investigate the transferability of a trained EMG classification model.The model is initially trained on one person's data and then tested on another person's data.The goal is to assess whether a model trained on one person can effectively classify EMG signals from a different person.

3 )
Experiment 3: EMG Continuous Motion Classification: In this experiment, our focus is on classifying EMG signals based on continuous motion patterns.The objective is to study the performance of different classification algorithms in accurately categorizing EMG signals generated during continuous motion.To achieve this, additional processing and feature extraction techniques are applied to extract relevant features from the EMG signals.

4 )
Experiment 4: EMG Transfer Learning in Motion Classification: In this experiment, we explore the transferability of a trained EMG motion classification model.The model is initially trained on one person's data and then tested on another person's data.The goal is to assess whether a model trained on one person can effectively classify EMG signals from a different person.5) Experiment 5: EMG Online Learning in Motion Classification: This experiment focuses on evaluating the performance of online learning techniques in the context of direct EMG signal classification.It aims to assess how well the model adapts to evolving EMG patterns in real-time and maintains accuracy during continuous learning, addressing the challenges model stability.
j 11: Concatenate the output of the input attention layer and the output attention layer: 12:H c = concat(H i , H o ) 13: Apply fully connected layer and softmax function:14:

TABLE VI T
-TEST RESULTS FOR NUMBER GESTURE DATASET

TABLE VII T
-TEST RESULTS FOR NINAPRO DB5 DATASET and TableIX.We collected additional data during dynamic movements to verify if our model remains effective amidst variations.Among the compared models, the LSTM-MSA model consistently outperformed the RF, CNN, and SVM models in various performance metrics for both datasets.It achieved the highest F1 score, accuracy, recall, and precision.These results indicate that the LSTM-MSA model, which incorporates two attention layers, excelled in classifying EMG signals based on continuous motion patterns.The LSTM-MSA model's superior performance can be attributed to its ability to effectively capture the temporal dependencies present in the EMG signals.By incorporating attention mechanisms, the model can focus on relevant information and extract meaningful features from the continuous motion patterns, leading to accurate classification.Although the RF and CNN models also demonstrated good performance with competitive F1 scores and accuracy, the LSTM-MSA model consistently outshined them across multiple metrics.On the other hand, the SVM model exhibited relatively lower performance compared to the other models, indicating its limitations in accurately classifying EMG signals with continuous motion patterns.These findings emphasize the effectiveness of the LSTM-MSA model for classifying EMG signals in the context of continuous motion patterns.Its superior performance in terms of accuracy, recall, precision, and F1 scores highlights its potential as a robust and reliable approach for EMG signal classification.

TABLE X RESULTS
OF GRASPING AND RELEASING GESTURES

TABLE XI RESULTS
OF NUMBER GESTURE DATASET

TABLE XII
The results of Experiment 4 are presented for two datasets in Table XIV and Table XV:For the Two-Gesture Dataset, the models achieved high accuracy, precision, recall, and F1 scores across various training and testing combinations.When training on one individual and testing on another, the models consistently demonstrated good performance.For example, when training on Person A and testing on Person B, the models achieved an average testing accuracy of 91.23%, precision of 90.84%, recall of 91.52%, and F1 score of 0.9121.Similar patterns were observed for other training and testing combinations.Moving on to the Number Gesture Dataset, the models also exhibited notable performance in classifying continuous motion patterns.Although the dataset involved more individuals, the models still achieved competitive accuracy, precision, recall, and F1 scores.For instance, when training on Person A and testing on Person B, the models achieved an average testing accuracy of 88.73%, precision of 88.31%, recall of 89.11%, and F1 score of 0.8874.These results demonstrate

TABLE XVI ENHANCED
PERFORMANCE THROUGH ONLINE LEARNING the models' ability to effectively classify EMG signals related to number gestures in a continuous motion scenario.The consistently high accuracy, precision, recall, and F1 scores across different training and testing combinations suggest the models' robustness in capturing the distinct patterns and characteristics of continuous motion in EMG signals.This performance indicates their potential in real-world applications that require accurate classification of EMG signals during continuous motion.