Enhancing EEG and sEMG Fusion Decoding Using a Multi-Scale Parallel Convolutional Network With Attention Mechanism

Electroencephalography (EEG) and surface electromyography (sEMG) have been widely used in the rehabilitation training of motor function. However, EEG signals have poor user adaptability and low classification accuracy in practical applications, and sEMG signals are susceptible to abnormalities such as muscle fatigue and weakness, resulting in reduced stability. To improve the accuracy and stability of interactive training recognition systems, we propose a novel approach called the Attention Mechanism-based Multi-Scale Parallel Convolutional Network (AM-PCNet) for recognizing and decoding fused EEG and sEMG signals. Firstly, we design an experimental scheme for the synchronous collection of EEG and sEMG signals and propose an ERP-WTC analysis method for channel screening of EEG signals. Then, the AM-PCNet network is designed to extract the time-domain, frequency-domain, and mixed-domain information of the EEG and sEMG fusion spectrogram images, and the attention mechanism is introduced to extract more fine-grained multi-scale feature information of the EEG and sEMG signals. Experiments on datasets obtained in the laboratory have shown that the average accuracy of EEG and sEMG fusion decoding is 96.62%. The accuracy is significantly improved compared with the classification performance of single-mode signals. When the muscle fatigue level reaches 50% and 90%, the accuracy is 92.84% and 85.29%, respectively. This study indicates that using this model to fuse EEG and sEMG signals can improve the accuracy and stability of hand rehabilitation training for patients.


I. INTRODUCTION
T HE prevalence of cerebrovascular disease and the occur- rence of frequent accidents have contributed to a rising population of individuals with paralysis [1].For paralyzed patients, postoperative rehabilitation is an effective treatment that can help improve and restore movement motor function.Hand exercise rehabilitation training is essential when patients relearn daily movements [2].The widely used single mode Electroencephalography (EEG) or Surface Electromyography (sEMG) signals cannot fully meet the requirements of effective patient control.Integrating surface electromyography (sEMG) and electroencephalography (EEG) for fusion recognition and decoding holds theoretical promise in enhancing the classification accuracy of single-mode actions, thereby offering a novel approach to hand sports injury rehabilitation.This method aims to facilitate hand rehabilitation training and improve the overall effectiveness of rehabilitation interventions for patients.
The EEG signal is the reflection on the surface of the cerebral cortex when the brain neurons are active, which can be used to decode action information [3].Given that the paralysis and stroke patients with EEG signal performances are not compromised, the Brain-computer interface (BCI) technology can be used to control objects such as wheelchairs and prosthetic hands to improve Patient's lost body function [4], [5].Recent research has shown that BCI technology can help paralyzed patients with motor rehabilitation, such as patients after stroke) [6].However, due to the limited number of classifiable modes for a single EEG mode, precise control of output devices is restricted.EEG signals are susceptible to external noise interference, and the low classification accuracy and poor user adaptability often limit the application of EEG signals in clinical patient rehabilitation.
Multifunctional prosthetics are often used for patients with muscle function loss to restore their lost motor function.sEMG is a bioelectrical signal generated by neuromuscular activity recorded from the surface of skeletal muscle through electrodes when the human body moves autonomously [7].The sEMG signal contains rich neural information, which can extract features from fewer channels that can control multiple action modes.The sEMG signal plays a vital role in the control of modern mobile prosthetics [8] and rehabilitation robots [9].It is important to acknowledge that certain limitations may arise due to variations among subjects and the specific application environment.These limitations may include issues such as muscle fatigue arising from prolonged usage and the inability of subjects to generate consistent and sufficient sEMG power due to muscle weakness or disability.
Fusing multimodal signals is a feasible method to improve the accuracy and stability of classification.By combining EEG signals with sEMG signals [10], sufficient information is provided for motion decoding.According to the level of information fusion, the fusion methods can be categorized into three groups: data layer fusion, feature layer fusion, and decision layer fusion.The first group is data layer fusion, which involves directly fusing the EEG signal and sEMG signal data obtained from different acquisition sensors, and then feature extraction and classification are performed on the fused signal data.The obtained information has a certain degree of redundancy.The information loss of data layer fusion is the smallest, but the fault tolerance is the smallest and the anti-interference ability is the worst.The second group is feature layer fusion, where feature vectors are extracted from the obtained EEG and sEMG signals, followed by feature data fusion processing.Finally, the fused features are used for the decision-making of system classification.The feature layer fusion extracts effective features from various channels of EEG and sEMG data, preserving useful information while compressing it, resulting in high accuracy.Yang et al. [11] proposed a method based on graph theory as a multimodal fusion strategy for EEG and sEMG.Functional connectivity, often considered as the weight of edges, enhances the robustness and accuracy of hand motion recognition.The third group is decision layer fusion, which involves separately processing and classifying EEG and sEMG signals for decision-making.Both EEG and sEMG signals are required to have independent decision-making capabilities, with the greatest information loss, ignoring the synergistic complementarity between EEG and sEMG.Tryon et al. [12] proposed a decision layer fusion strategy based on EEG and sEMG sources, which achieved flexion-extension motion recognition and improved the accuracy and stability of the system.
At present, feature extraction plays a very important role in the research of EEG and sEMG signals.Ji et al. [13] proposed a feature extraction method based on discrete wavelet transform (DWT) and empirical mode decomposition (EMD) to improve the effectiveness of EEG signals.Li et al. [14] proposed an effective feature fusion method TS-SEFFNet to enhance the temporal and spectral dependencies in MI-EEG.Zhu et al. [15] designed a network framework that combines CWT with AlexNet, and collected sEMG signal data from various gesture actions to extract rich time-frequency domain features of sEMG signals.
With the continuous development of brain science technology, multiple researchers have proposed the hybrid BCI system that combines EEG signals and sEMG signals.The proposal compensates for the shortcomings of existing braincomputer interfaces.Li et al. [16] merged EEG and sEMG into parallel control inputs, extracted four time-domain features, and inputted them into the linear discriminant analysis (LDA).Combined with the sequential forward selection (SFS) algorithm to optimize performance, the highest recognition accuracy was 87.0%.Chowdhury et al. [17] used the correlation between power-limited time processes as the fusion feature of EEG and sEMG to classify hand movements.For the disabled patient group, the accuracy was 84.53 Â± 4.58%.Shi et al. [18] proposed a multimodal enhanced fusion network based on a dense non-attention mechanism and introduced the Joint attention structure with an accuracy of 88.44%.However, existing methods all have some shortcomings.First, the coherence and functional coupling relationship between EEG signals and sEMG signals are ignored [19].Secondly, feature extraction is mostly a machine learning method, which can result in the loss of some features [20]

II. METHODS
To integrate the advantages of human-machine interaction between a single EEG and sEMG modality and improve system performance, we researched the fusion and recognition of EEG and sEMG signals based on attention mechanism multi-scale parallel convolutional neural networks.

A. The Design of Synchronous Collection Scheme for EEG and sEMG Signals
We independently designed a synchronous acquisition scheme for EEG signals and sEMG signals, achieving

B. Screening of EEG Channels Based on ERP-WTC
Although BCI systems with a large number of electrodes can record more working features of brain regions, it is necessary to consider the response of EEG electrodes to different types of stimuli and the coherence between EEG and sEMG signals [19], in fusion motion recognition systems.We screen EEG channels to achieve the optimal combination of EEG and sEMG signals channels and prepare for subsequent fusion at the feature layer.
1) ERP Analysis: ERP (Event Related Potential) refers to the potential changes in the brain area caused by applying a specific stimulus to the sensory system or a certain part of the brain when the stimulus is given or withdrawn [21].For the hand movements in the experiment, We use the ERP topographic map analysis to select activated brain regions.The activation status at the spatial scale corresponding to the peak of the ERP waveform can be observed in the ERP2D or 3D brain topographic map shown in Fig. 2. ERP2D and 3D brain topographic maps can select activated brain regions and detect faulty or noisy electrodes, so we preliminarily select the two activated regions of interest in the red box in Fig. 1(a).Based on the specific analysis of the ERP waveform curve in Fig. 2, six channels, FT9, F7, C3, FC6, F8, and F4, were preliminarily selected from the two activated regions.
2) WTC Analysis: In the EEG and sEMG signals fusion motion recognition system, the brain's motor cortex sends commands to control the limbs and completes the muscles through the brain stem and the spinal cord along the motor nerve pathway exercise [19].Simultaneously, the limb signals are sent back to the cerebral cortex along the sensory nerve pathway for fusion analysis.This interaction between cerebral cortex activity and muscle movement can be evaluated through the consistency of EEG and sEMG.Thus, it is possible to screen the EEG channels most relevant to sEMG signals of different muscle movements.Wavelet coherence analysis: spectral coherence [19] measures the correlation degree of two signals in the frequency domain.The formula is as follows: Pxy refers to the cross-spectral density of EEG and sEMG signals, Pxx and Pyy refer to the self spectral density of x and y respectively, that is, the power spectral density.The range of Cohxy values is [0,1], and the larger the value, the greater the degree of correlation between EEG and sEMG signals at frequency f.The coherence analysis reflects the consistency of the EEG and sEMG signals.Informed by coherence analysis, FT9, F7, and FC6 EEG channels are finally selected.

C. Extracting Synchronous Features of the EEG and sEMG Signals
We comprehensively consider the common features information of EEG and sEMG signals and use time-frequency spectrogram images to demonstrate the signal oscillation behavior of EEG and sEMG signals.By maximizing the preservation of multi-domain information of EEG and sEMG signals, the fusion of EEG and sEMG feature layers is achieved.We use short-time Fourier transform [22], wavelet transform [15], and Stockwell transform [23] to perform time-frequency transformation on the EEG and sEMG signals in each channel.1) Short Time Fourier Transform: As non-stationary signals, the internal characteristics of sEMG and EEG can be fully characterized by STFT.STFT divides the signal into smaller segments, computes the Fourier transform of each segment, and visualizes the frequency characteristics over time.Fourier transform can be applied to the sliding window to obtain the following local spectra: where x(t) ∈ L 2 (R) is the original signal, t, τ ∈ R refer to time, f ∈ R is the frequency, and w(t) ∈ L 2 (R) is the applied window function.The selected window in STFT is the Hanning window.
2) Continuous Wavelet Transform: The ψ in the continuous wavelet transform is called the fundamental wavelet or mother wavelet.The continuous wavelet transform can be expressed as follows: In formula 3, a represents frequency, while b represents time or spatial position.The reciprocal of scale 1/a corresponds to frequency ω.
We preserve the frequency domain information of EEG signals ranging from 8-30Hz and the effective frequency domain features of sEMG signals ranging from 10-200Hz.Due to the synchronous collection of EEG and sEMG, the EEG and sEMG are synchronized in the time dimension after time-frequency transformation.We concatenate the EEG time-frequency image and the sEMG time-frequency image in the frequency domain dimension, with the three channels of sEMG on top of the three channels of EEG.The fused features of EEG and sEMG are saved as concatenated time-frequency images, as shown in Fig. 3.

D. Proposed AM-PCNet Structure
As shown in Fig. 3, we used the spectrogram images of the spliced EEG and sEMG signals as the input of AM-PCNet.We selected three different scales of convolutional kernels, namely 1×10, 10×1, and 5×5, to obtain the time-domain features, frequency-domain features, and time-frequency mixed domain features of the input feature map to better obtain rich network features [24].
1) The Design of the Time-Conv Block: The EEG and sEMG fusion spectrogram images contain a large amount of time-domain feature information, so a 1 × 10 convolutional kernel is first selected in this block to extract time-domain features.To further explore deeper time-frequency features, continuous convolutional units on the time dimension are used.The Time-Conv block in Fig. 3 includes two successive Temporal Conv Units, and the structure of the units is shown in Fig. 3.Each unit starts from a 1 × 3 convolution kernel, with a convolution stride size of (2,15).The Temporal Conv Unit is only convolved at the time scale, which can extract the internal time-domain features of the EEG and sEMG fusion spectrogram images without damaging the EEG-sEMG signals in each channel.In addition, dropout and batch normalization operations are used in each unit to minimize overfitting.Time-Conv block arranges all units in order and extracts deeper feature representations from basic shallow time domain features.The number of units and the size of the kernel will be verified by subsequent related experiments.
2) The Design of the Freq-Conv Block: We concatenate the spectrogram images of EEG and sEMG signals in the frequency domain dimension.We design the Freq-Conv block to extract frequency domain features of EEG and sEMG signals while establishing long-range EEG channel dependencies.We select a 10 × 1 convolution kernel with a convolution step size of (2,15) and then perform dropout and batch normalization operations.
3) The Design of the TF-Conv Block: In the TF-Conv block of Fig. 3, a 5×5 convolutional kernel is first selected to extract the neighborhood features of each pixel in the spectrogram image.The time and frequency channels are mixed to extract the contextual features of the EEG and sEMG spectrogram images.In the TF-Conv block layout, there are two successive TF Conv Units, and the structure of each unit is shown in Fig. 3.Each unit starts from two successive convolution kernels with the size of 3 × 3, which perform convolution simultaneously in both time and frequency domains.It can extract correlation time-frequency features between each channel.The convolution stride is (2,15).The activation function is ReLu.In addition, dropout and batch normalization operations are used in each unit to reduce overfitting.The TF-Conv block arranges all units to extract time-frequency deep feature information further.
4) Attention Module: In the study of decoding sEMG-EEG signals, we aim to extract features with high discriminabil-ity and robustness.We introduce two attention models: SE (Squeeze Excitation) [25] and PSA (Pyramid Split Attention) [26].
We add the SE attention model to the Time-Conv block and Freq-Conv block, enabling the network to selectively amplify valuable time-domain and frequency-domain characteristic channels based on global information.
We added the PSA attention module to the TF Conv block.First, each group is convolved with different convolution kernel sizes through group convolution to obtain the Receptive field of different scales and extract information about different scales.The SPC module structure is demonstrated in Fig. 4.
where the number of groups is set at 4, so i=0,1,2,3.
Next, through the SE attention module, we extract the weighted values of each group of channels and obtain spatial information of multi-scale input spectrogram images.
We multiply the feature map of the corresponding scale with the attention vector at the channel-wise level.We concatenate the weighted feature maps into dimensions, and the PSA module Output is shown in Fig. 4: where the number of S is set at 4, and Y is the output of the PSA module corresponding to the convolutional weighting of different scale groups.
The Time-Conv block and Freq-Conv block focus on some information in the spectrogram image, while the information obtained in the TF-Conv block is a mixture of time and frequency channel features.We use the similarity between time-domain and frequency-domain features to weight the attention of the time-frequency convolution module [27] so that the time-frequency convolution module focuses on timefrequency features, improves the interaction between the three parallel network branches, and enhances the accuracy and stability of the network.
where the number of K is set at 4, so i=0,1,2,3.Where X T i is the output of Time-Conv Block, X F i is the output of Freq-Conv Block, and X T F i is the output of TF-Conv Block.
Authorized licensed use limited to the terms of the applicable license agreement with IEEE.Restrictions apply.5) The Design of Classification Block: We flatten the feature maps of three blocks into one-dimensional feature vectors and then perform feature stitching on them.
We send the concatenated feature vectors Out to two fully connected layers connected in series.Finally, the Softmax function converts the output into classification probability.

A. Acquisition Experiments of EEG and sEMG Signals
Eight subjects were recruited to collect their EEG and sEMG signals.They were all healthy students with an average age of 23.5 years old.They are right-handed.Before the experiment, all the subjects signed a written informed consent form.This experiment was performed in a quiet room.The subject sat on a chair and waited for the start of the experiment.They perform the four common hand movements, as shown in Fig. 5.The specific process of the experimental paradigm in Fig. 6 is as follows: • 0-2s : When the computer is heard, a "+" picture appears in the computer screen, reminding the subject to prepare to perform the corresponding actions.
• 2-5s : The subject can see the corresponding action picture in the middle of the computer screen, and then the screen is in a black screen state.The subjects perform corresponding unilateral upper limb hand movements and EEG motor imagination.The entire action is completed within 3s.
• 5-8s : The subjects relax and return to their position prior to the experiment, getting ready for the following experiment.The EEG and sEMG signal preprocessing steps are as follows: • EEG : Firstly, the EEG signal needs to be band-pass filtered and 50Hz power frequency interference removed.Then, the original collected EEG signal needs to be channel selected and baseline interference removed.Next, the EEG signal needs to be segmented to obtain the corresponding action response signal.
• sEMG : Firstly, the original sEMG signal is filtered to retain the signal information in the 10-200Hz frequency band.Then, the action segment signal is extracted, and finally, the sEMG signal is downsampled to facilitate synchronous feature extraction with the EEG signal.In the experiments, the first four rounds of 400 trials were used as a training set, and the 96 trials of the last round were used as the test set data.Tensflow1.15.0 is used to build the AM-PCNet network.The loss function uses cross-entropy.The dropout probability is 0.25.The Adam method is used to optimize our network.The learning rate is 0.0001.The batch size is 32, and 300 epochs are trained.we used classification accuracy (ACC) [28], kappa coefficient (K) [29], F1 score (F1) [30], and Recall [28] to evaluate the proposed AM-PCNet.

B. Comparison of EEG Channels Selection
Fig. 7 is a time-frequency map of wavelet coherence coefficients for different channel combinations of EEG and sEMG, reflecting the differences in coherence coefficients between different EEG and sEMG channels.Fig. 7(a) shows strong coherence in combination, and Fig. 7(b) shows weak coherence in combination.The comparison of strong and weak coherence provides theoretical support for improving the accuracy and stability of EEG and sEMG fusion.Eventually, we chose the FT9, FC6, and F7 channels.We fuse the EEG channels (FT9 / F7 / FC6) and EEG channels (C3 / CP1 / F8) with sEMG signals at the feature layer, and the classification results are shown in Table II.The average classification accuracy of a strong coherent combination is 96.62%, while the average classification accuracy of a weak coherent channel combination is 94.4%.After screening, the accuracy has increased by 2.22%, and the Kappa value has increased by 0.038.Screening of EEG channels suggests that each subject showed improvements in accuracy, Recall, F1, and Kappa.The result shows that the stronger the coherence between EEG and sEMG channels, the better the fusion classification performance.The conclusion presented in Table II also proves that reasonable screening of EEG channels can effectively enhance the coupling correlation between EEG and sEMG signals and improve the accuracy and stability of hand motion decoding.

C. Comparison of Feature Extraction for Different EEG and sEMG Signals
Table I summarizes the accuracy obtained by eight subjects in different time-frequency methods and classification methods.The three network structures used are: • AM-PCNet : Multi-scale parallel convolutional network based on attention mechanism.
According to the comparative results, the best combination, which employs STFT to extract time-frequency features and the AM-PCNet network as the classifier, achieves an average accuracy of 96.62%.The Table I indicates that the performance of STFT is slightly better than that of Stockwell and CWT, boosting accuracy by 3% and 0.78%, respectively, in the AM-PCNet network.The result shows that by selecting the window width, STFT adapts to signal frequency content changes and improves the extraction feature's quality.In Table I, subject 5 performs the best with the combination of CWT and TFCNN-LSTM, with an accuracy rate of 94.79%.The other subjects obtain the best classification accuracy with the combination of STFT and AM-PCNet.The subject standard deviation of the STFT+AM-PCNet combination in Table I is 3.19.The results indicate that the STFT+AM-PCNet method we selected is the most robust.

D. Network Parameter and Structure Comparison Experiment
Section II-D mentions that the size of convolutional kernels and the number of units connected in series significantly impact the performance of network structure.We consider selecting two values to explore the optimal network structure [31].Temporal Conv Unit convolution size is 1 × K. TF Conv Unit Convolutional Size is K × K. K takes three values, i.e., 3, 5, and 7. We set the number of units in series to M, and select M from 2, 3, and 4. We have considered a total of 9 network structures.We randomly choose two subjects for testing.
In Fig. 8(a), it can be observed that when M is 2 and K is 3 (with a horizontal coordinate of "2-3"), the accuracy is the highest, reaching 98.96% and 97.92%, respectively.The accuracy rate gradually decreases with the increase in the number of series units, and overfitting occurs.Therefore, Authorized licensed use limited to the terms of the applicable license agreement with IEEE.Restrictions apply.we choose the "2-3" parameter combination in the subsequent experiments.
To evaluate the classification effectiveness of each network branch in the model, we remove the other two network branches from the proposed model to compare decoding accuracy.The input images are EEG and sEMG Spectrogram images (time-frequency domain).A comparison classification method uses the Time-Conv branch network, where the convolutional kernel is only convoluted through the time of the input images.The other uses the TF-Conv branch network to simultaneously perform convolution operations on both time and frequency.The comparative results are shown in Fig. 8(b).Our proposed AM-PCNet network model exhibits the best performance, with an average accuracy of 96.6%.When using the Time-Conv branch network for classification, the average accuracy is 92.2%, while using the TF-Conv branch network for classification has an average accuracy of 91%.Compared with Time-Conv and TF-Conv independent networks, the AM-PCNet designed in this paper combines the advantages of each branch network and incorporates attention-based branch module interaction.The results verify the effectiveness of the AM-PCNet network and the importance of module interaction.

E. Comparison of Attention Mechanisms
To evaluate the effectiveness of introducing attention mechanisms in the model, we compare decoding accuracy by removing one or two attention mechanisms from the proposed model.We conducted the following ablation experiments with M-PCNet as the baseline model for four experiments.The EEG and sEMG feature fusion time-frequency images are used as input for the network.Experiment 1: M-PCNet (Multiscale Parallel Convolution).Experiment 2: M-PCNet+ Squeeze Exception.Experiment 3: M-PCNet+ Pyramid Split Attention.Experiment 4: M-PCNet+ Squeeze Exception + Pyramid Split Attention(in TF-Conv Block).Experiment 5: M-PCNet + Pyramid Split Attention + Squeeze Exception(in TF-Conv Block).
The results of the average accuracy are presented in Table III.Fig. 9 displays the classification accuracy results of the ablation experiments for each subject.In the results, the average gesture recognition accuracy of 8 subjects using the baseline method M-PCNet was 92.06%; Adding SE to M-PCNet can improve the average recognition accuracy by  III and Fig. 9, we can see that the classification accuracy decreased by 2.74% after swapping the positions of the SE module and PSA module.This is mainly because the SE attention mechanism is not effective in extracting effective features in the two-dimensional time-frequency domain.By using the grouping convolution principle of PSA, the most suitable convolution kernel size can be obtained, effectively extracting finer-grained multiscale two-dimensional spatial information of EEG and sEMG fusion.The above results indicate that the introduction of attention modules enables the network to effectively extract fine-grained multi-scale spatial information from EEG and sEMG fusion, amplify valuable EEG and sEMG fusion feature channels, and establish longer distance EEG and sEMG channel dependency.For these reasons, the accuracy and stability of EEG and sEMG fusion recognition classification are improved.

IV. DISCUSSIONS A. Comparison of Different Decoding Methods
To demonstrate the advantages of our model, we conducted a comparative analysis between the AM-PCNet network recognition model and other models investigated in recent years.
Here is a concise introduction to these methods: • DeepNet [32] : A deep-level model that uses more than three convolutional layers between time dimensions.
• EEGNet [33] : A compact EEG analysis Convolutional neural network uses deep convolution and separable convolution to construct the network model.
• ShallowNet [32] : deep learning model with two simple convolutional layers and one mean pooling layer.
• TCNet-Fusion [35] : A network structure that utilizes time convolutional networks, separable convolutions, and deep convolutional layer fusion to improve the interactivity of classification systems.
• ATCNet [36] : A network structure that uses a multihead self-attention method and time convolution to extract advanced temporal features.

B. Comparison of Different EEG and sEMG Fusion Methods
To demonstrate the advantages of our EEG and sEMG signal fusion method, we conducted a comparative analysis with other fusion methods investigated in recent years.Here is a concise introduction to these methods: • DCA Fusion [37] : A feature layer post-fusion method, which first extracts the time-domain features of EEG signals and sEMG signals separately, and then fuses and classifies them in the later stage of the feature layer.
• Decision-Level Fusion [12] : A decision-level fusion method for EEG signals and sEMG signals, which decodes and classifies the EEG and EMG signals separately and then fuses the classification results using the decision-level fusion.
• GFSEs [11] : A method based on graph theory as a multimodal fusion strategy for EEG and sEMG.Functional connectivity, often considered as the weight of edges, enhances the robustness and accuracy of hand motion recognition.We compared the proposed EEG signals and sEMG signals fusion decoding method with the three aforementioned fusion methods, and the results are shown in Table V.
In Table V, we can see that using the dataset collected in the laboratory and comparing it with the other three EEG signals and sEMG signals fusion methods, our proposed fusion classification result is the highest.Compared with DCA Fusion in the later stage of feature layer fusion, our accuracy has improved by 5.99%.Compared with the Decision-Level Fusion method, our EEG and sEMG signal feature layer spectrum concatenation method improved the accuracy by 4.17%.Compared with GFSEs using functional connectivity node features between EEG and sEMG signals, our method focuses more on the common frequency domain features of EEG and sEMG signals, with an accuracy improvement of 2.48%.In summary, our proposed ERP-WTC EEG channel screening method and early feature layer fusion using common time-domain and frequency-domain features of EEG and sEMG signals achieved maximum expression of common features of signals.The final results also indicate that our proposed method has stronger capabilities in feature fusion and extraction.

C. Single-Mode Signal Classification and Decoding
We compare the decoding performance of single-mode EEG signals with sEMG signals for action classification.

D. EEG and sEMG Fusion Decoding Under Muscle Fatigue
When sEMG signals are used alone, muscle fatigue issues may occur due to differences in subjects and application environments.Due to muscle fatigue, signal quality deteriorates, and classification accuracy decreases.We conduct multimodal fusion analysis using EEG and sEMG signals to alleviate the adverse effects of muscle fatigue.
In this study, we use simulation methods to represent the degree of muscle fatigue.We study fatigue sEMG signals of varying degrees from 0% to 90%.As the fatigue level increases, the sEMG amplitude decreases, the variance of the amplitude increases, and its root mean square increases [38].We simulated fatigue signals by reducing the amplitude of surface sEMG signals and adding Gaussian noise, where reducing the amplitude simulated the exhaustion of the subjects.
As the degree of muscle fatigue increases, the changes in various indicators after EEG and sEMG fusion are shown in Table VI  The results indicate that the approach is robust even in weak sEMG signals, reducing the impact of partial loss of motor function and exercise fatigue in subjects.The fusion classification network we have constructed effectively utilizes the synergistic complementarity between EEG and sEMG signals to improve system stability, and recognition accuracy.

V. CONCLUSION
We propose a new attention mechanism-based multi-scale parallel convolutional network (AM-PCNet) for identifying and decoding EEG and sEMG fusion signals.We save the selected EEG and EMG signals in the form of time-frequency map concatenation and use AM-PCNet to extract their timedomain, frequency-domain, and mixed-domain information.We introduce attention mechanisms to effectively extract finergrained multi-scale EEG feature information, improve the expression ability of the network model, and thus improve the accuracy and stability of signal fusion recognition.
We conduct experiments on the dataset collected in the laboratory to evaluate the effectiveness and generalization of the proposed method.Our proposed AM-PCNet network outperforms other state-of-the-art methods in accuracy, Kappa value, Recall, and F1 value.The experiment and discussion also illustrate that our proposed method effectively improves the accuracy of recognition classification compared to single-mode decoding approaches, ensures decoding stability during muscle fatigue, and improves the accuracy and stability of the interactive recognition system.

Fig. 1 .
Fig. 1.Synchronous collection of EEG and sEMG signals (a) Distribution map of electroencephalogram electrodes, (b) The EEG and sEMG collection system, (c) Muscle position distribution map.

Fig. 3 .
Fig. 3.The EEG and sEMG fusion strategy based on AM-PCNet (a)The Overview of EEG and sEMG fusion strategy, (b) The structure of AM-PCNet.

3 )
Stockwell Transform: We attempt to use Stockwell transform for time-frequency feature extraction of EEG and EMG.The Stockwell transform formula for continuous signal x(t) is:

Fig. 4 .
Fig. 4. The specific composition structure of PSA attention module.

Fig. 7 .
Fig. 7. Analysis of EEG and sEMG wavelet coherence (a) The EEG and sEMG strong coherence, (b) The EEG and sEMG weak coherence.
8. (a) Accuracy comparison of different network structure in AM-PCNet.The number before "-" represents the unit numbers and the number after "-" indicates the convolution kernel sizes, (b) Accuracy comparison of AM-PCNet, Time-ConvNet, and TF-ConvNet.
. In this work, we propose the Attention Mechanism-based Multi-Scale Parallel Convolutional Network (AM-PCNet) for recognizing and decoding fusion signals to improve the effective control of patient rehabilitation training.

TABLE I COMPARISON
OF CLASSIFICATION ACCURACY(%) BETWEEN DIFFERENT TIME-FREQUENCY TRANSFORMS AND BASELINE NETWORKSTABLE II COMPARISON OF CLASSIFICATION PERFORMANCE OF DIFFERENT COMBINATIONS OF THE EEG AND SEMG CHANNELS

TABLE III COMPARISON
OF AVERAGE PERCENTAGE CLASSIFICATION ACCURACY WITH DIFFERENT ATTENTION MECHANISMS 1.17%; Introducing PSA in M-PCNet improves accuracy by 1.6%; Simultaneously introducing both SE and PSA improves accuracy by 4.56%.In Table Table IV compares the AM-PCNet model with several stateof-the-art models on the laboratory dataset.Compared with the ShallowNet method, the AM-PCNet method has improved classification accuracy by 6.38% on average.Table IV shows that our proposed method outperforms other models among Authorized licensed use limited to the terms of the applicable license agreement with IEEE.Restrictions apply.
Table VII shows the classification performance of single-mode sEMG signals for each subject.Table VIII displays the classification performance of single-mode EEG signals for each subject in different combinations of EEG signals.Table VIII indicates that the classification accuracy of the strong coherent combination of EEG is 5.34% higher than that of the weak coherent combination, which once again verifies the effectiveness of the EEG channel screening method.In Table VII and Table VIII, the action classification methods utilizing sEMG signals demonstrate superior accuracy compared to those relying on EEG signals.The result indicates that sEMG signals contain more information related to unilateral upper limb movements, while EEG signals poorly classify fine hand movements.In Table VII, Table VIII, and Table IV, the classification results of single-mode signals were compared with those of the Authorized licensed use limited to the terms of the applicable license agreement with IEEE.Restrictions apply.

TABLE VI COMPARISON
OF CLASSIFICATION PERFORMANCE OF THE EEG AND SEMG FUSION UNDER MUSCLE FATIGUE . In Table VI, the classification accuracy decreases

TABLE VIII CLASSIFICATION
PERFORMANCE OF DIFFERENT EEG CHANNEL COMBINATIONS UNDER EEG SIGNAL MODALITYas the degree of muscle fatigue increases.with 0% of muscle fatigue, the accuracy of EEG and sEMG signals fusion stands at 96.62%.When the degree of muscle fatigue reaches 20%, the classification accuracy after fusion is 94.14%, and the K-score is 0.922, indicating good network recognition performance.When the simulated fatigue sEMG signal amplitude is 50%, the recognition accuracy is 92.84%.Still, when the degree of muscle fatigue reaches 90%, the recognition rate of EEG and sEMG signals fusion significantly decreases, with accuracy and K-value being 85.29% and 0.798, respectively.