An Upper-Limb Rehabilitation Exoskeleton System Controlled by MI Recognition Model With Deep Emphasized Informative Features in a VR Scene

The prevalence of stroke continues to increase with the global aging. Based on the motor imagery (MI) brain–computer interface (BCI) paradigm and virtual reality (VR) technology, we designed and developed an upper-limb rehabilitation exoskeleton system (VR-ULE) in the VR scenes for stroke patients. The VR-ULE system makes use of the MI electroencephalogram (EEG) recognition model with a convolutional neural network and squeeze-and-excitation (SE) blocks to obtain the patient’s motion intentions and control the exoskeleton to move during rehabilitation training movement. Due to the individual differences in EEG, the frequency bands with optimal MI EEG features for each patient are different. Therefore, the weight of different feature channels is learned by combining SE blocks to emphasize the useful information frequency band features. The MI cues in the VR-based virtual scenes can improve the interhemispheric balance and the neuroplasticity of patients. It also makes up for the disadvantages of the current MI-BCIs, such as single usage scenarios, poor individual adaptability, and many interfering factors. We designed the offline training experiment to evaluate the feasibility of the EEG recognition strategy, and designed the online control experiment to verify the effectiveness of the VR-ULE system. The results showed that the MI classification method with MI cues in the VR scenes improved the accuracy of MI classification (86.49% ± 3.02%); all subjects performed two types of rehabilitation training tasks under their own models trained in the offline training experiment, with the highest average completion rates of 86.82% ± 4.66% and 88.48% ± 5.84%. The VR-ULE system can efficiently help stroke patients with hemiplegia complete upper-limb rehabilitation training tasks, and provide the new methods and strategies for BCI-based rehabilitation devices.

accuracy of MI classification (86.49% ± 3.02%); all subjects performed two types of rehabilitation training tasks under their own models trained in the offline training experiment, with the highest average completion rates of 86.82% ± 4.66% and 88.48% ± 5.84%.The VR-ULE system can efficiently help stroke patients with hemiplegia complete upper-limb rehabilitation training tasks, and provide the new methods and strategies for BCI-based rehabilitation devices.

I. INTRODUCTION
T HE prevalence of stroke continues to rise with the global aging.Stroke patients with hemiplegia have neurological damage caused by the massive death of brain cells, resulting in varying degrees of upper-limb motion disorders [1].Rehabilitation exoskeletons based on brain-computer interface (BCI) technology have become a more common rehabilitation treatment plan for stroke patients in different rehabilitation periods [2].
BCI technology realizes communication between the human brain and external electronic devices by decoding the features of the electroencephalogram (EEG) in the cerebral cortex.As a new means of expression and interaction for motor intention, BCI has been widely used in the rehabilitation training of stroke patients at different stages [3].Barsotti et al. [4] designed a set of upper-limb exoskeletons based on MI-BCI to rehabilitate the grasping ability of poststroke patients.Soekadar et al. [5] designed a noninvasive brain/neural hand exoskeleton to assist stroke patients in daily motions such as eating and drinking.As a bridge for direct communication between the human brain and external devices, BCI has been widely used in stroke rehabilitation treatment [6].
As one of the main paradigms of BCI technology, MI has been widely used in rehabilitation therapy of cerebral motor function in stroke patients [7].Through MI training, the motor nerve conduction pathways of stroke patients can be repaired or reconstructed.The MI of different motions is mapped to the EEG changes in the corresponding regions of the cerebral cortex, and decoding different EEG features can distinguish different motions [8].For example, in unilateral hand MI, mu rhythms (8)(9)(10)(11)(12)(13) and beta rhythms (13)(14)(15)(16)(17)(18)(19)(20)(21)(22)(23)(24)(25)(26)(27)(28)(29)(30) of the motor sensory area on the opposite side of the brain will decrease in power, while mu rhythms and beta rhythms in the ipsilateral motor sensory area will increase in power.This phenomenon is called event-related desynchronization (ERD) and event-related synchronization (ERS) [7].BCI technology uses various computer algorithms to classify these different ERD/ERS patterns and convert them into control signals for external devices.Tang et al. [8] proposed a BMI system based on ERD/ERS and used for upper-limb exoskeleton control, achieving high classification accuracy.Liu et al. [9] proposed an ERD/ERS-based BCI control system and verified its effectiveness by operating a two-arm multi-finger robotic to complete tasks.Based on ERD/ERS, Li et al. [10] proposed a BCI hybrid control strategy that combines EEG and EMG signals to achieve flexible and stable control of the lower limb dynamic exoskeleton.
Convolutional Neural Networks (CNN) as a representative algorithm of deep learning have been widely used in computer vision, natural language processing, and other fields [11].Conventional EEG data processing relies on the experience of researchers for complex data preprocessing and feature extraction.However, human-operated preprocessing and feature extraction will reduce the accuracy and reliability of classification results [11].And the correlations between EEGs of different channels are easily ignored during the feature extraction process [12].The CNN model can automatically extract features from the original input signal and obtain deeper and more distinguishable feature information through local receptive fields, weight sharing, and downsampling, which reducing the subjectivity and incompleteness of feature selection caused by human factors [13], [14], [15].Amin et al. [16] proposed an attention-based CNN model to learn the importance of different features of MI data and obtained good results when they applied it to the BCI IV 2a dataset.Roy [17] proposed a Multi-Scale (MS) CNN which can extract the distinguishable features of several nonoverlapping canonical frequency bands of EEG signals from multiple scales for MI-BCI classification.Zhao et al. [18] proposed a multi-branch 3D-CNN classification strategy and the 3D representation is generated by transforming EEG signals into a sequence of 2D array which preserves spatial distribution of sampling electrodes.Li et al. [19] proposed an end-to-end EEG decoding framework, which employs raw multi-channel EEG as inputs, to boost decoding accuracy by the channel-projection mixed-scale convolutional neural network aided by amplitude-perturbation data augmentation.However, due to significant individual differences between subjects, such as the optimal time period and frequency band of ERD/ERS changes [20], [21], [22].It is not good enough to use conventional recognition methods to perform shallow temporal or spectral feature learning on MI features.Therefore, due to the influence of individual differences among stroke patients, the refinement and weight assignment of deep features is another research interest that could improve the accuracy of MI-EEG decoding deep learning models.
Squeeze-and-Excitation Networks (SENet) as a channelbased attention mechanism, treats each feature channel as a whole and use global information to automatically "learn" the importance of different feature channels in the training process, thereby suppressing the relatively unimportant features in the training classification process and boosting the most discriminative and information-rich features to improve the accuracy of the model [23].Sun et al. [24] proposed a CNN with sparse spectrotemporal decomposition (SSD) for MI-EEG classification, which adopted SE to adaptively recalibrate the channel direction.Zhang et al. [25] proposed an orthogonal CNN fused with SE blocks to perform feature recalibration across different EEG channels.Inspired by SE, we merged SE blocks into the CNN model, enabling the model to automatically obtain the weights of each feature channel (EEG features of different time and frequency bands), adaptively weighted the feature maps generated by the original feature fusion layer, and improve the proportion of useful features in the current task.This approach can solve the problem of optimal features of EEG signals from different subjects located in different frequency bands, and train a MI recognition and classification model with high recognition accuracy suitable for specific users.
The current rehabilitation strategies based on MI-BCI mainly improve the MI recognition accuracy of subjects by improving the feature extraction algorithms and neglecting the impact of MI signal strength on recognition accuracy [26].Therefore, in order to maximize the activation of the subjects' motor nerves and improve their signal strength, virtual rehabilitation technology combining MI-BCI technology and virtual reality (VR) technology is applied in the field of stroke rehabilitation [27].VR technology has solved the problem of poor immersion and multiple external environmental interference factors (sound, light) in conventional rehabilitation training strategies (by observing cues on computer screens) [28].VR technology can provide an immersive training environment, improve the interhemispheric activation balance (IHAB) and enhance the cortical connectivity between the primary sensorimotor cortex (SM1), the primary motor cortex (M1), and the supplementary motor area (SMA) on both sides of the subject during motion induction.The VR scene can provide real-time feedback in each training task, achieve more comprehensive MI training, improve rehabilitation efficiency, shorten the rehabilitation period, and enhance the patient's initiative and adaptability in rehabilitation [28].Jang et al. [29] demonstrated a shift in cortical organization of the affected limb from the ipsilateral hemisphere to the contralateral hemisphere after the VR intervention.Mekbib et al. [30] revealed that unilateral and bilateral limb mirroring exercises in an immersive virtual environment may stimulate MNs in the damaged brain areas and may facilitate functional recovery of the affected upper extremities post-stroke.However, the current VR approaches use single-scenario rehabilitation, the inter-individual adaptability is poor [27].At the same time, the conventional rehabilitation training strategies lack visual feedback based on motor intention, but at the neural mechanism level, visual feedback based on motor intention can activate the mirror neuron system, promote brain plasticity changes and functional reorganization, and contribute to the recovery of motor function [31].Therefore, we used virtual character arm motions to cue patients for simulated movements in VR, and the patient's mirror neurons were activated.Then, the patient performed MI, and the computer decoded the EEG and converted it into control commands for the virtual character to achieve visual feedback of the motion intention.Patients continuously adjusted the MI process based on feedback results.
In this paper, we developed a virtual reality upper-limb rehabilitation exoskeleton system (VR-ULE).VR-ULE used an SE block-based CNN model and a VR scene to improve the MI recognition accuracy of stroke patients.VR-ULE includes a wearable exoskeleton hardware subsystem and an MI recognition software subsystem.The wearable exoskeleton subsystem is used to assist the patient's limb movement.The software subsystem for MI recognition is composed of a VR scenes cues control module, CNN+SE module, and an online hybrid control module.The VR scenes cues module is used to provide patients with visual cues and feedback with mirror operation intention.CNN+SE module is used to automatically analyze the importance of EEG features in different time periods and frequency bands.The SE blocks is used to emphasize important features and suppress nonimportant features through adaptive weighting.The online hybrid control module is used to receive the patient's MI signals and provide virtual feedback signals in the early and middle stages of the patient's motor neuron rehabilitation.In the later stages of rehabilitation, the online hybrid control module is used to and control the upper-limb exoskeleton robotic arm to assist patients in muscle group strength rehabilitation training.We designed an offline training experiment to obtain the MI EEG data of different subjects and trained the CNN+SE classification model.We also designed an online control experiment to evaluate and validate our proposed rehabilitation strategy and training system.The contributions of this study include: (1) Based on the theory of neural plasticity, we independently designed an upper-limb rehabilitation exoskeleton system for stroke patients at different rehabilitation training stages to perform active movement and passive movement; (2) The SE module based on channel attention mechanism in the CNN model were used to obtain the frequency band (3) Based on the mirror neuron theory, we built three VR rehabilitation training scenes (lifting dumbbells, tasting fruits, and feeding pets) with virtual motion guidance to improve the immersion experience and avoid some environmental interference factors in conventional screen cues.

II. SE-BASED CLASSIFICATION STRATEGY OF MI A. System Introduction
The VR-ULE consists of a wearable exoskeleton hardware subsystem and a MI recognition software subsystem.The system framework is shown in Fig 1.
The wearable exoskeleton subsystem consists of a selfdesigned and developed functional backpack and an upperlimb exoskeleton robotic arm.The functional backpack is used to store various hardware control modules, and the upper-limb exoskeleton robotic arm is used to assist the patient's limb movement.The MI recognition software subsystem consists of a VR scene cue module, a CNN+SE module, and an online hybrid control module.The VR scene cue module is used to provide patients with a strongly immersive virtual MI cue during the CNN+SE model training stage.The CNN+SE module it is used to amplify the strong-response frequency band of EEG during the patient's MI, accurately identify and classify the patient's motor intentions.The online hybrid control module is used to receive the MI signal from patients during rehabilitation, provide feedback signals in different periods of the patient's rehabilitation, and control the motion of virtual character or upper-limb exoskeleton robotic arm motion.
1) Wearable Exoskeleton Subsystem: We independently designed and produced a lightweight wearable exoskeleton hardware subsystem, as shown in Fig 2 .The wearable exoskeleton hardware subsystem consists of a functional backpack, an upper-limb exoskeleton robotic arm skeleton, two power levers, a push lever drive board, a servo motor gear, a single-chip microcomputer, a power module pack, a lithium battery, a four-finger bionic hand, a disc damping shaft, and a Authorized licensed use limited to the terms of the applicable license agreement with IEEE.Restrictions apply.universal joint.Among them, the functional backpack made of 3D printing materials which is light, small, and comfortable.The inside of the backpack is used to store the push lever drive board, servo motor gear, single-chip microcomputer, power module pack, and lithium battery.Nylon shoulder straps and waist belts can be adjusted according to the patient's body shape.The upper-limb exoskeleton arm can simulate the motion of a healthy arm.Bionic arm skeleton can fit the patient's arm, and the power lever can provide motion thrust to assist the patient's limb movement, which realized a series of rehabilitation training motions with five degrees of freedom, including grasping, wrist flexion, elbow flexion, and shoulder abduction (Fig 3).The four joints of the exoskeleton arm are connected by detachable binding screws, which can be smoothly rotated by simply inserting them into the holes.The four-finger bionic finger module is located at the end of the upper-extremity exoskeleton, and the knuckles use the short-range rope-driven motion mode of the servo motor to achieve a more natural finger grasping effect.The upperlimb exoskeleton arm is mounted on the backpack through the universal joint behind the right shoulder.The backpack is equipped with a lithium battery (12 V, 2400 mA), a lever driver board (L298N), and a microcontroller (ESP-WROOM-32, Shenzhen Yusong Chuangda Electronics Co. Ltd, China).The entire wearable exoskeleton hardware subsystem is sewn onto the inner nylon fabric, it is fixed to the patient's chest and waist by multiple elastic nylon straps, and all its drive levers are driven by lever driver board control.The singlechip microcomputer has a 4-MB storage space, which can communicate with the computer through WiFi to receive and convert the MI signal identified and classified by the CNN model with SE into feedback control signals to control the motion of the exoskeleton.training, better stimulates the enthusiasm of patients for training, improves rehabilitation efficiency, and shortens the rehabilitation cycle.When a patient wears a VR head-mounted display (VIVE-P130, HTC, Inc.) for offline MI training, the VR scene only provides virtual motion cues, but during online MI training, the patient sees the virtual motion cues and then imagines the movement.After the MI signal is recognized and classified by the CNN model, the online hybrid control module outputs it back to the virtual scene to control the motion of the virtual characters.If the MI is wrong or fails, the virtual scene will display the recognition results to provide feedback to the patient to prompt the patient continuously correct or strengthen the MI.
b) CNN+SE module: The EEG is a signal with spatiotemporal characteristics, and its feature extraction process needs to consider temporal and spatial features [32].Because there are significant individual differences in the frequency characteristics and spatial characteristics of EEG signals among different subjects during MI., by incorporating the improved SE blocks into the CNN model, it is beneficial to train a CNN model that meets the specified user, thereby improving the recognition accuracy of MI.To this end, we independently designed and built a CNN model with an embedded SE block for the correct identification and classification of the patient's MI signals, as shown in Fig 5.
The entire CNN+SE model consisted of 10 layers: the first layer was the input layer; the second and third layers were convolutional layers, which constituted the feature extraction part; the fourth layer was the feature fusion layer; the fifth layer was the SE blocks layer, which constituted the frequency-band channel-weight learning part; the sixth layer was the feature weighting layer, which was used to weight the output feature map of the fourth layer; the seventh layer was the average pooling layer, which was used for down sampling; the eighth and ninth layers were fully connected layers; the 10th output layer constituted the classification part; and the 11th layer was the output layer, which we used to output the classification result.
where C 2 m ( j) is the output feature map of the C2 layer, the superscript 2 represents the number of layers, the subscript m represents the mth feature map, j represents the jth neuron The process can be expressed as where k 2 m is the convolution kernel of In the SE layer, the squeeze operation is first performed to compress the input feature map tensor in space, that is, global average pooling is performed on the input feature maps in turn, and the results are fully connected.The output is a [1 × 1 ×8] feature map F 5 to achieve the purpose of compressing and integrating the original 8 feature maps and shielding spatial distribution information.At the same time, by extracting the overall information of the eight feature channels, the underlying network can also obtain the global receptive field.This process can be expressed as Then, the excitation operation is performed to learn the nonlinear interactions between the eight feature channels and limit the complexity of the model by using two fully connected Authorized licensed use limited to the terms of the applicable license agreement with IEEE.Restrictions apply.
layers with activation functions and no bias.These two fullyconnected layers are dimensioned down and up, respectively, to form the structure of a bottleneck.The process can be expressed as i. e, first F 5 is multiplied by R4M 1 in a fully connected layer operation, then multiplied by a rectified linear unit (ReLU) layer to keep the output dimension unchanged.Then by multiplying the result by R4M 2 in a fully connected layer operation, and then through a ReLU, and so on to R4M 8 , a [1 × 1 ×8] feature map is output through the sigmoid function F 6 .
Feature weighting layer (R7) -This layer performs channelwise multiplication using the weights obtained by the excitation operation to perform channel-by-channel adaptive weighting on the original eight feature maps.That is, it multiplies the eight feature maps in the initial input SE by the eight weights in F 6 .Finally, eight feature maps of size [40 × 75] are obtained to achieve feature weighting.The process can be expressed as Pooling layer (R8) -This layer performs average pooling of the output of the R7 layer in 5 × 5 regions with a stride of 5, and the output is eight feature maps of size [8 × 15].Fully connected layer (F9) -This layer is used as a fully connected layer.The eight feature maps output by the R8 layer are fully connected to obtain eight feature maps with a size of [120×1].This process can be expressed as where k 9 m is the convolution kernel of [1 × 1]; b 9 m ( j) is the bias; Fully connected layer (F10) -This layer fully connects the eight feature maps output from the F9 layer to form a classification part of size [960 × 1], containing 200 neurons: where ω 10 i ( p) is the connection weight from the neurons in the F9 layer to the neurons in the F10 layer, and b 10 ( j) is the bias.
Output layer (O11) -This layer is the output layer, containing two neurons, representing a binary classification problem.The process can be expressed as where ω 5 (i) is the connection weight of the neurons in the F10 layer to the neurons in the O11 layer, and b 5 ( j) is the bias.
Online hybrid control model: The online hybrid control module converts the classification signal identified by the MI recognition software subsystem into a VR-scene character motion control signal or an exoskeleton motion control signal.First, the VR scene control module in the MI recognition software subsystem will randomly generate left-and righthand MI motion cues, and the stroke patients will then try MI a certain time.The trained CNN+SE model will acquire the subject's MI EEG data and perform identification and classification.The classification results are identified by the online hybrid control module according to the training task and converted into a continuous control signal output.The Control-flow diagram of the VR-ULE system is shown in Fig 6.

III. EXPERIMENT
For the MI classification strategy based on the combination of VR and SE blocks in our proposed VR-ULE, we designed offline training experiments and online control experiments to test the effectiveness of the strategy.In the offline training experiment, we trained two types of CNN+SE models that were cued by the conventional experimental scene and the VR scene for each subject in order to do comparative verification.In the online control experiment, we first selected the highestaccuracy classification model trained in the offline experiment for different subjects.Then, analogous to the rehabilitation stage of brain motor neurons in the pre-rehabilitation stage of stroke patients, the subjects will perform MI based on the VR scene to control the virtual characters to complete the corresponding virtual tasks.At the same time, analogous to the upper-limb muscle group strength training stage of stroke patients in the later stage of rehabilitation, the subjects independently perform MI according to the task requirements to achieve different control of the exoskeleton system and complete corresponding tasks.Finally, the completion results of the two types of analogy experiments are evaluated.We also chose three methods, conventional CNN [11], MRA+LDA [33], and CSP+SVM [34], to train the MI recognition model on the same training set, and then these models were tested using the same test set.

A. Subjects and Dataset Preparation
For the experiment, we recruited 20 healthy subjects (age: 22 ± 1.21 years), all right-handed (as assessed by the Edinburgh Handedness Questionnaire) [35].At the same time, we also recruited one mild stroke patient and one moderate stroke patient to participate in the experiment.All the subjects participated in the EEG experiment for the first time and were not told any experimental hypotheses.Each subject signed an informed consent form before the experiment.The experimental procedure was reviewed and approved by the Human Ethics Review Committee of Zhejiang University of Technology.
EEG signals were acquired with the ActiveTwo64 channel EEG signal acquisition system (BioSemi, Netherlands).Twenty-three channels of EEG data (Fz, FC3, FC1, FCz, FC2, FC4, FC6, C5, C3, C1, Cz, C2, C4, C6, CP3, CP1, CPz, CP2, Authorized licensed use limited to the terms of the applicable license agreement with IEEE.Restrictions apply.CP4, P1, Pz, P2, POZ) were collected.The reference electrode was placed at the mastoid of the left ear; the ground electrode was replaced by two independent electrodes, common-mode sense (CMS) and driven right leg (DRL).Before placing the electrodes, a conductive gel was used to reduce the impedance between the electrodes and the scalp.The sampling frequency was set to 250 Hz, the cutoff frequency of the high-pass filter was 0.05 Hz, the cutoff frequency of the low-pass filter was 200 Hz, and the power frequency notch was 50 Hz.After placing all the electrodes, the subjects sat in front of the computer screen and put their hands on the table naturally.The subjects were asked to avoid blinking and unnecessary head or body motions as much as possible.The collected data were divided into a training set, validation set, and test set in a 3:1:1 ratio.The screen was blank for the first 2 s, and then a "+" character appeared in the center of the screen to remind the subjects that the trial was about to start.From 3 s to 6 s, the "+" character on the screen changed to a randomly generated leftward or rightward arrow and the subjects imagined their left-hand movement or the right-hand movement according to the arrow point.There was a random interval of 2 s to 5 s between each trial.There was a 3-min rest time between every 20-trial set to avoid fatigue.

B. Offline Training Experiment
The sequence diagram of the VR test is shown in Fig 7B-b.Each trial lasted for 8 s.The screen was blank for the first 2 s, and then the word "ready!" appeared in the center of the VR scene to remind the subjects that the trial was about to start.From 3 s to 6 s, the left-or right-hand movement randomly appeared in the VR scene.The subjects imagined their left hand or right-hand movement according to the body movements in the VR scene.There was a random interval of 2 s to 5 s between each trial, and there was a 3-min rest time between every 20-trial set to prevent fatigue.Each subject was tested with one set of data, including 200 non-VR trials and 200 VR trials, with seven subjects totaling seven sets of data and 14 parts.
The data of each subject's non-VR test and VR test were cropped, and after removing the data corresponding to the subjects' rest period, the EEG data in the frequency band of 7 Hz to 31 Hz were obtained by filtering, and the extracted EEG data were further processed for frequency band separation of 7-10, 10-13, 13-16, 16-19, 19-22, 22-25, 25-28, and 28-31 Hz.At the same time, the window length of the data segment was defined as 7.5 s.The CNN+SE preprocessing code intercepted the 3 s after the MI cue as the model input, so each input sample was composed of a 22-channel × 750-sampling-time-point (3 s time period × 250 Hz sampling rate) matrix, and each subject finally had 200 matrices of size [22 × 750] corresponding to the non-VR test and VR test.Finally, 200 matrix data of size [22 × 750] for each subject's non-VR test and VR test were randomly divided into five copies, three of which were used to train each subject's MI model, one copy is used as a validation set for training the model and one of which was used as a test set for evaluating the trained model.The EEG of the C3 and C4 channels in all trials of each subject was subjected to the superposition average calculation (ERD/ERS), which can be expressed as The sequence diagram and brain topography of ERD/ERS were observed, and the time period when the ERD/ERS pattern appeared and ended in each trial was recorded.
We compared this model with three other types models, CNN, MRA+LDA, and CSP+SVM, which we trained on the same training set and then used the same test set to test these models.

C. Online Control Experiment
The online control experiment had two parts: the VR scene online cue control test and the exoskeleton online control test.The MI model with high accuracy trained by each subject in the offline training experiment was used for the control signal input of the VR scene and the exoskeleton in the online test.The experimental scenario is shown in Fig 8 .In the online control test of the VR scene, the subjects did not wear the exoskeleton equipment, only the VR headmounted display and the EEG cap, and sat in front of the screen.The subjects rested their hands on the table.The screen was used to display the head-mounted display in real-time.In the VR scene, the MI instruction "Please move your left hand" or "Please move your right hand" appeared once every 10 s.At the same time, the virtual characters in the VR scene made the same actions to guide the subjects, and the subjects performed the corresponding MI according to the instructions.The PC saved the EEG data collected 2 s before and 5.5 s after the MI instructions.The data processing code preprocessed the saved data and input it into the MI model trained by the offline training experiment for classification.The classification result was converted into a control signal and input to the VR scene c control module to control the virtual character arm in the VR scene to move and complete the corresponding task.Each subject performed 20 MI tasks with the left hand and 20 with the right hand for each type of scene, so a total of 120 MI tasks were performed.
In the exoskeleton control test, the subjects wore the exoskeleton on the right hand and sat in front of the table with the EEG cap on.The MI EEG signals in the left and right hands were mapped as control signals for the input exoskeleton to perform the task action or not, respectively.The subjects made an MI trial every 8 s according to the task requirements, and there was a 8-s rest between trials.All EEG data were saved in the PC during the test.Similarly, the data processing code preprocessed the saved data and input them into the MI model trained by the offline training experiment.The classification results were converted into control signals and input to the exoskeleton control module to control the motion of the exoskeleton.A total of 30 MI trials were designed as a full test.

A. Public Dataset Results
To verify the effectiveness of the SE blocks in the SE-VRbased MI classification strategy, we applied it to the public dataset 1 in BCI Competition IV for model training and verification, and compared it with CNN, MRA+LDA, and CSP+SVM.For this dataset, seven subjects selected two types of movements from the left hand, right hand and foot, and to perform 100 MI trials.Record the EEG data of 64 EEG channels for each subject at a sampling rate of 1000 Hz.More details on the experimental paradigm corresponding to this dataset can be obtained from the following website: https://www.bbci.de/competition/iv/desc_1.html.
The classification results of the four MI classification models for different subjects in the public dataset are shown  in Table I.The average classification accuracy of the four MI classification models was CNN+SE 87.53% ± 1.07%; CNN 83.32% ± 1.04%; MRA+LDA 84.55% ± 1.62%; and CSP+SVM 83.02% ± 2.07%.

B. Results of Offline Training Experiment
ERD/ERS Analysis: After the EEG data of the C3 and C4 channels of each subject in all tests were separated, the superimposed average calculation of the ERD/ERS phenomenon was performed [36].The ERD/ERS sequence diagram from 0 to 5 s and brain topography from 0 to 6 s for a single trial is shown in Analysis of the classification results of the non-VR tests and VR tests: After using the four classification models for training of each subject's non-VR tests and VR tests EEG data, the model with the best convergence was selected, and the classification test was performed.The resulting classification accuracy is shown in Table II.From the data in the table, it can be seen that compared with a single screen cue, the VR scene to provide MI cues obtained higher classification accuracy with the same classification model, and CNN+SE achieved higher classification accuracy than CNN, MRA+LDA, and CSP+SVM (non-VR:78.13%± 2.60 and  VR:86.72%± 2.99%).In addition, for the CNN+SE model with VR, the model classification accuracy of two patients were lower than the average model classification accuracy of 20 healthy subjects (86.72% ± 2.99%), and the model recognition accuracy of Patient 1 (86.44%) was higher than that of Patient 2 (81.94%).
To better evaluate the classification accuracy of the four classification models, the three average indicators, precision, recall, and F score, were introduced in Table III.Based on the data in Table II and Table III, we conducted an ANOVA on the two types of MI cues (VR or non-VR), four types of classification models (CNN+SE, CNN, MRA+LDA and CSP+SVM), and two types of MI classes (Left or Right hand) to evaluate their interactions and their impact on classification accuracy.The results indicate that when the confidence level is set to 95%, there is no interaction between the MI class and the classification model, and there is no interaction between the MI cue and the classification model (all p > 0.01).The classification model has a significant effect on classification accuracy (F = 63.984,p < 0.01), and the MI cue has a significant effect on classification accuracy (F = 152.328,p < 0.01), but the MI class has no significant effect on the classification accuracy (F = 0.02, p > 0.01).

C. Results of Online Control Experiment
In the online experiment, the task completion rate of each subject is shown in Table IV.The task success rate was defined as the percentage of times a task was completed correctly.It can be seen from the online experimental results that the success rates of virtual scene tasks 1 (lifting dumbbells) was higher than virtual scene task 2 (tasting fruits) and 3 (feeding pets), and the success rate of the exoskeleton control task was higher than the virtual scene tasks (88.48% ± 5.84%).And there is no significant difference in the success rate between the four types of tasks of Patient 1 (87.50%,85.00%, 80.00%, 86.67%) and the average success rate of the four types of tasks of 20 healthy subjects (87.00% ± 4.78%, 85.25% ± 4.74%, 80.13% ± 4.71%, 88.83% ± 5. 99%).The success rate of the four types of tasks of Patient 2 (87.00% ± 4.78%, 85.25% ± 4.74%, 80.13% ± 4.71%, 88.83% ± 5.99%) is lower than the average success rate of the four types of tasks of 20 healthy subjects.Our offline experimental results also show that when the subjects tried MI, compared with the boring screen cues, the use of VR cues was more helpful for training a network model, with higher classification accuracy.All four classification models were verified, in which CNN+SE obtained a classification accuracy of 86.49% ± 3.02%.The reason is that the VR scene brings the subjects a more immersive experience, avoids the interference of many external factors, and makes it easier for the subjects to concentrate, and the arm movements of the characters in the virtual scene will guide the subjects to quickly generate corresponding responses, improving their IHAB while activating connections between more areas of the cerebral cortex, cueing patients to produce more pronounced MI EEG features.In related research on stroke rehabilitation, Sip et al. [38] applied the Virtual Mirror Hand 1.0 procedure to the treatment of hand functional recovery after stroke and compared it with the classic mirror therapy, finding that applying VR to the rehabilitation of stroke patients was feasible.Nath et al. [39] developed a VR task library for upper-limb rehabilitation of poststroke patients and concluded that VR therapy can improve the clinical symptoms of chronic stroke patients.

V. DISCUSSION
Our online control experiments showed that the average success rate of exoskeleton control task was 88.48% ± 5.84%, which was higher than that of virtual character arm movement tasks in VR scenes.The reason is that the MI command in the exoskeleton control task uses real task actions to improve patients' perception and motion mechanisms [40].Patients can perform more concrete MI based on the obtained perception experience, which can improve the model recognition accuracy.
In the offline training experiment, the model classification accuracy of two patients were lower than the average model classification accuracy of 20 healthy subjects (86.72% ± 2.99%), and the model recognition accuracy of Patient 1 (86.44%) was higher than that of Patient 2 (81.94%).The reason is that different degrees of stroke can cause varying degrees of damage to the patient's motor neurons, thereby affecting the patient's ERD/ERS patterns during MI and reduce the classification performance [41], [42].
One limitation of this study is that the subjects in our experiment lack diversity.In future studies, we will conduct more experiments on stroke patients of different ages, genders, and rehabilitation stages to verify the effectiveness of our proposed rehabilitation strategy.

VI. CONCLUSION
In this paper, based on the MI-BCI paradigm and VR technology, we designed and developed a VR-ULE that can be used for the rehabilitation of stroke patients with hemiplegia.The system obtains the patient's motion intention through the MI EEG identification strategy based on a CCN and SE blocks, and it controls the execution of VR-ULE rehabilitation training actions.The SE module makes up for the shortcoming that different subjects have differences in MI frequency domain characteristics.The MI indication based on the VR scene strengthens the MI EEG of the subjects and makes up for the shortcomings of the current MI-BCI rehabilitation strategies, such as a single rehabilitation scene, poor individual adaptability, and many external environmental interfering factors.Our results show that compared with the conventional classification strategy, the proposed MI EEG recognition method (CNN+SE) can improve the MI classification accuracy.The VR-ULE system can more efficiently help stroke patients complete upperlimb rehabilitation training tasks through a more reasonable MI identification strategy and an immersive experience of VR scenes, all of which improve the patients' autonomous rehabilitation.

2 )
MI Recognition Software Subsystem: The MI recognition software subsystem consists of a VR scene cue module, a CNN module with SE, and an exoskeleton control module.It is used for the identification, classification, and control signal output of the patient's MI signal.a) VR scene cue module: We designed and built three types of VR training scenes for MI: lifting dumbbells, tasting fruits, and feeding pets.These are used to provide patients with virtual MI cues (Fig 4).While ensuring the sense of immersion, it increases the interest in rehabilitation
c) Description of each network layer of CNN+SE: Input layer (L1) -200 22 × 750 matrices of input samples per channel L N ,T where N is 22, representing 22 EEG channels, and T is 750, representing the sampling time point in each channel.

Fig. 5 .
Fig. 5. Framework diagram of MI recognition model.The input EEG signal samples are divided into multiple different frequency bands in multiple channels, the weight of different feature channels is learned by combining SE blocks to emphasize the useful information frequency band features.
in the feature map, k 2 m is the convolution kernel of [22 × 1], and b 2 m ( j) is the bias.Convolutional layer (C3) -This convolutional layer performs temporal convolution on the input EEG through five convolution kernels of size [1 × 10].Eight frequency band channels are extracted to output 40 [1 × 75] feature maps.

[ 1 ×
10] and b 2 m ( j) is the bias.Feature fusion layer (R4) -This layer splices 40 feature maps of size [1 × 75] output by the C3 layer of each frequency band channel to form a [40 × 75] feature map, with a total of eight feature maps of size [40 × 75]R4M c output from eight frequency bands.SE Blocks layer (SE) -this layer inputs feature maps R4M c with the size of [40 × 75 ×8].

Fig. 7 .
Fig. 7.The offline training experimental scenario is shown in A, where (a) is training without VR and (b) is training with VR; the timing diagram of one trial for two types of experiments are shown in B, where (a) is timing diagram of one trial without VR and (b) is timing diagram of one trial with VR.
Each subject needed to complete 200 trials in each of the non-VR tests and VR tests throughout the offline training experiment (Fig 7), including 100 imaginary left-hand movements and 100 right-hand movements.The sequence diagram of the non-VR test is shown in Fig 7B-a.Each trial lasted for 8 s.

Fig. 8 .
Fig. 8.The Online control experimental scenario, where (a) is exoskeleton control test and (b) is VR control test.

Fig. 9 .
Fig. 9. ERD/ERS time course from 0 s to 5 s and EEG topography from 0 s to 6 s for left-and right-hand MI of all subjects in all trials.

Fig 9 .
As seen from the Fig, the ERD/ERS pattern appeared in each trial for a time period ranging from 3.5 s to 4.5 s.Analysis of the training results of the non-VR tests and VR tests: The two types EEG data of each subject were trained in four classification models, and models with good convergence were obtained, and the training loss curve of each subject was recorded.The best-converging model came from subject 06.The training loss curve of top three models is shown in Fig 10.

Fig. 10 .
Fig. 10.The model training loss curve obtained by the training of top three classification models on the VR tests data of Subject 06.

TABLE I CLASSIFICATION
ACCURACY OF FOUR CLASSIFICATION MODELS FOR EACH SUBJECT IN THE PUBLIC DATASET

TABLE II THE
TEST CLASSIFICATION ACCURACY OF EACH SUBJECT'S NON-VR TEST AND VR TEST DATA IN THE FOUR CLASSIFICATION MODELS Our designed offline training experiments (MI recognition and classification model training) and online control experiments (VR-scene character and exoskeleton arm movement control tasks) fully verify the effectiveness of our proposed VR-ULE system.The offline training experiment results show that adding SE blocks into the CNN can promote the accuracy of MI EEG classification in public datasets.And we got a classification accuracy of 77.94% ± 2.65% in our own Authorized licensed use limited to the terms of the applicable license agreement with IEEE.Restrictions apply.

TABLE III THE
TEST CLASSIFICATION PRECISION, RECALL, AND F SCORE OF THE VR TESTS DATA FOR EACH SUBJECT IN THE FOUR CLASSIFICATION MODELS measured dataset.The reason is that the CNN will learn the and spatial features of the subject's MI, and the SE blocks performs feature weighting operations on the EEG data of subjects in different frequency bands to learn strong features of the MI EEG frequency band of each subject.The advantage of this method is that while avoiding individual differences, the final classification result of the model will also be output based on the weights of all frequency bands, avoiding transiently missing a signal in a band of the subject during the online test affecting the Authorized licensed use limited to the terms of the applicable license agreement with IEEE.Restrictions apply.

TABLE IV THE
[37]ESS RATE OF EACH SUBJECT'S FOUR TYPES OF ONLINE EXPERIMENTAL TASKSclassification results.Relevant studies have obtained results consistent with ours.For example, Sun et al.[24]used a deep learning framework called SSDSE-CNN integrating the SE blocks for MI-EEG classification, and the highest classification accuracy obtained was 79.3% ± 6.9%.Li et al.[37]proposed a novel temporal-spectral-based SE feature fusion network for MI-EEG decoding, and the highest classification accuracy was 84.49% on the public dataset.