Introduction
Approximately 80% of stroke survivors suffer from motor dysfunction that affects one or both upper limbs, especially the coordination and flexibility of the hands [1]. Hand impairment is one of the major causes of functional limitations in individuals with post-stroke hemiparesis. Given that the hand provides 90% of the motor function of the upper limb, the patient’s autonomy is reduced, thus affecting the performance of daily activities and reducing the quality of life [2]. An adequate treatment of hand motor function impairment is necessary for establishing a realistic prognosis, planning customized rehabilitation interventions, and evaluating the effectiveness of those interventions [3].
In the clinical setting, assessments of hand motor function are typically performed by the ‘standardized’ clinical scales and tests [4]. The Brunnstrom assessment (BA) scale [5] and Fugl-Meyer assessment (FMA) scale [6] are two most commonly used hand function assessment scales for stroke patients. Swedish physiotherapist Brunnstrom developed BA in the 1970s to assess movement disorders following central nervous system injuries. In his theory, stroke patients are divided into six stages of recovery. The FMA scale of the hand contains seven types of hand assessment movements. Each movement is assigned a qualitative rating with a score of 0, 1 or 2 depending on how well it is performed. However, since both scales are graded according to the therapist’s own experience, both scales are subjective and have the disadvantage of the “ceiling effect”.
The “ceiling effect” widely exists in assessing post-stroke upper extremity dyskinesia [7], which means that the scale is insensitive to the change of those patients at the top end of the recovery (i.e., the scale resolution is low for patients with good motor recovery) [8]. In this study, the “ceiling effect” is suggested to occur when the patient’s assessment score reaches over 80% of the maximum score. In this setting, approximately 80% of patients in the experiments conducted in this study face the “ceiling effect”. It is obvious that the scale “ceiling effect” affects the assessment accuracy, which should be addressed in practice.
Due to the above shortcomings, some papers in the literature used sensors to measure the hand kinematic information of stroke patients accurately and used artificial intelligence methods to evaluate their hand motor functions. Fang et al. allowed patients to perform seven FMA-specific movements above the leap motion (LM) sensor and evaluated the patients’ FMA scores based on the range of each finger’s angular changes detected by the LM [9]. Hamaguchi et al. used LM to record the finger angle changes of 24 stroke patients’ hands during group flexion and extension within seven seconds, calculated the peak angle and normalized peak velocity, and then used a support vector machine method to classify the patients into six categories [10]. Li et al. designed a set of combined movements of the wrist and fingers and then utilized LM to measure the angle ranges of the patient’s fingers when performing the movements to evaluate the patients by ensemble learning [11]. Song et al. designed a mobile phone-based automated Fugl-Meyer assessment system for stroke patients. Patients were asked to complete specific tasks when holding a mobile phone. Afterward, the motion information collected by the mobile phone was used in combination with the decision tree method to evaluate the patient’s upper limb function [12]. Adams et al. built a virtual environment assessment system. The system assessed the patient based on the task completion time, the average speed of hand movement, and task scores. Spearman rank correlations showed a high and significant correlation between virtual world-derived measures and gold-standard assessments [13], [14]. Although the above studies used artificial intelligence methods to assess the hand motor function of patients directly, they all used the methods adopting features by the manual extraction manner. Nevertheless, the manually extracted features may not be optimal.
Deep learning belongs to the method which is capable of automatically extracting features. A typical deep learning network is the convolutional neural networks (CNNs). With the development of graph neural networks, graph convolution networks (GCNs), which extend CNNs to graphs of arbitrary structures, have received increasing attention and have been successfully used in various applications, such as image classification, document classification, and skeleton-based movement recognition [15]–[17]. In particular, Yan et al. proposed spatial-temporal graph convolutional networks for skeleton-based movement recognition and achieved a good classification result [18]. Whereas, hand joints can also be regarded as a graph network structure and the GCN-based hand motion assessment has yet to be explored.
Furthermore, besides kinematic signals, sEMG signals also reflect the hand motor function of stroke patients to a certain extent. Zhang et al. estimated the muscle strength by collecting sEMG signals and using a third-order polynomial fitting technique [19]. The muscle strength can reflect the motor function. Some studies have realized that studying both hand kinematics and sEMG data when conducting clinical trials can better understand the muscle coordination in functional recovery. However, they did not study how to implement the multi-modality fusion evaluation and only studied healthy people rather than stroke patients [20]–[23].
Given the above observations, the main contributions of this paper can be summarized as follows:
To automatically extract practical features and make full use of the spatial position information of the human hand joints, we propose a hand assessment graph convolution network (HAGCN). The network includes the graph convolution in the spatial domain and the temporal convolution in the time domain. To the best of the authors’ knowledge, this is the first study of applying the GCN to assess the hand motor function.
Both the motion signal and the sEMG signal are analyzed, and the weighted decision fusion method is used to assess the hand motion function of patients with post-stroke hemiplegia, which improves the accuracy of the assessment. The SCs between the assessment result of this study and the traditional scales are 0.908 and 0.967, respectively, proving that there is a significant correlation between the proposed assessment and the traditional scale scores.
The therapists have refined the assessment results of 25 stroke patients facing the “ceiling effect”. These stroke patients have been assessed by the proposed algorithm as well. The SC between the score of this study and the refined assessment is 0.997, indicating that the “ceiling effect” in some traditional scales can be avoided.
The remaining parts of this study are organized as follows: Section II introduces the proposed experimental setup and the acquisition of multi-modality data. Section III presents details on the assessment framework based on HAGCN, LSTM and multi-modality data fusion. Then, the assessment results and the related discussions are provided in Section IV and Section V, respectively. Finally, Section VI concludes the paper with final remarks.
Experimental Methods
A. Participants
The experiments were performed in collaboration with the China Rehabilitation Research Center (Beijing Bo’ai Hospital) and we recruited 35 post-stroke hemiparetic patients (27 males, 8 females, mean age of 52.7 ± 12.7 years) from the hospital. The study imposes no subject requirements in terms of the minimum level of required motor function, as long as the subject has no cognitive deficits. Before the experiment, each post-stroke participant was examined by three experienced therapists for the Brunnstrom stage classification and the hand section of the Fugl-Meyer assessment. Then, according to the majority rule, the BS and FS of the patients are determined. The detailed clinical assessment results of the enrolled subjects are shown in Table I.
To eliminate the “ceiling effect”, three therapists selected 25 patients with the Brunnstrom stage greater than III, touched these patients, felt the patient’s gripping strength by shaking hands with them, and observed the patient’s completion of some props tasks (such as drawing strokes and inserting nails) in the occupational therapy room to further rank the patients. This step still follows the majority rule. The sorting result is shown in Order 1 of Table V.
This research was reviewed and approved by the Ethics Committee of the China Rehabilitation Research Center (approval number: 2021-108-1). Each subject signed a written informed consent form before enrollment.
B. Acquisition Setup
To explore the kinematic and muscular characteristics in normal and pathological movement patterns, we collect the kinematic and sEMG data of the subjects simultaneously.
1) Kinematics:
Kinematic data are acquired by WISEGLOVE19 (Xintian Vision, Beijing, China) at a sampling rate of 200 Hz; the device collects data from 19 joint angles of the fingers. The data glove adopts an optical fiber to measure the angle, with a dynamic accuracy of 0.2 degrees.
Due to the use of optical fiber sensors, the maximum and the minimum values of the wearer’s finger angle should be calibrated before using the data glove to collect data. Considering the patient’s hand dysfunction, the volunteer needs to assist the patient in calibration by performing some calibration movements.
2) Surface Electromyography:
The muscular activity is gathered using a Thalmic Myo armband (Thalmic Labs, Ontario, Canada), a low-cost wireless armband containing eight single differential sEMG sensors. The sampling rate is also set to be 200 Hz, which is the same as the kinematic sampling rate. Eight electrodes are evenly wound around the forearm, keeping a constant distance from the radiohumeral joint just below the elbow. A snapshot of the experiment is shown in Fig. 1.
C. Experimental Paradigm
The hand movements in the experiment are proposed by therapists based on their clinical experience, aiming at evaluating the hand motor function, which are shown in Fig. 2. These movements are divided into two parts: one part contains five fundamental movements of the wrist (no. 1 – no. 5 in Fig. 2), four isometric and isotonic hand configurations (no. 6 – no. 9 in Fig. 2), and five combination movements (no. 10 – no. 14 in Fig. 2); the other part contains the left fourteen grasping movements.
The repetitions of one action are performed in one block, and the order of the blocks is the same as the sequence of actions shown in Fig. 2. Figure 3 shows one block. There are video tips at the beginning of each block. The video of the hand movements are played twice, instructing the subject on what to do next. After resting for 3 seconds, subjects try their best to execute the hand movement within 5 seconds. This whole process is called a trial, and 6 trials form a block.
D. Data Acquisition and Data Preprocessing
After the experiment, we obtain two types of time-series signals. One is the kinematic signal, which is composed of 19 channels, and the other is the sEMG signal, which is composed of 8 channels. The kinematic data are filtered by a second-order two-way low-pass Butterworth filter with the cut-off frequency of 5 Hz [21]. The Thalmic Myo already presents a notch filter at 50 Hz, so the sEMG signal requires no extra filtering [24].
To expand the number of samples, we perform the sliding window method, and the window size and the sliding distance are consistent with the ones given in [25]. Therefore, each movement repetition has a window of 200 milliseconds (20 sampling points), with an overlap of 100 milliseconds (10 sampling points). All data preprocessing works are performed on MATLAB R2019a.
Multi-Modality Fusion Framework for Functional Assessment
After data collection, this section introduces the hand function assessment method based on the fusion of two data modalities. The overall framework is shown in Fig. 4. The method mainly includes the movement analysis based on HAGCN, the sEMG analysis based on LSTM, and the multi-modality fusion scheme. All methods are implemented by the Python language (version 3.6) based on the TensorFlow and PyTorch framework. We first introduce the classification task of this study.
A. Classification Task
It should be noted that the classification task is not designed to recognize the 28 hand movements but to distinguish patients of different levels through each hand movement. We expect to use the assessment framework to obtain more accurate assessment results and eliminate the “ceiling effect” of the scale, because the “ceiling effect” reduces the data interpretation accuracy and affects the effectiveness of the rehabilitation progress [26].
The therapists select and label 25 patients among 35 recruited patients to represent 25 categories at the top end of the recovery. The therapists suggest that 25 categories are sufficient to avoid the “ceiling effect”. To distinguish these 25 categories, we have also added one extra category (the healthy category). We first collected data from 35 healthy subjects. Then, we randomly selected six groups of data generated by the same 28 hand movements of the healthy subjects to represent the healthy category. Since the selection is random, the selected data can represent the movements of most healthy subjects. In this way, there are a total of 26 categories in the classification task.
B. Motion Analysis Based on HAGCN
HAGCN includes a spatial graph convolution network (SGCN) and a temporal graph convolution network (TGCN).
1) Skeleton Graph Construction:
In this work, we utilize a spatial graph to form a hierarchical representation of the hand skeleton sequence. The structure of SGCN is shown in Fig. 5.
The yellow dots on the left in Fig. 5 are 19 key joints measured by the data glove. Leveraging the natural connections between these joints, we propose a graph structure. The structure can explicitly exploit the spatial relationship between the joints, which is crucial for understanding human actions. In the spatial graph, the internal edges between human joints are defined according to the natural connections of the human body.
2) Spatial Graph Convolutional Neural Network:
In this study, we use \begin{equation*} Y = \sigma ((D^{-1/2}AD^{-1/2})XW),\tag{1}\end{equation*}
3) Spatial Configuration Partitioning:
When a person is performing hand movements, his/her finger joints play different roles in the hand motor assessment. Therefore, the adjacency matrix \begin{equation*} Y = \sigma (\Sigma ((D_{n}^{-1/2}(K_{n}A_{n})D_{n}^{-1/2})XW_{n})),\tag{2}\end{equation*}
Adjacency matrix group: the adjacency matrix
removes the centripetal group and the centrifugal group.A Centripetal group: the neighboring nodes far from the gravity center (node 11), such as (8, 7), (11, 8), (8, 9), (11, 12), (12, 13), (12, 15).
Centrifugal group: the neighboring nodes close to the gravity center (node 11), Such as (7, 8), (8, 11), (9, 8), (12, 11), (13, 12), (15, 12).
4) Temporal Graph Convolution Network:
After constructing the SGCN, the task of modeling the TGCN within the skeleton sequence is performed. The process allows us to define a simple strategy for extending the SGCN to the spatial-temporal domain. The temporal graph is constructed by connecting the same joints in a continuous frame, as shown in Fig. 8. The three green points in Fig. 8 represent the three consecutive data points of the node in time, and the yellow and blue points are the adjacent nodes of the green joints. The nine points in the red grid are convolved in the spatial-temporal domain to obtain the red point. In this study, the temporal kernel size is set to be 3.
5) Network Architecture and Training:
One SGCN and one TGCN form the HAGCN. The whole network consists of five HAGCN layers. The output sizes of five HAGCN layers are 8, 16, 32, 64, and 32, respectively, and the input size is 19. The network is optimized by the residual connection. The whole model is trained in an end-to-end manner with backpropagation.
The division of the dataset adopts the method of leave-one-out (LOO) cross-validation, which is also known as the 6-fold cross-validation. Each movement has six trials, as shown in Fig. 3. Five trials are taken as the training set, and the data of the remaining trial are taken as the test set to test the classification result. In this way, six tests are performed with different validation sets. The average value of six tests represents the classification accuracy of the action. A total of 26 classification tasks for each of the 28 movements need to be completed.
C. sEMG Analysis Based on LSTM
In this study, a multi-layer LSTM network is introduced to extract the deep features of sEMG signals, which improves the generalization and robustness of the model compared with the manual feature extraction [25]. The network includes six LSTM layers and one fully connected layer. The input size of the first LSTM layer is eight. The data within each time window of sEMG are the input of the LSTM. The output size of each LSTM layer is 32, 64, 128, 64, 32, and 32. The output size of the fully connected layer is 26, which is activated by the softmax algorithm. Sparse categorical cross-entropy is used as the loss function. The Adams algorithm is used as the optimization method for the network training. The LSTM network training still employs the LOO method.
D. Multi-Modality Fusion Scheme
The proposed multi-modality fusion algorithm is given in Algorithm 1. To better understand this algorithm, we briefly explain its working principle and setting.
The softmax function is often used as the last activation function of a neural network to normalize the output of a network to a probability distribution [28], which can be used as a feature to achieve the satisfactory classification tasks [29]. In this study, the number of softmax outputs is 26, and the final output represents the probability value of the input sample being recognized as a healthy person by the neural network. The patient’s hand motor function is assessed by calculating and comparing the average probability value of each type of the patient’s hand movement to be recognized as a healthy subject’s movement. The assessment consists of the kinematic-based assessment and the sEMG-based assessment. We use the decision fusion approach for the multi-modality assessment. The total score is the weighted sum of the kinematic modality score and the sEMG modality score.
The details of Algorithm 1 are as follows. After training HAGCN and LSTM networks, 336 (2 \begin{equation*} s = c_{1} {\times }s_{1}+(c-c_{1}) {\times }s_{2},\tag{3}\end{equation*}
Results
A. Data Collection
Thirty-five stroke patients and thirty-five healthy subjects participated in collecting trial data. We design a visual interface that displays the movement of the hand in real-time to ensure the patients to follow the guide. We also evaluate the effect of experimental factors on the range of the joint angles. Factors affecting the joint angles include the individual joint, the movement, the subject, and the movement repetition. The evaluation results show that the data collection quality is reliable.
B. Classification Results
After 6-fold cross verification, the classification accuracy is shown in Fig. 9. The blue histogram represents the classification accuracy by the kinematics, with an average accuracy of 91.2%. The red histogram shows the classification accuracy by the sEMG signal, with an average accuracy of 79.1%. Classification accuracy shows that we can distinguish patients with the same Brunnstrom grade or the same FMA score. Table II shows the order of the classification accuracy, and the following facts can be observed:
In the traditional FMA scale, the diameter of the cylinder is not considered as an affecting factor. In this experiment, three types of cylinder grasping actions with different cylinder diameters are designed (marked by the blue background in Table II). Among them, the classification accuracy of grasping a large diameter cylinder is the highest. This finding shows that the more challenging the movement is, the easier it is to distinguish patients at different levels. This finding is also consistent with the physiological characteristics of stroke patients. They are easy to bend fingers but not easy to stretch fingers.
The three-finger sphere grasp has the highest classification accuracy among three spherical grip movements designed in this experiment (marked by the red background in Table II). Therefore, this movement can be used as a representative action of the spherical grasp, which is consistent with our previous kinematic analysis results [30].
Among the top 6 movements with the highest classification accuracy based on two kinds of signals, the same movements are: (1) the abduction of all fingers and (2) the lateral grasp. This finding suggests that the hand extension and thumb movement are more effective in identifying the patient’s hand movement level.
The classification accuracy by the kinematic information based on HAGCN and the classification accuracy by the sEMG based on LSTM.
C. Performance of Quantitative Assessment
To prove the validity of the proposed quantitative assessment, we need to select a performance metric. Generally, the Pearson correlation coefficient (PC) is the most commonly used metric to prove the correlation between two datasets. However, PC can only be applied if the sample is normally distributed. Therefore, the first step is to verify whether the sample follows a normal distribution. Considering that the number of samples is less than 50, the Shapiro-Wilk test is used to assess whether the samples followed a normal distribution. As a result, the p-values of the Shapiro-Wilk test results of the Brunnstrom score and FMA score are both less than 0.05, indicating that the sample does not exhibit the normal distribution. Therefore, Spearman’s rank correlation coefficient (SC) is selected as the metric. In general, an SC greater than 0.8 indicates a strong correlation.
Table III presents the SC of the quantitative assessment based on the single modality. The quantitative evaluation results based on the kinematic signal are more consistent with two traditional methods.
Figure 10 shows the changing trend of SC under different
SC under different values of
Scores of traditional scales and the score obtained by the proposed algorithm (BS, FS and ORDER 1 are Brunnstrom scores, FMA scores, and the order of patients assessed by the therapists, respectively; ORDER 2 is the order of patients assessed in this study. The x-axis represents the patient’s sequence number.).
Discussion
The main objective of this study is to develop a hand motor function assessment system for the quantitative analysis of motor impairment in patients with post-stroke hemiplegia. The system is constructed based on the kinematic data and the sEMG signals which are collected synchronously during 28 well-designed hand movements. Under the framework of multi-modality fusion, the quantitative evaluation results of different modalities are well weighted and integrated, which results in a comprehensive assessment of the hand motor function.
The proposed HAGCN can achieve a classification accuracy of 91.2% based on the kinematic modality. Although, in the sEMG modality, the LSTM network could only achieve a classification accuracy of 79.1%. By exploiting the complementarity between motor characteristics of different modalities, the obtained fused results show that the clinical relevance can be enhanced by fusing the multi-modal information. It should be noted that the classification accuracy solely based on the sEMG is relatively low. Because the sEMG signal can only reflect the neuromuscular activity to certain extents, the classification accuracy using sEMG signal is naturally inferior to the one using the motion information. However, it is reasonable to keep the sEMG modality in the assessment system. This is because if the motion information is collected by some non-contact devices like Leap Motion (Leap Motion’s price is much lower than the data glove used in this paper), the hand movement measurement is susceptible to illumination and occlusion. This can cause the measurement missing problem, while the sEMG signal can be stably measured by the wearable bracelet. More importantly, the muscle strength is also crucial for the rehabilitation assessment. In Brunnstrom and FMA, there are also some items reflecting the muscle strength. The muscle strength can be well estimated by the sEMG [31]. Therefore, from the perspective of the assessment scalability, it is necessary to keep the sEMG modality.
In this study, as shown in Table V, each patient can be given a specific score by the proposed assessment algorithm rather than a rough grade, and even patients with the same scale can be distinguished, avoiding the “ceiling effect” of the traditional scales.
Conclusion
In this study, we propose a multi-modality (kinematics and sEMG) fusion assessment framework based on HAGCN and LSTM, and apply this framework to the self-collected dataset to quantitatively evaluate the rehabilitation levels of 25 stroke patients. The SCs between the assessment results of this study and the traditional scales (Brunnstrom scale and Fugl-Meyer assessment scale) are 0.908 and 0.967, respectively, providing a significant correlation between the proposed assessment and the traditional scale scores. In addition, the SC value between the score of this study and the refined rehabilitation level of patients with the same grade is 0.997, suggesting that the quantitative assessment of 25 stroke patients can avoid the “ceiling effect” of traditional scales to some extends.