A Novel Neurorehabilitation Prognosis Prediction Modeling on Separated Left-Right Hemiplegia Based on Brain-Computer Interfaces Assisted Rehabilitation

It is essential for neuroscience and clinic to estimate the influence of neuro-intervention after brain damage. Most related studies have used Mirrored Contralesional-Ipsilesional hemispheres (MCI) methods flipping the axial neuroimaging on the x-axis in prognosis prediction. But left-right hemispheric asymmetry in the brain has become a consensus. MCI confounds the intrinsic brain asymmetry with the asymmetry caused by unilateral damage, leading to questions about the reliability of the results and difficulties in physiological explanations. We proposed the Separated Left-Right hemiplegia (SLR) method to model left and right hemiplegia separately. Two pipelines have been designed in contradistinction to demonstrate the validity of the SLR method, including MCI and removing intrinsic asymmetry (RIA) pipelines. A patient dataset with 18 left-hemiplegic and 22 right-hemiplegic stroke patients and a healthy dataset with 40 subjects, age- and sex-matched with the patients, were selected in the experiment. Blood-Oxygen Level-Dependent MRI and Diffusion Tensor Imaging were used to build brain networks whose nodes were defined by the Automated Anatomical Labeling atlas. We applied the same statistical and machine learning framework for all pipelines, logistic regression, artificial neural network, and support vector machine for classifying the patients who are significant or non-significant responders to brain-computer interfaces assisted training and optimal subset regression, support vector regression for predicting post-intervention outcomes. The SLR pipeline showed 5-15% improvement in accuracy and at least 0.1 upgrades in $\text{R}^{{2}}$ , revealing common and unique recovery mechanisms after left and right strokes and helping clinicians make rehabilitation plans.


I. INTRODUCTION
S INCE the 20 th century, neurorehabilitation scientists have realized that injured brain tissue has the potential for regeneration and plasticity [1]. Neurorehabilitation intervention studies try to help patients with brain disorders recover from dysfunction. For instance, constraint-induced movement therapy, Transcranial magnetic stimulation, and Brain-computer interfaces (BCI) are used to rehabilitate motor recovery [2], [3], [4], [5] after stroke. Typically, a rehabilitation program lasts for weeks or months. Due to limited medical resources, the prognostic prediction of treatment performance is necessary for the clinic to make rehabilitation plans.
Functional connectivity based on resting-state functional magnetic resonance imaging (rs-fMRI) measures the bloodoxygen-level-dependent (BOLD) signal's temporal correlation across brain regions without imposed tasks [6]. Diffusion tensor imaging (DTI), one kind of Diffusion MRI, provides insights into the tissue structure [7]. MRI modalities can be applied to disease diagnosis and prognosis based on quantitative imaging biomarkers [8], [9], [10], [11]. However, biomarkers were mainly derived from a single brain region, and the connections were ignored across regions and modalities, remaining unsatisfactory in predicting post-interventional outcomes at the individual level [12], [13], [14], [15]. The graph theory-based approach analyzes the whole brain as a complex network, where brain regions are regarded as nodes, and edges indicate the relationships between nodes [16], [17]. The network demonstrates the relationship between brain structure and function and the characteristics of the whole brain in high-dimensional feature space.
This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/ Machine learning methods are up-and-coming for the study of high-dimensional neuroimaging data. Reference [18] summarized the typical classification/prediction pipeline for poststroke recovery. Due to the small sample size, researchers commonly mirrored the imaging along the mid-sagittal axis so that all tracts were in the same hemisphere [13], [15]. Therefore, the issue of left-right brain hemispheres was converted into that of contra-ipsilesional hemispheres. Nevertheless, hemispheric asymmetry has become a consensus for brain structure and function [19], [20]. Regarding stroke patients, researchers aim to distinguish their difference from healthy controls and then predict the corresponding recovery potential according to clinical and imaging metrics. The intrinsic left-right asymmetry confuses the difference caused by impairment and recovery.
We propose modeling the left and right hemiplegia groups separately, which is called SLR modeling Pipeline in this paper to sidestep the effect of left-right hemispheric asymmetry. Two alternative modeling pipelines -the mirrored contra-ipsilesional hemispheres (MCI) Pipeline and the removing intrinsic asymmetry (RIA) Pipeline are designed as a comparison to verify the effectiveness of SLR. The subjects and BCI rehabilitation are presented in Section II. In Section III, some evidence shows that the proposed SLR pipeline may improve the accuracy of the prognostic classification and prediction of the treatment performance, which is explained in detail in Sectionn IV.

A. Participants
The study protocol was approved by the Ethics Committee of Beijing Tsinghua Changgung Hospital, conducted in accordance with the tenets of the Declaration of Helsinki (No. 18172-0-02), and registered in the Chinese Clinical Trial Registry (No. ChiCTR1900022128). All patients provided written informed consent prior to study participation.
A patient dataset with 18 left-hemiplegic and 22 righthemiplegic stroke patients and a healthy dataset with 40 subjects, age-and sex-matched with the patients, were used in the study. Every five years was an age group, and subjects in the same group were considered age-matched. All stroke patients were selected from hospitalized in Beijing Tsinghua Changgung Hospital from March 2018 to October 2019. The diagnosis was in line with the standard consensus of Clinical research on acute stroke in China and confirmed by brain MRI or CT examination. Five criteria are considered for study inclusion: (i) age 18-75; (ii) no significant cognitive impairment (score of simple mental state examination ≥ 21); (iii) first-time stroke and time since onset > 1 month and ≤ 6 months; (iv) moderate to severe upper limb motor dysfunction (Brunnstrom stage ≤ Stage IV [21]); (v) modified Ashworth < III [22]. Detailed demographic and clinical information are listed in Table I.
The 40 healthy control subjects were selected from the HCP dataset (including HCP 1.0, 1200 participants, and HCP 2.0, 1300 participants) [23], with age-and gender-matched to the 40 patients, for performing the symmetrical feature selection procedure.  Table S2 and Supplementary Fig.S1 show the lesion location and size among the individuals.

B. Treatment
Each patient received a 20-day BCI-assisted rehabilitation training program [24], which is an exoskeleton hand controlled by BCI signal to assist the paretic hand exercises. The BCI recognizes the grasping or releasing motor imagery signal of the patient's Electroencephalography (EEG) when the imagination according to the guidance shown on screen and voice prompts (10 sets of 10 imagery tasks in each session, together with passive flexion and extension of fingers through the exoskeleton, 1 min rest between sets, total lasting 1 hour). There are five rehabilitation training sessions per week for four weeks. During the training, patients were instructed to avoid blinking, coughing, chewing, and head and body movements.

C. Measurement
Before and after the training, Fugl-Meyer's Assessment of the upper extremity [25] was used to evaluate the motor function of the upper limb, including shoulder, elbow, wrist, hand, and coordination/speed. By summing the five items, we got the total score, denoted as FMA-total (66 points) in this paper. The sum of shoulder and elbow scores was the FMA upper arm score, defined as FMA-SE (30 points). The sum of wrist and hand scores was the FMA wrist score, defined as FMA-WH (24 points). The higher the score, the stronger the function. Two trained rehabilitation therapists completed the assessment who were limited to know the treatment plans.

D. Labeling of the Patient Dataset
Patients were divided into a significant responder group (n = 23) and a non-significant responder group (n = 17) according to the definition of the minimum clinically important difference, based on an improvement of ≥ 2 points in the FMA-WH score after training [26].

E. Pipelines
As shown in Fig. 1, the proposed SLR modeling Pipeline consists of 5 steps: preprocessing of whole-brain neuroimaging data, brain atlas registration, whole-brain feature extraction, and left / right classification modeling or prediction modeling. Atlases are used to define brain regions. The features of the whole brain include intra-and inter-region and could be obtained by building brain networks with brain regions as nodes or other methods. In SLR Pipeline, we separate patients into two groups beforehand-left and right, referring to the hemiplegia side, and directly carry out the procedure of classifying and predicting based on extracted whole-brain features to achieve the two goals: to classify patients into two classes-significant responders versus non-significant responders to the treatment and to predict stroke patients' function after rehabilitation treatment. We can replace the 'Wholebrain Feature Extraction' module in the framework with feasible methods. Some specific implementations are given in Section II-F. Two compared pipelines are designed for comparison. To make all the patients' injury areas in the same hemisphere, the conventional contra-ipsilesional mirroring routine is adopted on the whole-brain features, labeled as MCI Pipeline. In RIA Pipeline, we add a feature selection procedure to remedy the asymmetry problem, as shown in the box of Fig 1. Specifically, the whole-brain features are extracted from healthy controls with the same feature extraction procedure of patients. Then paired tests are used to check whether the mean of left and right lateral observations in healthy controls was equal, paired T-test for characteristics that hold the assumption of normality (determined by Shapiro's test, P > 0.05), otherwise paired Wilcoxon's test. This procedure is performed for every extracted feature and thus generates a p-value for each paired test. All significant attributes (P < 0.05, FDR corrected) are asymmetrical, while those remaining are regarded as left-right symmetry indices. Accordingly, we apply the above indices to extracted features of patients to select symmetrical features, denoting the connectivity whose features are supposed to be left-right hemispheric balanced for healthy controls. Finally, features are sent to the classification and prediction models.

F. Preprocessing and Feature Extraction
The SPM-based package DPARSFA [27] was used to perform the standard resting-state BOLD (rsBOLD) signal preprocessing steps. Including the first 10 out of 240-time points were discarded, the spatial resolution was set at 2 mm 3 , realigning, normalizing on Montreal Neurological Institute space with 3 mm isotropic voxel resampling, smoothing with a Gaussian smoothing kernel with 4-mm full width at half maximum, removing the linear trend of time courses and nuisance covariates with global signal regression and finally temporally filtering with 0.01-0.08 Hz. The preprocessing of the DTI signal was performed by using FSL tools (http://www.fmrib.ox.ac.uk/fsl/index.html) according to the following steps: skull stripping by function "bet2", eddy current and motion correction by function "EddyCorrect" and tensor fitting by function 'DTIFIT'.
Functional and structural features based on rsBOLD and DTI, respectively, were extracted by network construction based on a brain atlas and were carried out using inhouse software. The functional features are some graph theoretic attributes extracted from rsBOLD. After preprocessing of rsBOLD, we constructed a fully weighted functional inter-connectivity network and intra-connectivity networks. The inter-connectivity network was obtained by using the Automated Anatomical Labeling (AAL) [28] atlas to define the nodes. Average all voxels in each node and then calculate the Pearson correlation coefficients between every pair of nodes as the connectivity. There are 116 brain regions in AAL, with odd numbers for the left hemisphere and even numbers for the right. For example, L1 and R2 are the same regions of the left and right hemispheres, respectively. Further, we constructed an intra-connectivity network for each brain region, in total 116 intra-connectivity networks. For each intra-connectivity network, the nodes were all the voxels corresponding to the brain region, and the Pearson correlation coefficient between each pair of voxels was the connectivity. The brain-connectivity toolbox [17] was utilized to extract a few attributes of the complex network according to graph theory, including local efficiency (LE), weighted degree (WD), betweenness centrality (BC), clustering coefficient (CC), global efficiency (GE), modularity (MO), characteristic path length (PL), small-worldness (SWN) [29]. For attributes of inter-connectivity and intra-connectivity, denoted here as .inter, .intra respectively. Overall, there are 10 attributes of BOLD: LE.inter, WD.inter, BC.inter, BC.intra, CC.inter, CC.intra, GE.intra, MO.intra, PL.intra, SWN.intra. The structural features of DTI are fractional anisotropy (FA) and mean diffusivity (MD). After preprocessing, we registered the AAL atlas to the native space of the reconstructed DTI. 'Dipy', a Python package was used to apply fiber tracking. Finally, FA and MD values of each brain region (nodes of the structural network) and fiber bundle counts between two regions (edges   Fig. 2 illustrates the detailed procedure of the "Classification and Prediction Modeling" module. The features obtained by the upstream of pipelines represent a way of brain imaging measurement and can be applied to explore underlying differences in physiological functions at the individual level. We then classify stroke patients into significant or non-significant responders to treatment and predict postinterventional outcomes by combining the machine learning framework and the above-calculated whole-brain features. It is worth noting that there is a mismatch between data size (dozens) and feature dimensionality (thousands). Feature filtering is carried out to reduce the dimensionality of the data to avoid over-fitting. Subsequently, machine learning models are performed to classify and predict prognostic results, and evaluate features' importance. Brain regions contributing to the classification and prediction are then subjected to decoding analysis for impairment and recovery. Popular classifiers and predictive models are available for this framework [13], [15], [30], [31], [32], such as linear regression and its variants, artificial neural network (ANN) [33], support vector machine (SVM) [34] and support vector regression (SVR) [35].

G. Classification and Prediction Modeling
Leave-one-out cross-validation (LOOCV) [36] is applied to estimate the performance of classifiers and predictive models. In the training process, we leave one sample as the test sample and utilize the remaining data for training. Specifically, Twosample tests (T-test or Wilcoxon's test, uncorrected) were conducted for feature filtering in training data, across significant responders and non-significant responders on each attribute to identify the features that make a difference. Some machine learning frameworks were implemented in the experiment, logistic regression (LR), ANN with one hidden layer, SVM for classification, optimal subset regression (OSR) [37] derived from linear regression, and SVR for prediction. Besides the neurophysiological variables, initial clinical scores are significant predictors of outcomes in stroke patients [15], [38]. Therefore, we include the pre-interventional clinical scores into the predictors regarding prediction. The trained models are then employed to classify/predict the left-out sample.

TABLE II LOOCV CLASSIFICATION ACCURACY OF ALL PIPELINES AND DATASETS
Repeat the operation until all patients are traversed to test the performances. We adapt accuracy, R 2 and mean absolute error (MAE) to evaluate models.

A. Asymmetrical and Significant Features
For all 40 healthy controls from HCP, BOLD and DTI features were extracted with the same pipeline of patients, and 258 of the total 1392 features were judged as asymmetrical, accounting for 18.8%. Fig. 3 shows the heatmap of left-right hemispheric asymmetry. For detailed information, see Supplementary Table S3.
We performed statistical tests which are the same as feature filtering in each pipeline but across all subjects and listed the significant features selected in Supplementary Table S4. The results show that 26 of 1160 BOLD features and 21 of 232 DTI features were significantly different for the left hemiplegic group, while the results were 41 BOLD features and 20 DTI features for the right hemiplegic group, more significant than the number of characteristics in MCI Pipeline and RIA Pipeline. Only a tiny proportion differed among thousands of features, contributing to the distinction between significant and non-significant responders. Separate modeling of left and right hemiplegic patients helps to detect more potentially significant features. Table II lists the LOOCV accuracy of each pipeline/dataset for different methods. Besides, an exact number of correctly classified samples and the total sample amount for SLR Pipeline left and right models are summed to form the overall accuracy. The results of ANN are unstable due to the training parameter settings. To ensure reproducible and comparative results, we set seeds and fixed the best parameter in MCI Pipeline as the hidden layer's size. Different seeds and the hidden layer size did not affect the conclusions presented.

B. Performance on Classification
The results show that SLR Pipeline gives a 5-15% improvement to all datasets and methods. The sensitivity and specificity of each implementation are listed in Supplementary  Table S5 and Supplementary Table S6 and the SLR Pipeline Heatmap of Asymmetry in healthy controls. The color bar indicates the number of times the brain region was judged to be asymmetrical in 12 attributes.
significantly improves the classification performance was verified therefore. LR and SVM performed well when modeling left and right hemiplegia separately, and the classification accuracy/sensitivity/specificity is close to 1. As a remedy to asymmetry, RIA Pipeline does not help in the case of classification. For all implementations in our experiment, except BOLD attributes as the dataset in SVM, removing all asymmetrical features led to the same or worse LOOCV accuracy, resulting in increased sensitivity and decreased specificity, making it easier to determine patients as significant responders to reduce the possibility of missed diagnoses.
Using BOLD attributes alone as the dataset for all methods and pipelines gave better results than using DTI attributes alone. Across all machine learning methods, SLR Pipeline maintained and improved the accuracy when combining the two types of attributes. In contrast, most others have led to worse results than using BOLD alone. It implied that the BOLD and DTI partially overlap or contradict each other, and SLR Pipeline helps us discover complementary characteristics.

C. Performance on Prediction
The results in Table III show the R 2 and MAE for each implementation. The OSR method had the largest R 2 because it optimizes for sparsity, which is more conducive to finding strong predictors of the dependent variable and avoiding the phenomenon of overfitting. The dataset with a combination of BOLD and DTI features improved the effectiveness of the  III  LOOCV R2 AND MAE OF ALL PIPELINES AND INPUTS (INCLUDING PRE FMA-TOTAL)  OSR method compared to using them separately as inputs. This is because the number of predictors of OSR was limited to 5, and new invalid features were not selected, ensuring that the results were no less effective than the situation where only some features were used. SLR Pipeline improved the R 2 from approximately 0.75 to at least 0.85. Furthermore, the MAE is consistent with R 2 . The larger the R 2 , the smaller the MAE. The proposed SLR pipeline brings about a 40% reduction in MAE, indicating the reliability of the results. Similar to the case of classification, RIA does not improve the performance of the OSR predictive model. See Supplementary Fig.S2-S4 for specific feature selection, and there were precisely 5 features after feature filtering for RIA Pipeline with DTI as input, so no more selection was performed.
For the SVR model, SLR Pipeline was slightly more effective than MCI Pipeline. SVR aims to find a hyperplane that minimizes the distance from all the data without imposing too many restrictions on the features. It is often thought that dimensionality is a blessing for finding a support vector, but the rule failed in this experiment. The model became better with the asymmetry features removed. That is because only dozens of subjects were included, and almost all of them were selected as support vectors in the high-dimensional case. RIA can be regarded as a manual feature selection process to reduce dimensionality and assist the SVR in selecting more appropriate support vectors, thus improving prediction results. The model performed best when using only the DTI dataset in SLR Pipeline. This suggests that DTI features in prediction can be used as prognostic biomarkers, agreeing with previous findings [10], [39].

D. Importance of Predictors
Linear regression and its variants (LR and OSR in this experiment) autotune the parameters and select some of the features as predictors. Which brain regions chosen by the predictors are the most important for neurorehabilitation prognosis prediction? As illustrated in Fig. 4, by permutating explanatory variables, the order of importance of the predictors for the BOLD + DTI dataset represents the corresponding brain regions and their features. Box plots were overlapped to the bars to provide the distribution of the importance of the measure across 50 permutations. The names of interesting brain regions corresponding to the label of AAL-atlas and their abbreviations are shown in Table IV. Only a few features in the dataset were strong predictors of post-intervention FMAtotal. Significant predictors in this experiment, including Pre FMA-total and a few specific functional brain regions, are consistent with previous studies.

IV. DISCUSSION
Many studies have discussed left/right brain asymmetry from various aspects. Friedrich et al. [40] has found that we can identify left and right hemispheres in their lowand high-dimensional representation. Ortiz et al. [41] studied motor imagery performance after asymmetrical transcranial direct current stimulation. Obermaier et al. [42] introduced an HMM-based classifier allowing asymmetrical structures to help design the EEG-based BCI training paradigm. Due to the known asymmetries during motor imagery of right-and lefthand movement, the classifier of imagery tasks demonstrates an improvement of 9% for the classification accuracy. Unique patterns and brain changes in stroke recovery have been discovered in left-and right-hemispheric strokes [43], [44], [45], indicating that prognosis is significantly correlated with injured lateral. The human brain asymmetry is also validated in this study, via the changes in results brought by compared pipeline RIA (sometimes enhancing and sometimes worsening). Further, these changes indicate that the intrinsic asymmetry would confound the evidence for making decisions of classification and prediction. And the classification and prediction results will be significantly improved when separating the hemiplegic side for modeling. These facts would be helpful for revealing possible hemispheric recovery mechanisms in response to brain damage and offering guidance for stimulation during treatment, to the end of which, more and further exploration is deserved.
We utilized BOLD and DTI features to perform the pipelines since they provide functional and structural information, respectively. FA and MD as DTI features, BC and CC as BOLD features have been validated to be significantly different before and after therapy in previous studies [39], [46]. These four features might be used as biomarkers to predict post-intervention outcomes for stroke patients, although their performance varies and the importance thereof is still unclear. In this study, we evaluated the role of each data set and found that our proposed SLR pipeline worked consistently across all datasets to enhance performance. The results show that BOLD provides more recovery information in classification, while DTI has more potential to predict post-intervention scores. However, DTI has a large amount of structural asymmetry (in healthy controls), and in RIA, only 4 features were left to be used as discriminators, with reduced reliability. In general, the combination of BOLD and DTI improves the prediction results, while only slight changes occur to the classification. For each pipeline, BOLD + DTI does not always perform better than BOLD or DTI alone, and there may exist a contradictory relationship between the two kinds of features, suggesting that the graphical properties of the BOLD network contain a large amount of structural information represented by DTI.
Noticeably, across all classification and prediction methods employed in this experiment, the traditional statistical method (LR and OSR) performed better than algorithmic modeling (ANN, SVM, and SVR), with higher accuracy and super interpretability, mainly due to the sample size [47]. For detailed functions of specific brain regions in classification, the chosen features remained the same after RIA, but their importance changed-the clustering coefficient of L51 (MOG.L) elevated, while others remained unchanged. The most important two regions, L51 and L73 (PUT.L), are associated with cognitive and motor function [48], [49]. Among the subjects of this experiment, L73/R74 was with the highest frequency of injury, and the LR classifiers of MCI and RIA indicated that the higher the local efficiency of L73, the more likely the patient was to be a responder of treatment. For SLR Pipeline, regions associated with visual (R4, R44, L25) and learning (R90) [50], [51] appeared essential. Generally, these regions are more often associated with learning cognitive functions. We conjecture that BCI training requires patients to engage in motor imagery tasks and that a corresponding neural circuit is to control the training process. For prediction, the preintervention FMA-total is always the most critical factor and other brain regions vary for pipelines.
There are some limitations to this study. Brain atlases provide an objective definition of nodes and thus are usually used to construct brain networks. Node definitions will change the graph's topology and influence the functional relationships between nodes and hence affecting our results. Differences in atlas-related metrics for resting-state networks [52] and structural networks [53] have been found. In the present study, we only used the AAL atlas to define nodes, and future studies will focus on selecting brain atlases and fusing them to construct multilevel networks.
Another drawback of this study was feature filtering. We adopted the LR method to perform feature selection automatically while no selection for ANN and the result that LR has higher accuracy than that of ANN reveals the importance of feature filtering to some extent. In the pipeline, we performed an uncorrected t-test/Wilcoxon as preliminary filtration to detect as many potentially significant features as possible, leading to the relatively high dimensionality of the feature space. But in other ways, it may introduce redundant features and cause pitfalls in the classifier training. In the future, the t-test/Wilcoxon-test filter can be upgraded to a more well-designed and robust method. Finally, the data size of corresponding patients is relatively small and as the left and right hemiplegia are distinguished beforehand, the proposed SLR modeling pipeline is suitable for unilateral stroke patients, and how it performs on patients with multiple brain lesions remains unknown.

V. CONCLUSION
For more accuracy of neuro-rehabilitation prognosis prediction applied in the clinic, a left and right hemiplegia separated modeling pipeline SLR is initiated in our study to eliminate the asymmetry, especially in brain structure, to improve the performance of classification and prediction. BOLD and DTI are both used for validating the classification and prediction accuracy of recovery levels after BCI-assisted training of stroke patients with upper limb dysfunction. The comparative experiment is designed to compare the classification and prediction accuracy provided by three modeling pipelines, i.e., SLR, MCI, and RIA, with the same three classification and two prediction methods. The LOOCV results show that SLR can efficiently distinguish the significant and non-significant responders and make more accurate predictions of post-intervention FMA-total. The overall effect of targeted modeling for left and right hemiplegic patients is better than MCI and RIA. This is the first time rs-fMRI and DTI are applied separately on the left and right hemiplegic patients to evaluate the rehabilitation outcomes. The final classification and prediction models obtained from SLR could be used to help clinicians make a more reliable rehabilitation program for stroke patients.