Wearable Sensors Improve Prediction of Post-Stroke Walking Function Following Inpatient Rehabilitation

Objective: A primary goal of acute stroke rehabilitation is to maximize functional recovery and help patients reintegrate safely in the home and community. However, not all patients have the same potential for recovery, making it difficult to set realistic therapy goals and to anticipate future needs for short- or long-term care. The objective of this study was to test the value of high-resolution data from wireless, wearable motion sensors to predict post-stroke ambulation function following inpatient stroke rehabilitation. Method: Supervised machine learning algorithms were trained to classify patients as either household or community ambulators at discharge based on information collected upon admission to the inpatient facility (N=33-35). Inertial measurement unit (IMU) sensor data recorded from the ankles and the pelvis during a brief walking bout at admission (10 meters, or 60 seconds walking) improved the prediction of discharge ambulation ability over a traditional prediction model based on patient demographics, clinical information, and performance on standardized clinical assessments. Results: Models incorporating IMU data were more sensitive to patients who changed ambulation category, improving the recall of community ambulators at discharge from 85% to 89-93%. Conclusions: This approach demonstrates significant potential for the early prediction of post-rehabilitation walking outcomes in patients with stroke using small amounts of data from three wearable motion sensors. Clinical Impact: Accurately predicting a patient’s functional recovery early in the rehabilitation process would transform our ability to design personalized care strategies in the clinic and beyond. This work contributes to the development of low-cost, clinically-implementable prognostic tools for data-driven stroke treatment.


I. INTRODUCTION
Inpatient stroke rehabilitation is an early-stage program integrating clinical care with targeted therapy to maximize a 23 patient's functional recovery. A primary goal of inpatient 24 rehabilitation is to retrain patients to maneuver safely and 25 independently in the home and community after hospi- 26 tal discharge, such as by restoring walking ability. Walk- 27 ing at home or in the community are very meaningful 28 tasks for individuals following a stroke; however, they are 29 uniquely different skills. The functional capacity, coordina- 30 tion, endurance, strength, motor control, cognition needed 31 are significantly higher for community ambulation compared 32 to household ambulation [1], [2], [3]. With the average 33 length of stay at an inpatient rehabilitation facility (IRF) 34 in the United States declining over time [4], [5] (current 35 estimates ranging from 8.9-22.2 days depending on impair-36 ment severity [6]), the patient's care team only have a acteristics, and standardized functional assessment scores [7], 54 [8]. For instance, Bland et al. identified that a Berg Balance 55 Scale (BBS) score ≤20 points and a Functional Independence 56 Measure (FIM) walk item score of 1 or 2 at IRF admission 57 were predictive of low functional ambulation at discharge [9]. 58 Harari et al. found that functional assessment scores recorded 59 at admission to an inpatient rehabilitation facility (IRF) were 60 the most important predictors of the same test scores at 61 discharge, over age, stroke characteristics, or performance on 62 other assessments of gait or balance [10]. While standardized 63 clinical functional assessments are useful indicators to predict 64 future outcomes, their administration can be time-intensive 65 and cumbersome due to limited interaction time with patients 66 and need for specialized training. Furthermore, some func-67 tional assessments are scored using subjective rating scales 68 and suffer from floor/ceiling effects [11], high inter-/intra-69 rater variability [12], [13], and lack of suitability for patients 70 with severe cognitive impairments [14]. Reliance on these 71 assessments could result in imprecise and inequitable prog-72 noses. Indeed, previous studies indicate that patient prognosis 73 can be an important source of variation in healthcare and may 74 lead to inconsistent access to rehabilitation services across 75 the continuum of care [15], [16]. Identifying alternative, 76 objective predictors of stroke recovery that can be obtained 77 easily in a clinical setting may improve measurement resolu-78 tion, diagnostic accuracy, and lead to data-driven, prognostic 79 models. 80 Wearable sensors have started transforming our ability 81 to objectively measure patient health and performance in 82 clinical settings. Ongoing technological advances yield sen-83 sors that are smaller and more affordable, with options 84 to wirelessly stream analytics to customized, user-friendly 85 digital dashboards. Inertial measurement units (IMUs) are 86 especially ubiquitous in research-grade and commercial 87 devices, providing three-dimensional kinematic metrics from 88 accelerometers and gyroscopes. These devices have demon-89 strated utility in various stroke rehabilitation applications in 90 the inpatient setting - Thus, the objective of this study was to quantify the value 106 of inertial sensor data in predicting post-stroke recovery 107 of walking function. We trained machine learning models 108 to retrospectively classify functional walking ability at IRF 109 discharge (household or community ambulation) using var-110 ious types of data obtained at admission, including patient 111 demographics and clinical information, functional assess-112 ment scores, and IMU sensor data. We hypothesized that IMU 113 data recorded during a brief walking bout at IRF admission 114 would improve discharge predictions over traditional models 115 trained using standard clinical assessment scores and patient 116 characteristics alone. We trained a machine learning algorithm to classify patients 122 as household ambulators (discharge 10MWT score ≤0.4 m/s) 123 or community ambulators (discharge 10MWT >0.4 m/s) [1], 124 [24] at discharge using input data recorded at admission.

125
A balanced random forest classifier was selected as the 126 algorithm of choice following initial exploration (see V. 127 Methods and Procedures, section F. Algorithm Selection), 128 demonstrating the highest average weighted F1 score and 129     The benchmark PI+FA model correctly predicted the dis-174 charge functional walking level for 23 out of 27 community 175 ambulators (85%) and for 6 out of 6 household ambulators 176 (100%) (Fig. 3a). Adding sensor data improved the recall of 177 discharge community ambulators to 25 out of 27 (93%) for 178 PI+IMU or 24 out of 27 (89%) for PI+FA+IMU without 179 compromising the perfect recall of household ambulators. 180 We next examined the models' ability to detect changes in 181 walking function during IRF treatment. All models achieved 182 perfect recall for patients who maintained the same level of 183 walking function between admission and discharge, which 184 applied to most study participants (community: N=21; 185 household: N=8). However, only models that included IMU 186 data could correctly classify patients who changed func-187 tional walking level between admission and discharge (pro-188 gressed from household to community ambulators: N=4). 189 The PI+IMU model correctly classified two out of these 190 four patients (50%), while the PI+FA+IMU model correctly 191 classified 1 out of 4 (25%). The benchmark PI+FA model 192 was unable to correctly classify any of these patients, instead 193 predicting that they would remain at the household functional 194 level (Fig. 3b). Such misclassifications were exclusively tied 195 to the four patients who transitioned to a higher level of 196 functional ambulation. These patients exhibited a moderate 197 range of 10MWT scores at Adm and Dis relative to the 198 0.4 m/s classification threshold (Fig. 4). The two patients 199 consistently misclassified across all models had Dis scores 200 close to (0.54 m/s) and far from (1.27 m/s) the threshold, 201   formance, we also utilized IMU data recorded during differ-210 ent walking durations ranging from 10-360 s, obtained from 211 a 6MWT. Data from two additional patients were available 212 for this fixed duration paradigm, so all models were trained, 213 optimized, and tested using the available data from a larger 214 patient cohort (N=35). Pre-optimized model performance is 215 shown in Fig. 5a for each model and walking duration. The 216 60-s walking duration was selected for downstream analysis 217 since this duration exhibited the highest initial classification 218 performance.

219
Example feature importance and selection is illustrated 220 in Fig. 5b for the fixed duration model trained on patient 221 information and IMU features computed from 60 s of walk-222 ing (PI+IMU). Two features were selected for this model 223 via backward elimination, including the sample entropy 224 of acceleration on the stroke-unaffected ankle and sample 225 entropy of rotational velocity on the stroke-affected ankle. The fixed duration models did not outperform the fixed 240 distance models (Fig. 1b).

242
We found that inertial sensor data recorded from the bilateral 243 ankles and pelvis during a brief walking bout at IRF admis-244 sion improved the prediction of discharge walking ability. 245 Specifically, models trained with sensor data (PI+FA+IMU, 246 PI+IMU) were better able to predict household or commu-247 nity ambulation at discharge compared to a model relying 248 on patient information and admission functional assessment 249 scores alone (PI+FA). This trend was true whether IMU data 250 were recorded over a fixed walking distance (10 m) or a fixed 251 walking duration (60 s). Improved model performance with 252 IMU data stemmed from superior identification of patients 253 who improved functional walking level during inpatient reha-254 bilitation (progressing from household ambulation at admis-255 sion to community ambulation at discharge).

256
The best-performing model utilized patient information 257 and IMU data recorded during a 10 m walk (fixed dis-258 tance PI+IMU), indicating that functional assessment scores 259 may not be necessary for accurate predictions relating to 260 walking function. Sensor data recorded during a fixed dura-261 tion of walking also improved prediction performance over 262 the benchmark PI+FA model, though not beyond the fixed 263 distance walking data. A model utilizing 60 s of walking 264 was optimal for this approach, with longer walking bouts 265 reducing performance below the benchmark. These results 266  An estimated 70-80% of patients are able to walk at 295 the chronic stage of stroke [27]; however, only 30-50% of 296 patients recover community walking function [1], [28]. A cru- transitioning to higher ambulation categories [29]. Predicting 303 early in the acute rehabilitation program whether a partici-304 pant will achieve community-level ambulation or remain a 305 household ambulator at IRF discharge would help clinicians 306 develop targeted treatment and care planning strategies for 307 patients and their families.

308
For models utilizing IMU data, sample entropy and amount 309 of motion features were consistently ranked among the most 310 important. Higher sample entropy (greater complexity in 311 the movements) and greater overall motion were associated 312 with the higher ambulation level. Additional data and feature 313 exploration will be critical to establish the most important 314 features across a larger sample size. To reduce the feature 315 space and risk of overfitting, we computed features using 316 the magnitude of the acceleration and gyroscope signals, 317 rather than on the three sensor axes. Future work examin-318 ing motion in different anatomical planes (i.e., anteroposte-319 rior, mediolateral, and vertical movements of the pelvis and 320 ankles) will be of interest to illuminate additional predictors 321 of recovery based on detailed gait patterns. As expected, 322 for any models utilizing the functional assessment scores, 323 we found that the strongest predictor of functional walking 324 level at discharge -which was defined using the discharge 325 10MWT score -was the 10MWT score at admission. This 326 aligns with our previous work, which demonstrated that a 327 functional assessment score at admission was the strongest 328 predictor of a patient's performance on that same assess-329 ment at discharge, over other functional assessments and 330 patient information such as demographics, stroke presenta-331 tion, and pre-morbid activity levels [10]. Patient informa-332 tion, such as age, height, or stroke characteristics, were not 333 selected as important features for any model, suggesting 334 that this information is less predictive than measures of gait 335 function and behavior. Indeed, a model trained on PI alone 336 demonstrated substantially lower precision and recall than 337 models including FA and IMU data in the fixed distance 338 in predicting post-stroke recovery outcomes. These models 343 incorporate data from force plates [30], 3D motion tracking 344 [31], [32], or brain stimulation technology [33]. However, the  In the present study, we excluded data from patients who 380 were unable to complete the either the 10MWT or 6MWT, 381 utilizing IMU data during these walking assessments to train 382 and test the predictive models. As such, these models require 383 patients to be ambulatory at IRF admission, which is not 384 always the case. For example, in a study of 41 IRFs, approx-385 imately 6% of stroke survivors were unable to ambulate or 386 required assistance at admission [34]. It remains to be seen 387 whether wearable sensor data have predictive value for non-388 ambulatory patients; incorporating sensor data during alter-389 native activities, such as sitting [35], [36], may facilitate the 390 prediction of walking recovery for these patients.

391
Importantly, this model was trained using admission and 392 discharge data from patients at a single rehabilitation hospital, 393 which may limit its generalizability to broader post-stroke 394 outcomes. Stinear et al. [8] note the importance of predict-395 ing outcomes at specific time-points after stroke rather than 396 at discharge, since discharge itself is linked to functional 397 achievements and subject to variations in care structure and 398 resources. We have developed a model that intentionally 399 leverages predictions based on standard-of-care treatment 400 at a single rehabilitation hospital. While it remains to be 401 seen whether such a model will generalize to other IRFs, 402 the approach described here can serve as a roadmap for the 403 development of site-specific models for accurate, validated 404 predictions at other rehabilitation hospitals. 405 Future work will expand the existing dataset for additional 406 training and testing of the predictive model, including exter-407 nal validation in a new subset of patients. We will also test 408 the predictive value of additional sensor data such as EMG or 409 ECG to account for neuromuscular or cardiovascular factors, 410 and we will examine the feasibility of regression models over 411 classification models to improve the precision of predictions. 412

413
Inpatient stroke rehabilitation is often a hectic and over-414 whelming experience for patients, families, and clinicians 415 working to deliver optimal therapeutic care. Many times, 416 due to time restrictions, patients' limited physical capabili-417 ties, or cognitive/communication impairments, full functional 418 assessments and clinical measures are not recorded and/or 419 uploaded to the EMR. Furthermore, the full sequence of 420 assessments at admission might take as long as 2-3 hours 421 to complete. This results in incomplete or inconsistent data, 422 posing a significant challenge in the creation of traditional 423 prediction models to estimate a patient's future functional 424 scores. Our current study suggests that a viable alternative 425 is to record data from three simple inertial sensors during 426 a brief walking bout (maximum of 60 seconds), which can 427 be completed during any part of therapy or non-therapy time 428 without significant dedicated time. This represents a unique 429 translational engineering approach to support clinical eval-430 uation and treatment of stroke using widely available IMU 431 technology and machine learning techniques.  Table 1. Patient demographics and stroke information were obtained 483 from the EMR and a study intake form.

484
During the clinical assessments at the admission, all par-485 ticipants wore three flexible, wireless inertial motion sensors 486 (BioStampRC; MC10, Inc., Cambridge, MA) at the pelvis 487 (L4-L5 region) and bilateral ankles (Fig. 6a). The sensors 488 were attached to the skin with an adhesive film (Tegaderm; 489 3M, St. Paul, MN). The BioStampRC collected triaxial accel-490 eration (sensitivity ±4g) and triaxial angular velocity (sensi-491 tivity ±2000 • /s) at a sampling rate of 31.25 Hz. A Samsung 492 tablet running the proprietary BioStampRC application was 493 used to collect the sensor data and annotate the beginning 494 and end of each trial or item of the clinical tests. De-identified 495 sensor data were uploaded to the MC10 Cloud and then down-496 loaded and stored on a HIPAA-compliant (Health Insurance 497 Portability and Accountability Act of 1996) secure server.

499
Three sets of features were defined and extracted from infor-500 mation obtained at admission, including patient information 501 (PI, such as demographics and clinical information about 502 their stroke), functional assessment scores (FA), and sensor 503 data (IMU). Table 2 summarizes the 71 total features utilized 504 for model development. A custom code in MATLAB (Math-505 works, Inc. R2017b, Natick, MA) calculated features from 506 the sensor data and concatenated them with the other feature 507 sets.

508
All sensor features were computed from the data recorded 509 during the 10MWT (fixed walking distance) or a subset of 510 the 6MWT (fixed walking duration). Sensor features included 511 amount of motion (AoM) [37], defined as the cumulative 512 angular displacement measured from gyroscope signals, and 513 general statistical and mathematical features calculated from 514 the gyroscope (Gyr) and accelerometer (Acc) signals of the 515 VOLUME 10, 2022   Walking speed is an objective indicator of post-stroke 554 walking ability, a reliable marker of deficit severity, and 555 a strong predictor of functional community ambulation 556 [26], [29]. We targeted the classification of patients as 557 household or community ambulators based on discharge 558 10MWT scores. Target model predictions were ''household'' 559 or ''community'' discharge walking speed based on strati-560 fied 10MWT scores, in alignment with previous classifica-561 tions for household (<0.4m/s) and community (≥0.4 m/s) 562 ambulation [1], [24].

563
For the fixed distance dataset, 26 participants were labeled 564 as community walkers at discharge, and 7 participants were 565 labeled as household walkers at discharge. For the fixed 566 duration dataset, 28 participants were labeled as commu-567 nity walkers at discharge, and 7 participants were labeled 568 as household walkers at discharge. These imbalanced classes 569 can pose a challenge for machine learning models, with a risk 570 of biasing classifications toward the majority class. To min-571 imize this risk, we selected candidate algorithms that can 572 contend with imbalanced classes, namely Balanced Random 573 Forest, Balanced Bagging, and RUSBoost, which randomly 574 undersamples from the majority class. All machine learning 575 algorithms were implemented using the Scikit-Learn (0.23.2) 576 and Imbalanced-Learn (0.23.2) libraries in Python (3.8.8).

578
We evaluated the performance of each model using leave-579 one-subject-out cross validation. The primary performance 580 metric was the weighted F1 score, an average of precision 581 and recall scaled by the proportion of samples for each class. 582 The weighted F1 score ranges from 0 to 1, with 1 indicating 583 perfect precision and recall. Since the explored models are all 584 stochastic in nature, the performance can vary depending on 585 Average and SD of weighted F1 score across 100 iterations for three algorithms to predict discharge ambulation outcomes. Pre-optimized performance is shown for different model types trained under the (a) fixed distance (10m walk), or (b-d) fixed duration (10-360s walk) paradigms, relative to the amount of IMU data used for analysis. Models without IMU data (PI and PI+FA) are unaffected by the amount of IMU walking data. PI models were not considered for the fixed duration analysis given their low performance, shown in (a). The Balanced Random Forest algorithm was selected to compare downstream models for its typically higher performance (e.g., maximum average performance for 10m walk and 60s walk) and lower fluctuation across conditions. FIGURE 9. Feature elimination for fixed distance and fixed duration models. Average and SD of weighted F1 score across 100 iterations is shown as a function of the number of features, as determined by backward elimination, for (a) fixed distance (10m walk), and (b) fixed duration (60s walk) paradigms. The subset of features that maximized the weighted F1 score were selected to optimize model training and testing. Performance for the PI+FA model is identical between the fixed distance and fixed duration models since this model is unaffected by the amount of IMU walking data. to 1 and a cumulative order of importance for the feature 618 set. Finally, we used backward elimination to remove the 619 least important features based on their cumulative order of 620 importance. The mean and standard deviation of weighted 621 F1 scores were calculated using another 100 iterations of 622 the model over different random seeds to capture changes 623 in performance across the number of features used during 624 backward elimination (Fig. 9). 625 Backward elimination indicated that only a subset of fea-626 tures was needed to achieve a maximum average F1 score. 627 VOLUME 10, 2022

649
Using the selected features, we tuned the hyperparameters 650 of each model based on a randomized cross-validation search 651 (RandomizedSearchCV function from Scikit-Learn, using 652 the default 5 folds). These parameters included the number 653 of estimators, minimum sample split, minimum sample leaf, 654 method for determining the maximum number of features 655 (automatic, log2, or sqrt), and the maximum depth. We exe-656 cuted 100 iterations with different random states and identi-657 fied the best-performing hyperparameters based on majority 658 vote. The hyperparameters selected for each model using this 659 randomized search approach are shown in Fig. 10.

661
Model performance metrics were averaged across test folds 662 (left-out subjects in the cross-validation procedure). Per-663 formance was primarily evaluated using the weighted F1 664 score, which accounts for class imbalances by computing a 665 weighted average of precision and recall based on the num-666 ber of samples in each class. Secondary model performance 667 metrics included accuracy (proportion of correctly classified 668 samples) and area under the receiver operating characteristic 669 (AUROC). Possible values for these metrics range from 0 to 1, 670 with higher values indicating better model performance.