A Survey of Wearable Sensors and Machine Learning Algorithms for Automated Stroke Rehabilitation

Stroke is one of the leading causes of disability among the elderly population and is a significant public health problem worldwide. The main impact of stroke is functional disabilities due to motor impairment after stroke. Advances in modern medicine and technology have significantly improved diagnosis and treatment; however, most post-stroke care is based on the effectiveness of rehabilitation. Stroke rehabilitation depends on two main components: (i) training (or therapy) to restore the patient to pre-stroke mobility and (ii) assessing motor functionality of affected patients performing activities to track motor recovery. This article highlights how combining wearable devices and machine learning (ML) produces new pathways for effective stroke rehabilitation. While wearable devices help capture patient movements at much finer time resolutions, ML allows us to build predictive models from wearable data to assist clinicians in diagnosis and treatments. Specifically, we expand on how wearable devices and ML can improve monitoring quality in training intervention, assessment, and remote monitoring. In addition, we provide our main findings from the literature, research challenges, and future directions in post-stroke therapies using wearable devices and ML.


I. INTRODUCTION
Stroke is the second leading cause of death, and 17 million people worldwide suffer from stroke each year [1].Stroke can have devastating effects, including death or severe disability, which can cause social and family burdens.Even though the majority of the victims are older adults, the number of people 60 years of age or older is projected to increase from an estimated 488 million in 1990 to nearly 1,363 million in 2030 [2].Since stroke is the leading cause of adult disability in the world and 70-85% of stroke patients have hemiplegia after the first stroke [3], motor recovery is one of the most crucial aspects for stroke victims.Wearable devices, when The associate editor coordinating the review of this manuscript and approving it for publication was Chan Hwang See.
paired with machine learning (ML), enable us to continuously monitor subjects and ascertain the progressive course towards enhanced motor recovery.It is vital to monitor stroke patients' daily activities and exercise schedules in the absence of a physician to monitor progression regularly.Wearable devices will enable us to collect patient data continuously, which would otherwise lead to missed opportunities for diagnosis and treatment.Combining ML with wearable devices will improve the remote monitoring of stroke patients.Figure 1 provides an overview of stroke monitoring using wearable devices, smartphones, cloud computing, and the Internet of Things.
To improve and regain mobility following strokes, we require two primary components: (i) more practice and training (including physiotherapist instruction sessions and Whenever a person suffers a stroke, they are rushed to the hospital, and a computed tomography (CT) scan is performed to determine the next course of action, including the type of stroke.The rehabilitation period commences 24 hours after the onset of stroke.Wearable sensor-based remote monitoring detects motor recovery when patients engage in exercises in a rehabilitation facility or at home without the assistance of a medical professional.Patients wear a wearable device (wrist-based), and sensors onboard the device record their activities.The data is then transferred to a phone or tablet via Bluetooth.The app will send data to a cloud server using Wi-Fi or internet-enabled devices.The cloud server allows clinicians to retrieve and analyze data to track patients remotely.An alert process can be incorporated if the clinician needs to provide some feedback to patients.The primary advantage of wearable devices for remote monitoring is that they can capture finer time resolutions for near-real-time monitoring and data visualization.Integrating this functionality, specifically fine-time resolution data, with ML enables clinicians to access real-time predictive analytics.
personal practice) and (ii) periodic evaluations (to gauge the impact of training).Therefore, monitoring the performance to assess the progress remotely is essential when the patient is at home or practicing exercises alone.It is imperative to conduct objective monitoring without any biases or prejudices.No single measure is capable of predicting all dimensions of recovery and disability [4], despite the availability of several stroke assessment scales to measure functional outcomes at each level of the World Health Organization International Classification of Functioning, Disability, and Health (WHO-ICF) [5].For further details about stroke assessment scales, readers are referred to Table 1 (with 17 scales for impairment), Table 2 (with 16 scales for activity), and Table 3 (with eight scales for participation restriction class) of the included supplementary material.
A comprehensive overview of the numerous techniques employed in stroke therapy for motor recovery is depicted in Figure 2. As shown in Figure 2, we note that training and monitoring (i.e., motor evaluation) are crucial components of motor recovery.In addition to conventional physical therapy (manual), six widely utilized rehabilitation training programs are presented in Figure 2. Either in-person or remote evaluations can be employed to evaluate motor recovery.The physical therapist conducts a manual evaluation; however, it is also feasible to perform it remotely using an inexpensive wearable sensor.
Mobile health (mHealth) technology [6] uses smartphone processing power and mobility to provide and support healthcare solutions.The utilization of current cell phone technology and applications enables patients to access their health status by interacting with healthcare providers at any time and location.The primary advantage of mHealth is its ability to provide remote healthcare services, such as medical advice and consultations, through smartphones when patients cannot visit a healthcare center.mHealth has become essential, especially in rural or regional areas with limited access to health facilities.With the rise of smartphone applications, it is now possible for people to download and use easily usable applications to record, monitor, and report different aspects of their health to relatives and health professionals.Smartphones and other health monitoring tools can help provide information on various health-related parameters, such as physical activity, body mass index, heart rate, and blood pressure.The apps can be used to promote physical activity and motivate individuals to engage in more physical activity [7], [8], [9], [10], [11], [12].
Furthermore, the data from these apps can be shared with clinicians to monitor the condition for consultation and treatment.Ambient Intelligence (AmI) is another related technology that uses sensors and artificial intelligence to create an environment that can respond to needs, for example, by adjusting the air conditioning according to the temperature [13], [14].However, it can be challenging to monitor stroke patients with smartphones because of health conditions.Hence, portable devices, such as the form factor of a wristwatch, are more appealing to observing stroke subjects.
Wearable devices coupled with ML will significantly aid in assessing patients' motor recovery status by automatically collecting continuous data objectively and inferring progress conditions.Patients can wear wearable devices to gather data relating to their health, and the sensors on such devices will monitor various activities.While wearable devices can capture movements at finer time resolutions, signal processing helps us analyze wearable data, and ML helps us build models to help clinicians diagnose and treat Manual or remote assessments can be performed to assess motor recovery.A physical therapist performs a manual evaluation, and remote evaluation can be performed using a low-cost wearable sensor.This article will focus on the use of wearable devices for stroke rehabilitation.diseases.Therefore, wearable devices and ML present a suitable means of remotely analyzing the wearable data of stroke patients and obtaining details on their motor recovery status.This approach enables the creation of a continuous record of movements and a diary for the resulting assessments.
In this article, we concentrate on various wearable devices and machine-learning techniques to monitor stroke patients' motor recovery during rehabilitation.Specifically, the methods by which they can enhance the quality of monitoring in training intervention, assessment, and remote monitoring (as outlined in Section V). Figure 3 provides a summary of the main sections of this article.Our review differs from a recent review paper [15] in examining the advantages and limitations of various wearable sensors extensively utilized in stroke rehabilitation.Furthermore, we provide open research challenges and prospective directions in diverse aspects of stroke rehabilitation research, which are absent in [15].
Therefore, this review aims to help: • determine the type of remote monitoring that is most relevant and objective • determine which sensors are most relevant and best suited to specific problems • identify deficiencies in the current techniques and approaches • compare the results of various ML models to pick the most effective ML algorithms • suggest deep learning models suitable for real-time outcomes by considering cost, power, time, and model complexity requirements.

II. WEARABLE SENSORS IN STROKE REHABILITATION
This section outlines the significance of wearable sensors in stroke rehabilitation, their methodologies, the monitoring of motor recovery progress, and the diverse types of sensors utilized in wearable devices.

A. NEED FOR WEARABLE SENSORS
During rehabilitation, different therapeutic exercises are recommended for patients, who are asked to try some daily activities (motor tasks).In hospital rehabilitation, healthcare professionals evaluate patients' motor abilities and provide scores based on assessment scales.This evaluation procedure is conducted regularly at crucial intervals.Nonetheless, these assessment scales possess limited predictive capability [4].Since this periodic monitoring is not continuous, we do not receive an accurate estimate of motor activity progress, leading to missed opportunities and early interventions.This is where wearable sensors play a crucial role in facilitating the continuous monitoring of individuals in hospitals, rehabilitation facilities, and homes.

B. ADVANTAGES OF WEARABLE SENSORS
In stroke rehabilitation, the use of wearable sensors presents two significant advantages.First, utilizing a wearable device equipped with appropriate sensors and algorithms coupled with an internet connection enables the real-time monitoring of subjects.Second, the small form factor of these wearable devices makes them comfortable for patients to wear without causing discomfort or inconvenience, encouraging them to participate actively in rehabilitation exercises and activities.

C. MONITORING MOTOR RECOVERY
Two approaches are used to monitor motor recovery through wearable sensors: 1) obtaining clinical equivalent assessment scores by detecting tasks, and 2) measuring the amount of activity performed by the affected part. 1) Obtaining clinical equivalent assessment scores by detecting tasks.Clinical scores are calculated remotely (without a clinician) using data recorded on wearable devices [16], [17].Sensors acquire movement data from the affected body parts while executing prescribed tasks at a rehabilitation facility or home.Using this sensor data, patterns can be detected for different activities.
It is possible to convert the data later into another index representing the patient's status, similar to the assessment score.The newly released index is transmitted to the healthcare professional via either an Internet web application or a smartphone application, enabling the healthcare professional to guide patients residing in remote locations.For instance, in a hospital setting, equivalent National Institutes of Health Stroke Scale (NIHSS) scores were calculated using accelerometer sensors to assess motor recovery during the initial 24 hours following a stroke [18].2) Measuring the amount of activity carried out by the affected part.The total amount of exercise the affected body part performs is determined by wearable sensor data.Different movement metrics are calculated, such as the complete linear or angular movement, the distance covered, and the amount of daily use of the affected body part [19], [20].For instance, a radial basis function (RBF) was employed to correlate differential magnetometer readings from a ''manumeter'' to wrist and finger joint angles [19].These techniques are not only cost-effective and remotely accessible (i.e., timeefficient) but also provide objective methods to measure body movements and calculate scores.

D. SENSORS
Table 1 depicts wearable sensors commonly used in healthcare applications, capabilities, and limitations.It presents the eight sensors widely used with wearable devices, including an accelerometer, gyroscope, magnetometer, goniometer, tilt sensor, pressure sensor, pedometer, and electromyography (EMG) sensor.Each sensor measures a particular physical phenomenon (second column of Table 1), which means it also has its limitations (third column of Table 1).For instance, accelerometers can measure acceleration, but they cannot measure angular velocity, which can be accomplished by utilizing gyroscope sensors.Depending on the problem, many studies use single sensors, while others use a combination of sensors that complement each other to capture the desired signals.For further details on the use and applications of wearable devices in healthcare applications, we encourage readers to refer to [25].The following subsections provide a brief overview of each sensor and its primary application in stroke rehabilitation.

1) INERTIAL MEASUREMENT UNITS (IMUS)
Micro-electromechanical systems (MEMS) facilitate the miniaturization of electrical and mechanical components.These components are fabricated into integrated circuits (ICs) with a form factor ranging from a few micrometers to millimeters [26].MEMS technology allows for a smaller form factor while reducing costs and requiring low power.These characteristics make MEMS-based sensors particularly attractive for wearable devices.The three commonly used IMUs are: • Accelerometer sensors are used in wearable devices to measure acceleration (the rate of velocity change).List of eight wearable sensors commonly used to detect motor activities in stroke monitoring and rehabilitation, along with their measures and limitations.
• Gyroscope sensors are utilized to measure angular velocity, commonly measured in degrees per second or revolutions per second [27].Gyroscopes are advantageous instruments for determining orientations and evaluating rotations on three axes.
• Magnetometer is used to measure magnetic fields and fluxes in the Earth's magnetic field [27].Accelerometers are limited to linear accelerations, whereas gyroscopes can measure rotations and orientation.Therefore, it is possible to combine accelerometers and gyroscopes to obtain a complete picture of object movement in 3D space.This combination is called an IMU and has several applications, including stroke monitoring [18] and epilepsy monitoring [29].Adding a magnetometer allows us to estimate the direction, which is helpful in many applications, such as navigation systems [30] and sleep monitoring [31].
IMUs possess significant practicality in evaluating motor function in individuals who have undergone strokes.Numerous researchers have utilized accelerometers in Upper Extremity (UE) studies [16], [32], [33], [34] due to their ability to provide reliable and valid objective measurement and their potent psychometric attributes [35].For instance, accelerometers were employed to assess arm usage [33], calculate Functional Ability Scale (FAS) scores from accelerometer data [16], acquire gait patterns [36], and document Activity of Daily Living (ADL) [34].Further, IMUs were used to determine the body's specific force and angular rotation rate [37].A wristwatch device and a magnetic ring worn on the index finger (''manumeter'') are utilized to calculate the total angular distance traveled by the wrist and finger joints [19].

2) GONIOMETER
A goniometer is an instrument used to measure a joint's angle (range of motion) and is commonly used in orthopedics to measure the angles of the joints [38].In rehabilitation, wearable goniometer-equipped systems aid in reconstructing human posture, providing support both in clinics and at home [39].A sensing glove with three knitted piezoresistive fabric (KPF) goniometers was used to track the flexion and extension movement of the metacarpophalangeal joint of the thumb, index, and middle fingers [40].

3) TILT SENSORS
Tilt sensors, also called inclinometers, measure the angle of gravity.Compared to goniometers, inclinometers possess a higher degree of reliability [41], and studies have demonstrated their superiority in detecting intricate movements [42].Since inclinometers use gravity, they cannot measure angles in a horizontal plane.

4) PRESSURE SENSORS
Pressure sensors measure the applied pressure in the sensing area, and force resistive sensors (FSRs) are commonly used to detect physical tension, squeezing, and contact.Examples include detecting gait patterns with FSR sensors embedded in shoes [43], gait abnormalities [44], and monitoring of daily life among stroke survivors [45].An FSR can monitor multiple major muscle groups that govern the movements of the hand and wrist [46].It has been demonstrated that a combination of FSR and gyroscopes can accurately detect FIGURE 4.An overall ML pipeline comprises six steps, from data collection to building models to model deployment.Data is collected from sensors and equipment, and the first step is capturing electronic health records (EHR).In the next step, the data is pre-processed to remove unwanted noise and missing values, then normalized.The third step is feature engineering, which comprises feature extraction and selection.The fourth step is building the appropriate ML model, training the model, and validating the model.In the fifth step, the model's performance is measured through various metrics using a test dataset.In the final step, the model is deployed to an edge server, cloud server, or wearable device for real-time predictions.The model's performance is constantly measured, and the feedback is used to improve the processes and the model's performance.
precise gait patterns [47].Furthermore, a gait shoe consisting of an IMU, FSR, and electric field height sensors was used to quantitatively analyze gait parameters (heel strike and toeoff) between healthy people and patients with Parkinson's disease [48].In stroke patients, shoe-based accelerometers and FSR sensors were used to detect temporal gait parameters to differentiate posture allocation and activities [49].

5) PEDOMETERS
A pedometer is used to measure the number of steps an individual takes and to record the vertical acceleration [50].Wearable pedometers often use a 3-axis MEMS-based accelerometer to estimate 3D motion patterns and deduce the number of steps [51].The pedometer is a more valid method for measuring ambulatory activity in stroke patients [52].It proved advantageous in objectively assessing the number of steps walked during walking and describing the pattern and intensity of an activity [53].

6) ELECTROMYOGRAPHY (EMG)
EMG measures electrical activity in muscles and can detect activities such as moving fingers or clenching the fist.There are primarily two types: intramuscular EMG (iEMG) and surface EMG (sEMG) [54].Intramuscular EMG is an invasive technique in which needles are inserted through the skin to measure electrical signals.On the other hand, sEMG is a non-invasive technique that measures electrical signals from the skin's surface.Due to the non-invasiveness of sEMG, it is commonly used in wearable systems [55].The accelerometer and sEMG sensors were combined to monitor ADL [17] and provide feedback during ADL [32].Additionally, they were used to assess the quality of motor performance in home rehabilitation so that appropriate feedback could be provided to promote high-quality exercises.
Until now, we have discussed commonly used sensors in wearable devices.The next section provides an overview of ML steps in stroke rehabilitation, including data types, ML types, and ML models.

III. MACHINE LEARNING IN DATA ANALYSIS
In this section, we discuss various data types used in ML, types of ML, pre-processing steps, feature engineering, ML models, training of the models, evaluation of the models, deployment of the models, and qualitative and quantitative effects of various steps on the outcomes.The overview of these steps is shown in Figure 4.
A. DATA TYPES Different methods are used to collect patient information to provide the appropriate medical attention in medical establishments.Sensor data is primarily quantitative (numerical or a natural number).On the contrary, the data gathered through electronic health records (EHR) is predominantly qualitative (categorical) and typically comprises a digital patient history form.The EHR is longitudinal data collected over time.It contains information about age, gender, demographics, problem history, immunizations, physician's observations, reports, and laboratory results.It is also known as electronic medical records (EMR) and computer-based patient records (CPR) [56], [57].Figure 5 shows a graph depicting the data collection in the medical field divided into four categories: discrete, continuous, nominal, and ordinal.The numerical (quantitative) data is categorized into discrete (such as the number of hospital visits) and continuous (such as the weight of patients).On the contrary, categorical (qualitative) data can be classified into nominal (unordered data, such as male or female) and ordinal (ordered data, such as cancer stages [I-IV])).These data types are crucial in defining ML problems and creating appropriate models.

B. MACHINE LEARNING TYPES
ML is a field of artificial intelligence (AI) concerned with performing tasks by learning patterns from the dataset and generalizing without being specifically programmed.In other words, ML algorithms try to automate tasks by improving the learning process without human participation.There are four types of ML: (i) supervised learning, (ii) unsupervised learning, (iii) semi-supervised learning, and (iv) reinforcement learning.Figure 6 shows the four types of ML.Below, we briefly describe the four types of ML paradigms [58]: 1) In supervised learning, the ML algorithm is presented with a dataset whose input (feature vector) and the desired output (label) are explicitly provided.The algorithm learns to map the input to the output by learning the hidden patterns.Suppose we have N labeled examples: {(x i , y i )} N i , where x i is a vector input and y i is a (scalar) output.The label y i can be a real number (regression problem) or a finite set of classes {0, 1, 2, . . ., C} (classification problem).Then, the supervised learning algorithm uses the dataset to generate a model that takes x as input and delivers output labels (y).The desired output is also called target variable.Examples of supervised classification algorithms include Support Vector Machine (SVM) [59], k-nearest neighbors (k-NN) [60], Naive Bayes [61], decision trees (DT) [61], random forest (RF) [61], and neural networks (NN) [62].Examples of supervised regression algorithms [61] include linear, non-linear, Bayesian, polynomial regression, ensemble methods, and NN.
Supervised learning is beneficial for learning from data and making predictions.For example, data collected from wearable devices during stroke rehabilitation can be used to build a personalized ML model for individuals and subsequently use these models to track and predict the quality of exercises during remote monitoring.
2) In unsupervised learning, we do not have specific labels, i.e., the dataset will be a collection of unlabeled examples {(x i )} N i .The goal here is to build a model that transforms the input data (or a feature vector) into another vector or value such that it helps to solve the problem at hand.For example, in clustering problems, unsupervised learning can be used to find clusters in the data so that it helps to make decisions.Another application of unsupervised learning is to find associations between variables in large databases.Examples of unsupervised algorithms include K-means [63], fuzzy c-means [64], hierarchical [65], Density-based spatial clustering of applications with noise (DBSCAN) [66], Gaussian Mixture Model (GMM) [67], Principal Component Analysis (PCA) [68], and NN.
Unsupervised learning is advantageous when we do not have the labels for the data and want to explore the data for hidden patterns.For example, data from wearable devices without labels can be extracted and clustered according to stroke severity level and exercises during rehabilitation, establishing connections between subjects and exercises.
3) In semi-supervised learning [69], the dataset contains both labeled and unlabeled examples.Ordinarily, the number of unlabeled examples will be much higher than labeled examples.The objective here is similar to supervised learning, i.e., to produce a model that learns not only from the labeled examples but also uses unlabeled examples.The idea here is that the unlabeled examples add more uncertainty, and this helps to generate better models.Examples of semi-supervised learning algorithms include FIGURE 6.The figure illustrates the four types of ML: (i) supervised learning, (ii) unsupervised learning, (iii) semi-supervised learning, and (iv) reinforcement learning.Depending on the target variable, the supervised learning can be further categorized into classification (categorical) and regression (numerical, real number).In the case of unsupervised learning, the target variable is unavailable.However, we can perform clustering and association-type problems.In the case of semi-supervised learning, it primarily handles categorical target variables, which can be grouped into classification and clustering problems.
Reinforcement learning deals with categorical target variables to manage classification problems.When the target variable is unavailable, the problems usually fall into the category of control (e.g., self-driving cars).
Semi-supervised learning becomes extremely useful when we have limited labeled data.For example, timeseries data from stroke rehabilitation exercise sessions may consist of a few minutes for each exercise for each subject.In such a scenario, semi-supervised learning can be used to train a supervised learning model from the unlabeled data by exploring temporal associations between exercises and segments of data.4) In reinforcement learning [72], an agent (ML algorithm) interacts with an environment and is capable of comprehending the state of the environment.The agent can execute actions in all states.Different actions bring different rewards, and the agent can move to a different environment.A policy is a function that accepts a state's feature vector as input and produces the best course of action to take while in that state, much like the model in supervised learning.The goal of the RL is to learn a policy, which is optimal if a course of action maximizes the expected average reward.
Reinforcement learning is beneficial when the decision-making is sequential to solve the problem over the longer term.For example, robots can learn to interact with stroke victims to help improve motor recovery in the stroke rehabilitation process.Examples of RL algorithms include Trust Region Policy Optimization (TRPO) [73], Proximal Policy Optimization (PPO) [74], Q-learning [75], and Deep Q Neural Network (DQN) [76].

C. ML SPECIFIC TO STROKE REHABILITATION
ML approaches offer distinct applications in stroke rehabilitation.Depending on the rehabilitation objective, specialized classifiers may be employed.For instance, supervised learning holds advantages in classification tasks, as it enables the precise classification of stroke patients based on established patterns, such as identifying ADL or distinct stroke types and severity levels.
In instances where specific target classes are not readily available, clustering can be employed to identify patient clusters with common characteristics or to identify groups of exercises suitable for specific severity groups.Semisupervised classifiers are advantageous when the labeled data is restricted.By utilizing clustering algorithms, it becomes feasible to harness the patterns present in unlabeled data and extract features that can subsequently be utilized to classify labeled data.
Reinforcement learning can be used in developing adaptive intervention strategies, where the patient learns to improve the outcome of the sequential task by maximizing the total rewards.In such an approach, the patient learns by trial and error while accumulating rewards.This enables the optimization of treatment plans in real-time by utilizing individual patient responses and progress.This approach allows for personalized treatment, but it is essential to prioritize ethical considerations, implement safety measures, and integrate expert knowledge when applying reinforcement learning to stroke rehabilitation.

D. PRE-PROCESSING
This subsection provides a brief overview of the steps involved in the ML approach to data analysis.Once sensor data is collected, the next step is to separate activity types and provide a comparative index equivalent to a clinical score.As shown in Figure 4, pre-processing steps are performed depending on the acquired data (after collecting the data).The data may consist of noise and artifacts.Typical pre-processing efforts include centering and scaling (zero mean with unit standard deviation), removing outliers, and handling missing values [77].The condition of the data determines whether these steps are required.

E. FEATURE ENGINEERING
A feature (also known as a predictor) compactly represents the sensor data and reveals its distinguishing characteristics.The fundamental steps in feature engineering consist of feature extraction and feature selection.
Feature extraction transforms the input data (feature vector) into a different feature space for better model performance.Feature extraction requires a solid understanding of the problem, domain knowledge, and related aspects of the problem.Often, new features are created from existing ones to improve ML model performance.The features extracted from the sensor data can be primarily classified into five groups (or domains): (1) time domain, (2) frequency domain, (3) statistical features, (4) morphological features, and (5) data-specific features.
1) Time domain features encompass the characteristics extracted directly from the time series data (either raw or pre-processed), capturing the temporal patterns and dynamics without transforming them into alternative domains.2) Frequency domain features are obtained by transforming time series data into the frequency domain, revealing information about the distribution of frequency components and their corresponding magnitudes.3) Statistical features provide the quantitative measures calculated from valuable insights into data distribution, its central tendency, and variability.4) Morphological features capture the shape and structural information of data, focusing on the time series' patterns, trends, or structural characteristics.5) Data-specific features are patterns extracted from specific datasets, tailored to the characteristics of that particular dataset, rather than being generalized for all datasets.For example, a velocity sequence can be extracted as a data-specific feature in accelerometer sensor data.On the other hand, feature selection deals with selecting the best k features to improve the model's performance and help with the models' interpretability [61].Feature selection (also known as subset selection) helps reduce redundancy of features and time complexity [61].There are three main approaches to feature selection [78]: (1) filter methods, (2) wrapper methods, and (3) embedded methods.
1) Filter methods use various statistical tests to determine their correlation with the classes and then rank these features.Some frequently used statistical tests include the X 2 test, Fisher's exact test, the Euclidean distance, Pearson's correlation, information gain, and others.2) Wrapper methods select features by eliminating features that do not contribute (unimportant or less significant) by using the performance (typically of classification algorithms).As a result, wrapper methods depend on classification algorithms to select the best features.Some algorithms include sequential forward and backward selection and genetic algorithms.3) Embedded methods select features by dynamically adjusting the weights given to each part during training, and this mechanism is built into the classifier.Some techniques include tree-based classification (e.g., decision trees, random forests) and regularization models (ridge regression, most minor absolute shrinkage, and selection operator).
Both feature extraction and selection are part of feature engineering.Furthermore, procedures such as dimension reduction may be applied for specific problems, which require projecting the data into a lower-dimensional space (using PCA) and reducing the correlation among features.

F. CLASSIFICATION MODELS
The literature shows that most machine-learning models developed using wearable sensors in stroke rehabilitation tend to address classification problems.In this article, we have identified five application areas concerning stroke rehabilitation.They are (i) identifying ADL, (ii) estimating clinical scores, (iii) monitoring exercises, (iv) recognizing postures, and (v) recognizing gestures.These approaches are reviewed in Section IV.These application areas use various data types, but most target categorical outputs.Therefore, we focus on classification models in this article.
Classification models are mathematical models designed to make decisions on a new observation (e.g., sensor data) and categorize the outcome based on the learned features of the collected data.For example, classifiers can help clinicians make an informed decision on whether the patient is recovering or deteriorating after treatment by using wearable devices [18].The selection of models depends mainly on the type of feature, the complexity of the task, and the sample size.After the model is trained using training data, it is evaluated, validated, and calibrated before deployment to predict new test data.
In this subsection, we provide a high-level description of the most commonly used traditional classifier models (k-nearest neighbors, SVM, RF, and k-means algorithms) and the most recent deep learning models (convolutional neural network, recurrent neural networks, transformers).The reader is encouraged to follow the relevant references for more detailed explanations.

a: TRADITIONAL MODELS
Most models in this category require a set of selected features to be used as input to build classification models.The previous subsection III-E presented a brief overview of feature extraction, where features are hand-crafted.The hand-crafted features may or may not be good, as this is subjective.Therefore, the classification performance may suffer.

K-nearest neighbors (K-NN
) is a supervised algorithm used for classification.Suppose that we have a set of data points (x 1 , y 1 ), (x 2 , y 2 ), . . ., (x n , y n ), where x i is the data value and y i are the class labels.The k−NN algorithm classifies a new data point x j into one of the y i classes using a distance metric that minimizes the distance to one of the existing data points x i .Distance metrics can include Euclidean, City Block, Manhattan, and others [61].The parameter k determines the number of neighbors it looks for while deciding the label for the new data point.It uses majority voting to make the final class label for this new data point.For example, with k = 5, the algorithm looks for the five closest data points of x j and assigns the labels based on most of the labels y i of these 5 data points.It is one of the simplest supervised learning algorithms.It performs faster than other models as it has no sophistication (i.e., explicit training step) to classify a new data point.
Support Vector Machine (SVM) is a supervised learning algorithm for binary and multi-class classification.The idea behind the SVM is to find the hyperplane so the distance from the closest points (that belong to two classes in the binary) to the hyperplane is maximized and is given by [79] min subject to where x i is a feature vector, n is the number of feature vectors, b is the bias term, ω is the weight vector, C is a positive regularization constant and ξ is the slack term.The points close to the decision boundary are called support vectors.In cases where the data is not linearly separable using the hyperplane, the kernel techniques allow for class separation by projecting the data points into a high-dimensional space.SVM can be trained with kernels, including linear, polynomial, radial basis function (RBF), and Sigmoid [80].The RBF kernel is often used, as it allows the points to be separated into different classes and is given by [79] where K (•, •) is the RBF kernel, x and x ′ are feature samples in the input space and a free parameter σ .The grid search optimization algorithm [81] provides the optimal positive regularization constant C and the kernel parameter σ .SVM can handle high-dimensional data and is highly effective in cases where the number of predictors exceeds the observations.SVM is primarily used for classification problems, and the binary classification can be extended to multi-class classification problems.However, a variant of SVM called the Support Vector Regression (SVR) can be used for regression problems.
Random Forest (RF) is a supervised learning algorithm that combines a collection of tree-structured classifiers to classify a new data point [82].Predictor trees are created by randomly sampling data points with replacements from the training dataset (also referred to as Bootstrap Aggregation or Bagging).The decisions of each tree are aggregated (majority vote) for the new data point, that is, the predicted class [61].Bootstrap Aggregation helps to reduce variance by building uncorrelated trees by selecting random samples and then splitting them into binary trees.There are several hyperparameters, such as the depth of the tree, the number of leaf nodes, and many others.RF can be used both for classification and regression problems.We urge the readers to refer to [61] and [83] for more details.
K-means [67] is a clustering algorithm that assigns one of the k labels to unlabeled data points in the dataset.Suppose we have a dataset consisting of N observations: Then, our goal is to partition the dataset into k clusters.This is primarily used in exploratory data analysis.We encourage the readers to refer to [67] for further details.

b: NEURAL NETWORKS AND DEEP LEARNING
In contrast to traditional classification models that employ manually designed features, modern deep learning models use artificial neural networks (ANN) to automatically learn effective representations (attributes) hidden deep in the data.In addition, as the dataset grows, it becomes increasingly challenging to identify good features from complex datasets using handcrafted features.Therefore, deep learning models tend to have a better generalization power (better predictions for unseen samples) than traditional classification models, leading to the widespread adoption of algorithms in the recent past.Neural networks, the fundamental building block of deep learning models, began in the 1950s with the creation of the perceptron [84].In the 1980s and 1990s, many techniques were invented, such as the backpropagation algorithm [85], which is the powerhouse of modern deep learning algorithms; however, they became more prominent after the 2012 research work by Krizhevsky et al. [86], especially with the availability of large data sets and computing facilities.
ANNs use artificial neurons (modeled like biological neurons) to create an artificial brain-like structure with links between artificial neurons, representing synapses in the physical brain.Artificial neurons are often organized as layers with connections between them.ANNs are also called neural networks (NN) in short.We provide our data (to be modeled) to the first layer (or the input layer) and the expected output to the last layer (the output layer).We then use the backpropagation algorithm and various mathematical optimization techniques to learn the weights (of the links between artificial neurons) so that the difference between the expected output and the predicted output by the NN is minimized.The crucial part of these deep learning models is learning the proper weights, and the optimization algorithms adjust these weights through various mathematical procedures.ANNs can be used both for classification and regression problems.We recommend the readers refer to [62] and [87] for more information.
Convolution Neural Network (CNN) is a class of ANN, and the name ''convolution'' is due to a mathematical convolution operation.NNs implement convolution layers, which are a series of convolution filters.CNNs are widely used in image processing, natural language processing, and time series data [87].CNNs are robust to noise, can extract deep features and are equivariant (i.e., they provide the correct output corresponding to changing input in timeseries data, independent of time).Although the CNNs were primarily made for non-sequential data, they sometimes perform well on specific applications.This makes CNN very useful for time series classification.For example, CNNs have been used in activity recognition [88], detection of atrial fibrillation from electrocardiography (ECG) signals [89], stroke rehabilitation [90], [91], and many others.
Recurrent Neural Network (RNN) is another class of ANN where they are explicitly geared to handle sequential input (such as text, audio, and video).RNNs incorporate a cyclical process between neurons, allowing the display of dynamic temporal behavior.This makes RNNs highly useful for time series data and applications such as speech recognition, language translation, handwriting recognition, and many other applications [62], [87].
Long Short-Term Memory (LSTM) model is a type of RNN that uses ''short-term memory'' and long-term dependencies of RNNs and was introduced in 1997 [92].They use efficient gradient-based learning techniques to reduce the longer training time (owing to slow changes in weight) of the default RNNs by repeating backpropagation [92].Therefore, LSTMs avoid the problem of vanishing gradients, commonly seen in deep NNs [93], and are used in applications such as robot control, music composition, time series prediction, human action recognition, and others.We encourage readers to refer to [94] for a comprehensive treatment of LSTMs and their extensions.
Transformer is a sequence-to-sequence model that employs the attention mechanism [95], allowing the models to capture dependencies independent of the distance between input and output.Transformer models avoid the recurrent relations present in RNNs and LSTMs.Instead, they use attention mechanisms to calculate global dependencies between the information and work to provide the predictions.We urge the readers to go through [95] for further details on the attention mechanisms and how they are used in sequence modeling applications.Furthermore, the reader can refer to [96] for in-depth information and usage of the transformers.

G. MODEL TRAINING
In this step, we choose the appropriate ML classification model, which can be based on the expertise of the domain, the nature of the data, and the expected results.Once a classification model is chosen, the next step is to divide the data set into three sets: the train set, the validation set, and the test set.Often, the proportion of the data for the training, validation, and test sets is chosen to be 80:10:10 percentages (likewise, 70:20:10 and 60:20:20 are also utilized in some cases), and partitioning of the data set is performed randomly to eliminate the introduction of bias.
The train set is used as input to the chosen classification model.It is trained using optimization methods to minimize the error between the given labels and the model's predicted output.In the case of classification problems, we try to reduce class errors by leaning on the model classification parameters.Each classification model requires a type of optimization algorithm to minimize the mistakes.For example, deep learning models use Stochastic Gradient Descent, Adagrad, Adadelta, RMSprop, Adam, AdamW, and others to learn the weights and biases in deep learning models [97].
The training process may also include dividing the train set into batches so that extensive data is divided into batches to train the model.An epoch refers to the process in which the model has used all the training data, and an iteration refers to the number of batches in an epoch.The training process is completed when the error is minimized, or the training process has reached a predefined number of epochs.Once the model is trained, we use the validation set to assess the model's performance.The hyperparameters of the model are tuned [98] to obtain the best model for the given metric.The metric is chosen based on the outcome needs and is tied to the domain.
Once the fine-tuning of the model is completed, the model is used in test environments.In training, we decided to partition the data sets into training, validation, and testing sets.This is also called the holdout method.However, this may often cause models to have a high variance or have limited data samples.In such cases, we often refer to model training using k-fold cross-validation, where the data is split into k subsets, and we train the model on k − 1 subsets and validate the model using the remaining set.The process is repeated until all training is performed on all k subsets, and the performance is averaged.The k-fold crossvalidation helps reduce the models' variance to learn the best parameters.Usually, the value of k is chosen to be 5 or 10.When the number of samples is highly limited in the data set, the leave-one-out cross-validation is used and is similar to k-fold, where one sample data point is used for validation and the rest for training.We urge the readers to refer to [99] to explore related methods and their variations.
Often, one has to train several classification models and use different validation strategies to select the best-performing classification model for the problem at hand.

H. MODEL EVALUATION
Once the best-performing model has been selected, we have several metrics to evaluate classification performance.The most commonly used classifier prediction metrics in new unseen data are precision, recall, F1-Score, sensitivity, specificity, accuracy, root-mean-square error (RMSE), and coefficient of determination (R 2 ).They are given by: where TP represents the number of true positive predictions, TN the number of true negative predictions, FP the number of false positive predictions, and FN the number of false negative predictions; y represents the actual value ŷ represents the predicted value, ȳ represents the mean value, n is the number of data points.Each metric measures certain aspects of the model's performance and allows us to interpret the results specific to the application problems.For example, we may need a model that requires both precision and recall to be high for specific medical applications.In such cases, we have to choose a classification model that provides higher precision and high recall, not simply based on the model of choice.

I. MODEL DEPLOYMENT
The best-performing model against the set criteria is then used for real-world predictions (often called the production model).The model predicts the outcomes on newer data samples, and labels for these samples may or may not exist.The model's outcome is recorded, and the actual labels are registered (if they exist).The model performance against the desired metrics is continuously monitored, and feedback is generated to identify the cases where model performance needs improvement.This feedback may apply to any previous steps in the ML pipeline, as shown in Figure 4. Feedback is then analyzed, an appropriate corrective measure is taken, and the model is retained and deployed.The model's development, monitoring, feedback analysis, and fine-tuning are an ongoing process.
J. QUALITATIVE AND QUANTITATIVE EFFECT ON ML PIPELINE 1) DATA The impact of data on shaping ML models is significant, both in qualitative and quantitative aspects.Qualitatively, noise, data imbalance, outliers, and other factors pose a potential risk of overfitting.Quantitatively, the dimension, diversity, correlations, and overlap of classes in the data directly impact a model's efficacy in extracting features from the data or learning from the data.Consequently, acquiring representative and balanced data is pivotal in determining whether ML models can qualitatively develop meaningful patterns and quantitatively utilize their knowledge with new data.

2) SENSORS
The first step is to choose a sensor, considering factors such as activity type, ease of use, and financial implications.
There are several advantages and disadvantages to each sensor.For instance, a triaxial accelerometer solely records movement along three axes without providing physiological data.However, sEMG sensors only detect the activity of muscles.Therefore, employing numerous sensors to collect diverse data types enhances efficiency and presents diverse viewpoints.Nonetheless, it is imperative to balance the quantity and placement of sensors while avoiding any unnecessary burden on patients.Furthermore, we must use only relevant sensors to acquire valuable information.The number of sensors and the quality of the data they generate will impact the qualitative aspects of the ML model's learning capabilities and quantitative outcomes.

3) PRE-PROCESSING
Pre-processing removes unwanted noise from sensor data and improves the data quality.This helps ML models find hidden patterns, which makes them very effective.For example, pre-processing accelerometer readings can have a quantitative effect on improving the detection of movements and prediction score accuracy on new data.Similarly, preprocessing steps can help improve the quantitative outcomes of ML models by incorporating data from additional sensors.

4) FEATURE ENGINEERING
To detect activities from time-series data, windowed data is preferred over long data streams, providing equal information in smaller segments while reducing complexity and dimensionality.The features extracted are deemed representative of identifying valuable patterns, thereby facilitating data interpretation.Therefore, feature extraction helps improve the model's quantitative performance.Feature selection reduces model complexity by eliminating redundant and irrelevant features, improving quality.The reduction of data dimension enables the visualization and modeling of features by removing feature correlations and enhancing the qualitative aspects of the modeling.

5) CLASSIFICATION MODELS
When selecting a classifier model, it is essential to consider the data types, complexity, and sample size.For example, linear models, such as linear discriminant analysis (LDAs), are easier to use but lack the structure to handle complex data.They may underfit intricate data, resulting in suboptimal performance.However, LDAs are less likely to overfit with small sample sizes.On the contrary, ANNs can handle complex data; however, they may exhibit overfitting without regularization or with fewer samples.
Furthermore, the outcomes of the ANN models pose a significant challenge in their interpretation and rationalization.Therefore, the choice of classifier has a qualitative impact on the quality of the model.Quantitatively, classifiers impact performance metrics such as accuracy, F1-score, sensitivity, and specificity.The selection of the correct classification model is, therefore, crucial and will have a significant impact on the prediction's performance.

IV. EXISTING APPROACHES IN STROKE REHABILITATION
This section reviews the relevant research on stroke rehabilitation and provides an overview of each work from 2009 to 2023.Based on the primary objectives of this review, we have cataloged the literature into six subsections for easy understanding and comprehension.These include (i) identifying ADL, (ii) estimating clinical scores, (iii) monitoring exercises, (iv) recognizing postures, (v) recognizing gestures, and (vi) deep learning methods.
At the end of this section, Table 2 summarizes the research objectives, sensors, devices, number of subjects, research context, placement of sensors, ML models, results, and limitations from 2009 to 2023.In addition, Table 3 shows the high-level summary of features (extracted and processed) in stroke rehabilitation using wearable sensor devices.

A. IDENTIFYING ADLS
Identifying ADLs is primarily a classification problem.In this direction, Roy et al. [17] used an accelerometer and sEMG sensors to classify 11 ADLs from 10 subjects (five men and five women).Sensors were placed on the shoulders, thigh, wrist, and waist.They used the adaptive neuro-fuzzy ML model and ANN to identify ADL with a sensitivity of 95% and a specificity of 99.7%.One of the significant limitations is that they designed ML models for each subject individually, which limits the usage of models for a more generic population.
Arif and Kattan [100] used three triaxial accelerometers to monitor the physical activities of nine healthy subjects (eight men and one woman).Sensors were placed on the ankle, chest, and wrist.They used k-NN, RF, and NN to identify 12 types of physical activity.The Rotation Forest model achieved 98% classification accuracy.One of the significant drawbacks of this work is that it requires selecting optimal features, which may not always be generalizable and may need re-selection of features upon change in the data.
Sadarangani et al. [101] uses FSR sensors attached to the forearm to classify three tasks (reach, grasp, and move an object).They collected data from 16 subjects (eight strokes and eight healthy).They used SVM with RBF kernel and LDA to classify the three tasks.SVM with the RBF model provided 92.2% accuracy in subjects with stroke and 96% precision in healthy subjects.However, they did not investigate the effect of the object's weight on grasp detection accuracy and did not change the importance of the entities.Furthermore, they did not seek to study the impact of re-donning the FSR sensors and the tightness levels.
O'Brien et al. [102] used the Samsung Galaxy S4 smartphone to collect data from 45 subjects (30 stroke and 15 healthy).The smartphone was strapped to a belt pouch and tied to the waist.The data included an accelerometer, gyroscope, and barometer sensor readings.Subjects self-labeled the activities and were sent to a server via Wi-Fi and 4G communications.They created 270 features (131 from the accelerometer, 131 from the gyroscope, and eight from the barometer), and RF was used to classify the six activities.The findings suggested that the model performed better when trained on stroke data (75% recall) than healthy activity data, and models trained on data collected in the laboratory performed poorly (56% recall) when the same models were tested at home.Some limitations include (i) a lack of study of environmental effects on the model performance and (ii) a limited number of subjects and data samples outside the laboratory environment.
Tran et al. [103] uses a single wrist-worn accelerometer to classify functional and nonfunctional arm movements.The study included 20 subjects (10 healthy and ten strokes) and extended the previous research [104].The study strongly suggests that RF performs better overall when compared with SVM-RBF, linear SVM, and k-means.However, the performance of RBF-SVM was better than RF in inter-subject for the standard group, and k-NN was slightly better than RF in inter-normal and intra-stroke group [103].One of the significant limitations of this work is that it is limited to only 4 ADLs and requires additional sensors to measure angular velocity, which is crucial sensing information to consider in measuring ADLs.
Lee et al. [32] use a wrist-worn triaxial accelerometer and gyroscope (Shimmer Ireland) to (i) detect upper limb movements detected with goal-directed (GD) during ADL and (ii) evaluate motor performance quality during home rehabilitation of 30 subjects (20 strokes and ten healthy).Logistic regression was used to classify GD and non-GD movements, and the model detected GD movements with 87% Area Under the Curve (AUC).RF detects whether exercise requires feedback for home rehabilitation with an F1 score of 84.3%.Some limitations of this work include that they did not consider both movements that belong to GD and non-GD movements.In addition, the approach has not been tested in home settings.Furthermore, the process requires two algorithms to detect tasks and quality assessments.
Chen et al. [105] uses five IMUs (one on each wrist, one on each of the upper arms, and one on the hip) to detect 10 ADLs using four ML models.They collected data from 11 subjects after the stroke, and Apple Watches (Series 3) were used to manage the data.Decision Tree, RF, SVM, and Extreme Gradient Boosting (XGBoost) were used in the analysis, and they found that XGBoost achieved 82% accuracy on 10 ADLs and 90% accuracy on seven tasks.One of the main limitations of this work is the limited sample size (n = 11), which is limited in diversity (data variations) due to a limited number of samples.Furthermore, the data was collected in a structured script-based simulated environment, which is unnatural (as in a home).

B. ESTIMATING CLINICAL SCORES
Estimating clinical scores can be both a classification and regression problem.This depends on the desired clinical scores.If the scores are whole numbers (1, 2, 3, and so on), they fall under the classification category.On the other hand, if the clinical scores are a range of numerical values, they fall under the category of regression.
Patel et al. [16] used accelerometer sensors on the hand, forearm, upper arm, and trunk to estimate FAS scores [106]. 1he study included 24 hemiparesis subjects and used RF to estimate FAS scores on 15 motor tasks.The RF model and further evaluation of motor tasks revealed that they could calculate scores with 0.04 points and a standard deviation of 2.43 points (with a coefficient of determination R 2 = 0.96).We note that the tasks in the study were segmented and separated, which helps the RF model select the features and identify the tasks correctly.Furthermore, further work is required to evaluate the model's performance when the jobs are isolated (as in the real world) and when the subjects wear the sensors (self-calibration and position variation).
Yu et al. [107] used two accelerometers and seven flex sensors to estimate the Fugl-Meyer Assessment (FMA) scores of 24 stroke subjects (consisting of 16 men and eight women).They used ensemble regression using an Extreme Learning Machine (ELM) of 7 weak models and could predict the FMA scores of the seven exercises with the coefficient of determination (R 2 = 0.917).The FMA estimation framework was designed to work in both laboratories and homes.However, the system was not tested in an open, free-living environment, nor was the model deployed on a wearable device.
Oubre et al. [108] used two inertial sensors (attached to the wrist and sternum) to estimate the levels of upper extremity impairment using FMA scores.They collected data from 23 subjects (seven men and 16 women) who had experienced a stroke.As a first step, they used DBSCAN [66] to group time series segmented data to extract features and later support vector regression (SVR) to estimate FMA scores.Their regression approach calculated the FMA scores with a normalized root-mean-square error of 18.2% (R 2 = 0.70) and only required one minute of time series data to estimate.One of the limitations of this method is that the model is dependent on large continuous movements.This dependency, in turn, may cause exhaustion among subjects.The second limitation is that they do not evaluate the correctness of activities, which is necessary to determine the quality of movements.Park et al. [109] developed an automatic grading system to measure subtle weaknesses in the upper and lower extremities.They collected 60 instances of kinematic characteristics of motor disorders in stroke patients with the NIHSS score (0 or 1) or Medical Research Council (MRC) grades (7, 8, or 9).This included 15 stroke subjects (10 men and five women).They also used a synthetic minority oversampling technique [110] to complement the imbalance.They achieved 0.912 AUC for NIHSS scores of 83.3% accuracy using an ensemble of SVM kernels and boost algorithms, while they achieved 0.86 AUC for MRC scores of 80% accuracy with SVM.One of the drawbacks of this approach is that it generates synthetic data to handle class imbalance, which is not ideal, as the generated data could have different statistical characteristics.Therefore, the models may not generalize well for out-of-distribution (OOD) samples.
Adans-Dester et al. [111] use wearable sensors (consisting of 6 accelerometers) strapped to the arm, chest, fingers, and wrist to estimate the severity and quality of the impairment.They collected data from 37 subjects (16 stroke, 21 traumatic brain injury) and developed a modified RF model.They estimated FMA scores (severity of impairment) with the coefficient of determination R 2 = 0.86.On the other hand, they estimated FAS (quality of movement) with R 2 = 0.79.The approach assumes that the data precisely segments activities, meaning the models could calculate the scores.However, more studies are needed to assess performance in home and community settings, where tasks may not be clearly understood.

C. MONITORING EXERCISES
Pan et al. [112] developed a home-based self-rehabilitation system to monitor and detect exercises of the shoulder joint using a smartphone and two wireless accelerometer sensors.The plan was tested on 10 healthy subjects (three men and seven women) and 14 stroke subjects (five men and seven women).In the study, the authors included five exercises: ear touch, hand raise, climbing wall, pendulum movement, and assisted active stretching.The SVM classifier classified these exercises with 96% accuracy.Some limitations of this system include: (i) the data was easily segmented into different exercises with the use of the Android app; (ii) the authors did not specify whether the SVM model was deployed on the smartphone to classify the exercises or the data was collected in a centralized location, and then the analysis was performed offline.
Yurtman and Barshan [113] developed an autonomous system to detect and evaluate physical therapy exercises using wearable sensors.They used five IMUs strapped to the arms and legs of five healthy subjects (three men and two women).They proposed a multi-template, multimatch dynamic time warping algorithm (MTMM-DTW) to detect multiple occurrences of more than one exercise type.The algorithm could see and classify eight types of exercise with 93.46% accuracy.Furthermore, it counted and evaluated the practices as correct or incorrect with 88.65% accuracy.However, this method has some limitations.First, the template of each exercise for each subject must be rerecorded.Second, it has not been used in clinical settings, which limits its applicability in healthcare institutions.Third, sensors must be attached to the exact locations (as in template sequence recordings) to correctly detect the exercises, which can be practically difficult at home and in other community settings.
Zhang et al. [37] developed a system to identify everyday rehabilitation exercise movements during a routine exercise session.The study included 14 stroke patients (10 men and four women), and IMUs were attached to the patient's wrist to collect data while performing six joint UE exercises (Bobath handshake, straight arm palm press, horizontal flexion and extension of the shoulder, reaching the forehead with the elbow, shoulder touching and wrist turn).They compared eight different ML models to classify the exercises, and the fuzzy kernel classifier (FKC) obtained a 0.56% error (with a standard deviation of 0.64).Some limitations include: (i) the system has not been tested in an open free-living environment, and (ii) needs expert fine-tuning of model parameters for new data points.
Yu et al. [114] proposed a wearable sensor network system to monitor and assess upper extremity motor function quantitatively.Compared to traditional approaches, they used compressed sensing technology to reduce data transfer to the computer (using a sparse representation).The study included 230 stroke subjects (13 men and 10 women), and accelerometer sensors were strapped to the forearm and shoulder.The exercises had a Bobath handshake and shoulder touch, and the results indicated that accelerometer signals could be compressed by one-third of the raw signal length.They used the ELM algorithm to detect and compare the exercises, resulting in an accuracy of 89.5% (compared to the accuracy of 92.5% of the raw signal model).One of the main limitations is that the study has not considered additional sensing modalities and how they affect the performance of the compressed sensing model.

D. RECOGNIZING POSTURES
Sazonov et al. [115] proposed a shoe-based wearable sensor device to monitor postures and activities.They observed that heel acceleration and plantar pressure uniquely characterized postures and typical movements.The study included nine subjects (3 men and six women).Each shoe had five FSRs and an accelerometer sensor implanted at the critical contact points.SVM was used to classify six postures (sitting, standing, walking/jogging, ascending stairs, descending stairs, cycling) with an accuracy of 95% in the complete sensor set and more significant than 98% in the optimized sensor set.
Fulk and Sazonov [116] also proposed a similar shoe-based wearable sensor to classify three postures (sitting, standing, and walking) with SVM.They included data from eight stroke subjects (six men and two women).Using SVM, they achieved a classification accuracy of 99% (recall and precision from 0.99 to 1) for individual models and an accuracy of over 76.9% (recall and precision from 0.82 to 0.99) for group models.
In another work, Fulk et al. [117] developed a smart shoe to identify three postures (sitting, standing, and walking) using ANN instead of SVM.The study included 12 stroke subjects (6 men and six women).They achieved an accuracy of more than 97.2% (95-99% recall and 95.4-98.7%precision).However, the three studies were limited to laboratory settings.Another drawback of these studies is that they do not monitor activities in the upper extremities.Furthermore, subjects without ambulatory abilities cannot use these systems.Moreover, they have not measured the power of the sensors to measure the transition from one activity to another.
Cheng et al. [118] used four sEMGs and two accelerometer sensors to identify seven body postures (including 50 categories of dynamic activities).The seven body postures include standing, sitting, squatting, lying on the right, lying on the left, lying face up, and lying face down.The sEMG and accelerometer sensors were strapped to the chest and right thigh.The hidden Markov model (HMM) [119], [120] was used to classify gestures by combining the sEMG and accelerometer data.The average accuracy of posture recognition was 98.3%, where the model sometimes misclassified sitting and squat postures.However, the classification model is complex, requires much time to process data to detect activities, and is unsuitable for practical use on wearable devices.
Xiao and Menon [46] developed a prototype to monitor the FMG signals from the upper extremities using FSR straps attached to the forearms.The study includes only six healthy subjects (six men).They used non-kernel ELM to classify six postures related to drinking tasks by organizing the forearm FMG in real-time, with an accuracy of 92.33% and a standard deviation of 3.19%.However, the study is limited to the discrete classification of six postures of the upper extremities.Moreover, the system cannot automatically identify the signatures of different types of multijoint movements.
Masse et al. [121] developed the characterization of sitto-stand and stand-to-sit (STS) using inertial sensors and barometric pressure (BP) sensors.Each subject was simultaneously paired with an inertial sensor (3D accelerometer and 3D gyroscope) sampling at 200 Hz and a BP sensor sampling at 25 Hz.The study included 12 stroke subjects (seven men and five women), 345 STS were recorded, and subjects were monitored in a semi-structured conditioned protocol.They used Logistic Regression to classify STS with a single device comprising inertial with 75.4% accuracy.In addition, they combined inertial and BP sensors to achieve 90.6% accuracy.Therefore, the classification algorithm is not easily generalizable and depends on the collected dataset.Furthermore, the BP sensor is vulnerable to external perturbations, so validating further in different settings is critical.

E. RECOGNIZING GESTURES
Yang et al. [122] introduced an IoT-enabled stroke rehabilitation system based on a smart wearable armband (SWA), ML algorithms, and a 3D printed dexterous robotic hand to recognize nine gestures (agree, close hand, open hand, pointer thumb and middle finger, thumb and little finger, flex hand, extend hand and relax).The study included three subjects (two men and one woman).The authors compared three ML models to identify the nine gestures: LDA, multi-layer perceptron (MLP), and SVM.The MLP model identified the nine gestures with 96.02% classification accuracy.The system and the approach come with the following limitations: (i) the train and test data came from the same subjects and were not tested on data from unknown subjects; (ii) the dataset itself is limited to three topics and lacks diversity; (iii) the system is limited to only nine gestures; (iv) the accuracy of the system decreases as and when other gestures are introduced; and (v) the position of the armband changes the results, which is not such an attractive option if we were to use such a system in home settings.

F. DEEP LEARNING METHODS
In this subsection, we highlight the deep learning approaches.The (last) four references (on page 21) in Table 2, comprehensively summarize four deep learning approaches.
Panwar et al. [90] proposed a ''Rehab-Net'' framework to classify three upper limb movements from wrist-based accelerometer data.The Rehab-Net uses a CNN-based deep learning model.The framework was tested in two situations: (i) a semi-naturalistic environment (making tea) with four-stroke subjects and (ii) a natural environment (any desired arm movement for 120 min) with ten-stroke subjects.They achieved an accuracy of 97.89% on semi-naturalistic data and 88.87% on naturalistic data.The CNN-based model outperforms LDA, SVM, and k-means.The model has also been optimized (algorithmic level) for real-time hardware implementation.One of the drawbacks of this study is that the deep learning (DL) models were trained specifically for individuals and, therefore, lacked generalizability.
Kaku et al. [123] developed an LSTM-based approach to identify five functional primitives using nine internal measurement units.Data were collected from 48 stroke subjects (22 men and 26 women).The nine IMUs were attached to the cervical vertebra C7, the thoracic vertebra T12, the pelvis, arms, forearms, and hands.They achieved an accuracy of 78%.The study included only subjects with dominant rights.This hand dominance may have a differential influence on the preferential roles of the UEs and, therefore, needs further investigation.Furthermore, classification performance deteriorates significantly for severely impaired patients, lacking generalizability.
Chae et al. [124] developed a home rehabilitation system using a smartwatch and a smartphone.They attached a smartwatch (equipped with an accelerometer and a gyroscope sensor) to a wrist and collected data during exercises.Data were collected from 10 control subjects and 22 subjects with stroke.They used a CNN model and achieved 99.80% accuracy (when both accelerometer and gyroscope were combined) versus 98.13% (only accelerometer data) and 96.07%(gyroscope data).The study selected four exercises to suit chronic stroke patients, particularly those based on bilateral movement therapy.However, these exercises may not fit all stroke subjects.Therefore, we cannot apply the same approach to issues with stroke.
Recently, Nair and Sakthivel [91] proposed a system to identify the completion status of rehabilitation exercises.They used triaxial accelerometers attached to the hands and forearms to collect data.The study included approximately five subjects and wrist, forearm, and shoulder exercises.They used a CNN model to classify 12 activities (six wrists, four forearms, two shoulders) with 98.61% accuracy.The model outperforms the Decision Tree, SVM, Linear discriminant (LD), and Naive Bayes classifiers.One of the limitations of this approach is that it is not automated, i.e., we need to convert time series data into images for feeding to CNN, which may not be a practical approach, as LSTMs are robust models for processing time series data.It will be inconvenient to use such an approach for real-world processing.

V. OPEN RESEARCH CHALLENGES AND FUTURE DIRECTIONS
Wearable sensors and ML will play an important role in stroke rehabilitation.Wearable sensors are the most practical option, from different training interventions to automated objective assessment and creating predictive models using ML. Figure 7 shows the combined view of wearable devices and ML algorithms to improve stroke rehabilitation outcomes.In this section, we discuss potential open research challenges and future research directions to solve problems in stroke rehabilitation.

A. TRAINING INTERVENTIONS
The two most prevalent issues encountered in training interventions relate to the insufficient knowledge of new physiotherapists and the execution of the optimal dose of therapy required for a particular patient.Except for studies that use biofeedback, the mandatory use of wearable sensors during the application of all other interventions has not yet been implemented.Wearable sensors can establish a connection between training and its effects in real-time, which helps improve training and maximize potential results.It enhances patient performance and provides a valuable opportunity for a novice physiotherapist to gain insight into the patterns (such as movement patterns and muscle activity) of patients' motor ability and progression in real-time.Furthermore, by using wearable sensor data from healthy subjects, physiotherapists may better understand the patient's status.Thus, the appropriate therapy and individualized treatment dosage can be implemented in real time.

B. REMOTE MONITORING
Physicians currently recommend that patients perform prescribed exercises with wearable devices when they gain voluntary movement without a physiotherapist, in addition to routine care.Nonetheless, whether the patients effectively complete the tasks as per the instructions remains uncertain.
In [32], the researchers examined data from the unaffected UE.However, assessing the health of the affected body parts, specifically the upper or lower extremities, is advisable.This is critical for making improvements.Furthermore, the optimal amount of self-directed exercise and research related to selecting more efficient activities for individual patients to help regain pre-stroke movement ability remain to be explored.
Although wearable devices hold promise for remotely capturing movement data, it is essential to note that movement data alone is not always sufficient to reveal the fundamental physiological conditions of patients.It is essential to include sensors that record physiological signals such as nerve conduction velocity, EMG, ECG, photoplethysmogram (PPG), and others to provide additional information about muscle and nerve activity.Furthermore, incorporating data on heart rate, respiration rate, body temperature, blood pressure information, and movement data may prove advantageous.The correlation, however, has not yet been established.
Another crucial aspect of remote monitoring relates to the ability of ML models to adapt to individual requirements.The current systems posit that offline machine-learning models can be utilized directly.Nonetheless, this is not the case, as the ML models have only been exposed to labeled data and cannot adapt to novel, unobserved data.This necessitates researchers to consider adaptable ML models and novel approaches to cater to personalized requirements.

C. CONTINUOUS ASSESSMENT
Visual assessment or solely utilizing clinical scales can result in subjective errors, leading to inappropriate physical therapy application.Nonetheless, wearable sensor-based continuous monitoring of subjects during clinical evaluation, regular hospital routines, and conventional physiotherapy can be utilized to compile a log of activities.This approach will assist in comparing the evaluated scores recorded by the physiotherapist to the objectively recorded scores by the wearable devices and inferred by the ML models.
Nonetheless, it is imperative to select the appropriate assessment scale.Assessment scales are subject to limitations, including but not limited to the ceiling effect, making it impossible to detect improvements in motor activities precisely.[107].The use of wearable sensors that apply patient-specific movement information (such as angle of movement, time duration to complete a task, and total amount of activity) for evaluation could help remove this error.Furthermore, the non-linear analysis of wearable sensor data, such as multiscale entropy [126] or the maximum Lyapunov index [127], [128], can address this issue.

D. ASSESSMENT SCORES COMPARED TO TRADITIONAL SCALES
The accuracy, reliability, and clinical validation of clinicalequivalent assessment (CEA) scores (i.e., derived from wearable sensor data) compared to traditional scales are essential in determining their practical utility and limitations.Studies using CEA scores to match traditional scales are accurate, valid, and practical.For instance, the CEA scores that are produced to correspond to scales such as NIHSS [18], FAS [16], and FMA [107], [108] have demonstrated promising results.
The evaluation of the clinical validation of these scores necessitates evaluating their testing and verification in diverse stroke populations and diverse settings.The evidence backing these assessment scores needs to be critically assessed by researchers and healthcare professionals, considering their practical utility, drawbacks, and the individual requirements of stroke victims.

E. SELECTION AND PLACEMENT OF SENSORS
The placement of sensors on the body is essential for obtaining the desired signal correctly.As each sensor plays a crucial role in getting precise movement patterns, artifacts, or noise, incorrect placements or displacements of sensors can result in significant issues.Although doctors can prescribe several training interventions and exercises, it is always possible to misplace body sensors.We require methods to monitor correct placements and observe sensor placements automatically.ML algorithms are needed to detect misplacement, and they should also identify the artifacts caused by incorrect positions and compensate for incorrect information.To our knowledge, no such ML model or methodology is utilized in rehabilitation.
Due to wearability issues and the sensors' inefficiency, we cannot capture the patterns of all the activities captured by the sensors.Therefore, further studies are required to determine the optimal location of the wearable sensors.

F. PATIENT COMFORT, PSYCHOLOGICAL IMPACTS, AND ACCEPTANCE OF TECHNOLOGIES
The practical implementation and acceptance of these technologies hinges on crucial factors such as patient comfort, adherence to wearable device usage, and the psychological consequences of continuous monitoring.For the practical application and acceptance of wearable devices and ML models in clinical settings, patient-centered considerations are essential.
It involves addressing issues of comfort, adherence, and psychological impact.Wearable devices should be designed to be ergonomic, lightweight, and non-restrictive.Additionally, the technology designs should also use hypoallergenic materials for individuals with skin sensitivities.Long-term, continuous monitoring can cause psychological impacts such as anxiety and stress.
To mitigate these risks, patients should receive education about the purpose and benefits of monitoring and be informed about privacy and security measures.It is important to provide regular communication and a means for expressing concerns.It is important to incorporate patient feedback to improve device usability and the patient experience.

G. SIGNAL QUALITY AND NOISE
The signal quality depends significantly on the sensitivity of the particular sensors and the environment.Sensors that capture external body movements exhibit greater sensitivity and are more vulnerable to noise.A minor error during data collection from healthy individuals can result in noisy data and a noisy model of healthy individuals.Therefore, a stroke patient can be attributed to being healthy, or vice versa; that is, an inaccurate movement from a healthy subject can be considered the movement of a stroke patient.Moreover, when the sensor is positioned on a body part with a significant degree of freedom, the signal may exhibit increased noise if the number and placement of sensors are not precisely calibrated.
Sometimes, simple movement sensors (for example, accelerometers) cannot distinguish one movement from another due to a limited pattern dissimilarity.Therefore, combining more than one sensor (multi-sensor fusion) and sensors that capture the body's internal activity during a task is necessary to obtain more accurate information [129], [130].
Monitoring tasks would be more accurate if data were recorded for extended periods.As in the conventional approach, regular monitoring poses a challenge; however, long-term monitoring by wearable devices offers a solution.During specific exercise practice, patients are required to wear the device, and information is gained on the improvement or deterioration of motor function by tracking changes in patterns from time to time.A single exercise is generally of short duration, and efficient features can be extracted from these temporary signals.

H. TIME-SERIES DATA MODELING AND ANALYSIS
The analysis of sensor data signals is carried out primarily based on the characteristics of the time domain.Timefrequency analysis may prove advantageous due to the non-stationary nature of sensor data.Furthermore, the feature extraction process can be further enhanced by incorporating characteristics of the frequency domain, the distribution of sensor data, and the fractal nature of the sensor data.Recent advances in deep learning, such as LSTM models, have successfully extracted time-series features from wearable sensors [123], [131].However, LSTM models are more complex and require much more data to learn than traditional models.Hence, implementing LSTM in stroke motor recovery necessitates further investigation, as data accessibility is frequently constrained.
Most studies use and analyze individual databases, making comparing different studies difficult.Furthermore, the limited collection of a large amount of data to determine the degree of variability between subjects from stroke patients is difficult, making the analysis process inefficient.Therefore, future studies should include large cohorts and, where possible, make these databases available to researchers outside organizations with due ethics considerations.

I. ACCELERATING ML WORKLOADS
Using IoT and deep learning to process wearable sensor data collected from stroke patients in the cloud requires significant power consumption.Graphics Processing Units (GPUs) are commonly utilized as the standard equipment for training deep learning networks, including but not limited to CNN, RNN, LSTM, and other large models.Images, speech, and sensor data can be introduced in milliseconds.However, the power consumption for training a deep learning model is not less than 100 W when using a high-performance GPU system [132].According to [133], the power budget for devices such as smartphones, tablets, or watches is approximately 1.5 W, significantly lower than the power requirements for the cloud-based deep learning model.Therefore, it is recommended to use cloud-based services to train deep learning models and perform inference using wearable sensors.
Google has addressed this issue via the Tensor Processing Unit (TPU) [134].TPUs were explicitly designed for deep learning networks, and they require a smaller number of resources and are much faster than Google introduced an Edge TPU into custom system-on-chip (SoC) named Google Tensor, and it was released in 2021 with the Pixel 6 line of smartphones.Its advantage is that it can achieve more excellent performance with lower power consumption.Suppose TPU is integrated into stroke watches or wearable sensors; in that case, the desired scores of stroke patients' can be obtained immediately, potentially opening up a new avenue for the future of monitoring stroke rehabilitation systems.

J. DATA AND ML MODELS
Most current rehabilitation methodologies are limited to controlled environments (laboratory settings).Furthermore, the rehabilitation tasks and exercises are predetermined and must be adhered to by the subjects.These restrictions limit user mobility and hinder the free movement of individuals.Therefore, newer approaches are needed, considering uncontrolled environments (laboratory and home settings) and protocols that are not scripted.Using such techniques to identify, classify, and categorize tasks would benefit stroke survivors.
Current approaches develop ML/DL models, considering limited data (refer to Table 2).Increasing the number of samples and diversifying the dataset with varying tasks and scenarios is necessary to make the models robust to different scenarios.Researchers need to share datasets and models to collaborate and develop strong models.
Although ML models are trained using diverse variables, they often cannot be transferred from one environment to another.For example, we cannot guarantee that the performance of models developed specifically for laboratory settings will be the same when used on data collected at home.Models created in laboratories are typically not tailored to the needs of individuals at home (local or personalized models).We currently do not possess models that can be tailored to individual requirements.
The literature indicates that most researchers use supervised or unsupervised learning to address their issues.Semisupervised learning is also a promising direction to explore when there are many unlabeled samples.Furthermore, the TABLE 3. A high-level summary of the features used in stroke rehabilitation using wearable sensor devices is provided.The first column of the table categorizes the literature into five categories: (1) time domain, (2) frequency domain, (3) statistical, (4) morphological, and (5) data specific.The second column identifies the types of features.The last two columns provide an overview of the advantages and disadvantages of five categories of feature groups.
study of sequential decision-making has not been conducted in stroke rehabilitation research.To this end, integrating deep learning techniques with reinforcement learning may provide promising solutions.

K. NEUROMORPHIC EDGE COMPUTING
Wearable computing units frequently necessitate additional energy for novel applications that rely on numerous sensors and possess high learning capacities.However, with current portable battery technology, the power supply to wearable devices is insufficient to provide long-term monitoring capabilities.This limitation must be addressed to improve the longevity of wearable devices for continuous monitoring in stroke rehabilitation [135].
The intricacy of utilizing a wireless module, its substantial power consumption, considerable data volume, spatial limitations arising from wireless transmission range, and privacy concerns arising from signal broadcasting.Non-negligible time latency from communication channels renders this solution less than optimal [136].These limitations in technology severely limit the applicability of wearable sensors.
In contrast to traditional methods that rely on a binary digital system, brain-inspired neuromorphic hardware is a promising solution.Still, it needs to be improved regarding data removal, storage, and transmission across various units [136].From this angle, the front-end processor may be a neuromorphic semiconductor with an intelligent algorithm integrated right next to the sensor.
Compared to traditional methods that rely on a binary digital system, brain-inspired neuromorphic hardware is a promising solution.However, it still needs to be improved regarding data removal, storage, and transmission across various units [136].From this perspective, the front-end processor may be a neuromorphic semiconductor with an intelligent algorithm integrated right next to the sensor.
Spike-Timing Dependent Plasticity (STDP), a biologically inspired learning rule, has been the focus of many neuromorphic realizations of on-chip learning.Since synaptic weight changes only occur if presynaptic spikes reach the synapse, this model is highly suitable for event-based algorithms [137], [138].These developments are encouraging researchers to adapt neuromorphic chips for stroke rehabilitation.

L. GENERATIVE MODELING
The current wave of AI is driven by success in data availability, generative modeling, and process-intensive computing infrastructure.Recent developments have propelled several natural language applications to enable human-like conversations, such as OpenAI's ChatGPT. 2 These models are built using large Generative Pretrained Transformer (GPT) models in combination with supervised and reinforcement learning.
Large language models (LLMs) have recently been used in medicine [139].LLMs use transformer architecture internally to build robust models.They can process human language (i.e., our written or verbal communication) to drive AI algorithms to develop specific models and augment medical competencies in patient care.ChatGPT is an excellent example of an LLM.Since ChatGPT is a chatbot, it can process user queries and return results in the desired style, format, and language.ChatGPT can also write code, process images, and perform machine-learning tasks.It is an emerging area of research with great potential that needs further exploration.
Furthermore, generative modeling provides many options for generating new data from existing data distributions.Health professionals and researchers often encounter imbalanced datasets and limited samples.Generative modeling can supply new data samples to address imbalanced datasets and increase the sample numbers.Nonetheless, it is imperative to conduct additional research to guarantee that the newly generated data samples adhere to medical conditions and do not infringe upon patient privacy while generating new samples.

M. DATA SECURITY AND PRIVACY
Health data is highly susceptible and necessitates the utmost protection against security threats.Lately, the number of data breaches has increased significantly.Data from wearable sensors must be exchanged with third-party applications, including smartphone apps and web servers, to provide information to patients, caregivers, and health professionals.
Many deep learning algorithms run on the cloud, and some lightweight algorithms run on wearable devices.These signal processing, ML, deep learning algorithms, and visualization tools require data transfer from wearable devices (edge devices) to a centralized cloud server.This presents a significant risk of leakage of sensitive health data and jeopardizes data privacy.For innovative healthcare, it is possible to achieve secure communication using wearable devices by employing identity-based systems and biometrics that authenticate user identity and blockchain-based immutable data security [140], [141].
The differential privacy technique [142] involves the transfer of patterns in wearable device data to cloud services.A global algorithm updates the appropriate ML model without utilizing individuals' raw data.Nevertheless, numerous financial, societal, and technological obstacles remain when assessing the proper degree of privacy for wearable devices, notably in medical settings.

VI. KEY FINDINGS
Figure 8 summarizes the objectives, sensors, types of feature extraction, classification models, and various performance metrics used in the literature.The main findings of this review are: • Even though there have been several training interventions for stroke rehabilitation, patients still need to practice helping them get back to their pre-stroke mobility.In addition to regular care at home or in a rehabilitation center, patients must perform prescribed exercises and other possible daily activities without a physiotherapist.Remote monitoring of stroke patients' movements is necessary to ensure patients perform the recommended tasks correctly.• The extent of remote assessment utilizing wearable sensors is contingent upon the intervention provided.Specific training interventions, such as functional electrical stimulation (FES), are not feasible for patients to execute at home or in rehabilitation centers without the aid of professionals.Therefore, recovery from these interventions may not be remotely monitored.However, virtual reality (VR) can be employed in home-based monitoring systems without manual assistance.
• The use of motor imagery (MI) is suitable for those who lack residual movement in their affected limbs.Still, it is incompatible with general motion-based wearable sensors, as no measurable movement occurs.The brain-computer interface (BCI), where electrodes are attached to the scalp, is not user-friendly, limiting its usability for patients alone, as they require additional assistance.Moreover, it is imperative to ascertain whether the brain rhythms observed during the MI process are attributed to MI.
• Accelerometers were the most frequently used sensors for evaluating body movement compared to other sensors.Nonetheless, combining data from diverse sensors enhances the monitoring quality capabilities and the prediction outcomes.Gyroscopes are better at measuring the quality of movement than accelerometers are.Additional sensors are often combined with accelerometers to obtain activity information from different modalities, enhancing the detection outcomes.
• Collecting and curating data from subjects and annotating data appropriately for use with ML models is a massive task.It is imperative to obtain training and establish guidelines to gather and manage personal health data while recognizing the privacy concerns of patients and the advancements in technology.
• Most methods generate window-based time-domain features and a few employ feature selection techniques.Nonetheless, the acquired knowledge derived from incorporating data from diverse domains will likely enhance ML models' learning and prediction outcomes.Therefore, it is essential to consider different aspects of the problem, context, and understanding before selecting valuable features.Identifying the optimal features and selecting the most suitable ones is imperative, mainly when dealing with hand-crafted features.
• Deep learning methods are becoming increasingly popular, reducing the need for expertise in hand-made feature extraction.However, understanding the intricacies of deep learning models and the availability of computational resources to build new models require human resources, time, and financial support.It also requires qualified people with theoretical and empirical domain knowledge.

VII. CONCLUSION
Recovery from stroke is a complex process that probably occurs through spontaneous and learning-dependent processes [143].The stroke recovery process involves a series of distinct stages, and the duration of recovery and degree of distress varies according to the individual.Although we have traditional systems and methods for capturing data at high resolutions, they don't permit continuous examination, as they are heavy and unportable.
Wearable devices with embedded sensors are highly appealing as they can capture high-resolution patient data while being portable, mobile, and wireless.Furthermore, wearable devices are practical for remote monitoring of stroke patients' daily activities and exercise schedules without a physician.Moreover, the ability of wearable devices to continuously collect patient data reduces missed opportunities for diagnosis and treatment.
By utilizing the predictive capability of ML models constructed using high-resolution wearable sensor data, we can develop and build systems for the automatic, remote, and precise monitoring of rehabilitation schedules, which was previously unfeasible.Specifically, combining high-resolution data from wearable devices and well-trained ML models enhances the monitoring quality in training intervention, assessment, and remote supervision.
This review has summarized the relevant research on stroke rehabilitation from 2009 to 2023 and cataloged the literature into six categories.These include (i) identifying ADL, (ii) estimating clinical scores, (iii) monitoring exercises, (iv) recognizing postures, (v) recognizing gestures, and (vi) deep learning methods.Furthermore, the review has highlighted the advantages and limitations of existing techniques and provided future research direction for adopting wearable devices in stroke rehabilitation, especially for remote monitoring.
We expect this review to benefit interested researchers in discovering the critical challenges of using wearable sensors and machine-learning techniques for post-stroke movement analysis.This review did not dig deep into individual features found in the due to the vast nature of the topic.We expect a future survey by the research community to complement this article and give readers additional insights.Lastly, we expect this review to provide directions and encourage the research community to explore new avenues.

FIGURE 1 .
FIGURE 1. Depicts a visionary diagram of wearable sensor-based remote monitoring of stroke patients' activity during rehabilitation.Whenever a person suffers a stroke, they are rushed to the hospital, and a computed tomography (CT) scan is performed to determine the next course of action, including the type of stroke.The rehabilitation period commences 24 hours after the onset of stroke.Wearable sensor-based remote monitoring detects motor recovery when patients engage in exercises in a rehabilitation facility or at home without the assistance of a medical professional.Patients wear a wearable device (wrist-based), and sensors onboard the device record their activities.The data is then transferred to a phone or tablet via Bluetooth.The app will send data to a cloud server using Wi-Fi or internet-enabled devices.The cloud server allows clinicians to retrieve and analyze data to track patients remotely.An alert process can be incorporated if the clinician needs to provide some feedback to patients.The primary advantage of wearable devices for remote monitoring is that they can capture finer time resolutions for near-real-time monitoring and data visualization.Integrating this functionality, specifically fine-time resolution data, with ML enables clinicians to access real-time predictive analytics.

FIGURE 2 .
FIGURE 2. Shows the overview of different methods used in stroke rehabilitation for motor recovery.Two main components of stroke rehabilitation for motor recovery are training and motor assessment.In training, six common rehabilitation training schemes are shown in addition to conventional physical therapy.Manual or remote assessments can be performed to assess motor recovery.A physical therapist performs a manual evaluation, and remote evaluation can be performed using a low-cost wearable sensor.This article will focus on the use of wearable devices for stroke rehabilitation.

FIGURE 3 .
FIGURE 3. The figure shows this article's different sections and outlines.

FIGURE 5 .
FIGURE 5.The figure illustrates how data collected in the medical domain can be classified into four types: discrete, continuous, nominal, and ordinal.The numerical (quantitative) data is categorized into discrete (such as the number of hospital visits) and continuous (such as the weight of patients).On the other hand, categorical (qualitative) data can be partitioned into nominal (unordered data, such as male or female) and ordinal (ordered data, such as the cancer stages [I-IV]).

FIGURE 7 .
FIGURE 7. The future of wearable devices in stroke rehabilitation.The figure depicts the vision for wearable devices and ML algorithms for rehabilitation after a stroke.ML algorithms can be applied to wearable devices to make predictions (local model) and have a central database (global model) to predict outcomes and advise people, doctors, and health workers.The figure also shows privacy-enabled communication between data collection devices and third-party servers.It also highlights neuromorphic edge computing capabilities for providing efficient (near real-time) ML inferences on portable devices such as a smartwatch.

FIGURE 8 .
FIGURE 8.A summary of the articles presented in Table 2 including the research objectives, sensors, types of feature extraction, ML models (mostly classification models), and performance metrics used in the literature.

TABLE 2 .
A comprehensive review of existing stroke rehabilitation approaches using wearable devices.

TABLE 2 .
(Continued.)A comprehensive review of existing stroke rehabilitation approaches using wearable devices.

TABLE 2 .
(Continued.)A comprehensive review of existing stroke rehabilitation approaches using wearable devices.

TABLE 2 .
(Continued.)A comprehensive review of existing stroke rehabilitation approaches using wearable devices.

TABLE 2 .
(Continued.)A comprehensive review of existing stroke rehabilitation approaches using wearable devices.