AI-Driven Stroke Rehabilitation Systems and Assessment: A Systematic Review

Post-stroke therapy restores lost skills. Traditionally, patients are supported by skilled therapists who monitor their progress and evaluate the program’s effectiveness. Due to a shortage of qualified therapists, rehabilitation facilities are both expensive and inadequate. Furthermore, evaluations may be subjective and prone to errors. These limitations motivate the researchers to devise automated systems with minimal human intervention, therapist-like assessment, and broader outreach. This article reviews seminal works from 2013 onwards, qualitatively and quantitatively adapting the PRISMA approach to examine the potential of robot-assisted, virtual reality-based rehabilitation and automated assessments through data-driven learning. Extensive experimentation on KIMORE and UI-PRMD datasets reveal high agreement between automated methods and therapists. Our investigation shows that deep learning with spatio-temporal skeleton data and dynamic attention outperforms others, with an RMSE as low as 0.55. Fully automated rehabilitation is still in development, but, being an active research topic, it could hasten objective assessment and improve outreach.


I. INTRODUCTION
S TROKE is a common medical condition in which blood supply to the brain is cut off, resulting in cell death. It is a leading cause of disability worldwide [1]. A stroke may have short or long-term consequences, as around 35% of patients have deterioration in their cognitive and physical abilities [2]. However, many stroke patients can repair and re-learn motor functions during the therapeutic time, according to study [3], demonstrating the efficacy of rehabilitation therapy. Conventionally, stroke rehabilitation begins when a physician determines the severity of the stroke. This rehabilitation process can be of two types: inpatient rehabilitation, which takes place in a hospital, and outpatient rehabilitation, which takes place in the patient's home. Inpatient rehabilitation treatments often provide superior outcomes [4] as a consequence of the therapists' continual supervision. On the contrary, outpatients seldom have therapists there to help them through the recovery process. Thus, the patients doing the exercises are not monitored or assessed except during the follow up assessments. This leads to a possibility that the patients are not doing the exercises properly, resulting in subpar outcomes. Despite the benefits of inpatient rehabilitation, treating every patient in the hospital is not always practical. One of the primary reasons for this is that it needs a significant amount of physicians. There are around 300 trained therapists for every million individuals in affluent nations such as the United States or Australia [5]. This results in around 65% of persons in the United States obtaining rehabilitation treatments [4]. This is exacerbated in developing nations, where about 70% of all stroke cases occur [6]. Given the high rehabilitation demands in these nations, there are less than ten qualified therapists per million persons [7]. In the United States, the average lifetime cost of inpatient rehabilitation and follow-up care is around 140,048 dollars per person [8]. Another issue that emerges in both inpatient and outpatient settings is that exercise evaluations, when they occur, are conducted by humans. This might result in subjective and biased judgments, and ultimately hinder the progress. While the expense of rehabilitation can not be eliminated instantly, the paucity of qualified therapists, the subjective nature of evaluation, and even the shortcomings of at-home therapy may all be addressed by automating the stroke recovery process.
Automated stroke therapy systems can be classified into two categories: robot-assisted and virtual reality-based. In a robot-assisted system, the patient is aided in doing the exercises by a therapy robot or an exoskeleton. On the contrary, in VR-assisted systems, patients are immersed in a virtual environment, often a game, that assists them in doing the exercises. In all cases, the overall process can be broken down into three smaller tasks: assisting with exercises and subsequent monitoring to gather data, evaluating the data, and producing correct conclusions about the exercise's quality. Each of these three jobs is automatable. However, our review of the literature reveals that automated control of the peripheral devices used in stroke rehabilitation systems is almost never used. This is possibly due to the hazards associated with such devices such as device malfunctions, and abrupt behaviours in the presence of the vulnerable patients. However, data collecting and analysis are automated in many works, provided that the peripherals are correctly installed. Additionally, therapists still have a part to play in automated systems regarding initial diagnosis, stroke severity analysis and planning the rehabilitation program. A comparative overview between traditional vs automated post-stroke rehabilitation systems is drawn in Fig. 1. The diagram above depicts the traditional process, which need expert assistance at each stage. The image below depicts the automated procedure, which still involves therapists in the rehabilitation planning. However, the next steps only require minimum human intervention. The second step (data collection) entails gathering information from a variety of sensors (Kinect v2 [9], Gyroscope [10], IR cameras [11], Vicon [12]).
Step three involves feeding this input data into an assessment model, which then gives performance evaluation and recommendations in step four. This is then forwarded to both the doctor and the patient as feedback so that the rehabilitation plan, as illustrated in step five, may be revisited as well as enables patients to comprehend their performance and make necessary adjustments.

A. Related Surveys
This section reviews surveys on automated rehabilitation systems, broadly classifying them into two subcategories.
1) Robot-Assisted Stroke Rehabilitation Systems: The most common application of robotics in automated rehabilitation systems is actively assisting the patient using an exoskeleton or a robotic manipulator. Frolov et al. [13] investigated the neurophysiological aspects of such devices in rehabilitation, whilst Jarasse et al. [14] reviewed their potential for functional recovery. Additionally, the study in [14] sought to identify flaws in the mechanical designs and different control algorithms for such systems. The authors in [15] explored the role of therapeutic robots such as ARMin [16], the HapticWalker [17], and others in motor function rehabilitation, rehabilitation gyms, and robot-assisted telerehabilitation. However, these studies focused only on robot-assisted systems, leaving the automated evaluation of exercises untouched.
2) VR-Assisted Stroke Rehabilitation Systems: Virtual reality-enabled systems emulate real-world experiences in a virtual environment, often via the medium of a game. The patient gets engaged in exercise-like movements while playing games, and their performance is assessed either by the game score or an automated assessment performed by machine learning models. Tamayo-Serrano et al. [18] identified 20 distinct features of VR-based systems' quality, including cost, difficulty, and rehabilitation types. However, they did not incorporate any quantitative analysis. Authors in [19] discussed the technical design aspects of VR-based neuromuscular rehabilitation systems and provided a comparative perspective. Webster and Celik [20] conducted a comprehensive assessment of the literature on the use of the Kinect in elderly care and exercise monitoring (e.g., fall detection and risk reduction), as well as exercise games. However, the drawbacks of the surveys regarding robot-assisted systems are persistent in the above surveys as well. Furthermore, none of the aforementioned polls include a quantitative analysis of available solutions. Apart from the works mentioned above, Langan et al. [21] reviewed stroke rehabilitation technologies via questionnaires, revealing that traditional methods continue to outperform more recent stroke rehabilitation systems that incorporate games, virtual reality, and so on. Besides, authors in [22] covered the wearable devices used to gather patient exercise data, and according to their analysis, wearable gloves give the most accurate measurements of all the devices. It is quite apparent that the surveys of stroke rehabilitation systems are restricted in scope, concentrating on either robot-assisted or VR-assisted systems. Moreover, the techniques for exercise assessment remain unexplored. This has brought out another gap in this field: quantitative analysis of automated exercise assessment. Our research attempts to provide a comprehensive picture of rehabilitation systems by examining works on robot-assisted, virtual reality-based rehabilitation and automated assessments through data-driven learning. Table I provides a summary of the existing survey papers discussed above.

B. Contributions
This is the only review article that we are aware of that has a complete and systematic discussion on automated stroke rehabilitation systems. Rather than focusing on a single aspect or branch of rehabilitation systems, we examine them from a broader perspective to see how state-of-the-art technology in automated rehabilitation systems functions. We also shed light on the research challenges and the direction of future research. In addition to the qualitative discussion, we provide a quantitative analysis by evaluating a variety of methodologies using publicly available datasets. In particular, throughout this article, we opt to answer the following research questions:

C. Survey Methodology
The works included were chosen using the Preferred Reporting Items for Systematic Reviews and Meta-Analyses (PRISMA) technique [23]. As shown in Fig. 2, the selection progressed through four phases: identification of relevant papers; further screening of those based on title and abstract; determining their eligibility against a specific set of criteria; and lastly, inclusion of the chosen works. The works were identified first by searching for "automated stroke rehabilitation systems," "robot-assisted stroke rehabilitation systems," "virtual reality-based rehabilitation systems," and "machine learning-based rehabilitation exercise assessment." Hence, 61,900 publications relating to stroke rehabilitation systems and evaluation were found. The following criteria apply to the first screening: the article must have been published in a peer-reviewed journal or conference and be deemed relevant to this review based on the title and abstract. Following an initial screening, the paper count was decreased to 700 papers. The next step was eligibility, which considered the overall quality of the article, including qualitative and/or quantitative analysis. This resulted in the final inclusion of only 48 publications on automated stroke rehabilitation systems and assessment, including 24 articles on robot-assisted or VR-based systems, 7 datasets and 17 articles on machine-learning-based exercise evaluation.
The remainder of the paper is structured as follows. In section II, we review the notable works in automated stroke rehabilitation systems. We discuss the work done in automated exercise assessment methods, including available datasets, in section III. Section IV offers a quantitative comparison of several state-of-the-art assessment methods and a side-by-side comparison of deep learning and feature engineering-based learning approaches. We shed light on some of the existing challenges in the field in section V. Section VI holds a comprehensive discussion of the findings and answers to the research questions presented before. Finally, in section VII, we provide our conclusion and briefly address future prospects.

II. RESEARCH IN AUTOMATED POST STROKE REHABILITATION SYSTEMS
This section reviews notable works in automated stroke rehabilitation, broadly divided into two branches: robotassisted and virtual reality-assisted stroke rehabilitation systems. Since the exercise evaluation techniques are the same for both branches, it is discussed later in Section III.

A. Robot-Assisted Systems
Robot-assisted stroke rehabilitation solutions include exoskeletons and assistive or demonstrative robots. Although similar systems, e.g., the hybrid assistive limb (HAL) [24], have existed for some time, the cost has always been a concern. However, recent advancements have resulted in a cost reduction and performance on par with, if not better than, traditional treatment [25]. Aprile et al. [26], conducted a study in which three robots assisted 51 patients with shoulder, elbow, and finger flexions while a sensor-based device recorded their movements. To evaluate the patients, cognitive tests (e.g., digit span [27], Oxford cognitive test [28]), motor (FMA and Motricity index [29]) and disability scales (modified Barthel index [30]) were used. These measures improved significantly, demonstrating the effectiveness of robot-assisted treatment. Ou et al. [31] developed a wearable exoskeleton to allow more thumb joint mobility of stroke patients. However, no clinical evaluation of the exoskeleton was conducted.
Pilla et al. [32] suggested a randomized trial method based on a robotic exoskeleton, the NEUROExos Elbow Module (NEEM) to determine the effectiveness of a rehabilitation system. The authors of [33] suggested a clinical study using four distinct robotic setups for upper limb rehabilitation of 190 stroke patients. They reported that the mean FMA score increased from the baseline by 8.50 and 8.57 points, respectively, in the robot-assisted and conventional therapy group. This indicates that, robotic rehabilitation barely outperforms conventional therapy, albeit at a much higher cost. This is consistent with the findings of [34].
A number of studies have examined the use of electromyography (EMG) signals to control robotic systems. For instance, Qian et al. [35] developed an EMG-driven neuromuscular electrical stimulation (NMES) robotic arm for upper limb rehabilitation. They examined the device's influence on the FMA score, the Modified Ashworth Scale (MAS) [36], the ARAT score [37], and the Functional Independence Measurement (FIM) on 24 subacute stroke patients. The results indicated remarkable improvements in all measures rehabilitated by the NMES-robot. Similarly, authors of [38] used EEG and EMG signals for brain intent recognition to develop a brain machine interface (BMI) to enhance the functionality of an exoskeleton.
Baca et al. [39] suggested a wearable, modular robotic system, whereas Lu et al. [40] proposed an EMG-driven exoskeleton hand controlled by the patients' intentions. Apart from [39], which did not have any clinical validations to date, all the others indicate significant improvement in all metrics.
Some studies have sought low-cost alternatives to roboticassisted rehabilitation systems. The authors of [41] applied their system in a treatment gym, and found the system to be nearly 1.5 times less expensive than traditional oneto-one therapy while still providing better results. Reduced reliance on pricey hardware would be an alternate strategy to increase cost-effectiveness. Using smaller end-effector robotic devices instead of full-scale robots or exoskeletons can lead to less expensive yet effective rehabilitation solutions. Perfect examples are MIT-Manus [42] and REAplan [43]. These, when combined with appropriate software, can form a reliable system. The initial study of [44] had explored the neuro-rehabilitation capabilities of an MIT-Manus based system, finding no negative consequences. This was further proved in the work of Heins et al. [45], that presented ROBiGAME, playable with a REAplan controller for improving stroke patients' motor and cognitive functionality. A feasibility study conducted with two stroke patients observed that patients loved playing the game. However, the impact on stroke patients' rehabilitation has yet to be determined.

B. Virtual Reality Aided Post Stroke Rehabilitation Systems
Virtual reality (VR) technology has experienced tremendous success in recent years, gaining widespread appeal during the previous decade. Rather than having physical rehabilitation sessions, a simulated rehabilitation program can be held. These novel systems are appreciated by the patients as well [46]. It is worth noting that, a few researches found that there is no statistically significant advantage of VR over traditional therapy in terms of overall performance. Adie et al. [47] examined 240 patients with arm weakness who were randomly assigned to daily exercise with a VR device or with simple arm exercises performed at home. Nevertheless, there were no significant variations in mean ARAT scores and other measures between the two groups. Instead, the VR-based rehabilitation would cost the patients an additional £336. Rosiak et al. [48] found similar results with little difference measured in the Vertigo Symptom Scale questionnaire [49]. Schuster-Amf et al. [50] did not find significant improvements in their chosen parameters (Box and Block Test [51], the Chedoke-McMaster Arm and Hand Activity Inventory [52], Stroke Impact Scale (SIS) [53]) either.
In contrast to these, numerous studies revealed that VR-based rehabilitation systems outperform conventional methods. For example, Ho et al. [54] found that VR-based treatment significantly improved NIHSS and modified Ranking Scale (mRS) scores in an investigation with 100 stroke patients. Similarly, Joo et al. [55] saw substantial improvements in Jebsen Taylor hand function test (JTT) [56] and Michigan Hand Outcome Questionnaire (MHQ) [57] scores.
The majority of the works mentioned above have taken a gamified approach, that is, they have built games that can be played in virtual reality and aid in the rehabilitation process. In fact, this approach is predominant in almost all the state-of-the-art works in VR-based rehabilitation. For instance, Warland et al. [58] presented an upper limb rehabilitation system that included both exercise and an apple catching game. Overall, all evaluated parameters improved, with five subjects reporting minor side effects. Elor et al. [59] also presented an immersive VR game for stroke therapy based on the Constraint-Induced Movement Therapy (CIMT) [60]. According to their findings, players expressed an interest in utilizing their stroke-affected arms to enhance game rewards. Compliance with affected arm usage was as high as 78% for easy and medium difficulties, a substantial increase over the 32% compliance rate in traditional CIMT [61].
In contrast to gamified approaches, in which patient rehabilitation is directly linked to in-game scores, a few studies have used external sensors to monitor and assess exercises completed with the assistance of VR systems. For example, Luca et al. [62] performed neurocognitive rehabilitation of patients with the help of the BTs Nirvana system along with an infrared video camera for monitoring and assessment. The results indicated that patients in the VR group progressed much more than those in the control group in cognitive rehabilitation. A similar observation is made in [63], where VR-based balance training improved patients' foot placement performance.
Additionally, a few studies have included internal EEG and EMG signals to aid and assess rehabilitation progress. Vourvopoulos et al. [64] presented a head-mounted braincomputer interface (BCI) for post-stroke rehabilitation based on the REINVENT system [65]. The authors of [66] compared the performance of EMG and EEG signals used as biofeedback in VR-based rehabilitation and found participants performing much better with EMG than with EEG feedback. A summary of the robot-assisted and VR-based works are presented in Table II and Table III, respectively.

Discussion on RQ1 & RQ2
It is evident by the studies mentioned above that the patients cope well with such systems, with initial feasibility assessments indicating high levels of agreement. Such systems are comparable to traditional therapy and often yield greater results, which answers RQ1. In response to RQ2, Tables II and III demonstrate that the majority of the works performed a clinical study. They conducted a number of clinical trials and validations using a variety of patients and therapists, confirming that automated procedures are consistent with the clinical standard.

III. AUTOMATED EXERCISE ASSESSMENT
Automated stroke rehabilitation systems must incorporate some form of assessment to monitor the system's efficacy and ensure optimum recovery. This section summarizes the  research on automated exercise evaluation that has been conducted. The majority of the works use some sort of supervised learning, and therefore we begin with a discussion on the datasets available for stroke rehabilitation.

A. Existing Datasets
In this section, we discuss some existing, publicly available datasets that are directly applicable for post stroke rehabilitation exercise assessment.
1) Selection of Available Datasets: The PRISMA approach [23] was adopted for selecting datasets. The identification was carried out using the keywords "Stroke rehabilitation exercise dataset" and "Exercise dataset," which resulted in the selection of 12,600 papers. The initial screening took into account the following criteria: dataset availability, suitability for poststroke rehabilitation exercise evaluation, whether the dataset is publicly available or not and whether the datasets are meant for action recognition but can be used in this context. After the initial screening, the number of articles was reduced to 200. Following that, the datasets' eligibility was verified, resulting in a further reduction to 25. The primary criterion was that the dataset be properly annotated and that the data included be of high quality. This reduced the number of datasets presented to only seven.
2) Overview of Selected Datasets: Here, we briefly describe the datasets deemed to be related to the field of automated stroke rehabilitation systems and assessment. Table IV presents a summarized description of the publicly available datasets.
a) IntelliRehabDS (IRDS) dataset [67]: The dataset captured the 3D data of nine gesture movements performed by 29 subjects, with 15 being patients and the rest, healthy. Two separate professionals annotated the type of gesture, position of the subjects, as well as the correctness of the movements with an agreement level of 88%. The patients were given the choice to perform the movement either sitting or standing while the healthy subjects perform the both. Even though the dataset consists of both correct and incorrect movements with appropriate labels, the ratio of them is highly unbalanced. b) Quality of movement assessment for rehabilitation dataset (QMAR) [68]: It is worth mentioning that the dataset is comprised of movement data from 38 healthy participants who were simulated to have Parkinson's disease, a stroke, or a limp. A physiotherapist trained the individuals to do two movements: walking and sitting up and down, and their performance was graded in three different ranges. The dataset contains RGB, depth, and skeleton data pertaining to the individuals' motions, with six views accessible for the RGB data and two views available for the depth and skeleton data.
c) The multi-modal exercise (MEx) dataset [69]: The dataset contains 6262 occurrences of seven exercises carried out by thirty people. 47% of individuals were between the ages of 18 and 24, while the remainder were between the ages of 24 and 54. At the start of each exercise, volunteers were instructed to complete it for a maximum of 60 seconds while  their movements were recorded using two accelerometers, a pressure mat, and a depth camera. d) KIMORE dataset [70]: The subjects were divided into two groups: the control group (CG) and the pain and postural disorders (GPP) group with chronic motor disabilities. The CG group is further divided into two subgroups: expert physiotherapists and non-experts. Clinicians chose the following five exercises: upper limb movement stretching the trunk muscles, trunk movement in each of the three planes, and lower limb movement. Experts evaluated each exercise by watching the videos and responding to a ten-item Likert questionnaire [71]. Three scores were calculated based on the responses: the clinical total score, the clinical primary outcome score, and the clinical control factors score. e) UI-PRMD dataset [76]: The dataset contains joint angle and position measurements from ten healthy people performing ten exercises. A Vicon optical tracker and a Kinect camera were used for capturing the exercise motions from 22 different joints. Ten episodes of the ten exercises were executed suboptimally by the individuals and are included as instances of incorrect movements for the test set. The UI-PRMD dataset only has correct/incorrect (binary) annotation because it was originally designed for cassification tasks. An annotation scheme based on a gaussian mixture model was later proposed by Liao et al. [77] for producing assessment scores. This model determines the deviation from the ideal movement patterns, which are obtained from healthy patients executing the exercises to their fullest potential. A monotonic scoring function is used to further transform this deviation into the 0-1 range. This scoring system may predict movement patterns with a satisfactory level of justification, according to experimental results from [77] and [87], with the reference movement receiving a higher score and the incorrect movement receiving relatively low scoring.
f) AHA-3D [83]: The dataset contains 79 skeleton videos of exercises by 21 subjects. The skeletal data was collected using Kinect v2 3D cameras in conjunction with RGB cameras, and raw data were labeled using a custom-built GUI. The subjects were instructed to perform four exercises: a 30second chair stand to assess lower-body strength; an eight-foot up and go to assess fall risk; a two-minute step test to determine functional fitness, and a unipedal stance to check static balance. g) Toronto rehab stroke pose (TRSP) dataset [84]: This dataset includes motion data from subjects performing stroke rehabilitation exercises. The stroke survivors had experienced a subacute or chronic stroke with disability of their upper limbs. A 2 DOF haptic robot was used to assist in shoulder and elbow movement rehabilitation. In addition to the scripted motions used with stroke patients, healthy participants were instructed to do motions imitating common post-stroke compensatory movements. The motions were classified and labeled by two specialists as following: no compensation, leaning forward, shoulder elevation, and trunk rotation.

B. Works in Automated Exercise Assessment
Automated exercise assessment can be regarded as a classification task, categorizing a movement into correct or incorrect, or a regression task, predicting the score of a movement. We continue our discussion by grouping the works in automated exercise assessment based on the task type and its significance.
1) Exercise Correctness Classification: Exercises can be classified according to the accuracy with which they are performed. According to the literature, feature engineering-based algorithms have been the most frequently used. For instance, in [88], the authors used K nearest neighbor and SVM classifiers to identify compensatory motions in the pressure distribution, achieving F1 scores as high as 0.993. Jung et al. [89] on the other hand used model trees [90] and found modest results with an F-measure of 79.29 percent and a ROC of 0.91.
Lee, in [91], used a hybrid approach that combines a rule-based knowledge model and a predictive model for classifying the quality of motion as 0, 1, or 2. The findings indicated a good agreement level with the therapists' assessments. On this basis, Lee et al. [92] investigated several such hybrid models using a variety of classifiers, including Neural Networks (NNs), SVMs, and others and discovered that NNs produce an effective result. This was further demonstrated in [93], which combined reinforcement learning with a variety of classifiers. In [94], the authors developed an ensemble learning model composed of 18 classifiers, each trained on a random subspace. Using six categories, they found 92% accuracy for Brunnstrom and 82 percent for FMA scoring systems.
Another branch in the literature investigated the potential of deep learning for exercise quality classification. Zhi et al. [85] investigated the classification of compensatory motions in rehabilitation using both SVM and RNN classifiers where RNN did not perform as expected. Kaku et al. [95] also achieved unsatisfactory results with their CNN architecture paired with embedding modules, with an average accuracy of 70%. However, the work of [96] obtained great results, even in semi and uncontrolled environments. Zhu et al. [97] was also proven successful with their suggested multipath CNN, which was composed of a dynamic convolutional network called D-CNN and a state transition probability CNN called S-CNN, claiming a test accuracy level greater than 90%.
2) Exercise Quality Score Prediction: Rather than anticipating discrete class labels as in classification approaches, the work in this section attempts to assign a continuous value as an assessment score compared to that of a professional therapist. Typically, prior research in this field employs a distance function to assess the quality of performed and prescribed exercise [99], [100]. These approaches necessitate multiple pre-processing phases, impeding the system's endto-end processing. In fact, hand-crafted feature based works are still prominent in the literature. The works of [101] and [89] found great results with their selection of features. Lee et al. [102] calculated the range of motion, their smoothness, and the occurrence of correct and erroneous movements. These features also worked well in score prediction, achieving a high level of agreement with the therapists. In another work, Liao et al. [77] presented a Spatio-temporal network capable of evaluating an exercise. They enhanced performance by combining temporal pyramids, multi-branch convolution, and recurrent layers. They used convolutions on tensors of joint data to disrupt the natural graph structure's fine spatial organization within the human body. Their approach yielded an average absolute deviation of only 0.02527.
3) Data Input Length: Input length of the sensor data plays a vital role in assessing the exercises. In many cases, the patient might perform exercises at different speeds and the assessment score should not vary if they are performed correctly. However, most approaches keep the length of the exercise video fixed while training models, causing useful information to be left out. Some works such as [96] and [95] attempt to address this by choosing an input length that provides sufficient to capture movement data of all patients. But if a test sample contains useful information exceeding the predefined length, the model might fail to provide consistent results. One technique for dealing with this is dynamic time warping [104], which can handle temporal sequences of different lengths. This has been a widely used technique of past assessment tasks [100], [103], [105], [106] and even in recent works [107], [108]. A better solution was proposed by Zhu et al. [97] that trained a separate CNN network, D-CNN, to deal with dynamic input lengths. Recently, Deb et al. [87] proposed graph convolution based architecture for assessing the rehabilitation exercise to handle  variable length and preserve local connectivity between each joint.
Table V presents a summarized description of existing literature. One key observation is that only [97] and [87] can deal with varied length inputs. Another issue is that most works adhere to the upper body rehabilitation exercise. It would be better to incorporate lower body rehabilitation exercise data to make the algorithms more robust. Literature suggests that preserving the topological structure of human skeleton has become the trend in action recognition tasks for its great performance [109], [110], [111]. In this line of works, recently Deb et al. introduced graph convolution with a dynamic attention module to assess rehabilitation exercises. Their results significantly outperform the current research, demonstrating the efficacy of structure-aware learning.

Discussion on RQ3 & RQ4
Our analysis here is tailored towards the research questions 3 and 4. In the 3rd research question about therapist involvements, all of the works report the involvement of therapists in one way or another. Thus, these systems do not eliminate the need for therapists, but rather assist them and alleviate some of the dependency on them. The consequent research question is directly addressed by the authors of [91], [92], [102] who directly report high agreement levels among therapist and automated assessment.

IV. QUANTITATIVE ANALYSIS
A qualitative analysis of the existing literature is presented in Section III, where the datasets, number of subjects, and evaluation protocols differ. This non-uniformity prevents us from comparing the results of different methods on the same scale. Therefore, we conduct a quantitative study in which automated assessment methods are evaluated on two datasets: UI-PRMD [76] and KIMORE [70], using following performance metrics: Mean Absolute Deviation (MAD), Mean Absolute Percentage Error (MAPE), and Root Mean Squared Error (RMSE) [77], [87]. It should be noted that the results reported in this section are generated in our own experimental setup rather than being directly adapted from the research we compared.
To draw the comparative analysis, we chose seven deep learning methods, including [87] and [77], which were originally proposed for rehabilitative exercise assessment. Since there is a limited corpus of work on exercise assessment in the extant literature, the remaining five models are drawn from human action recognition. Existing work on action recognition can be broadly classified as: CNN, LSTM, and graph convolution-based approaches. Therefore, we choose [115] as CNN, [114], [117] as LSTM, and [87], [109], [112], [113] as graph-based regressors. For the feature engineeringbased learning, we apply four regressors: K-Nearest Neighbour (KNN), Random Forest (RF), Support Vector Machine (SVM), Neural Network (NN), on features generated by [116] and [102]. The experimental results on the KIMORE dataset are presented in Table VI. We observe that, Deb et al. [87] (avg. MAD 0.576) outperforms other methods in all the studied metrics. This is articulated by the fact that they extended the Spatio-temporal Graph Convolution Network (STGCN) with a dynamic self-attention mechanism to extract discriminative features from the structural information of the human skeleton. Furthermore, because each exercise focuses on the movement of a specific set of body joints, this self-attention mechanism makes it easier for the network to include the role of joints in varied exercises. In the squatting exercise (Ex 5), for example, the ankle, knee, spine, and shoulder movement are critical, whereas in the lifting arms exercise (Ex 1), the elbow, spine, thumb, and wrist joint are more important than the rest of the joints. Furthermore, they employ LSTM instead of global average pooling to capture subtle sequential dependencies residing in consecutive frames and to extract discriminative temporal features from variable-length exercise data. The approach by Liao et al. [77] (avg. MAD 0.960) came in second best that adapted a 2D convolution based method. However, 2D convolution treats the skeleton sequences in a grid like fashion that unable to utilise the subtle information contained in spatial characteristics. Yan et al. [109] (avg. MAD 1.124), used an STGCN that overlooked the sequential aspect of the spatio-temporal features because of successive global average pooling. Similarly, Song et al. [112] (avg. MAD 1.120) proposed an STGCN utilizing multi-stream information with joint attention mechanism. Furthermore, Li et al. [115] (avg. MAD 1.384) proposed a hierarchical method as well and aggregated features from point level to global co-occurrence features. Multiple box filters are used in this hierarchical convolution-based model to extract skeleton features, however the spatial relationship between adjacent joints is missed, making it unable to capture topological information.. Finally, Du et al. [114] (avg. MAD 1.464) employed a hierarchical RNN based approach. However, the RNN based approaches only consider the temporal information but overlooks the spatial information. Among the deep learning models, the lowest agreement level with the therapists' assessments is acquired by the approach in [113] (avg. MAD 1.920) as their model does not capture significant sequential connections between subsequent frames due to many spatial and temporal maxpooling layers.
Capacci et al. [116] and Lee et al. [91] adopted feature engineering-based learning on KIMORE dataset to predict the movement correctness score. However, the use of several preprocessing stages restricts the model's ability to capture meaningful information from the movement data.
We conducted experiments on the UI-PRMD dataset in Table VII. Here we found that graph convolution based approaches, i.e., Deb et al. [87], Song et al. [112], Zhang et al. [113] consistently outperform the rest of the approaches. However, interestingly, Lee et al. [91], and Capecci et al. [116], although being feature engineering-based approaches, perform better than the rest of the deep learning methods. The reason for this is that the UI-PRMD dataset contains a homogeneous distribution of patients (i.e., ten healthy patients performing 10 exercises correctly and incorrectly) with a small number of training samples. As a result, movement patterns became less varied. As a consequence, unlike graph convolution-based models, CNNs are unable to discern subtle differences associated with each movement pattern due to disregarding structural information.

A. Results Using RGB Camera
With the advancement of technology, depth cameras like the Vicon, Kinectv1 [9], and Kinectv2 [9] are now more affordable for use in healthcare facilities. However, rather than solely relying on the advanced RGBD sensors, in this section, we analyze the potential of using RGB cameras, a more economical solution for the assessment task Instead of only relying on the more sophisticated RGBD sensors, in this part we emphasize the more practical approach of using RGB cameras for the assessment task. Since RGB cameras are unable to provide skeleton information, we select two 3D pose estimation algorithms: videopose3D [118] and blazepose [119], to detect joint information automatically from RGB information. Finally, we compare the performance with RGBD sensors, i.e., Kinectv2 [9].
In Table VIII, we analyze the performances of the studied approaches on RGB videos of patients performing exercises as provided by the KIMORE dataset. To extract the 3D poses from videos, we used two pose estimation algorithms: BlazePose [119], (trained on MS Coco [120]) and Video-Pose3D [118] (trained on Human3.6M [121]). We also compare the results against the pose data extracted by Microsoft Kinectv2 [9]. From the results, we observe that the Microsoft Kinectv2 sensor surpasses the other two pose estimation methods based on RGB cameras because it uses RGBD information to detect human poses. In addition, GCN based algorithms [109] perform well on BlazePose [119] because they can exploit spatial information better than CNN based approaches [77], [114], [115]. In the case of Deb et al. [87], we found that BlazePose outperforms VideoPose3D since it models the human skeleton with more joints (38 joints) than the alternative (17 joints). The remainder of the approaches, on the other hand, work rather well on VideoPose3D. BlazePose adds unnecessary joint information that hinders model performance for [77], [109], [114], [115], whereas Deb et al. successfully ignores those via dynamic attention module.

B. Implementation Details
For models originally developed for assessing rehabilitation exercises (i.e. [87] & [77]), we closely followed the proposed architecture as described by the authors in the respective papers. On the other hand, for human activity based models ( [109], [112], [113], [114], [115], and [117]), we replaced the last softmax layer with a fully-connected layer with linear activation in order to adapt them into the regression problem setting. We use the skeleton data as provided by the KIMORE and UI-PRMD to construct the skeleton graph for graph-based approaches ( [87], [109], [112], [113]). Similarly, for CNN-LSTM-based models ( [114], [115], [117]), we also use skeleton data as input and transform these sequences as proposed by the respective authors. Regarding hyper-parameters, the batch size, the learning rate, and the dropout rate are selected from {16, 32, 64}, {2, 3, 4, 5, 6, 7}, and {10 −2 , 10 −3 , 10 −4 } via grid search [122]. All the models are trained with the Adam optimizer for a maximum of 100 epochs. The learning is terminated if the validation loss does not decrease in consecutive 30 epochs. All the models are initialized randomly and the results are reported by averaging over 10 runs.
For feature engineering-based methods, the features are selected based on [116] and [102]. We used Neural Network (NN) with the Adam optimizer for 500 epochs with a learning rate of 0.0001, two hidden layers with 12 and 24 neurons, respectively and Relu activation function. We also used Random Forest (RF) with two maximum depths, K-Nearest Neighbours (KNN) with 5 neighbors, and Support Vector Machine (SVM) with a radial basis function (RBF) kernel. We implement the machine learning algorithms (RF, KNN and SVM) in sklearn. 1 The NN models are implemented using Tensorflow. 2

Discussion on RQ5
This section attempts to provide a quantitative analysis of some of the state-of-the-art works. Our analysis addresses the fifth research question regarding hand-crafted features and deep learning models. The results speak heavily in favor of the deep learning models. In each of the metrics, deep learning models performed considerably better than feature engineering-based methods. This thus proves conclusively that deep learning models should be the primary modality in future research.

V. CHALLENGES IN AUTOMATIC STROKE REHABILITATION
Devising an automatic stroke rehabilitation system suffers from the unavailability of standard datasets, diversity in assessment methods, high implementation cost, and subject's safety concerns which hinder its successful implementation. In this section, we discuss these challenges in detail.

A. Inadequacy of Standard Data
Most stroke rehabilitation systems rely on data collected from local IoT devices such as Kinect V2, BTs Nirvana, and Vicon optical tracker, which are not publicly available. Few datasets provide action recognition data that are considerably different from rehabilitation exercises, particularly in terms of motion profiles. Again, the majority of datasets contain a limited number of samples from stroke patients, while others contain just healthy participants imitating various ailments [76]. Additionally, some datasets, such as UI-PRMD, are not scored by medical specialists.

B. Non-Uniformity In Assessment
The assessment mainly examines the severity of a stroke and the patient's functional ability, which a therapist does in a conventional system. This is highly subjective and frequently varies from therapist to therapist. Even among therapists, there is no universal agreement on using quantitative measures such as the FMA, ARAT, or other custom scoring systems for assessment purposes. The automated system also deals with the same challenges, and therefore a universal method of assessment is a must.

C. Increasing Cost of Rehabilitation
Our discussion, so far, indicates that automated systems are, as of now, much more expensive than conventional systems as the need for a skilled therapist still exists in both systems for purposes such as rehabilitation program design and final assessments. Automated systems provide superior performance and acceptability while reducing reliance on expert therapists, but at a higher expense. This increase in cost can vary depending on the system. For example, the authors in [47] reported that VR-based systems can cost on average, £376 more than conventional therapy. Robotics-based systems can cost significantly more. According to [34], such a system could cost a patient between £666 and £1602 more than usual care, depending on the intensity of the program. Nonetheless, a trade-off exists between the expense of deployment and the efficacy of rehabilitation methods. However, as technology improves and costs decrease, automatic stroke rehabilitation systems may eventually supplant traditional rehabilitation.

D. Safety Concerns
Safety concerns in such automated systems is a big concern since it operates alongside patients in special care. For instance, one critical feature is automatically detecting when the patient or user of the system is experiencing discomfort [123]. Ergonomics of the devices used in the system is another area of concern, with many of the works, such as [45], [46], [59], and [66] reporting on the users' affinity with the system and ease of use. The system's stability is also a significant concern, given that the peripherals involved in the system are in very close proximity with the users. Although this is solely in the case of robotics aided systems, other safety concerns mentioned above apply to immersive VR based systems as well. Maintaining all of these safety features is challenging since they add complexity to the system, making mass deployment of such systems difficult.
Irrespective of that, getting over these hurdles and developing a safe-to-use rehabilitation system is crucial. Especially since these systems promote at-home rehabilitation where the patient is minimally supervised. Traditional inpatient rehabilitation therapy, despite its drawbacks, poses very few risk factors. Since the patients are constantly monitored, even if device malfunctions occur, they are immediately resolved by experts. Any sort of safety concerns can arise only at the time of human negligence and extenuating circumstances. Also, since the patient is observed and guided actively, negligence on the patient's part or misuse of assistive devices is minimal. On the contrary, at-home rehabilitation systems pose significant risks if not deployed correctly. Adding to the numerous safety concerns mentioned above, negligence and device misuse by unsupervised patients can often create an unsafe environment and lead to sub-optimal results.

E. Biological Implications
Although this paper focuses primarily on the technological aspects of these automated systems, their biological implications should also be one of the main concerns. How such systems interact with the physic of a patient must be studied thoroughly before implementing such a system. Exoskeletonbased systems often conduct such studies before designing it [31], [39]. However, VR-based immersive systems rarely mention such a design consideration. This might be because such implications are less pertinent in their case. Nevertheless, studying these implications could provide insights into a more effective system overall.
Research shows that exoskeleton-based systems can reduce muscle volume [124] and muscle force [125]. According to these studies, exoskeletal assistance helps to activate muscles better and achieve more efficient movements. In order to get the optimal results, though, an early and intensive program should be considered [126]. The severity of the stroke affected also plays a part in rehabilitation [127]. One concern is that over reliance on such assistive exoskeletons could create problems with natural activation in the long run. In contrast to robot-assisted systems, VR-based systems affect the users on a psychological level as they engage the users to a great extent. This has great implications as it can support behavioral and habitual change [128], and increase their drive to complete the exercises successfully. All of the works reporting on VR-based systems demonstrate high user engagement and, in many cases, improved performance in daily tasks.

F. Adaptation of Automated Rehabilitation Systems
As reported in [45], [46], [58], and [59], users show a strong affinity for automated systems, despite some concerns about higher costs. Some works report on the needs and perspectives of the therapists involved in the deployment of these systems. For instance, authors in [129] discussed the key aspects for both improving patient movements and their progress assessment. In addition, many studies have acquired favor among physiotherapists as well [59] and [130]. Research is still ongoing, though, to find out which modality (robotics or VR) performs better, or which is more preferred.
Irrespective of that, there are also limitations to these automated rehabilitation systems. One work has reported on the poor generalization of results seen across such systems [131], which might not mirror the system efficacy. Skepticism about the rehabilitation system itself is also reported in the literature [132]. There also remains confusion and a lack of available information on how to effectively use the new technology [133]. Commercialization has also been cited as a hindrance [132], as this can lead to biased results. Finally, as of now, automated rehabilitation systems are quite costly. Recent research, however, predicts that the shift toward automating the rehabilitation system will reduce costs in the future [127], [134], and allow better outreach. Thus, despite these limitations, the world is moving in favor of such systems, with the goal of providing affordable and available rehabilitation to all.

G. Discussion on RQ6
Our final research question inquired about the possibility to overcome the hurdles associated with automated systems. We have already discussed some of the works tailored to addressing these challenges. With constant breakthroughs and efforts in the field, it is safe to predict that these obstacles will be overcome in the near future.

VI. TAKEAWAYS
In this section, we present our findings according to the research questions posed in Section I. RQ1: Our first research question challenged the scope and efficacy of automated rehabilitation systems. The literature review in Section II clearly shows that automated systems often exhibit better results than the conventional. Despite, the scopes of such systems are still in doubt due to the costs involved. Perhaps an automated system shared by a community would be the answer to this problem.
RQ2: Our subsequent research question raised concerns about the clinical validation. The standard approach towards this is to take a population and divide them into two groups, control and experimental. They are given therapy and the results are evaluated in different stages, using different metrics. From Table II and III, it is evident that most of the developed systems have either already completed clinical trials or awaiting one.
RQ3: Regarding the question of human involvement in automated systems, we have noticed that most of the works involve therapists in one way or another. Even though we are discussing automated systems, the exercise plan is always laid out by skilled professionals. Human supervisors are used in many of the works to assist and dictate the exercises, and also annotate datasets. RQ4: As to the answer to the following research question about the efficacy of such assessments, many of the works feature high agreement levels with the therapists' assessments [91], [92], [93], as well as high accuracy [94], [96], [97] and F1 scores [88], [91] in exercise correctness classification. This removes any doubts regarding the efficacy of such systems. RQ5: Our fifth study topic posed an age-old question: is handmade feature-based learning better than deep learning methods? We show conclusive quantitative proof that when compared side by side on two standard datasets such as KIMORE (Table. VI) and UI-PRMD (Table. VII), deep learning algorithms are superior compared to the traditional hand-crafted. Especially in KIMORE, deep models conclusively outperform the hand-crafted methods in every exercise. In UI-PRMD dataset however, feature engineering based learning performs better than all the deep learning methods except for the work of Deb et al. [87]. The reasons for which has been discussed in detail in section IV. RQ6: Our final query was whether we can move past the existing challenges. We have already discussed some of the approaches to solving the associated problems. A collective approach can solve the inadequacy of data, which solely depends on the enthusiasm of the researchers involved. A similar statement could be made for the non-uniformity issue. Even the issue of increasing cost can be dealt with soon enough. However, dealing with the safety concerns will be a daunting task.

VII. CONCLUSION
The scope and promise of automated stroke rehabilitation systems with minimum human intervention are presented in our study. We have discussed cutting-edge methods for poststroke rehabilitation, as well as data-driven exercise assessments. Our review of the literature suggests that automated rehabilitation systems have a lot of promise and can outperform traditional therapy. It has shown high levels of agreement with therapists' ratings in a number of studies. On two standard datasets, KIMORE and UI-PRMD, we offered a quantitative study of several state-of-the-art evaluation methods and a side-by-side comparison of deep learning and feature engineering-based learning approaches. The findings show that the deep models that incorporate spatio-temporal skeletal data and a dynamic attention module is extremely useful in rehabilitation assessment. We believe these automated technologies have the potential to make rehabilitation treatments more widely available. Future studies in this sector should undoubtedly focus on finding a cost-effective solution. Deep learning models should be investigated further, and an integrated strategy to gather real-world exercise data should be actively sought. Few shot [135] or zero shot [136] learning methods could be tried out in the absence of substantial datasets. These developments could eventually lead to a society where everyone has access to efficient rehabilitation services.