Modeling, Control, and Clinical Validation of an Upper-Limb Medical Education Task Trainer for Elbow Spasticity and Rigidity Assessment

The goal of this study was to validate a series elastic actuator (SEA)-based robotic arm that can mimic three abnormal muscle behaviors, namely lead-pipe rigidity, cogwheel rigidity, and spasticity for medical education training purposes. Key characteristics of each muscle behavior were first modeled mathematically based on clinically-observed data across severity levels. A controller that incorporated feedback, feedforward, and disturbance observer schemes was implemented to deliver haptic target muscle resistive torques to the trainee during passive stretch assessments of the robotic arm. A series of benchtop tests across all behaviors and severity levels were conducted to validate the torque estimation accuracy of the custom SEA (RMSE: ~ 0.16 Nm) and the torque tracking performance of the controller (torque error percentage: < 2.8 %). A clinical validation study was performed with seven experienced clinicians to collect feedback on the task trainer’s simulation realism via a Classification Test and a Disclosed Test. In the Classification Test, subjects were able to classify different muscle behaviors with a mean accuracy > 87 % and could further distinguish severity level within each behavior satisfactorily. In the Disclosed Test, subjects generally agreed with the simulation realism and provided suggestions on haptic behaviors for future iterations. Overall, subjects scored 4.9 out of 5 for the potential usefulness of this device as a medical education tool for students to learn spasticity and rigidity assessment.


I. INTRODUCTION
A. Overview of Rigidity and Spasticity S PASTICITY and rigidity are common abnormal muscle behaviors and are characterized by distinct resistive muscle tone characteristics when the the affected muscles are passively stretched ( Fig. 1) [1]. Rigidity is observed in patients with Parkinson's disease, manifested as an increased muscle tone which is independent of the stretch speed [2]. There are two types of rigidity: a) lead-pipe rigidity (LR) which exhibits a uniformly elevated muscle resistance across the full range of motion and b) cogwheel rigidity (CR) which has an intermittent pattern of resistance with a frequency of 6-9 Hz [3]. Unlike rigidity, spasticity (SP) manifests as an increased muscle tone but with stretch speed dependency and is commonly observed in patients with neurologic conditions affecting upper motor neurons (e.g., stroke, cerebral palsy, spinal cord injury). A typical spasticity resistance response is marked by an abrupt increase in the resistance called "catch" at a relatively consistent angle within the range of motion (ROM) and followed by a quick drop of resistance called "release" [4].

B. Current Medical Teaching Methods and Challenges
Clinical assessment for these behaviors is done by passively moving the joint at various speeds to stretch the affected muscle. Based on the resistance felt, the clinician will diagnose the type and severity level of the behavior. To classify different severity levels of spasticity or rigidity, the examiner relies on qualitative assessment tools such as clinical scales, e.g., Modified Ashworth Scale (MAS) or motor portion of the Unified Parkinson's Disease Rating Scale (UPDRS) respectively. A score of 0 indicates the absence of spasticity/rigidity (i.e., healthy), whereas higher scores indicate increasing severity of the spasticity/rigidity condition. Due to the qualitative and ambiguous nature of using these scales, it is frequently reported that clinical diagnosis often leads to subjective interpretations and introduces interrater reliability issues across different clinicians,such as in [5], [6], and [7].
Given the subtlety and variation observed within and between different abnormal muscle behaviors, accurate  diagnosis is built upon a good understanding of these behaviors and repetitive hands-on practice. However, for current clinical/medical learners, the training opportunity and consistency is often limited by the availability and small number of practice patients [8]. One promising approach to address these training challenges is to deploy robotic task trainers to provide realistic and easily accessible practice opportunities for medical trainees [7], [9], [10].
To render the desired haptic feeling to the user, previous designs have adopted various actuation and controls strategies. Common actuation choices were direct drive [17], quasi-static drive [11], [12], magnetorheological fluid brake/clutch [22], and gearmotor [13], [18]. Direct and quasi-direct drives suffer from their high operation current and heat dissipation, thus compromising user safety in human-robot interaction, and the low gear ratio resulting in bulky and nonergonomic joint designs. An MRF brake/clutch was a promising option to generate fast and smooth haptic feeling, but it had to be used in parallel with active actuators to mimic active symptoms (e.g., clonus, tremor) [8]. In addition, off-the-shelf MRF products are not easily available, and their sizes are often too bulky for medical applications. Given these limitations about the form factor, complexity, maintenance, or cost, to the best of the authors' knowledge, none of the previous research prototype trainers were adopted by medical training institutions beyond the authors' home institutions.
In terms of controls strategies, some studies have simply relied on the motor driver to perform open-loop torque control, which would often be vulnerable to unmodeled dynamics in the drivetrain [15], [18]. On the other hand, to achieve better torque tracking, others used closed-loop torque control via the torque feedback from a six-axis force/torque sensor at the end effector [11], [17]. However, the downside was the high cost and mechanical frailty of the sensor. Similar to other robots with physical interaction with human/environment (such as prosthetics [23] and exoskeletons [24]), the accurate delivery of interaction torque under disturbance (e.g., user's motion, environment contact) is a core control challenge for robotic trainers. However, surprisingly there is a lack of discussion and reporting of the control scheme as well as the torque tracking performance in previous task trainers studies.

D. Study Overview
To overcome the design and control limitations of previous task trainers, our research group previously developed a robotic arm task trainer based on a series elastic actuator (SEA) [21] (Fig. 2). The SEA introduces an elastic element such as a mechanical spring in series with a high-impedance actuator, which is used to estimate the interaction torque from the trainee based on Hooke's law for sensing and closed-loop torque control. The use of SEA eliminates the need for an expensive force/torque sensor, and lowers the hardware cost, improves the impact tolerance, and enables high-fidelity control performance [25]. The goal of this study was to validate this task trainer in terms of mimicking three target abnormal muscle behaviors (Fig. 1). In this work, we proposed the mathematical modeling of each muscle behavior and a control scheme that utilized cascaded feedback, model-based feedforward, and disturbance observer, to enable tracking of complex muscle torque profiles under strong motion disturbance as the trainee performs stretch tests of the robotic arm. A series of benchtop and clinical validation tests were conducted to verify the trainer's control performance as well as clinical realism.

A. Arm Trainer Mechatronic Design Overview
The robotic arm trainer used in this study is a 1-DOF kinesthetic haptic torque display device that resembles a human arm (Fig. 2). The dimensions of the limb were matched with the anthropometric data of a 50th percentile European American male [26]. The mass and inertia of the forearm and hand (the moving segments) are lower than those of a biological counterpart (Table I). The ROM was from 45 • to Fig. 3. Muscle kinematics and tone (torque) profiles for healthy, lead-pipe rigidity, cogwheel rigidity, and spasticity across severity levels given the mathematical model and control parameters. These profiles were generated when a passive stretch test was performed on the task trainer arm. Only biceps spasticity results are shown. Triceps spasticity has similar profiles and are omitted for clarity. . A drivetrain consisting of a 2:1 bevel gear set (A1M3MYZ2030A, SDP-SI, USA) and a 2.5:1 timing belt drive (MR5, Misumi, Japan) was used to transmit the motor torque to the elbow joint. A crank slider mechanism at the elbow converted the motor rotation into a linear motion to compress the die springs (9588K32, McMaster, USA) inside the spring cage at the forearm. The deflection of springs (stiffness of 114.9 N/mm) created the actuation torque around the elbow joint to move the forearm and provide resistive torque. More design details about this SEA mechanism can be found in [21].

B. Mathematical Modeling of Lead-Pipe Rigidity, Cogwheel Rigidity, and Spasticity
In this study, three target behaviors for the arm task trainer were considered: lead-pipe rigidity, cogwheel rigidity, and spasticity. The modeling of LR was inherited from [21], and the modeling of CR and SP are proposed in this work. In general, LR and CR are relatively simple to model, but SP is more complex to model. This section describes how these behaviors were mathematically modeled at different severity levels and how their resulting resistive muscle tones (τ muscle ) were calculated based on the user input kinematics (Fig. 3). For the healthy condition, τ muscle was set to zero. The highest possible scores on the clinical scales (MAS 4 and UPDRS 4) were not simulated because the patient's joint in these conditions are immovable and thus easy severity levels to distinguish and not needed in the trainer.
1) Lead-Pipe Rigidity (LR): In lead-pipe rigidity, once the clinician starts to move the patient's arm, a uniformly elevated muscle resistance will appear throughout the ROM and the resistance level tends to increase with the UPDRS score. To command this step response-like constant resistance, a smooth transition of the muscle tone from zero torque to an elevated torque level (at the UPDRS score being simulated) was implemented using a hyperbolic tangent function whereθ E is the elbow angular velocity, τ avg is the clinicallyderived average muscle tone, and ω thr esh is a threshold velocity constant. ω thr esh determines the velocity at which τ muscle will approach to the desired value of τ avg [21]. To extract τ avg for each UPDRS score, we initially referred to the clinical data from [27] in the design phase and the magnitudes of τ avg were further iterated during a clinical validation study with a group of 11 experienced clinicians [28]. Since gravity assists the stretch motion in extension, but resists it in flexion, the values were adjusted to be higher for extension to partially offset the effect of gravity. τ avg in both flexion (F) and extension (E) values were reported in Table II.
2) Cogwheel Rigidity (CR): To model cogwheel rigidity, the simulated muscle tone generated by the proposed LR model was turned on and off intermittently by a rectified sinusoidal function with a tremor frequency of ω,  [27], [29]. Q, θ ROM , AND k post WERE TUNED BY CLINICIANS. D WAS ADOPTED FROM [11]. FOR ALL MAS SCORES: where t is time. The tremor frequency ω for cogwheel rigidity has been reported to vary between 6-9 Hz in the literature [3], and we used ω = 6 Hz to model this behavior in our arm trainer. For practical implementation, an exponential moving average filter was used to smooth the commanded signal.

3) Spasticity (SP):
We started with the spasticity model proposed in Park et al.'s work [11] as the baseline model since it is one of the few published works that modeled SP mathematically. Park et al.'s piecewise model divided the SP resistance response into three phases: a) pre-catch, b) catch, and c) post-catch, where a separate governing equation was used to model each phase (Fig. 4). The control parameters were re-tuned based on our clinical data (Table III). For controlling the trainer, τ muscle was set to the following torque terms based on the phase. a) Pre-catch phase: The pre-catch muscle tone (τ pr e ) was modeled as a mildly damped feeling added to the arm dynamics, where b pr e is the pre-catch damping coefficient (0.1Nm/ • /s) andθ E is the elbow joint angular velocity. This model implies during pre-catch phase, minimal abnormal muscle tone appears.
The pre-catch phase will transition to the catch phase when the arm reaches a certain joint angle called the catch angle (θ catch ). θ catch is calculated in real time based on the average joint angular velocity during the pre-catch phase. Note that SP is a stretch-velocity dependent behavior, so if the arm is moved very slowly (less than a certain threshold speed, v L ), θ catch will be set as an unreachable angle and no catch will occur. Therefore, θ catch was expressed as In this study, a constant catch angle for each MAS level was assumed for simplicity.θ pr e_avg is the average arm stretch speed during the pre-catch phase.
b) Catch phase: The torque during the catch phase was expressed as where τ pr e_end is the torque at the end of the pre-catch phase,θ catch_init is the elbow stretch speed at the beginning of the catch phase, and H and Q are parameters that vary across different MAS levels and determine the catch amplitude (Hθ catch_init ) and release amplitude (Hθ catch_init Q), respectively ( Fig. 4). t catch_init is the time when the catch phase initiates, and T catch represents the catch duration and is given by , where D is a heuristically determined constant that specifies the catch duration. c) Post-catch phase: The post-catch torque was modeled as an impedance with a virtual spring and a damper, where k post is the post-catch stiffness, b post is the post-catch damping coefficient, θ E, post_init is the elbow angle at the beginning of the post-catch phase, τ catch−end is the torque at the end of the catch phase, and θ R O M is the elbow angle at the end of ROM for each MAS level. τ catch−end was included as a torque continuity term between catch and post-catch phases. Furthermore, if the user moved the elbow such that the angle exceeded the prescribed ROM, a software bumper was implemented in the controller as a very stiff impedance control to limit the ROM for each MAS score. Among the three abnormal muscle behaviors, SP had the most complex model and the greatest number of control parameters. For each MAS level, Park et al. identified the spasticity parameters based on clinical data collected from four child spasticity subjects. However, due to the relatively small sample size and their age, we estimated spasticity parameter from two clinical studies datasets of adults collected by our research group [27], [29]. In parallel, two experienced clinicians (both with 20+ years of experience) were invited and asked to perform extension trials at their preferred speed to provide expert tuning on SP parameters for each MAS score. They evaluated the simulated SP behaviors and adjusted the parameters on the fly. We started all test sessions using the same values from [11] as the baseline, and iteratively and incrementally tweaked the values in the Simulink interface (MATLAB 2022a) until the clinician felt that the adjusted values delivered the right haptic feeling based on their prior clinical experience. Eventually, all three sources of information were considered to finalize the control parameters for spasticity simulation (TABLE III).

C. System Dynamic Modeling and Control Design
The proposed control system consisted of high-level and low-level control schemes (Fig. 5). To replicate the target abnormal muscle behavior and patient's arm dynamics, the high-level controller calculated the desired reference interaction torque felt by the user (τ d user ) τ d user = τ muscle + τ dyn (7) where τ muscle is the simulated muscle tone for a selected behavior, and τ dyn is the simulated torque due to the patient's arm dynamics calculated based on the 50 th percentile human forearm inertia and gravity when driven by the user's input motion. This desired torque command was then input to the low-level control, which consisted of three controllers: a cascaded PI feedback controller (C F B ), a model-based feedforward controller (C F F ), and a disturbance observer (C D O B ). This low-level control was motivated by the needs of compensating for the mass and inertia mismatch between robotic and real patient's forearms, as well as rejecting the external disturbance from user interaction. The actual interaction torque between the arm trainer and the user (τ user ) was estimated based on the torque measured by the series springs in the SEA with corrections using the knowledge of the robotic forearm mass and inertia properties (τ user ). Throughout this section, superscript drefers to desired/reference signal.

1) Feedback Controller:
The initial control design started with a cascaded PI feedback controller (innermost to outermost loop: current, velocity, and torque control) (Fig. 5, C F B ), inherited from our previously developed ankle-foot robotic task trainer [20].
Given the cascaded control scheme, the gain parameters were tuned sequentially from inner to outer loop. The choice of gain parameters followed the guidance proposed in [30], e.g., for each loop, k i were chosen to be less 0.5k p , to ensure stability, but also achieving the desired response. For the velocity controller, the gains were set to k p = 0.25 and k i = 0.1 to achieve a desired closed-loop bandwidth of ∼40 Hz. Next, the outer torque controller was tuned with the forearm grounded. Initially, k p = 70 was selected to have a closed-loop bandwidth of ∼4 Hz and later increased to 120 (closed-loop bandwidth of ∼7.5 Hz) with the addition of DOB controller.
There were two control design considerations. First, tight velocity feedback around the motor and gearbox helps eliminate most of the friction, damping, and backlash because it is much easier to control position/velocity than force/torque through the drivetrain [25]. Essentially this middle velocity loop takes care of these parasitic effects at the motor and gearbox and masks the plant as a quite ideal velocity source to the outer torque loop, which greatly simplifies the torque loop design (for example no explicit friction compensation needed). This implementation was similar to [30] and [31], which is often referred to as velocity-sourced SEA control. Second, the reference motor velocity (θ d m ) was calculated by summing the measured motor velocity in the previous time step (θ m_ pr ev ) with a desired change of velocity ( θ d m ) obtained in the outer torque control loop based on the interaction torque error (τ e ). Essentially, the torque control loop only specified the change of motor velocity on top of the current velocity, rather than commanding a completely new velocity setpoint. This technique helped smooth out the reference velocity trajectory and also effectively reduce the effect of external motion disturbance from the user, in the same spirit of "load motion compensation" suggested in [32]. However, given the nonnegligible mass and inertia difference between the robotic forearm and the human forearm, this controller alone was less effective compared to its original implementation on the anklefoot trainer [20], so other controllers were introduced.
2) Feedforward Controller: To account for the mass and inertia discrepancy between the real patient's arm and arm trainer, a 2-DOF dynamic model of the arm trainer was established to guide the design of the components in a feedforward control effort (Fig. 5, C F F and Fig. 6). For clarity and without loss of generality, the drivetrain gear ratio was ignored in the model (but was implemented in the actual controller). Friction and damping at the motor gearbox and at the elbow joint were not modeled and assumed to be mostly removed by the feedback control. The equations of motion for the motor output torque (τ m ) and the user's applied torque, i.e., also the user felt torque (τ user ) were derived as: where θ F andθ F are the forearm segment angle and acceleration, θ m andθ m are the motor shaft angle and acceleration, k s is the series spring stiffness, I m is the reflected motor rotor inertia, m T and I T are the mass and moment of inertia (around the elbow) of the task trainer's forearm, and l is the distance between the elbow and forearm center of mass (Table I). By combining (8) and (9), the user felt torque was obtained as The torque due to simulated patient's arm dynamics was defined as which consists of the torque due to the 50 th percentile human forearm inertia (I HθF ) and gravity (m H glcosθ F ) ( Table I).
Considering that the mass and inertia of the task trainer's forearm were less than that of an actual human forearm, two positive constant terms were defined as I = I H − I T and m = m H − m T . Therefore, since the goal was to minimize the error between the user felt torque (τ user ) and the desired torque (τ d user ) to achieve a good torque tracking performance, by setting τ user = τ d user , the feedforward torque command to the motor was strategically chosen as Equation (12) motivated the structure of the feedforward controller. The following signals were fed forward: a) torques used to render the gravity (−m glcosθ F ) and inertia (−I θ F ) difference between the task trainer and the human forearms, b) motor inertia compensation (I mθm ), and c) reference muscle tone profile (τ muscle ). To implement this feedforward law practically, motor reference acceleration (θ d m ) was used (instead of the actual motor accelerationθ m ) for motor inertia compensation. In addition, θ F was approximated by the absolute encoder reading on the actuator side andθ F was obtained via double differentiation of θ F . In addition to these feedforward terms in (12), the spring torque in the previous time step (τ s_ pr ev ) was also fed forward to maintain the current interaction torque, similar to the use ofθ m_ pr ev described above. This approach also minimized the feedback control effort to compress the spring [31]. Given the inevitable unmodeled dynamics and model mismatch, the residual torque error between τ user and τ user _d would always exist and were dealt with by the feedback control. τ m_ f f was then converted to feedforward current command (i f f ) through the motor torque constant (K t ) (Fig. 5, C F F ).
3) Disturbance Observer: A disturbance observer (DOB) is a simple and effective robust control scheme that has been widely used in industrial motion control [33]. Since the SEA converts the force control problem into a position control problem by using the motor torque to modulate the spring deflection, DOB has become a popular technique for SEA control especially for the need to reject internal and external disturbances [24], [34], [35], [36].
The implementation of a DOB involved specifying a nominal plant (P n ) and a low-pass filter (Q L P ) (Fig. 5, C D O B ). Intuitively, a DOB compares the reference motor torque and the estimated motor torque, calculated using the series spring torque and the inverse nominal plant, and then compensates the difference due to various sources of disturbance. The low-pass filter determines up to which frequency the disturbance would be rejected and also makes P −1 n Q L P realizable [33]. To obtain the nominal plant transfer function, a system identification process was conducted. The forearm was fixed in a 90 • joint angle configuration and the motor was operated in a current-control mode given a chirp current signal with an amplitude of 1 A and a frequency changing from 0.1 -10 Hz. The torque estimated by the series spring was also recorded. The open-loop plant (P ol ) was fitted (System Identification Toolbox v9.13, MATLAB 2022a) with a transfer function from the geared motor torque (N τ m , where N = 5) to spring torque (τ s ) with mechanical efficiency (η < 100 %).  The low-pass filter was designed to be a second-order Butterworth filter with a cut-off frequency of 15 Hz, which was the highest cut-off frequency allowed by the trainer hardware.
Q L P (s) = 8883 s 2 + 133.3s + 8883 (15) 4) Interaction Torque Estimation: One advantage of the SEA is the ability to use series spring deflection to estimate the interaction force or torque, instead of using an expensive external force/torque sensor. If rearranging (9), note that the torque estimated by the series spring (τ s ) did not directly measure the torque felt by the user due to the gravitational and inertial torques of the trainer's forearm (16).
As a result, instead of directly feeding the series spring torque back as the measured interaction torque, the estimated forearm's gravitational and inertial torques were first compensated based on the series spring torque to calculate the estimated user felt torque (τ user ) as in (17) and then the error betweenτ user and τ d user was input to the feedback controller.
D. Evaluation Protocol 1) Benchtop Evaluations: The proposed control system was tested to evaluate its performance on delivering the desired interaction torque to the user (τ d user ). To understand the effectiveness of each controller, an ablation study was conducted to examine the tracking performance of four controller settings: a) feedback control only (C F B ), b) both feedback and feedforward control (C F B + C F F ), c) feedback control, feedforward control, and disturbance observer (C F B + C F F + C D O B ), and d) same as case c but with a higher feedback gain (C F B +C F F +C D O B (H G)) ( Table VI). In case d, note that with the addition of the C F F andC D O B , the torque loop P gain in C F B could be further increased. For each setting, the investigator performed the passive stretch test by mimicking the standard clinical technique (moving the arm through the ROM within 1 s) on the arm trainer to assess the simulated behavior for three trials. This procedure was repeated for each behavior across severities (3 UDPRS scores for LR, 3 UPDRS scores for CR, and 4 MAS scores for biceps and triceps SP, shown in Fig. 2). For each trial, to verify the torque tracking accuracy, the RMSE between τ d user and τ user was calculated throughout the ROM (extension only for SP; both extension and flexion for LR and CR) and then averaged across three trials with standard error (SE) reported. Additionally, the percentage errors were calculated as averaged R M S E |maximumtorque| × 100%, where the maximum torque was extracted from the red curve in Fig. 3.

2) Clinical Evaluations:
a) Test protocol: To validate the realism of the task trainer in mimicking the three behaviors, a validation evaluation was conducted to get feedback from clinical experts in spasticity and rigidity assessment. The study was approved by the Institutional Review Board at University of Illinois at Urbana-Champaign (#21703, approved on 4/19/2021) and informed consent was obtained from all subjects. The study was conducted in the Jump Simulation and Education Center in Peoria, IL with a total of seven subjects (Table IV). To be eligible for this study, the participant needed to: 1) hold a medical, nursing, physical/occupational therapy degree, 2) have at least two years of experience with performing passive stretch tests on patients with increased muscle tone, and 3) be able to interpret the patient's muscle tone based on UPDRS and/or MAS.
The clinical evaluation consisted of a Classification Test and a Disclosed Test. Before starting the study, written descriptions of UPDRS and MAS scores were provided to the subject. During the Classification Test, the trainer was configured to replicate all 15 different conditions (i.e., healthy, LR UPDRS 1-3, CR UPDRS 1-3, biceps and triceps SP MAS 1, 1+, 2, 3), one trial per condition and in total 15 trials in a randomized sequence for each subject. Without knowing the condition being simulated, the subject was asked to assess each trial to classify the behavior and evaluate its severity based on their prior clinical experience. They were instructed to always start the passive stretch test from the fully flexed joint position and to check both biceps and triceps conditions. During the Disclosed Test, the investigator guided the subject through all 15 simulated conditions (disclosed to the subject). The subject provided qualitative feedback on simulation realism of each replicated behavior, and the potential of this device as a medical education task trainer by answering multiple five-point Likert questions (Table V).
b) Data analysis: For the Classification Test, the judgements from subjects were plotted against the trainer's setup using a confusion matrix to determine if subjects could distinguish the behaviors (LR, CR, and SP). Thus, entries that fall closer to the diagonal of the matrix indicate how well the clinician's judgement matched the trainer's setup. The classification accuracy percentage was calculated as #o f classi f iedtrials #o f totaltrials × 100% for each behavior. For the Disclosed Test, the mean and standard error were calculated for each simulation aspect.  V  DISCLOSED TEST FEEDBACK QUESTIONS. THE FIVE-POINT LIKERT  SCALE FOR 1) SIMULATION REALISM: 1-TOO LITTLE, 3-ABOUT RIGHT,  5-TOO MUCH; 2) GENERAL USEFULNESS: 1-STRONGLY DISAGREE,  3-NEUTRAL, 5-STRONGLY AGREE)

A. Benchtop Validation
Four different controller settings, i.e., C F B only, C F B +C F F , C F B + C F F + C D O B , and C F B + C F F + C D O B (HG), were tested. As more controller blocks were involved, the tracking performance was significantly improved for all behaviors (TABLE VI and Fig. 7). The incorporation of feedforward control significantly improved the tracking performance by 17-49% from the baseline feedback controller (i.e.,C F B only). Furthermore, the addition of DOB further reduced the tracking error for SP and CR trials by 23-43% compared to C F B +C F F . Eventually, with C F B + C F F + C D O B (HG), the tracking performance was again improved by 16-36% compared to C F B + C F F + C D O B . These results led us to use the C F B + C F F + C D O B (HG) controller.

B. Clinical Expert Validation
During the Classification Test, it was noticed that Subjects 4 and 7 had different assessment patterns compared to others. Specifically, Subject 4 tended to make very quick assessments (i.e., often only performed a single passive stretch per trial and then made a judgment), while other subjects usually performed the stretch multiple times and took time to consider the simulated muscle tone behavior and severity. Subject 7 used a nonstandard technique, i.e., using one hand to casually move the trainer's arm without stabilizing the elbow/upper arm with another hand. Based on these observations, these  two subjects were marked as potential outliers in the data analysis, and we calculated the classification percentages in the Classification Test with and without Subjects 4 and 7. On the other hand, during the Disclosed Test when these two subjects carefully assessed the trainer using the same technique as other subjects, their feedback were consistent with the rest of the group, so their feedback were included in Disclosed Test data analysis.
Based on Classification Test results, on average, subjects were able to distinguish different behaviors with a mean accuracy of 87% (or 92% if excluding Subjects 4 and 7) (Fig. 8). These results suggest that the simulated SP, LR, and CR behaviors were distinctive. Note that 10 out of 16 misclassified trials (i.e., the pink off-diagonal entries) were found due to Subject 4 (6 trials) and Subject 7 (4 trials), and the remaining six were scattered across the other five subjects. For rigidity, severity agreement was in general satisfactory. For spasticity, mild (MAS 1 and 1+) and severe (MAS 2 and 3) SP trials were mostly separated, and some trials were mixed within the severity group.
Disclosed Test results suggested that in general subjects agreed with the trainer's simulation (i.e., most aspects scored close to 3) (Fig. 9). Additionally, the final Disclosed Test question found that all subjects strongly agreed that the device was useful as an educational tool for healthcare trainees to learn spasticity and rigidity (average 4.9 ± 0.1). A few responses that scored away from 3 were summarized and later used to explain misclassifications in the Classification Test in Section IV-B. For LR, subjects reported that the resistance magnitude should be lower for UPDRS 1 and higher for UPDRS 3. For CR, subjects indicated that the cogwheel frequency was "about right" across levels, whereas the cogwheel magnitude was higher than expected for UPDRS 1 and 2. For SP, subjects reported that the catch should occur earlier in the ROM for MAS 1 and 3. They also indicated that the catch tone should be lower for MAS 2. Furthermore, there should be more release for MAS 1 and MAS 2 and the post-catch tone should be lower for MAS 1, 2, and 3 ( Fig. 9).

A. Modeling, Control, and Benchtop Validation
In summary, three abnormal muscle tone behaviors at different severity levels as well as the healthy behavior (for a total of 15 conditions) were mathematically modeled, and their corresponding torque profiles were tracked by a proposed control system involving feedback, feedforward, and disturbance observer control on the SEA-based task trainer.
We started with the feedback controller inherited from [20], but since the mass and inertia of the trainer's forearm were lower than an actual human forearm, we incorporated a modelbased feedforward controller to compensate for the mismatch. In the context of our SEA, the force sensor is the series spring located between the elbow and forearm, so the mass and inertia of the forearm mechanism and protective shrouds were considered as post-sensor mass and inertia. It is known that it is difficult to modulate the apparent post-sensor mass and inertia with feedback force control algorithms and such systems usually need to rely on feedforward control [37]. Other than rendering the trainer's forearm with a higher mass and inertia, the feedforward control also took over several tasks from the feedback control (such as compressing the spring and accelerating the motor), leaving the feedback control to only address the remaining torque error due to unmodeled dynamics.
This robotic task trainer is a typical application with force control accuracy requirement under external disturbances.
The user's input motion (i.e., moving the forearm to assess the muscle tone) represents the motion disturbance that constantly perturbs the end of the series spring connected to the forearm and this motion disturbance should be rejected in the perspective of the force control. In addition, as the user holds onto the trainer's forearm, the mass and inertia of the user's own arm is coupled with the robotic trainer's dynamics, causing model variation that degrades the performance of model-based control schemes. Therefore, DOB control was introduced in our control system, which is a simple and effective robust control scheme widely used in the industrial motion control [33]. We implemented the DOB with a nominal plant only considering the robotic trainer model (ignoring user's interaction dynamics) to reduce the effect of internal and external disturbance from user's interaction in the innermost current loop [35], and to facilitate the design of cascaded feedback control (i.e., enable higher gains in the outer loop controllers).
Overall, the proposed control system (C F B + C F F + C D O B (H G)) was found effective in tracking all three behaviors (Table VI and Fig. 7). Among the three behaviors, as expected, the LR profile was the simplest to track, resulting in the lowest RMSE (< 0.16 Nm even for just C F B only). On the other hand, SP and CR profiles were more complex and challenging due to an abruptly changing piecewise torque trajectory and high frequency oscillations, respectively, so their tracking errors were higher. For all behaviors, although the amplitude of RMSE increased with MAS and UPDRS scores due to higher torque command amplitudes associated with higher severities, the error percentage remained about the same across severities (small error percentages, i.e., < 2.5 % for LR, < 2.8 % for CR, and < 2.3 % for SP).

B. Clinical Expert Validation
Classification Test results suggested that subjects were able to distinguish the different behaviors with a good accuracy of 87% (92% excluding Subjects 4 and 7) (Fig. 8). Occasionally, subjects identified some SP trials as LR or CR. For MAS 1 (both biceps and triceps), the catch occurred later than expected in the ROM (Fig. 9)(scores of 3.7 and 3.3). Since the release behavior was barely felt as it was too late (scores of 2.5 and 2.7), subjects might have considered LR occurring at the end of the ROM, thus rating the trial as mild LR (UPDRS 1 or 2). Similarly, for severe SP trials (i.e., MAS 2 and 3), the catch and post-catch resistance magnitudes were quite higher than expected (scores > 3). It was difficult to push the arm through the entire ROM and subjects might have felt a constant high resistance for a large portion of the ROM, therefore judging the trial as severe LR case (i.e., UPDRS 3). Additionally, Subjects 3 and 4 confused two of the SP trials with CR, where Subject 4 commented that the vibrations coming from the drivetrain could be the confusing factor (i.e., it felt similar to the tremor of CR), which might explain this misclassification.
Some discrepancies in the Classification Test were observed between the judgement and the actual simulated severity and the Disclosed Test results might provide some explanations (Fig. 9). For LR, the resistance magnitude was scored 3.5 for UPDRS 1 and 2.3 for UPDRS 3, which could explain why subjects misinterpreted UPDRS 1 as higher levels, or UPDRS 3 as lower levels. For CR, subjects mentioned that the cogwheel magnitude should be lower for UPDRS 1 (a score of 3.7), which suggests why this severity was confused with UPDRS 2. For SP, subjects suggested more release and lower post-catch tone amplitude should be implemented for MAS 1, which may explain why they misidentified MAS 1 as MAS 1+ in some trials. They also indicated that the catch and postcatch tone amplitudes were higher than their expectations for MAS 2, which may explain why MAS 2 was confused with MAS 3.
Overall, subjects strongly agreed that the device could be a useful medical education training tool for healthcare learners to practice both rigidity and spasticity assessment techniques (score of 4.9). Three subjects specifically commented that, in clinical setting, the ability to classify symptoms and distinguish the general severity groups such as mild (MAS 1 and 1+) or severe (MAS 2 and 3) to determine the treatment plan is more useful than the exact identification of a severity level. In this sense, our task trainer was quite effective based on the classification results. Furthermore, during a normal assessment, clinical signs from other parts of the body (e.g., posture, hand positioning) usually also provide insights regarding the patient's neurologic conditions, not solely the muscle tone. Therefore, the Classification Test in this study was more difficult than during the clinician's regular practice, in the sense that it required rating the exact severity level solely based on muscle tone information. Even in this strict and challenging assessment scenario, it is quite promising to see that subjects were generally able to distinguish across behaviors and identify the severity group for each behavior. This observation suggests that the simulation provided by our task trainer captured the key characteristics of each behavior and the design of each severity level mostly aligned with the subjects' previous experience.

C. Limitations and Future Work
In retrospect, some control complexity could have been avoided if the control requirements were comprehensively accounted in the mechanical design. The task trainer was originally designed to only mimic LR with a slow-varying torque profile, and the system natural frequency (determined by spring stiffness, gear ratio, etc.) was ∼ 3 Hz, which posed difficulties to track fast-varying profiles of CR and SP. This limitation motivated us to also use DOB control with a faster nominal plant to suppress the open-loop system resonance.
In this human-robot interaction scenario, the trainee's dynamics (or more generally, environment dynamics) are coupled to the task trainer's dynamics. It is known that force control performance varies with different environment dynamics [38]. Therefore, in this work, the environment dynamics were considered as a source of disturbance, and we attempted to reject it by a fixed-gain feedback controller and a DOB scheme. In the context of medical training, trainees with different body sizes (i.e., load mass), joint stiffness (i.e., load impedance) and techniques (i.e., load motion disturbance) represented different possible environment dynamics to interact with the task trainer. Therefore, a variable-gain controller might be more suitable for this application and potential controllers such as adaptive control, gain-scheduling control, or optimization-based control could be explored. Furthermore, the DOB controller (i.e., Q L P cut-off frequency and nominal plant) was designed empirically in this work. The controller stability criterion for the choice of Q L P involved estimating the multiplicative model uncertainty across the operating frequency range [35], [36]. In this context, the magnitude of model uncertainty depends mainly on the trainee interaction and the coupled dynamic characteristics as mentioned before; therefore, future detailed analysis is needed to ensure continued controller stability and performance.
Finally, valuable feedback was received from the clinical validation study regarding the simulation fine-tuning. To further enhance the fidelity, more simulation features could be added in the future, such as nonlinear elbow joint stiffness. As next steps, the device should be experimentally deployed and incorporated into the curriculum for healthcare students to further collect user feedback.

V. CONCLUSION
This study presented the modeling, control, and clinical validation pipeline of a robotic arm task trainer to mimic three abnormal muscle tone behaviors (lead-pipe rigidity, cogwheel rigidity, and spasticity) at varied severities. The SEA-based trainer together with the presented control system was validated to be able to deliver accurate torque control during user interaction. Based on the clinical results, this task trainer can be a clinically useful and cost-effective medical education tool to provide realistic and consistent practice opportunities for clinical learners to become proficient with rigidity and spasticity assessment techniques awhile reducing the need for practice human patients.