The Impact of Hill-Type Actuator Components on the Performance of Reinforcement Learning Controllers to Reverse Upper-Limb Paralysis

Functional electrical stimulation (FES) may allow people who are paralyzed due to spinal cord injuries (SCIs) to regain the ability to move. Deep neural networks (DNNs) trained with reinforcement learning (RL) have been recently explored as a promising methodology to control FES systems to restore upper-limb movements. However, previous studies suggested that large asymmetries in antagonistic upper-limb muscle strengths could impair RL controller performance. In this work, we investigated the underlying causes of asymmetry-associated decreases in controller performance by comparing different Hill-type models of muscle atrophy, and by characterizing RL controller sensitivity to passive mechanical properties of the arm. Simulations indicated that RL controller performance is relatively insensitive to moderate (up to 50%) changes in tendon stiffness and in flexor muscle stiffness. However, the viable workspace for RL control was substantially affected by flexor muscle weakness and by extensor muscle stiffness. Furthermore, we uncovered that RL controller performance issues previously attributed to asymmetrical antagonistic muscle strength resulted from flexor muscle active forces that were insufficient to counteract extensor muscle passive resistance. The simulations supported the adoption of rehabilitation protocols for reaching tasks that prioritize decreasing muscle passive resistance, and counteracting passive resistance with increased antagonistic muscle strength.

due to SCIs [1], and approximately half of these individuals have tetraplegia [2]. While there are several alternatives to restore some degree of motor function to these individuals, people with tetraplegia indicate that they would prioritize regaining the ability to control their own limbs, instead of using external assistive devices (e.g. exoskeletons) [3]. Also, the most relevant tasks for people with tetraplegia require upper-limb motor function and object manipulation [4], [5]. By eliciting contractions in paralyzed muscles, functional electrical stimulation (FES) can satisfy the priorities of people with tetraplegia. Over the past decades, many studies have successfully implemented FES systems that restored some arm and hand function to people with SCI [6], [7], [8], [9], [10], [11]. However, these systems have not yet been widely translated into clinical practice. Current upper-limb FES systems provide low dimensional control that poorly approximates upper-limb function, and they require careful and continuous intervention from skilled engineers and clinicians. If upper-limb FES systems are to be more widely used, they will require controllers that can be easily trained, that maintain good performance with minimal intervention from experts, and that can coordinate many actuators simultaneously.
Recently, reinforcement learning (RL) has been used to train controllers that can meet many of the aforementioned design goals. By emulating natural learning, RL may prevent labor-intensive manual adjustments of controller parameters. RL controllers have been effective for controlling an upper-limb FES system [12], a robotic arm performing high-dimensional object manipulation [13], and upper-limb multi-actuator musculoskeletal models [14], [15]. When Hindsight Experience Replay (HER) [16] was incorporated into RL controller training, controllers for upper-limb musculoskeletal models could be trained quickly, achieving success rates for reaching tasks that were close to 100% [17]. RL controllers retained good performance with minimal retraining after large changes in biomechanical properties [18]. More recently, RL provided effective control even for highly fatigable models representing chronically paralyzed arms, as long as arms were given sufficient rest between tasks [19]. However, previous works suggested that high levels of strength asymmetry between antagonistic muscles could impair RL controller performance [19]. Since people with SCI are likely to have some muscle denervation, as well naturally occurring differences in antagonistic muscle strengths [20], RL-controlled neuroprostheses would benefit from increased understanding of muscle force asymmetry-associated decreases in controller performance.
In this work, we investigated the RL control of upper extremity movements using different models of muscle atrophy, and we characterized the sensitivity of RL controller performance to the passive mechanical properties of Hill muscle models. We uncovered that RL controller performance issues that were previously attributed to asymmetrical antagonistic muscle strengths happened, in fact, because flexor muscle active forces were insufficient to counteract passive resistance from extensor muscles. Furthermore, the relationship between flexor muscle strengths and passive extensor muscle forces determines the workspace of the controller. Our simulations support the adoption of rehabilitation protocols that prioritize decreasing muscle passive resistance, and counteracting passive resistance with increased antagonistic muscle strength.

A. Biomechanical Arm Model
We used an existing musculoskeletal model of the human arm [15], [18], [21] to study the impact of Hill muscle components on RL controller performance. Figure 1A shows a diagram of the arm model and of the motor task workspace. The arm model included 6 actuators modeled as Hill-type muscles [22]. The model included two rigid segments representing the forearm and the upper arm, and the segments were connected by two pin joints representing the elbow and the shoulder joints. The range of motion was [−20 • , 130 • ] and [5 • , 70 • ] for the shoulder and the elbow joints, respectively. The actuators approximated the functions of the anterior deltoid (a), the posterior deltoid (b), the brachialis (c), the short head of the triceps (d), the biceps (e) and the long head of the triceps (f). Limb segment dimensions were extracted from [23] for an average male subject (height of 177 cm and weight of 80 kg), and Hill muscle parameters were extracted from [24], [25], as in similar RL studies that demonstrated suitable motor behavior [15], [17] and robustness to variations in biomechanical properties [18].
During motor tasks, the weight of the arm was supported, and the movement of the arm was constrained to a horizontal plane. The movement of the arm could be described as reaching across a frictionless tabletop or moving within a horizontal planar region using a mobile arm support. Forward dynamics simulations were performed using Euler approximation with model states updated every 20 ms, which provided accurate control in similar studies [14], [15], [17], [18], [19], [21]. The musculoskeletal model and the forward simulations were implemented in C [21].

B. Motor Task
The controller was tasked with outputting muscle activations to move the arm model towards a target region. The arm started at the end state of the previous reach, as if continuously  [19]. (B): Components included in the Hill muscle model used in this work. We evaluated the sensitivity of RL controller performance to the components by scaling the output of each component separately. SE is the series elastic element (tendon); CE is the contractile element (active muscle force); PE is the parallel element (passive muscle resistance). K SE and K PE were scale factors simulating different levels of tendon and muscle stiffness, respectively. We simulated muscle atrophy by either scaling the output muscle forces (atrophy 1 ) or the active contractile forces (atrophy 2 ).
moving across the workspace. The task workspace and the target region are shown in Figure 1A. The controller was given 1 second to complete the task, and the endpoint of the arm had to remain in the target region for 100 ms in order for the task to be considered successful. Most reaches took less than 0.5 seconds for all conditions simulated in this work, and increasing the maximum time to reach the target beyond 1 second did not affect movement duration. The targets were circles with radii of 7.5 cm, as in other similar studies [15], [17], [19]. Target locations were randomly sampled from a continuous uniform distribution across the task workspace. The task workspace was estimated using data from previous RL implementations using the same musculoskeletal arm model [17], [18]. We included regions where previous RL controllers could acquire targets with greater than 95% probability, confirming task feasibility and allowing for the RL Controller Training. The controller was a deep neural network that received kinematic variables and the target posture (observation). The controller output muscle activations that were applied to the environment (arm model). The controller was trained using an actor-critic RL architecture. Controller neural network coefficients were continuously updated to maximize the rewards output by the environment during training. The observations and the rewards were used by a parameter update function (F update ) using state-action value estimates output by the critic network.
impact of chosen biomechanical parameters to be assessed. By selecting this workspace, controller success rates could be represented as fractions of the maximum controller performance. Figure 2 shows a diagram of the controller training paradigm. The controller (policy) was a deep neural network (DNN) trained with reinforcement learning [26]. At each step, the controller received the kinematic state of the arm (joint angular positions, joint angular velocities) and the target arm posture. The controller output relative muscle activations in the range of [0, 1] for each of the 6 actuators included in the arm model. The reinforcement learning algorithm was given a reward at each step according to Equation 1. I AT was a boolean that was 1 if the endpoint of the arm was inside the target region (T , Figure 1A) and 0 otherwise. a was a 6 dimensional vector containing the muscle activations of the arm. The first term of Equation 1 rewarded the controller for the ability to maintain the arm in the target region, and the second term penalized higher muscle activations to promote convergence for an otherwise underdefined problem [17], [18].

C. RL Controller Algorithm
The arm controller DNN was trained using an actor-critic RL algorithm. The RL algorithm used Twin-Delayed Deep Deterministic Policy Gradients (TD3) [27] and Hindsight Experience Replay (HER) [16], as described previously [17], [18], [19]. The actor and the critic were feed-forward DNNs containing 2 layers and 64 nodes per layer. The critic received the rewards at each step and provided value estimates that were used to update the actor to maximize rewards. We implemented the RL controller in Python 3.7 using stable-baselines3. RL hyper-parameters were as described previously [17], [19]. The simulations were performed on a high performance computer cluster. Each controller trained in this study was provided with 2 CPU cores (Intel Xeon Gold, 2.10GHz) and 8GB of RAM.

D. Experiments 1) RL Controller Performance for Different Implementations
of Muscle Atrophy: In a previous study implementing RL control of a model representing a person with SCI, we observed that controller performance was impaired by large asymmetries in antagonistic muscle strength. More specifically, RL success rate decreased when flexor muscles were weaker than extensor muscles [19]. To further investigate this phenomenon, we implemented two different models of muscle atrophy: • Scaling output forces: a scale factor was applied to the output muscle forces ( f m ) of the Hill muscle model, as shown in Figure 1B (atr ophy 1 ). Previous works have also scaled f m to model muscle strength decreases due to muscle atrophy [17], [18] and muscle fatigue [18], [21]. This was the model used in [19], where asymmetry-associated decreases in RL performance were observed. By scaling f m , the forces output by all three elements (SE, PE, CE) of the Hill muscle model were decreased by the same proportion, as shown in Equation 2. k atr 1 is the fraction decrease in muscle element force.
• Scaling only contractile element forces: a scale factor was applied only to the active force ( f C E ) produced by the muscle, as shown in Figure 1B (atr ophy 2 ). Equation 3 summarizes the model, where k atr 2 is the fraction decrease in contractile element strength. When f C E is scaled, passive forces ( f P E , f S E ) are higher, compared to when f m is scaled. By performing simulations using a scale applied to f C E as compared to a scale applied to f m , we explored the effect of passive forces on RL controller performance. We uncovered that a decreasing ratio between flexor strength and extensor passive forces explained RL controller issues previously attributed to muscle strength asymmetry [19].
To measure controller success rates (number of successful reaches/total reaches) across the workspace, the workspace was discretized in pixels that measured 0.3 × 0.3 cm. Success rates were displayed for each pixel as a color gradient where yellow represented success rates close to a 100%, and blue represented success rates close to 0%. We modeled symmetrical and asymmetrical muscle disuse atrophy as described previously [19]: • Symmetrical muscle forces: flexor and extensor actuator elements were scaled by the same factor (k atr 1 or k atr 2 ).
• Flexor force decrease: only flexor actuator elements were scaled (k atr 1 or k atr 2 ), and extensor actuator elements remained unchanged.
• Extensor force decrease: only extensor actuator elements were scaled (k atr 1 or k atr 2 ), and flexor actuator elements remained unchanged. 2) Sensitivity to Musculoskeletal Passive Mechanical Properties: We noticed that controller success rates were worse when symmetrical atrophy was modeled as a scale applied to the contractile element (k atr 2 × f C E ), as compared to a scale applied to the output forces (k atr 1 × f m ). Differences in these 2 conditions (scaling f m vs. scaling f C E ) could be due to at least two factors: (1) flexor muscle strength was insufficient to counteract extensor passive forces, or (2) the controller could no longer stabilize the arm at the left lateral side of the workspace after contractile element forces were decreased.
To investigate these hypotheses, we measured the impact of each non-active element (PE, SE) on controller performance as described below: • Tendon stiffness: a scale was applied separately to the tendon forces ( f S E ) of the flexors and the extensors.
• Muscle stiffness: a scale was applied separately to the passive forces ( f P E ) of the flexor muscles and of the extensor muscles. This condition used a biomechanical model that represented an average male with no atrophy.
• Muscle stiffness for a model representing 50% flexor atrophy: we noticed that controller performance decreased with extensor muscle stiffness. To support the hypothesis that flexor muscle strength was insufficient to counteract extensor passive forces after flexor muscle atrophy, we performed a sensitivity analysis on extensor passive forces for a model representing 50% flexor atrophy. If the hypothesis was correct, decreasing extensor passive forces should improve controller performance for a model representing flexor atrophy.

3) Workspace Boundaries for Different Levels of Muscle
Strength: In this experiment, a forward simulation was performed to delimit the boundaries of the workspace of the arm for different levels of muscle strength. The goal was to investigate the impact of the relationship between active and passive muscle forces on the workspace area. Because this analysis did not require a controller, it eliminated the potential impact of the RL controller as a confounding variable in previous experiments. If the results of this forward simulation resembled the workspace boundaries shown in Figure 3, we could conclude that changes in arm biomechanics decreased the workspace of the controller only because they decreased the workspace of the arm. For each condition, the CE active forces ( f C E ) and the output muscle forces ( f m ) were separately scaled to represent different models of muscle atrophy. The forward simulation was performed in an open-loop manner, following the phases described below: 1) The arm was moved to its initial equilibrium position by slowly activating all flexors. For all muscle activation changes described in this experiment, activation was decreased or increased by 1% at each step, and the model was given 5 seconds to reach the new equilibrium position. 2) Starting from the initial equilibrium position, each flexor muscle activation was slowly decreased from 100% to 0% muscle activation. 3) Once flexor muscle activations reached 0%, the antagonistic extensor muscle was slowly activated from 0% to 100%. 4) Steps 1-3 were repeated for each flexor muscle individually, and then for each combination of flexor muscles. The boundaries of the workspaces were then manually outlined using Bezier splines. This provides a computationally efficient estimate of the workspace boundaries by combining maximal flexor activation with minimal extensor activation, and vice-versa, for all combinations of muscles.

E. Evaluating Controller Performance
Controller performance was measured as the median success rate (number of successful reaches/total number of reaches) for all controllers trained in each condition. The controller training process described in Figure 2 was repeated 32 times for each condition in this work, as was shown to provide an accurate estimate of controller performance in similar studies using the same motor task [17], [18], [19]. Controller performance was evaluated over 100 reaches after every 5-minute interval of

III. RESULTS
A. Controller Performance for Different Implementations of Muscle Atrophy Figure 3 shows RL controller performance across the workspace for different implementations of muscle atrophy after 60 minutes of training time. Figure 3A shows success rates when muscle atrophy was modeled as a scale applied to the contractile element ( f C E ) of the Hill muscle model. Figure 3B shows success rates when muscle atrophy was modeled as a scale applied to the output muscle forces ( f m ).
For both implementations of muscle atrophy, extensoronly atrophy had only very minor effects on performance. In Figure 3B, as expected, controller performance is lowest for 25% muscle strength. Controller performance increases across the workspace at 50% muscle strength, and it decreases slightly for higher levels of muscle strength. Notice that when muscle atrophy is modeled as a scale applied to the output muscle forces ( f m , Figure 3B), the passive stiffness of the extensor muscles increases with increasing modeled muscle strength. Therefore, the ratio between flexor muscle strength and extensor passive resistance decreases with larger extensor f m , resulting in worse controller performance.
Flexor-only and symmetrical atrophy shrank the workspace starting from the left lateral side. There was no visible difference in controller performance between the two implementations of flexor-only atrophy. However, when all actuators were weakened by the same percentage (symmetrical atrophy), there was a notable difference in controller performance across the workspace when scaling only the contractile element, as compared to scaling the output forces. Average success rates were considerably lower when symmetrical atrophy was modeled as a scale on f C E . The workspace performance plots from the symmetrical f C E condition were similar to the plots showing the flexor-only f C E and flexor-only f m conditions.  Tables 1 and 2 show the training time to achieve maximal success rates for each condition. Neither extensor tendon stiffness ( Figure 4A) nor flexor tendon stiffness ( Figure 4B) affected controller performance over the range explored in this work. Also, flexor muscle stiffness ( Figure 5B) did not impact controller success rates. However, extensor muscle stiffness ( Figure 5A) visibly impaired controller performance. Increasing extensor muscle stiffness from the baseline (scale factor=1) progressively decreased controller success rates; decreasing extensor stiffness from the baseline did not impact performance, and success rates remained close to 100 %. Figure 6 shows controller success rates as a function of training time for a model where flexor forces ( f m ) were reduced by 50%. Supplementary Figure 3 provides box plot representations of the results, and Supplementary Table 3 provides the training time to achieve maximal success rates for each condition. Reducing extensor muscle stiffness increased controller success rates for a model representing flexor-only atrophy. When extensor muscle stiffness was decreased by 50%, success rates were close to 100%. However, increasing extensor muscle stiffness impaired controller performance; an increase of 50% in extensor muscle stiffness caused controller success rates to drop to approximately 20%. Controller performance was not visibly affected by changes in flexor muscle stiffness ( Figure 6B); success rates remained close to the non-scaled baseline (scale factor=1) when flexor muscle stiffness was increased or decreased by 50%. Supplementary  Figure 4 Table 4 shows the training time to achieve maximal success rates for each condition in Supplementary Figure 4. Figure 7 shows the boundaries of the workspace obtained in a forward simulation for different levels of muscle strength. In Figure 7A, the area of the workspace decreased with symmetrically-decreasing levels of contractile element force ( f C E ). The workspace shrank from the contralateral border, while the ipsilateral border remained unchanged with decreased contractile element forces. Figure 7B shows that the workspace boundaries were virtually unaffected by decreases in the output muscle forces f m .

IV. DISCUSSION
Chronic paralysis leads to high levels of muscle disuse atrophy [28]. Previously, we noticed that biomechanical models representing asymmetrical muscle atrophy posed an unanticipated challenge for RL controllers [19]. More specifically, for reaching tasks where the arm was supported against gravity, RL controllers performed considerably worse in models where flexor muscles were substantially weaker than extensor muscles. In this study, we compared two different models of muscle atrophy to further investigate this phenomenon. Also, we performed a sensitivity analysis to determine the impact of Hill muscle model passive parameters on RL controller performance. We found that the relationship between flexor active forces and extensor passive forces is an important variable that affects not only the controller workspace, but the  Figure 3A. (B): Output muscle forces (F m ) were scaled to represent different levels of muscle strength, and workspace boundaries were outlined for each level. Note that the workspace boundaries were not affected by decreases in output muscle forces; compare with the simulations representing symmetrical atrophy in Figure 3B. overall workspace of the arm. These simulations suggest that predictions of controller performance may be unrealistically high if incorrect muscle stiffness parameters are used. Also, the simulations indicate that if flexor muscle strength is low, and if extensor passive forces are high, the controllable workspace will be very small with RL, and probably with other controller architectures as well.

A. Modeling Muscle Force Decreases Using Hill-Type Muscle Models
Several studies have modeled muscle strength decreases (atrophy and/or fatigue) by scaling the output forces ( f m ) produced by Hill-type actuators [15], [17], [21]. It is convenient to scale output muscle forces ( f m ) since estimates of muscles parameters are unavailable for many clinically-relevant conditions (e.g. disuse atrophy). However, simple scaling of the output muscle forces causes the active elements and passive elements to be scaled identically (see Equation 2). While chronic disuse atrophy and fatigue affect the passive mechanical properties of muscle-tendon units, it is unclear if the forces of the passive elements and the active elements would decrease by the same percentage. On the other hand, simply scaling the active contractile muscle forces ( f C E ) to represent the effect of chronic paralysis is unlikely to provide realistic results. A scale applied solely to f C E models a situation where muscle strength is changed without any changes to the passive mechanical properties of the muscles and tendons (e.g. acute muscle denervation). Several studies indicate that the mechanical properties of muscles and tendons are affected in different ways by chronic paralysis. Chronic paralysis due to SCI causes the cross sectional area, the stiffness, and the Young's modulus of tendons to decrease substantially [29]. Chronic paralysis also reduces the cross-sectional area of muscles [30], increases the proportion of fibrous connective tissues [31], and decreases the muscle length [32]. People with chronic paralysis often have muscles that feel stiff and non-compliant upon manual palpation, compared to those of non-paralyzed individuals [33]. Taken in combination, these observations indicate that the passive mechanical properties of muscle-tendon-units are substantially affected by chronic paralysis. Therefore, each of the components of a Hill muscle model should be adjusted separately to simulate the effects of disuse atrophy due to chronic paralysis.
Several methods can be used in clinical practice to estimate active muscle forces [28], [34]. Measuring muscle and tendon passive properties is less straightforward. Non-invasive techniques, such as ultrasound shear wave elastography [35] and magnetic resonance elastography [36] have been used successfully to estimate the elasticity of certain muscles. However, the in-vivo estimation of muscle elasticity is an area of active research, and guidelines to determine muscle model adjustments based on these measurements are not readily available. Therefore, we advise caution when using muscle models that were not specifically developed to reproduce the biomechanics of people with chronic paralysis to predict outcomes for this patient population.

B. RL Controller Performance With Different Implementations of Muscle Atrophy
The results in Figure 3 showed that RL controller performance was impaired equally when flexor-only atrophy was modeled as a scale applied to f C E (Figure 3A) or as a scale applied to f m ( Figure 3B). Controller performance was lowest at the contralateral border of the workspace, which is the region where extensor muscle-tendon units are most elongated, causing extensor passive forces to be maximized. Models representing symmetrical atrophy as a scale applied to f C E showed similar workspace performance, with success rates being lowest at the contralateral border. However, symmetrical atrophy represented as a scale applied to f m resulted in high success rates across the entire workspace. Comparing the two implementations of muscle atrophy ( f C E vs. f m ) passive forces are higher when scaling only f C E (see Equations 2 and 3). These results suggest that the decrease in workspace observed with flexor-only atrophy and with symmetrical atrophy (scaling f C E ) could be due to the fact that (1) flexor active forces were not sufficient to counteract extensor passive forces or (2) the relationship between flexor and extensor strength affects the controller's ability to stabilize the arm at the left lateral border of the workspace.

C. The Impact of Hill-Type Actuator Passive Parameters on RL Controller Performance
Previous studies suggest that tendon stiffness may play an important role in position and force control, with lower levels of tendon stiffness facilitating endpoint force control, while higher levels of tendon stiffness facilitate position control [37], [38]. The results in Figure 4 show no visible impact on RL controller performance when tendon stiffness was decreased or increased in up to 50 %. However, the controller was tested on a position-control task rather than a force-control task. It is possible that RL controller performance would be more greatly impacted by changes in tendon stiffness if the goal was to control the endpoint force of the arm, instead of only controlling arm posture.
Relatively large (50%) changes in flexor muscle stiffness also did not impact RL controller success rates (check Figure 5), or the controller workspace. This suggests that flexor muscle-tendon lengths are not substantially extended from their relaxed muscle-tendon lengths at the ipsilateral side of the workspace studied in this work. However, relatively small changes in extensor muscle passive stiffness visibly impacted RL controller performance (see Figure 5), and it caused the workspace to shrink starting from the contralateral side. Since the contralateral side is where the extensor muscle-tendon units are most elongated, these results support the hypotheses that controller performance was impacted due to the fact that flexor strength was insufficient to counteract extensor passive forces. The results in Figure 6 provided further support to this hypothesis. If performance was impacted because flexor strength could not counteract extensor passive resistance, performance should be rescued by decreasing extensor passive stiffness. Indeed, Figure 6 shows that by decreasing extensor muscle stiffness by 50%, RL performance is rescued for a model representing 50% flexor-only muscle atrophy.

D. The Viable Workspace Depends on the Relationship Between Flexor Muscle Strength and Extensor Passive Forces
The results in Figure 7 further confirm that controller performance is affected not by RL convergence issues, but by the relationship between flexor strength and extensor passive forces. The workspace boundaries obtained in Figure 7 were estimated by performing forward simulations during which muscle activations were slowly changed, and arm endpoint positions were recorded. These simulations did not include the RL controller, and therefore, they excluded the influence of RL control as a potential confounding variable. Comparing  Figures 7 and 3, the workspace boundaries obtained in the forward simulations are remarkably similar to the controller workspaces for models representing symmetrical muscle atrophy. Taken in combination, these observations indicate that the controller workspace decreases observed in some models of atrophy are not related to the RL control method.

E. Implications for Future Works Studying Arm Movement Control
The results from the present work indicate that the relationship between active muscle strength and passive muscle resistance impacts the workspace for reaching tasks. While people with chronic paralysis typically undergo FES-induced exercise to reverse disuse atrophy prior to neuroprosthesis deployment [10], [11], rehabilitation protocols are generally focused on increasing overall muscle strength and joint mobility. Our results suggest that users of upper-limb neuroprostheses could benefit from rehabilitation protocols including routines to decrease passive muscle forces. Since passive muscle forces can be modeled as springs, they can be decreased by (1) decreasing muscle-tendon elongation relative to the resting muscle-tendon lengths, or by (2) decreasing muscle stiffness. To decrease muscle tendon elongation, the users may benefit from interventions aimed at increasing resting muscle-tendon lengths, such as intermittent stretching. Previous studies indicate that several interventions can be used to decrease muscle stiffness. Long lasting static stretching exercises have been shown to decrease the stiffness of muscles and tendons [39]. Also, paraffin therapy, which is a form of heat therapy, decreases the stiffness of muscles and tendons acutely [40].
For the reaching task investigated in the present study, RL controller performance was most sensitive to flexor muscle active strength and to extensor muscle passive stiffness. Therefore, we recommend that future studies implementing RL controllers should tune these parameters to represent the biomechanics of the targeted user population. While there are several methods available to estimate muscle strength, measuring muscle stiffness is more challenging. Non-invasive techniques such as shear wave elastography and MR elastography could assist investigators in creating more accurate biomechanical models that better approximate the passive mechanical properties of the muscles of targeted users.
Deep RL is a numerical optimization technique for which convergence is not guaranteed. In this study, we demonstrated that RL controllers converge for a wide range of muscle strengths and muscle-tendon passive mechanical properties. However, the feasible workspace for the reaching task implemented in this study is substantially affected by the relationship between flexor muscle strength and extensor muscle passive stiffness. Relatively high extensor passive stiffness, combined with high flexor atrophy is associated with smaller task workspaces.
People with SCI often experience denervation in some muscles, which renders FES-induced stimulation ineffective for those muscles. The simulations in the present work indicated that the extensor muscle forces required for completion of reaching tasks where the arm was supported against gravity were low, as compared to flexor muscle forces. Therefore, individuals with SCI with some degree of denervation in upper-limb extensors could still benefit from neuroprostheses aimed at restoring reach, as long as the arm was supported against gravity by an assistive device, such as a mobile arm support [11]. This could inform criteria for clinical studies developing upper-limb neuroprostheses, although we recommend that these guidelines are experimentally validated before usage in clinical practice.

F. Limitations
The simulations in the present study relied on a relatively simple musculoskeletal model of the arm. The model included two degrees of freedom allowing only for horizontal planar motion, as if the arm moved on a frictionless tabletop. Also, the model did not include joint stiffness and muscle spasticity, which are common in people with SCI. These were deliberate choices, as in previous studies investigating upper-limb movement controllers [14], [15], [17], [18], [41], [42], [43], to limit the impact of confounding variables. Also, the model represented an arm supported against gravity, and the controller was simply tasked with moving the arm to target locations. We anticipate that controller performance would be negatively impacted in more challenging tasks requiring object manipulation, although our results should still provide reasonable predictions for reaching tasks using a mobile arm support.
Previous studies indicate that muscle stiffness may change as a function of posture, gender, and of muscle activation [35], [44]. In our simulations, muscle stiffness was scaled by activation, but we did not include variations in muscle stiffness with posture or gender. Also, all parallel elements (PE in Figure 1B) had the same mechanical properties in the present work, although previous studies indicate that stiffness varies across different muscles [44]. Future studies incorporating dynamic muscle-specific models of passive stiffness are likely to yield more accurate predictions of controller performance, although we anticipate that the main conclusions of the present study should remain unchanged.

V. CONCLUSION
In this work, we investigated the impact of different implementations of muscle atrophy (using a musculoskeletal model of the human arm employing Hill muscle models) on the performance of reinforcement learning controllers aimed at restoring upper-limb motor function to people with chronic paralysis. Also, we performed sensitivity studies to characterize the impact of the passive mechanical properties of muscles on RL controller performance. We uncovered that decreases in RL controller effective workspace that were previously attributed to asymmetrical antagonistic muscle strength resulted from flexor muscle active forces that were insufficient to counteract extensor muscle passive resistance. The results in the present study indicate that the relationship between active flexor forces and extensor muscle passive resistance plays a key role in determining the workspace of the arm and, therefore, the workspace of the controller. Thus, we recommend that future studies include reasonable approximations of the passive mechanical properties of targeted muscles, in addition to user-specific approximations of maximum muscle forces. Also, our simulations suggest that moderate changes in tendon stiffness and flexor muscle stiffness are unlikely to affect RL controller performance for reaching tasks where the arm is supported against gravity. Overall, RL controllers provided excellent performance over the range of passive mechanical properties studied in this work, although we recommend that the targeted workspace is adjusted to reflect the user-specific workspace of the arm. Our results support the adoption of rehabilitation protocols including interventions that aim to increase muscle-tendon length and decrease muscle passive stiffness to increase the workspace of the arm for reaching tasks performed by people with chronic paralysis.