CPI: Conservativeness, Permissiveness and Intervention Metrics for Shared Control Evaluation

—This paper presents an approach to measure the ability of a shared control system to track user input while simultaneously ensuring safety. These CPI metrics, based on reachability theory, are used to measure the Conservativeness (C), Permissiveness (P) and the amount of Intervention (I) applied to a user nominal control. The metrics apply to arbitrary dynamic systems, state and control constraints, and unlike other existing metrics, they apply to non-differentiable shared controllers including controllers implemented in procedural code. Moreover, we propose a parallel algorithm based on Rapidly-exploring Random Trees (RRTs) for conducting the reachability analysis necessary for computing conservativeness and permissiveness metrics efﬁciently. We demonstrate how CPI metrics may be used to evaluate a Linear-Quadratic Regulator (LQR) and two different Model Predictive Controller (MPC) based safe shared controllers applied to the cartpole system, for different control parameters.


I. INTRODUCTION
Shared control blends human and autonomous control, wherein the controller filters a human user input to generate the system control to provide additional safety or task assistance. It is an important component in robot teleoperation [1], assistive robotics (e.g. robotic wheelchairs [2]), and driver assistance in automobiles [3].
Intervention is the amount by which the system controls differ from the nominal user controls. A controller that intervenes affects the system's global reachability. There is an inherent trade-off between safety and performance on the one hand, and intervention and reachability on the other. On the one hand, a shared controller can be overly cautious and restrictive, preventing the user from reaching all unsafe states but also limiting the operator's control over the system. For example, a collision avoidance braking system that limits a vehicle to 10 kph is quite safe, but prevents the vehicle from reaching its operational limits. On the other hand, a controller may be too permissive and allow a careless, negligent or adversarial user to reach unsafe states. For example, a collision avoidance system that only partially slows the vehicle down before a collision is reliant on the human to provide safety. However, existing shared control evaluation methods have focused primarily on task performance, e.g. task success rate, completion time or efficiency in terms of control effort applied, or user preference [4], [5], [6], [7]. These metrics are system-specific, may require subjective This work is partially supported by NSF NRI Grant #2024775. 1   Qualitative illustration of the CPI metrics (M C , M P , M I ). The user velocity command and trajectory are shown as red arrows and curve, respectively. The outputs of two hypotethical shared controllers are drawn in blue and yellow. At the start (top left), the user issues a safe but aggressive command. Controller 1 is more conservative and does not follow the command closely, leading to higher M C . Then, the user mistakenly drives the robot toward wall obstacle (right). Controller 2 provides the operator freedom to let the robot collide, resulting in higher M P , while Controller 1 avoids the collision. Based on the overall trajectories, Controller 1 stays slightly closer to the user command, leading to lower M I . [Best viewed in color.] surveys, and thus cannot serve as a common language between systems and controllers. The goal of this work is therefore to provide quantitative and system-independent metrics to evaluate the safety-performance trade-offs in designing shared controllers.
We propose the metrics of conservativeness and permissiveness using concepts from reachability theory. These are dimensionless quantities in the range [0,1] and are independent of the user behavior. Qualitatively, a shared controller can be described as conservative if it leverages only a small portion of the system's reachable viable set and as permissive if it allows driving the system into states that would not be able to reach the safe set. The intervention metric is inversely proportional to how much control authority the user has over the system, and low values are preferable if the system designer trusts the operator's expertise and attentiveness, and wants to minimize surprise to the operator. These metrics generalize to any system and serve as a basis upon which fair comparison and understanding between shared controllers can be made. They may then be used to design and select the shared controller most appropriate to given requirements (Fig. 1).
The C and P metrics require calculating reachable and viable sets under the given shared controller, which are called the controller-dependent reachable and viable sets. Although past work has addressed reachable and viable set computation using Hamilton-Jacobi (HJ) methods, applying these methods in our setting requires the controller to be differentiable. To calculate reachable and viable sets of more complex controllers, we introduce a reachability estimation approach based on a parallel RRT-based algorithm.
We evaluate the CPI metrics on an LQR and MPC-based safe shared controllers applied to the cartpole system for different control parameters. In particular, our approach can be applied to a minimum intervention MPC, which performs a finite-horizon trajectory optimization to minimize intervention while maintaining safety, whereas prior HJ methods do not apply.
II. RELATED WORK 1) Safe Shared Control: The safe shared controller can be decomposed into two parts: potentially unsafe human input and a shared controller which tracks user input in a minimum intervention manner and only modifies it when considered unsafe. The safety to be verified is stability certification and constraint set certification [8].
Stability verification concerns closed-loop stability. Methods such as Sums-Of-Squares (SOS) to search for Lyapunov functions can be used to verify safety by constructing funnels [9] with Lyapunov properties. However, finding a Lyapunov function for a system is nontrivial. Constraint set certification tries to find a policy that keeps the system inside a control invariant safe set (CIS), the set of initial states for which there exists a controller such that the system constraints are never violated [10], when the human input is considered unsafe. This can be achieved through a safety filter style controller [11]. One approach under this framework is an Active Set Invariance Filter (ASIF) based on control barrier functions (CBFs) [12], which puts the nominal control input through a quadratic program to ensure it obeys certain constraints that define the safe set of the system. However, it is only pointwise optimal, and finding valid CBFs for a general dynamical system is generally challenging. Another predictive safety filter approach is given in [13], which guarantees the safety of a learned controller by using a predictive controller to find the closest control that is safe.
2) Shared Control Evaluation: Various metrics have been applied to evaluate shared control. Carlson et al. evaluate a robotic wheelchair in terms of performance, attention and workload with emphasis on the human factor [4]. Tee et al. introduce metrics for teleoperation task performance on a curved object surface [5] such as task duration, normalized error, jerk, and user experience using the NASA TLX questionnaire. Broad et al. use the average observed deviation between the user input and the closest safe signal as well as the average percentage of sampled rollouts that are safe at each timestep as safety metrics to evaluate a shared controller [7]. Oh et al. propose four quantitative metrics for obstacle avoidance tasks: task duration, travelled distance, minimum proximity to the obstacle and the cosine distance between controls [6]. Although some existing metrics capture user behavior and objective measures of safety, our work provides a quantitative, control-theoretic and system-independent framework upon which to evaluate shared control.
3) Safety verification and viability: The control literature has studied safety verification and viability checking extensively. The standard safety verification problem focuses on proving whether there exist trajectories entering a set of forbidden or unsafe states through forward reachability analysis [14]. Viability checking, on the other hand, is a backward reachability problem that involves finding all the states from which a safe set can be reached [15].
The conservativeness and permissiveness metrics require computing forward and backward reachable sets. There are multiple ways to do so. Hamilton-Jacobi (HJ) methods solve a partial differential equation to give an over-approximation of the reachable sets [16]. This technique however requires differentiable dynamics and suffers from a time complexity which increases exponentially with the state dimension. Recent work decomposes the computation of a reachable set into several smaller dimensions [17], but suffers from an over-approximation that worsens in higher dimensions and requires knowledge of how to decompose the system appropriately. Set propagation over-approximates the reachable set using polygonal approximations, and benefits from existing software toolboxes [18]. However, accurate set propagation for nonlinear systems is still a challenging problem and an active area of research [19]. Finally, reachable sets can also be computed with sampling-based methods, which require intelligent sampling strategies to obtain better coverage. For example, Lew et al. use an adversarial strategy to sample states that can generate a larger convex set [20]. This comes at the expense of an over-approximation of reachable sets that becomes non-negligible when those are highly non-convex.
We opt for a sampling-based method based on the RRTs framework [21], [22]. Prior work has used RRT as a falsification method focusing on generating a set of test scenarios that cause the system to fail [21]. A similar algorithm, R3T [22], uses reachable set approximations to improve the RRT distance metric and achieve faster convergence speeds. We build on these approaches for shared controller-dependent reachable and viable set computation in this work.

III. SAFE SHARED CONTROL PROBLEM FORMULATION
Let S be the system upon which we design a shared controller. The dynamics of S are modeled byẋ = f (x, u) with states x ∈ R n and controls u ∈ R m . Let X be the set of feasible states, and U the set of admissible controls, i.e. the set of states the system is allowed to be in and the set of controls it is allowed to execute, respectively.
A shared controller takes the system's current state x and a human input π h to generate a safe control policy π s : where U h is the space of human inputs. The human input is interpreted as being generated by a controller π h ≡ π h (x, t) which is usually unknown to the shared controller π s . 1) Safety: Let X 0 ⊆ X be the initial set, i.e. the set of feasible states the system may start in, and X safe ⊆ X the safe set , which refers to a set of feasible states in which a known auxiliary controller (e.g. LQR) is guaranteed to maintain the system: X safe is chosen to be a system-specific CIS, and may be conservatively set as a small region of feasible states near the equilibrium, e.g. the set of feasible states with zero velocity.
2) Human input: The space of user controls U h is typically a user interface design choice. For instance, if the user can drive the system's controls directly, where π h := u h , then U h = U. In addition to direct control, other more intuitive and practical interfaces may designate position, velocity, higher-order targets or a combination thereof as the user command: π h := x h . For example, if the system state x = (q,q) consists of a configuration q and its derivativeq, then a position control scheme sets U h to a (n/2)-D subspace of X .
3) Objectives: The goal of a safe shared controller is to design a policy which follows the human operator's command π h as closely as possible and guarantees safety. To that end, we define an intervention objective int (·). Assuming direct control u h , int (·) can for example be defined as: If the user control is interpreted as defining a target state x h , then int can be defined as: The minimum intervention shared control (MISC) [7] problem can be therefore formulated as the following optimization problem, assuming discrete time: Additional constraints.
. Safety is encoded through state and control constraints with the sets X and U. Additional constraints may include other artificial state and control constraints, but they are not required in this definition. MISC is an idealized goal; Eq. 2 is not actually solvable in practice since we do not have access to the user's control policy π h . Instead, the system can only approximate the MISC using the current user command u h 0 = π h (x 0 , 0) and past observation. MISC is often defined using a 1-step loss [23] (i.e., minimizing only int (x 0 , u 0 , u h 0 )), which works fairly well if the human provides direct control, but works poorly for target tracking, particularly in underactuated systems. For tracking, past approaches include discounting future target deviation cost to the currently commanded target [23], and using intention prediction to obtain future human command trajectories [24]. In our experiments in Sec. VI-A.2, we introduce two MPC-based controllers that vary in their approach to approximating MISC: MPC Safety Filter (MPC-SF) [13] and MPC Target Tracking (MPC-TT). MPC-SF obtains the system control from an unsafe controller, and adopts a 1-step loss. MPC-TT is an undiscounted formulation of (2).

IV. REACHABILITY ANALYSIS
Here we lay the groundwork, based on reachable sets, needed to define the CPI metrics in Section V. We define reachable and viable sets, distinguishing between controllerdependent and controller-independent quantities. Let U = R + → U and U h = R + → U h .

A. Controller-independent sets
We define the controller-independent forward reachable set as the set of states that can be reached from X 0 via a feasible state trajectory obeying some feasible control: Similarly, we define the controller-independent viable set as the set of states that can reach some X safe via a feasible state trajectory obeying some feasible control: Note that R(X 0 ) and V(X 0 ) depend only on the system controls and dynamics, not the shared controller.

B. Controller-dependent sets
The controller-dependent forward reachable set is defined as the set of states reachable under the shared controller π s under any user control inputs u h ∈ U h : Similarly, we define the controller-dependent viable set: which is the set of states that can reach X safe via a feasible state trajectory obeying a shared controller policy π s as a response to user input from U h . These sets are illustrated in Fig. 2. Note that R(π s , X 0 ) ⊆ R(X 0 ) and V(π s , X safe ) ⊆ V(X safe ). We confine x(t) ∈ X at all times.

C. Connections with other sets and functions
We point out here connections between our set definitions and some other key sets defined in control theory related to stability guarantees and safety constraints. The difference often depends on whether it's forward or backward propagation of the dynamics, what is the target set considered, if a controller is applied, whether the disturbance is considered as well as time horizon used in set computing, whether it's discrete or continuous representation, etc. Given a target set X N , the maximum controllable set [10] is the same as our controller-independent viable set V(X N ), and both of them are valid CIS. Our controller-dependent viable set V(π s , X N ) is a valid positive invariant set [10]. The region of attraction [25] is a subset of V(π s , X N ). Finally, the control Lyapunov functions (CLFs) [26] is a continuous-time representation of a subset of V(X N ) and the CBFs [26] can be seen as a continuous-time version of a subset of our reachable set R(X 0 ). If systems are subject to disturbance w(k) ∈ W then robust will be used to indicate the sets, usually added in front of the set definitions V. CPI METRICS The CPI metrics map a safety controller π s and a user behavior π h to a three-tuple (M C , M P , M I ), measuring conservativeness in terms of how much we are limiting the capabilities of the system artificially through adding intervening controls, permissiveness in terms of the portion of the forward reachable set that is outside of the viable set, and the intervention in terms of the actual amount of intervention applied on a user control, respectively. We note that M C and M P have the favorable properties of being dimensionless and taking the range [0, 1], with 0 being better. M I is in R + , with 0 corresponding to no intervention from the shared controller.

A. Conservativeness Metric
M C is defined as one minus the fraction of the intersection of the controller-independent reachable and viable sets that is reachable under the safety controller, i.e.
where vol stands for a volume measure.
M C captures whether a controller is conservative, with more conservative controllers having higher values of M C , and less conservative controllers having M C closer to 0. As an extreme, the pass-through controller π s (x, π h ) = π h that simply replicates the user control will exhibit M C = 0.

B. Permissiveness Metric
M P is defined as the fraction of the boundary of the controller-dependent reachable set that is not viable.
where ∂ denotes the boundary, and the volume measure here operates on sets of dimension n − 1. The numerator is shown as the dashed red boundary in the bottom right of Fig. 2, and the denominator is the bold yellow boundary. This ratio estimates of the likelihood that the user would reach an unsafe boundary of the viable set, with more permissive controllers having higher values of M P and safer controllers having M P closer to 0. As an extreme, a controller that enforces staying at a safe state at all times (e.g. for the cartpole system, an LQR which does not take any user input and instead simply tracks the upright position) will exhibit M P = 0. Less safe controllers will have 0 < M P < 1 with larger values indicating more "dangerous" controllers.

C. Intervention Metric
M I is the expected value of the amount of intervention applied to the user nominal control over a distribution of user behaviors π h . Define a trajectory τ as a sequence of states and controls τ = (x 0 , u 0 , x 1 , u 1 , ...). We collect a set of trajectories D = τ i=1,...,K , where each trajectory is obtained by letting the user control the system while being assisted by the shared controller. We then define the M I metric as the sample estimate: where N > 0 is the length of each trajectory for which we estimate M I , andˆ int (x, u, π h ) is an intervention objective as defined in Sec. III-.3 but with a fixed cost matrix to make it possible to compare different controllers on a comparable score. As stated, this metric assumes that all user commands are meaningful and should be followed if possible. If there exists a way to measure how "meaningful" a command is at a given time step, it should be used to weight the intervention score. M I reaches its minimum at 0 by a controller that always replicates the user's desired control.

D. Reachable and viable set computation
To calculate volumes, state space are discretized onto grid cells with resolution r, which dictates the accuracy of the sets: finer resolution yields a closer and smoother approximation of the true sets, at the expense of longer computation time. We are concerned with the volumetric ratio, which is the number of occupied grid cells over the total number of cells in the grid. x rand ← SampleState(X ) 9: x nearest ← FindNearestTreeNode(T, x rand ) 10: Ξ ← SampleTrajectories(x nearest , n)

11:
R ← R ∪ Ξ 12: x new ← FindNearestState(Ξ, x rand ) 13: Add x new to T as a child of x nearest 14: return R We adapted the RRT algorithm to compute reachable sets as outlined in Alg. 1. First, we sample initial states from X init , and then start building a tree in RRT fashion by sampling a random state x rand ∈ X and finding the nearest node x nearest in the tree. A weighted Euclidean distance metric is used, and the nodes in tree are stored in a k-d tree data structure to accelerate nearest neighbors computation. From x nearest , we generate n trajectories with different controls. In the case of shared controllers, controls are sampled from the space of user inputs. The generated trajectories are added into the reachable set, and the closest terminal state to x rand is denoted x new . x new is added into the tree as a child of x nearest . We then keep sampling until convergence of the reachable set, which is defined as the rate of change of the volumetric ratio falling below a threshold for certain number of iterations.
In order to speed up the computation and convergence rate, instead of building one single tree of N nodes, we build k trees of N/k nodes in parallel and take the aggregation of the reachable sets resulting from each tree as the final result. The aggregation is done by taking the union of the occupied grids in each reachable set. In this way, the algorithm takes O( N k log (N/K)) time -in other words, more than k times faster than with one single tree, and in practice we have found that it achieves a same or better coverage, likely due to the history-dependent nature of RRT construction. We can also choose a larger grid resolution and interpolate in trajectories to speed up the convergence of the reachable set.
The method to compute viable sets is similar to the way forward reachable sets are computed. The only difference is that we need to reverse the direction of time in the dynamical system, i.e. we apply backward instead of forward differencing.
Note that our method does not require any derivatives of the system, unlike HJ reachability which requires a differentiable expression of the shared controller. This makes our approach suitable to non-differentiable shared controllers, e.g. those that result from solving an optimization problem. Moreover, we only record the index of the reachable or viable grid during computation instead of storing the whole grid space, making it amenable to some extent to highdimensional systems.

VI. EXPERIMENTAL RESULTS
To evaluate our work, we compute the CPI metrics for LQR and MPC-based safe controllers to cartpole via numerical simulation, and illustrate how they correspond to qualitative system behavior. Note that the metrics are computed offline given access to a shared controller and system simulator as stated in Section III and a sample of user trajectories for the M I metric as stated in Section V-C. 2) Shared controllers: The first controller we consider is an infinite-horizon LQR of weight matrices Q lqr and R, with the dynamics linearized at equilibrium point. Q lqr is varied in the experiments as shown in Table I while R is fixed to 1. User input is saturated if the generated policy is beyond the range of admissible controls.
We also consider the two safe shared controllers MPC-SF and MPC-TT. The MPC-SF [13] is formulated as a one-step penalization of the control deviation: where N is the control horizon and X N the terminal constraint set. We formulate MPC-TT as the aggregate penalization of the target deviation over a given horizon N : subject to same constraints as in Eq. 10 A terminal cost term V (x N ) = x N − x h 2 Q N can be added in Eq. 11, but not required. This formulation permits us to replace x h with a predicted trajectory coming from our testing dataset.
The cost matrices used are R = [1] and Q = I 4 . The terminal constraint set X N is set to X safe . The optimization time step is h = 0.1 and the problem is solved via direct Fig. 3. Controller-independent reachable set R(X 0 ) (left) and viable set V(X safe ) (right) of a cartpole system using RRT-based reachability analysis. The volumetric ratio is 46.92% and 47.23% for each.

B. Reachability analysis
We remove the position dimension of the state from our state space grid representation since it is a dynamically invariant dimension of the system. The controller-independent reachable and viable sets, R(X 0 ) and V(X safe ), as computed per Alg. 1 are shown in Fig. 3. The volumetric ratios are 46.92% and 47.23%, respectively. We use the following parameters: tree count k = 10, number of sampled nodes N = 100000, number of trajectories sampled for each node n = 15, trajectory length 10, and weight matrix for the distance metric w = diag(0, 4, 1, 1) (θ is more safety-critical than the other dimensions). The total time for computing the reachable set is around 30 min. The parameters listed here could be optimized with further investigation.
In order to evaluate whether the sets are reasonable in terms of space coverage, we use HJ reachability (via the level set toolbox [28]) as a baseline and compare the volumetric ratios of the two methods. The volumetric ratios for R(X 0 ) and V(X safe ) are 47.47% and 47.36%, respectively, with a computing time of around 2 hours. We can thus bring the coverage of the sets from our sampling-based method within 2% of the baseline HJ reachability method while requiring approximately one fourth of the computing time.  Table I. We observe larger weights result in smaller values of M C , because the controller commands more aggressive movements, thereby resulting in a larger coverage of reachable states that are viable. This also lets the user drive the system into more non-viable states, thereby resulting in larger values of M P .

C. CPI metrics evaluation
CPI metrics for cartpole system with MPC-based safe shared controller are shown in Table II, We observe that in both formulations M C decreases with the horizon, which is also aligned with the results in Fig. 5. This is due to larger horizons allowing the system to reach more states, which can further recover the maximum reachable set when N → ∞. In the extreme case, an infinite horizon MPC with an unbounded U would make conservativeness tend to zero, i.e. M C = 0. M P , on the other hand, is always 0 because  (green) and N=20 (light green). Set labeled with "R" is R(X 0 ) and labeled with "V" is V(X safe ). Curves in the two sets are trajectory samples generated when computing the sets. Finally, for the M I metric, both LQR and MPC controllers are evaluated over a user input dataset D that is mixed with user input trajectories generated through sinusoidal, step, and linear functions. The dataset is designed such that we have user input with different challenges to experiment on. Sinusoidal functions take the form A sin(2πf t) where A and f are drawn uniformly at random from [1,10] and (0, 1 2π ], respectively. Step functions take the form   Table I we observe that larger Q lqr values lead to smaller M I . This is due to larger cost matrices helping the tracking converge faster, resulting in a smaller intervention. Results in Table II show that the intervention decreases at N = 10 for both MPC-SF and MPC-TT in general because the controller a larger horizon gives more time to plan and track. However, increasing the horizon further does not help because the constant user input assumption becomes less valid as the prediction horizon increases. This can be alleviated by including a human intention prediction module. The result of MPC-TT with human intention prediction (MPC-TT-Pred) is also shown in Table II, where we feed in the clairvoyant human trajectory as the trajectory to be tracked. The results show that accurate intent prediction has a strong impact on reducing the intervention metric.
Although the prior tests used synthetic user inputs to estimate M I , we show that the metric is indeed predictive of real-world performance. We collected data from a human operator through a human-machine interface for full-body control of a wheeled robot and used it as a velocity command [29]. We scaled the user input signal by 40 in order to trigger LQR failure cases and then fed that LQR control output to MPC-SF (horizon N = 10). We also send the scaled user input to be tracked by MPC-TT and the clairvoyant human velocity trajectory to MPC-TT-Pred. The output is shown in Fig. 6. Each MPC variant produces a safe control, but MPC-TT-Pred fits the user velocity command the best. This result is consistent with the M I estimations of Table II in   D. Discussion 1) Testing on high-dimensional system: We applied our reachability analysis approach on a 7 degrees-offreedom double-wheeled inverted pendulum (DWIP) system [30] to extract the sets required for metric evaluation for LQR controllers. The state variables . The cost matrices used in LQR are R = I 2 and Q is set to I 6 , 10 · I 6 , and 100 · I 6 . Note we reduce state from 7D to 6D by substituting x, y with d when computing LQR gain to remove the correlated dimension. We used 18 trees each with 50k nodes for reachability analysis and the volumetric ratios for R(X 0 ) and V(X safe ) are 18.25% and 18.21%, respectively. The resulting M C are 0.373, 0.098, 0.024 for each Q and M P are 0.515, 0.537, 0.539. Therefore, we obtain the same conclusion for both metrics: larger weights result in less conservativeness and increasing permissiveness.
The results are sensitive to the number of iterations and nodes sampled. Since our approach is an approximation of the true reachable set, when HJ reachability computation is tractable, we can use it as a reference to guide parameter tuning. However, we lose this sanity checking benefit for higher dimensional systems as HJ reachability scales poorly. Although we could alleviate the computational burden of high dimensional reachability analysis via coarser resolutions and looser convergence criteria, a more principled approach would be to devise a more efficient representation of reachable sets that have lower memory consumption and approximate better in higher dimensions. This is a future direction to investigate.
2) User behavior model: M I depends on the user behavior distribution model. In particular, humans can learn to predict the behavior of the shared controller, and early partial intervention before violating a safety constraint can teach the user about the limits of the system. At the same time, π s can have better tracking performance in terms of intervention when it can predict the human policy π h closely. This can be our future work to solve.

VII. CONCLUSION
We introduced CPI metrics to evaluate shared controllers in a principled manner by quantifying their conservativeness, permissiveness and the amount of intervention, given some user behavior. We proposed an RRT-based framework for efficiently computing the reachable sets required for said metrics. Case study on the cartpole comparing LQR with different MPC-based safe shared controllers has shown that these metrics are useful to evaluate a shared controller and to choose the most appropriate one for given requirements. We envision several promising directions for future work: considering different user behavior models when generating the CPI metrics, varying user input and the shared control task with more complex environments, and empirically confirming the metrics on a real robot.