Safe Learning MPC With Limited Model Knowledge and Data

This article presents an end-to-end framework for safe learning-based control (LbC) using nonlinear stochastic MPC and distributionally robust optimization (DRO). This work is motivated by several open challenges in LbC literature. Many control-theoretic LbC methods require subject matter expertise (SME), often manifested as preexisting data of safe trajectories or structural model knowledge, to translate their own safety guarantees. In this article, we focus on LbC where the controller is applied directly to a system of which it has no or extremely limited direct experience, toward safety during tabula-rasa or “ blank slate ” model-based learning and control as a challenging case for validation. This explores the boundary of the status quo in control theory relating to requirements for SME. We show under basic and limited assumptions on the underlying problem, we can translate probabilistic guarantees on the feasibility of nonlinear systems using results in stochastic MPC and DRO literature whose relevance we formally extend in mathematical analysis. We also present a coupled and intuitive formulation for the persistence of excitation (PoE) and illustrate the connection between PoE and the applicability of the proposed method. Our case studies of vehicle obstacle avoidance and safe extremely fast charging of lithium-ion batteries reveal powerful empirical results supporting the underlying theory.


I. INTRODUCTION
T HIS paper presents a novel application of Wasserstein ambiguity sets to robustify model-based reinforcement learning (MBRL) and learning-based control (LbC) in safetycritical applications.Here, we define safety as the ability of the control policy to satisfy constraints.Translating safety to online reinforcement learning (RL) algorithms is a notoriously difficult open challenge in relevant literature.This paper is motivated by unsolved shortcomings of many existing means to address this challenge, particularly a strong and often optimistic dependence on subject matter expertise.Two overarching examples include (i) assumed knowledge of underlying dynamics, and (ii) preexisting data of safe trajectories.
The LbC problem space borrows many concepts from historical research on stochastic optimal control, a field which dates back decades to the original linear-quadratic Gaussian problem [1].The key underlying concept relates to uncertainty, and how we can accommodate limited or imperfect knowledge of the underlying dynamics.The rise in popularity of MPC has created a new application for these robust and stochastic control principles.For instance, foundational work by Kothare et al. addresses uncertainty in MPC optimization with linear matrix inequalities by allowing the state transition matrices to vary in time within a convex polytope [2].
Within the past few years, stochastic optimal control has become connected to ongoing research in the burgeoning field of LbC.Here, researchers seek guarantees on safety and performance when learning-based controlling a dynamical system simultaneously.For a review of current state of the art methods in learning-based control which utilize MPC, we direct the reader to a thorough review by Hewing et al. [3].This type of problem presents a nuanced and complex challenge for a host of reasons.Safety and feasibility pose significant barriers for proper implementation of such algorithms.Moreover, balancing the exploration-exploitation tradeoff inherent to simultaneous control and model identification has presented researchers with a host of unique problems which form a primary focus of research in active learning.Work by Dean et al., for instance, explores safety and persistence of excitation for a learned constrained linear-quadratic regulator [4].
MPC is a highly popular use case for learning-based control problems, and provides an intuitive bridge between longstanding adaptive control theory and new developments and explorations.For instance, recent work has investigated recursive feasibility for adaptive MPC controllers based on recursive least-squares [5] and set-membership parameter identification [6], although similar papers frequently possess limitations including a dependence on linear dynamical models.Rosolia and Borrelli derive recursive feasibility and performance guarantees for a learned episodic MPC controller [7].Koller et al. also address the safety of a learned MPC controller when imperfect model knowledge and safe control exists [8].
We note that Control Lyapunov function and control barrier function [9]- [11] based approaches have further strengthened the connection between classical adaptive control and more modern approaches akin to popular model-based reinforcement learning (RL) problems.Recent work by Westenbrouk et al. has even explored coupling such nonlinear control methods with a policy optimization scheme [12].
In the space of RL, safe LbC has become a burgeoning area of study.For broad discussion and categorization of classical methods, Garcia et al. provide a comprehensive review [13].More recently, some control-theoretic principles have migrated towards the space of safe RL.For example, Chow and Nachum leverage Lyapunov stability principles to obtain improved empirical results [14].Other methods focus on safety as a challenge relevant to transfer learning, where safe behavior can be extrapolated and expanded from simpler tasks [15].Methods in the space of RL provide idealistic safety guarantees that translate into improved empirical safety properties.However, any guarantees (probabilistic or robust) or safety certificates in this space are elusive and remain an open challenge.
Guarantees in RL literature are difficult to obtain since that literature eschews subject matter expertise (SME), or direct intuition into a specific application.Some RL research obtains guarantees by leveraging strong SME in the form of known safe backup controllers [16], [17].Generally, when RL neglects considerations to SME it becomes applicable to a much wider body of relevant decision and control problems [18] that lack permeability to our intuition and expertise.Conversely, controls literature is ubiquitous in revealing how such expertise can be leveraged to yield strong and specific performance and safety even in adaptive and learning contexts.As previously discussed, SME in controls LbC methods often takes the form of model knowledge [5], [6], [9]- [11] and preexisting data of safe trajectories [7], [19].
The problem with these SME assumptions is that they can very easily become optimistic.Given the overarching assumption of preexisting data of safe trajectories, we have to ask "How trustworthy is our data?"This should always be called into question, especially when safety is of the utmost importance.Many LbC methods do consider noise-corrupted data [19], but what if deeper, malicious pathology infiltrates the data generation process?The process generating the data could be flawed in many ways, the relevance of each to existing methods varies but is persistent.An example could be sampling data locally where relevant dynamics can be effectively linearized, when the system experiences highly nonlinear behavior outside of that region.Without exploiting and trusting our SME, we cannot guarantee things like this will not happen especially in safety-critical settings.By applying a resultant controller to the underlying system, it can encounter out-of-distribution (OOD) experience and adversarial attacks that a majority of existing LbC methods simply cannot accommodate.Those few LbC algorithms that do make consideration to OOD experience do so using hyperparameters that are not trivial to select and validate [19], and often assume structure of the underlying dynamics [20].These same fundamental quandaries also apply when assuming model knowledge.
In this paper, we address these key open questions about SME in control theoretic LbC.Critically, we ask "What is the least amount of SME we may need to obtain safe control results?"Such questions remain relatively unexplored in controls literature, despite their relevance.Our methods for addressing these questions are actually quite simple, and rely on combination of concepts in stochastic MPC and distributionally robust optimization.We make this technical augmentation along with several basic assumptions about the problem formulation that allow us to translate probabilistic safety guarantees in the absence of conventionally strong dependence on SME.

A. Background on DRO and LbC
This paper primarily leverages concepts from distributionally robust optimization (DRO) to obtain safety certificates.In recent practice, DRO has been gaining traction as a set of methods that provide significant value to the study and solution of the LbC problem.DRO is a field of inquiry which seeks to guarantee robust solutions to optimization programs when the distributions of relevant random variables are estimated via sampling.This uncertainty can involve the objective or the constraints of the optimization program.Uncertainty in both cases can pose significant challenges if unaccounted for, leading to suboptimal and potentially unsafe performance [21].Given that past work in the LbC space frequently considers chance constraints [5], [19], [22], incorporating a true DRO approach possesses the potential to improve our capabilities of guaranteeing safety during learning.These methods have been recently explored to address challenges of safety and performance imposed by uncertainty.For instance, Van Parys et al. address distributional uncertainty of a random exogenous disturbance process with a moment-based framework [23].Paulson et al. also apply polynomial chaos expansions to characterize distributional parametric uncertainty in a nonlinear model-predictive control application [24].
Within the toolbox provided by DRO, Wasserstein ambiguity sets are a foremost asset.The Wasserstein metric (or "earth mover's distance") is a symmetric distance measure in the space of probability distributions.Wasserstein ambiguity sets account for distributional uncertainty in a random variable, frequently one approximated in a data-driven application.They accomplish this feat with out-of-sample performance guarantees by repLbCing the data-driven distribution of the random variable with the worst-case realization within a Wasserstein ball centered about the empirical distribution [25], [26].Expressions exist which map the quality of the empirical distribution with Wasserstein ball radii such that desired robustness characteristics are achieved without significant sacrifices to the performance of the solution [27].Within the control context, however, the Wasserstein distance metric has only recently began emerging as a valuable and widespread tool.Work by Yang et al. explores the application of Wasserstein ambiguity sets for distributionally robust control subject to disturbance processes [28].Similar methods have made their way to research on model-based and model-free reinforcement learning as well [20], [29], [30].DRO has also been applied to Markov decision processes (MDPs) in a general sense, with good results [31]- [34].Scalability is still an open challenge in that space.Overall, while Wasserstein ambiguity sets are seeing increased application in controls research, many of their true capabilities have yet to be fully exploited.

B. Statement of Contributions
This paper seeks to address key shortcomings in these areas of literature.Among those previously discussed, foremost is the lack of general methods that possess robustness when conducting tabula-rasa learning-based control, or those requiring significant assumptions on availability of prior data of safe control trajectories.
We present a novel and simple model-based LbC scheme based on MPC which provides strong probabilistic out-ofsample guarantees on safety.We validate our method using experiments that emulate tabula-rasa as closely as possible given our assumptions, but our algorithm is widely applicable to adaptive control scenarios especially when underlying dynamics may be poorly structured or difficult to characterize.By developing Wasserstein ambiguity sets relating to empirical distributions of modeling error, we can conduct MPC with an imperfect learned snapshot model while maintaining confidence on our ability to satisfy nominal constraints.The Wasserstein ambiguity sets allow us to optimize with respect to constraint boundaries that are shifted into the safe region.As our empirical distributions improve with more online data generation, the offset variables tighten towards the nominal boundary in a provably safe way.Critically, in this paper, we present this LbC scheme along with (1) an explicit and fundamental persistence of excitation (PoE) scheme, and (2) highly limited SME assumptions.While many LbC methods are amenable to PoE schemes [4], the question of PoE is in some cases neglected despite its relevance.We actually show our explicit PoE scheme is fundamental to illustrating the applicability of our method.Our contributions combine to allow us to translate safety guarantees with no strong model knowledge or prior data of existing safe trajectories.
Our approach yields probabilistic safety guarantees.The overarching objective of this paper is not to present the most high-performing LbC architecture, but rather to explore what kind of performance we can obtain when limiting our SME assumptions moreso than existing work in controls literature.Many control-theoretic methods provide stronger robust (i.e.safety w.p. 1) guarantees under much more restrictive assumptions.In our case, we label our method as "trustworthy" insofar as it relies on highly limited SME.Given the elusiveness of safety guarantees in RL literature, a probabilistic result within our context is powerful.
We validate our approach by learning to safely fast charge a lithium-ion battery using a nonlinear equivalent circuit model.Battery fast charging presents a strong challenge for learningbased control methods, given that the optimal policy is a boundary solution which rides constraints until the terminal conditions are met.We also conduct a case study on safe autonomous driving using a nonlinear bicycle model of vehicle dynamics.We demonstrate that our algorithm provides a provably safe method for the vehicle to avoid obstacles while learning its dynamics from scratch.
We provide an open-source GitHub repository [35] for our case studies.

II. DISTRIBUTIONALLY ROBUST OPTIMIZATION
The core of our proposed algorithmic architecture relies heavily on distributionally robust optimization (DRO) techniques.In the following section, we outline fundamental ideas which establish the foundation of our algorithm.

A. Chance Constrained Programming
A chance constraint is a constraint within an optimization program which is only satisfied with some probability.This is typically a necessary concession when the constraint is affected by a random variable R: Here, the constraint function h(x k , u k , R) outputs an mdimensional vector.In this case, the distribution P relates to random variable R with support ξ. Here, 0 ≤ η < 1 is the specified risk metric or our allowed probability to violate the constraint.If η = 0, we say we have a robust optimization program which must not yield any probability of constraint violation.In practice, especially when approximating P from sampling, we admit some small probability of constraint violation leading to a value of η > 0. This is frequently necessary because it allows our probabilistically robust solution to balance conservatism with performance.
Upon utilizing an empirical approximation of P derived from sampling (usually denoted P), we admit some distributional uncertainty which can arise from only having access to a finite group of samples.The law of large numbers states that for any number of samples ℓ → ∞, P → P * .The discrepancy from this limited sampling creates distributional uncertainty, which can affect the quality of the solution if our approximation P is inaccurate [21].Throughout the remainder of this section, we discuss the application of distributionally robust optimization techniques to address this distributional uncertainty.

B. Wasserstein Ambiguity Sets
The Wasserstein metric is defined as follows: Definition 2.1: Given two marginal probability distributions P 1 and P 2 lying within the set of feasible probability distributions P(ξ), the Wasserstein distance between them is defined by where Π is a joint distribution of the random variables R 1 and R 2 , and a denotes any norm in R n .The Wasserstein metric is colloquially referred to as the "earth-movers distance."This name is rooted in the interpretation of the Wasserstein metric as the minimum cost of redistributing mass from one distribution to another via non-uniform perturbation [28].To show why the Wasserstein distance is a valuable tool we can leverage to robustify a data-driven optimization program, we first reference the chance constraint equation (1), which depends on an empirical distribution P. Rather than solving the optimization program with respect to an imperfect snapshot of P * defined by P, we can optimize over any probability distribution within some ambiguity set centered around our estimate P. The Wasserstein distance provides a formal method to define such an ambiguity set.Namely, we can optimize against the worst-case realization of R sourced from a set of probability distributions within specified Wasserstein radius of our empirical estimate.We define "worst-case" as the realization which yields the lowest probability of satisfying the chance constraint.This formulation can be described mathematically with the following relation: where is the ambiguity set defined for a Wasserstein ball radius ϵ.
Of note is the fact that (3) guarantees probabilistic feasibility for any probability distribution within the ambiguity set when reformulated correctly.No assumptions must be leveled on the true distribution P * for these guarantees to translate under a proper reformulation.
Reformulation is necessary because the exact constraint shown in (3) poses an infinite dimensional nonconvex problem.Ongoing research has pursued tractable reformulations of this constraint which facilitate its real-time solution.
This paper adopts a reformulation of (3) detailed in [36].This reformulation accommodates vector constraint functions and requires that the function g(x k , u k , R) is linear in R, and entails a scalar convex optimization program to derive.Our algorithm is designed to exploit the linear dependence on R such that this assumption has no affect on the applicability of our approach.Importantly, the result is a conservative convexitypreserving approximation of (3).For an m-dimensional constraint function, the exact form of the ambiguity set is V = conv({r (1) , ..., r (2 m ) }), where the vector r is sourced from the optimization component of the overall procedure.The set of constraints we find to replace the infinite dimensional DRO chance constraint are: For complete and elegant discussion of this reformulation, we highly recommend the reader reference work in [36], specifically pages 5-7 of their paper.This reformulation requires some additional information, including a tractable representation of an appropriate Wasserstein ball radius.Finally, several expressions exist for the Wasserstein ball radius ϵ which are probabilistically guaranteed to contain the true distribution with allowed probability β.We adopt the following formulation of ϵ from [27] where ℓ is the number of data points, β is the probability the Wasserstein ball contains the true distribution, and C relates to the diameter of the support of the distribution and is obtained by solving the following scalar optimization program: where the right side bounds the value of C, and R k is a sample of the random variable which comprises our empirical distribution, and μ is the sample mean of the distribution.

III. EQUIVALENT CHANCE-CONSTRAINT REFORMULATION
This paper builds upon the equivalent reformulation of (3) from [36].This reformulation leverages findings from recent work by [25].The statement of the specific reformulation in [36] indicates a requirement that the constraint function g(x, R) is linear in x and R, respectively.
Notably, we identify a simple extension of the reformulation in [36] that allows its application to our nonlinear MPC formulation via relaxing requirement the constraint function be linear in the decision variable x.
A. Restatement of the Reformulation from [36] The reformulation from [36] is stated to require the constraint function g(x, R) to be linear in x and R, respectively.In the next subsection, we extend the reformulation to include some broader cases of constraint functions: where the functions g x and g R can be nonlinear in their respective arguments.In this subsection, we restate the work from [36] as a reference for our extension included in subsection III.b.Data samples {R (1) , R (2) , ..., R (ℓ) } corresponding to random variable R ∈ R m are drawn from the true distribution P * .These finite samples comprise our empirical distribution P. The finite-ness of our empirical distribution indicates it will not perfectly match the behavior of the true distribution P * .This is especially true in cases with limited samles, which are relevant to the challenging case studies this paper explores.
Normalizing the data lends simplicity to the derivation: where Σ is the sample variance of the data and µ is the sample mean.This standardization transforms the data samples such that its new mean is 0, and its new variance is I m×m .The support of this normalized distribution is since we have centered the normalized variable ϑ.Note that 1 m is a column vector of ones.Let Q * and Q represent the true and empirical distributions of the normalized data ϑ.We construct the ambiguity set Q using the "Wasserstein ball" given by ( 4), allowing us to transform the distributionally robust chance constraint (DRCC) in (3) to which says the worst case probability that normalized random variable ϑ is outside set V is less than η, where the supremum is taken over all distributions Q in ambiguity set Q. We wish to obtain the least conservative (i.e.tightest) set V ⊆ R m in order to define the desired Wasserstein uncertainty set We restrict the overall shape of the set V to be a hypercube, which enables computational tractability: Now, to compute this ambiguity set without introducing unnecessary conservatism, we need to find the minimum value of the hypercube side length σ ∈ R. The following optimization program details this problem: subject to: sup Here, we select σmax using a priori information about the specific problem context.The derivation in [36] provides a worst-case probability formulation, summarized by the following Lemma: Lemma 3.1 (Lemma 2 of [36]): where (x) + = max(x, 0).We defer to [36] for the proof of this finding.Their result entails that ( 16) can be reformulated as where The result of this optimization program is the value of σ, which is used to reformulate the chance constraints via convex approximation.For a convex approximation of the constraint function in (3), the hypercube V(σ) becomes the convex hull of its vertices.If for example m = 1 (i.e. the random variable is 1-dimensional), then V(σ) = (−σ, σ) -an open interval.The offset r (j) is calculated from: In the two dimensional case, this yields the ambiguity set For an m-dimensional constraint function, the exact form of the reformulated ambiguity set is V = conv({r (1) , ..., r (2 m ) }).In each case, the ambiguity set is a hypercube, and the change of signs is the method by which we enumerate across that hypercube's vertices.The set of constraints are: Algorithm 1 details the method used to compute the offset σ.

B. Extending the Reformulation
Duan et al. utilize the findings of [25] in presenting their convex reformulation.Critically, we identify that the fundamental theory presented by [25] allows applying the identical reformulation to cases where the constraint function takes the form Algorithm 1 Computation of σ wherein g x and g R may be nonlinear functions.Critically, there must not be any interdependence between x and R.This paper presents a modified lemma for the applicability of the previously stated reformulation first presented by [36].
Lemma 3.2: If the function g satisfies then constraints of the following form: can be reformulated into the convex approximation using the relations in (16)(17), where r = Σ 1 2 1 m σ + µ.Proof 3.1: We start by defining auxiliary variables in the constraint function.Consider that, without loss of generality, nonlinear functions of R can themselves be considered the random variable in question: where R is the new model of the stochasticity.This gives Now, we create a dummy auxiliary decision variable x in the same manner: forming a function g which is trivially linear in x and R, where This equality constraint (29) now shows up in the overall optimization program.However, the DRCC reformulation only poses conditions on the constraint function in question (namely g(x, R)).We have transformed the distributionally robust chance constraint into which is now linear in x and R. Following procedure from [25], we suppress dependence on x (or x) for simplicity, leading to ℓ( R) = g(x, R) [25], [36]: The remainder of the proof is identical to the Appendix in [36], leading to the convex approximation: Beyond exploiting the linear presence of x in the constraint function, suppressing dependence on decision variables is possible and helpful for the following reasons.The overall process of solving an optimization program with a DRCC is characterized by a two stage stochastic optimization problem.Here, (31) is the first stage problem that we solve using the equivalent reformulation.Esfahani and Kuhn show in Section 5.3 of their paper that, without loss of generality, the solution in the second stage (i.e. the overall optimization program) is unaffected by suppressing dependence of ℓ on decision variables in the first stage.Additionally, the decisionindependent loss function ℓ( R) can trivially be expressed as a pointwise maximum of elementary measurable functions, as required by Section 4 of [25].
This means that, in practice, the dummy decision variable x will not come into play during any stage of solution.After solving the first stage problem, we can reverse the substitution in the remaining optimization to avoid an equality constraint with poor computational tractability.
We have shown a simple extension of the DRO reformulation from [36] that allows us to apply the method to nonlinear optimization programs.In the next section of this paper, we describe our nonlinear MPC formulation and the context within which the guarantee from the DRCC is translated to LbC.

A. Model Predictive Control Formulation
We apply Wasserstein ambiguity sets to robustify a learning model predictive controller, based on the following optimization program formulation.Given true plant dynamics: where t is the current timestep, W t is state noise, V t is output measurement noise, x t is the state variable, and y t is the output variable.We assume access to full state and output measurements, subject to the measurement noises W t and V t .The capital letters represent random variables.Before considering modifications for distributional robustness to uncertainty (which also accommodate exogenous inputs), we seek to solve the following predictive control problem: minimize where x t is the known (measured) initial state at the current timestep t.The "hat" symbol indicates a predicted variable, and the learned models themselves are given by: At a high level, these can be thought of as two separate models.However, when learning a black-box representation of the system, that single model can be trained to predict both sets of values xt+1 and ŷt .The parameters θ f and θ g are learned from historical data through model identification.

B. Model Identification
The models are used to predict state transition dynamics and constraint function outputs.We assume the true model parameters θ * f and θ * g are inaccessible to the controller.Several methods can be selected to learn the parameters online, and can depend on what type of learning model architecture is selected.In this paper, we utilize nonlinear least-squares with neural network models for both the state transition dynamics and constraint functions: where x k+1 and y k are assumed to be measurable from the real system at the current timestep.When conducting MPC, the initial x k is obtained by assuming full state observability throughout the LbC problem.From this point forward, we denote θ g;t as the parameterization of the learned model of g at timestep t in the overall learning process.

C. Modeling Error Characterization
We characterize modeling error through comprehensive modeling residuals across varying prediction depths.
For example, consider a scalar system x ∈ R, y ∈ R within three steps of model predictive control N = 2 with quadratic, time invariant objective function (state penalty q = 1, effort penalty r = 1, terminal state penalty p = 1): minimize ut,ut+1,ut+2 Suppose we find a sequence u * t , u * t+1 , u * t+2 from solving 3 sequential model predictive control problems with the true Fig. 1.Diagram of safe Wasserstein-constrained MPC.In the most restrictive case, after initializing the controller, it immediately begins interacting with its environment.At every timestep, it observes an MDP state transition tuple, calculates model residuals, uses the residuals to calculate the DRO offset r (j) (k), and then solves a new MPC program at the next state.This application case serves as a purposefully extreme challenge of the robustness and behavior of our algorithm at what would otherwise be unreasonable levels of uncertainty and risk.Later in our paper, we demonstrate that even under such extreme conditions, we manage to safely learn control policies for a host of nonlinear stochastic control problems.We do note, however, that our algorithm is much more widely applicable when prior data and SME is available.plant in the loop.Since we are using learned models to solve these predictive control problems, these inputs are likely not actually optimal for the system, and with added PoE they include exploratory aspects.In each case we apply the first control input to the system to obtain x * t+1 , x * t+2 , x * t+2 We can quantify prediction error of the learned constraint function in the following manner: These are 1-step residuals, as denoted by the subscript R 1 , since xt+1 = f (x t , u * t ) and xt+2 = f (x * t+1 , u * t+1 ).In these equations, the function g represents our observations from the real system (simple data), and the function ĝ represents the predictions of our learned constraint model.We take the absolute value since these residuals will be introduced as variables that add conservatism relative to the existing constraint boundary.Since we conduct predictive control, we also want to quantify modeling errors after 2, 3, or more steps of prediction into the future using learned models, as errors can accumulate and become worse with successive prediction steps.This happens in the following way: As is shown here, modeling error accumulates from learned representation of both the constraint function ĝ and the learned dynamics function f .
Remark 1: We choose to take the absolute value of residuals.This decision is not necessary, but makes intuitive sense given the application.Since we are intending to modify the nominal constraint boundary, signals of modeling errors that show underestimation could lead to an offset that potentially moves the constraint into the unsafe region.We seek to avoid this, and only create offsets that reduce the size of the feasible region.
The model identification process utilizes the 1-step residuals to minimize mean-square prediction error (MSE) of the prediction of the state transition compared to past observations.The multi-step residuals are utilized by the DRO framework to adjust conservatism deeper into the future based on cumulative modeling error.
By representing modeling error this way, we lump all relevant sources of modeling error into an additive term.As previously discussed, the absolute value is taken as a precautionary measure.Omitting that transformation provides the following simple expression: (43) By treating the residuals as random variables drawn from a true distribution P, the constraints will by definition be additive in the random variable/modeing error.

D. Safety and Robustness using Wasserstein Ambiguity Sets
Now that we have outlined the distributionally robust chance constrained approach using the Wasserstein ambiguity set, we can describe how it fits within our robust control framework.
The residuals defined in the previous subsection entail a representation of the modeling error.This is only true because the constraint functions are evaluated using predicted states from the learned dynamical model, whose true representation is unknown.By considering process error/residuals as an additive noise term, we can maximize the utility of the DRO reformulation in [36] which requires this linear structure in the constraint: As previously discussed and shown in equation (43), by design, this linear structure will always occur.These residuals are random variables characterized by empirical distributions based on our observations.Now, we've bolded the variable R 1 to indicate it is a random variable, whereas the previous value 1 was a realization of this random variable at time t.
To accommodate distributional uncertainty in our estimate of P, we transform the constraint (44) for each of 1 → N + 1 step residuals into a joint distributionally robust chance constraint via Wasserstein ambiguity set as follows: The reformulation we adopt from [36] presents a simple method to accommodate the constraint without inverting the CDF.If we operate under the assumption that the residuals for i = 1, ..., N steps are uncorrelated, then we can decompose this joint chance constraint into a set of individual chance constraints.This decomposition could be useful if the optimization algorithm we select to solve the MPC problem scales unfavorably with the dimension of the constraints.Algorithm 1 provides an overview of the real-time implementation of our approach.As previously stated, the process for computing r entails a simple scalar convex optimization program.Remark 2: The reformulation from [36] adds cardinality of constraints that scale with order 2 m .However, our formulation of modeling error as an additive residual allows the number of constraints to remain constant.We detail this property in the Appendix of this paper.The simple answer is that, by taking the absolute values of the residuals, the random variable that represents modeling error is strictly non-negative.This means a negative realization is impossible to encounter, and need not be accommodated.By keeping the cardinality of constraints constant, the computational scalability of our approach is preserved for higher dimensional control problems.
At each time step, we compute model residuals with our most recent estimate θ g;t using predicted state transitions from our entire cumulative experience, compile a unique empirical distribution P corresponding to each individual chance constraint, and compute the value of r in (5) to reformulate the distributionally robust chance constraints.We can begin the overall process with a small control horizon N , and gradually increase N as we accumulate more and more data from experience.The residuals we compute are for horizon lengths of 1 to N -steps, meaning the elements of R correspond to each of i = 1, ..., N step residuals.Then, we assemble a joint chance constraint where the elements of the column vector of the random variable are the 1 → N step residuals.In [36], authors pursue a DRO reformulation that utilizes a polytopic representation of the uncertainty set.Our formulation preserves scalability by isolating dependence on the random variable in the constraint.Our Appendix shows the logic that allows the cardinality of constraints to remain constant.
Finally, when we conduct MPC, we replace the nominal constraints with their distributionally robust counterparts: . . .
The MPC program specified in (47a-47i) details the slight modifications made to (46a-46d) accommodating the coupled PoE component to our LbC framework.We discuss this in more detail in part F. of this section.
One important note concerns a specific scenario of model adaptation where the true underlying system slowly changes.Our application of receding horizon control necessitates the use of a snapshot model in the prediction phase.This requires we assume the rate of change of the dynamics of the true plant is relatively small.In such conditions, however, the historical residuals we collect through measurements will slowly lose relevance.This issue can be easily reconciled with use of either a moving window of residuals, or with a proper forgetting scheme.In this paper, we propose a simple method to accommodate such cases.Since the focus of this paper is on tabula-rasa learning-based control, we relegate the discussion of this additional framework to this paper's appendix.

E. Horizon Increment Rule
MPC with well-defined dynamical structure can leverage judicious selection of the prediction horizon as a component to proving recursive feasibility.When considering a general class of systems as is the case with MBRL, the prediction horizon becomes a hyperparameter that manages the tradeoff between prediction depth and computational expense.In this paper, we elect to define a simple horizon increment rule for our experiments.Typically in learning-based control, the prediction horizon is a hyperparameter whose selection can be done empirically with more nuanced methods [37], [38].In our case studies, which we design to emulate tabula-rasa learning-based control as closely as is consistent with the assumptions of our algorithm, we utilize this horizon increment rule as a heuristic to simply allow the problem to be rapidly solved.By solving severely restrictive case studies, we validate the performance of our method under the most challenging context for which it is technically designed.For real-world applications, the horizon can often be selected using a combination of available subject matter expertise (which should not be ignored if it is available), and automatic tuning methods like those of [37], [38].The increment rule is not meant as a serious method for real-world embedded control systems that often possess highly limited computational resources.

F. Persistence of Excitation, and Problem Assumptions
This subsection defines the set of least restrictive assumptions we identify towards achieving safe learning-based control.In this paper, we consider systems with non-hybrid dynamics for simplicity.Our method leverages proved safety properties from [36], which apply to static optimization programs.We identify that these methods can apply to LbC problems under a series of assumptions made in this section.These assumptions almost entirely relate directly to situations when the dynamical, DRO, and PoE components, which are normally not considerations for static optimization programs, could create opportunities for empty feasible sets.This subsection defines a PoE scheme directly amenable to translating guarantees from [36] to our formulation.Notably, our assumptions are significantly less restrictive than those of existing LbC methods.The majority of these assumptions relate to clear necessary conditions which we detail here: Assumption 4.1: A feasible state and control trajectory exists for each prediction horizon N in the optimal control problem.This is the most fundamental requirement to apply safe control.
Assumption 4.2: We assume we know a safe control input which we can apply at the first timestep.Starting with limited model knowledge, if we don't know a temporarily safe control input we can apply at the first timestep, we obviously can't translate any meaningful safety certificates.This contrasts to other work which requires knowledge of safe control trajectories throughout the time horizon, or a known safe backup policy.
Assumption 4.3: Starting with an optimal control problem of the form (35a-35f), suppose we have a constraint function g(x k , u k , θ g;t ) : § × U × θ → S. The sublevel set G r DRO = {(x, u) ∈ §, U : g(x, u) + r DRO ≤ 0} defines the adjusted feasible region, where feasibility is satisfied at the current timestep.This set must not be empty ∀r DRO ∈ R, where the set R = {r DRO ∈ R : 0 ≤ r DRO ≤ r DRO;max } describes the set of all potential values of the DRO offset.Since our method relies on creating an offset from the nominal constraint boundary, any potential value of the offset must lie in the image of the constraint function.
This assumption can be thought of as a generalization of a common LbC assumption that relates to "bounded modeling error," an example of which is given by Assumption 2 in [39].In our case, using general function approximation, our method to quantify model error is empirically based on residuals.If the residuals of the learned model are too large, indicating our learned model is inaccurate, the resulting computed r DRO (which is a conservative approximation of the residual, based on its distribution) will enforce a large offset from the nominal boundary.This assumption says that if the learned model is sufficiently inaccurate, the offset will be so large that the adjusted feasible region is empty, which is incompatible with the setup of [36].The value r DRO;max represents any maximum residual value we can potentially infer from the problem, and can be defaulted to as an empirical approach if this case is reached in a real problem, although safety properties may not be reliable in such cases.Our experiments show such scenarios can be unlikely to occur, although the possibility of their occurrence should be considered.
The next assumption relates to a slightly stronger condition regarding persistence of excitation (PoE).The agent must be capable of exploring during LbC.In order to ensure the guarantees from [36] translate under those diverse circumstances, the same statements of 3.1-3.3must be satisfied with respect to an additional exploration process N that ensures PoE.
For clarity, we define the following modified MPC program that considers an additive exploration signal from N : where N is the distribution of a random exploration process which can be added to the nominal control input, and the superscript x n and u n denote trajectories perturbed by the exploration signal.The solution u n (t) ⋆ is then applied to the plant at time step t.Remark 3: Equations (47a-47i) guarantee feasibility from k = t to k = t + N for a system with parameters θ g;t with a specified risk metric/probabilistic guarantee.This is formulated to guarantee feasibility over the control horizon.To assess recursive feasibility, one could utilize the methods from [19], [20] that require more significant restrictions in the form of model knowledge, mathematical structure on the feedback policy, and prior existing safe data.
The additive noise perturbation for exploration takes inspiration from common methods with actor-critic or policy gradient learning, where noise via an Ornstein-Uhlenbeck process is added to the control input [40].Relative to those existing methods, we make the following modifications for implementation: Remark 4: We must constrain both nominal and perturbed trajectories to ensure safety even with exploration.If we only add the perturbation after solving the MPC program, safety is not guaranteed.
Remark 5: A scalarized tradeoff between J k (x k , u k ) and J k (x n k , u n k ) can be formulated to balance exploration and exploitation during planning.Now, we define the next assumption relevant to translating safety to LbC systems under strong limitations on SME: Assumption 4.4: Given the noise process N defined to satisfy PoE for the model identification problem, the constraints g(x k , u k , θ g;t ) and g(x n k , u n k , θ g;t ) of the snapshot model must be satisfied for every realization from N throughout the overall finite-time optimal control problem.
Given these conditions, we state the following remark detailing the properties of our method: Remark 6: Based on the provided safety guarantee from the adopted DRO framework from [36], (46a-46d) admits a feasible solution that satisfies the nominal constraints w.p. 1 − η as long as the feasible set is not empty, which follows from Assumptions 3.1-3.4.
We also state two remarks that help with implementation of our approach.
Remark 7: These assumptions must also hold for the prediction horizons chosen at each instant in time.
Remark 8: If the DRO offset is so large it creates an empty feasible set, an artificial value r DRO;max can be defaulted to to facilitate implementation, although safety guarantees in such situations may be difficult to translate.If a random search is used to solve the MPC program in such cases, the evaluated trajectory that creates the least predicted constraint violation given the unmodified DRO offset can be selected.

V. CASE STUDY IN SAFE ONLINE LITHIUM-ION BATTERY FAST CHARGING
In this section, we validate our approach using a nonlinear lithium-ion battery fast charging problem.This problem closely emulates the performance-safety tradeoffs of common safe RL validation studies including ant-circle [41].Specifically, the objective is to charge the battery cell as fast as possible, but the charging is limited by nonlinear voltage dynamics which must stay below critical thresholds.Violation of the voltage constraint can lead to rapid aging and potential catastrophic failure.However, higher input currents (which increase voltage) also directly charge the battery more rapidly.Thus, the optimal solution is a boundary solution where the terminal voltage rides the constraint boundary.This presents a problem with significant challenges and tradeoffs relating to safety and performance.Exploring how such algorithms accommodate these challenges can reveal insights into their overall efficacy and shortcomings.

A. Equivalent Circuit Model of a Lithium-Ion Battery
Lithium-ion batteries can be modeled with varying degrees of complexity.Some of the more detailed dynamical models are based on electrochemistry.For example, the Doyle-Fuller-Newman (DFN) electrochemical battery model is a high-fidelity first-principles derived physics based model of the dynamics within a lithium-ion battery [42].Varying model-order reduction can be applied, yielding versions including the single particle model and the equivalent circuit model (ECM).For simplicity, this paper's case study utilizes an ECM.The relevant state variables in this model are the state of charge SOC and capacitor voltages V RC in each of two RC pairs.The relevant constraint is on the terminal voltage V .This constraint prevents the battery from overheating or aging rapidly during charging and discharging.The state evolution laws are given by: where I(t) is the current input (which is the control variable for this problem), and V OCV is the open-circuit voltage function, which is conventionally measured through experiments.The full experimental OCV curve is used to represent the true plant in the loop, and is obtained from a lithium-iron phosphate (LFP) battery cell [43].In this paper, we learn the dynamics of the states and output using a simple feed-forward neural network model.

B. Model-Predictive Control Formulation
We utilize the following formulation of fast charging: minimize subject to: Remark 9: In our case, we assume the controller does not have access to the form of the underlying dynamics given by (48-51).Instead, we apply our end-to-end LbC method to learn the dynamics "from scratch" as is consistent with tabula-rasa learning methods.We utilize neural network blackbox models to accomplish this.The rules used to update the neural network parameters affect the convergence of the datadriven model to accurate behavior, which also effects empirical safety.We keep the neural network training consistent between our DRO algorithm and its non-robust baseline.The exact training procedure can be referenced in the public codebase [35].Updating more slowly at first tends to encourage more safe behavior.In these case studies, we apply perturbation to the inputs that further excite the system, towards ensuring PoE.These perturbations are drawn as uniform vectors whose elements lie between −2.5 ≤ x p ≤ 2.5 Amps.These perturbations are applied to both the distributionally robust controller, as well as the non-robust baseline controller In both cases, we seek to ensure mutual constraint satisfaction for the trajectories predicted using both the nominal and perturbed inputs.
We only allow a maximum total of 500 seconds for the battery to be charged.The timestep ∆t = 1 seconds, η = 0.025, β = 0.99, and N targ = 8 steps.Our neural network dynamical model has 1 hidden layer with 3 neurons and sigmoid activation function, with a linear output layer.To solve the MPC problem, we apply a (1 + λ) evolutionary strategy (ES) based on a normally distributed mutation vector.In our appendix, we describe how this strategy works, why we select it, and other reasonable alternatives.The solver works with a single iteration and 250,000 mutants.The initial point of the ES is taken as the optimal point from the previous timestep.Addressing Assumption 2, we assume that at the first timestep, control inputs of I k ≤ 25 Amps are known to be temporarily safe.Since we constrain voltage which is a scalar, the constraint function dimension m = 1.
Our baseline is a learning MPC controller with no DRO framework.We adopt the same problem formulation as if we were going to add the constant r DRO to the constraints, but we omit the DRO constant in the end to evaluate the impact it has on the robustness of the final control law.

C. Results
In total, we conducted a series of 10 experiments with identical designs but different initial random seeds.We run our algorithm and a non robust baseline for these 10 independent runs on the same battery fast charging problem detailed in the previous subsections.Table 1 shows the performance, computation, and safety statistics for each of these runs.For a closer look, we go to Figure 2 which shows one run of both the DRO algorithm and its non-robust counterpart.In the case of Figure 2 (run 1), the DRO-based does not violate the constraint at any point.In Figure 3 we see the highest incidence of constraint violation for the DRO controller (from run 4).Conversely, the non-robust versions both experiences  a combination of initial, significant voltage spikes as well as minor violations which persist throughout the experiments.In total, if we focus on Figure 3 (run 4), the non-robust version violated constraints in 13.6 % of timesteps (68 timesteps out of 500 total).The charging time was 6.85 minutes, which was 16.29% faster than the DRO version, whose charging time was 8.1833 minutes.This makes intuitive sense, as the added DRO framework introduces additional conservatism which affects the performance of the overall control policy.
Overall across all 10 runs, our DRO version violates constraints in 0.26% of total timesteps, which is well within the chosen value of η = 0.025 = 2.5% over just a single optimization iteration.The non-robust version, however, violates constraints in 9.76% of total timesteps on average.Similarly, there is a stark difference in the maximum voltages seen by the robust and non-robust versions, with the DRO framework reducing the mean peak voltage by 122.9 millivolts.The DRO calculations increase the overall computation time by an average of 43.7 milliseconds per timestep, and allow the algorithm in this case to run in real time.No optimizations were made to the Matlab code to expedite the runtime of either algorithm, and the only difference in code between the two algorithms is the auxiliary and separate DRO framework.Finally, across the 10 total runs the overall charging time with the DRO framework averages 7.8150 minutes, approximately 14.1% longer than that of the non-DRO version.Given the safety-critical nature of this control problem, the safety guarantees of our algorithm are likely well worth the marginal degradation to the charging performance resulting from added conservatism.

VI. CASE STUDY IN SAFE AUTONOMOUS DRIVING AND
OBSTACLE AVOIDANCE In the following section, we implement our algorithmic architecture to safely learn to drive a vehicle while avoiding obstacles.This learning occurs within the same design as our battery case study, namely we begin with zero model knowledge and only a single known safe control input.We fit a data-driven model to the dynamics and conduct receding-horizon control.
This study is designed with specific decisions in mind to more effectively reveal the efficacy of our algorithm.Some of these decisions make our study somewhat unrealistic insofar as they expose the agent to greater danger than necessary.The following subsections discusses these decisions in more detail.

A. Dynamical Simulator
In this case study, we utilize a bicycle model for the vehicle dynamics.This environment is encoded in the following equations discretized via forward Euler approximation: x 1;t+1 = x 1;t + ∆t(x 4;t cos(x 3;t )) (55) x 2;t+1 = x 2;t + ∆t(x 4;t sin(x 3;t )) (56) x 3;t+1 = x 3;t + ∆t x 4;t tan(u 2;t ) L (57) where t is the current timestep, x 1 and x 2 are the x-y position of the vehicle, x 3 is the vehicle heading angle, x 4 is the vehicle velocity, u 1 is the acceleration input (in m s 2 ), and u 2 is the steering angle input in radians.These equations represent the true plant, which is unknown to our learning-based controller.

B. Model Predictive Control Formulation
We utilize the following formulation of simple autonomous driving with obstacle avoidance: minimize subject to: Here, Z(x k ) is the obstacle barrier function which we limit to be smaller than a specified value (corresponding to the definition of the edge of the obstacle).Residuals in the DRO algorithm are with respect to this barrier function using predicted values of the dynamical state, as opposed to the value of the obstacle function obtained with the true state.We create the driving environment defined by Z(x k ) by generating and summing random Gaussians in 2 dimensions.Then, we define the obstacle boundaries by setting a threshold within the static map, below which becomes the safe region and above which the obstacles inhabit.This map is used with interpolation during the final experiment.If this constraint is violated, the agent will take actions which minimize constraint violation until feasibility is restored.We set u min = [−1, −0.75], u max = −u min .The experiment terminates once the vehicle leaves the 100 × 100 meter space.
With the learned neural network dynamics models, the MPC formulation in (59-61) becomes: minimize subject to: Table 3 includes relevant parameters of our case study design.
In this case study, we simply use 1-step residuals by relying on a basic assumption that the modeling error is uncorrelated to the depth of prediction.Based on our experiments, this assumption is reasonable.We make a deliberate choice for this objective function for a host of reasons.While it necessarily encodes our intended behavior, it also is simple and at odds with the preeminent objective of avoiding obstacles.Normally, we might want to encode additional considerations to constraints.However, by allowing our simple objective function to drive the vehicle directly towards the obstacles, our control algorithm must be capable of managing the vehicle while simultaneously maintaining safety throughout most of the experiment.Thus, this case study is designed to specifically focus on the added safety contributions from the DRO framework.
For our learned model, we initialize a feed forward neural network based on a single hidden layer with 10 neurons.The hidden layer uses sigmoid activation functions, and the output layer uses linear activation.At the first timestep, we assume control inputs of a zero vector are known to be safe.To solve the MPC problem, we use the same (1+λ) evolutionary strategy used in our battery case study.In this case, we modify the optimization algorithm such that we utilize 750,000 mutants.We also increase the maximum prediction horizon to N max = 12 to improve the consistency of our results.

C. Results
Much like our battery case study, we conduct 10 individual runs with both our algorithm and a non-robust version.Figures 4 and 5 show runs 1 and 3, respectively.Table 4 shows the safety statistics from the total set of experiments.
We observe marked improvements to safety with use of our DRO algorithm.With the DRO controller, only 1 of the 10 total runs violates constraints at all and only during a single timestep.The overall violation with the DRO controller is 0.0623% of timesteps.Moreover, the magnitude of the violation with the DRO controller is equivalent to the vehicle skimming the edge of the boundary by less than 0.0386 meters.Conversely, the non robust controller shows significant constraint violation in nearly all 10 runs.The constraint violation of the non robust  To verify the model is operating in nonlinear portions of the state space, Figure 6 shows the range of the variable x 3 throughout experiment 1.
Fig. 6.Heading angle trajectory for run 1 (same as that shown in Figure 6).The total range of heading angles is nearly π, showing exploration of highly nonlinear portions of the state space.The feasible range of steering angle input also covers a range of nonlinear behavior in the dynamics.

VII. DISCUSSION
In these case studies, we have not only explored the behavior of our algorithm at the boundary of available knowledge and data, but have validated the theoretical safety properties of our approach under the most challenging arena of its applicability.Importantly, our approach is widely relevant in many LbC contexts.For real-world applications, we are unlikely to conduct this restrictive type of tabula-rasa LbC.However, the same safety guarantees we have rigorously validated in these case studies are similarly applicable when more data and knowledge is available (e.g.conventional adaptive control, but with the modeling capacity of nonlinear machine-learning models).
Since our approach functions as an end-to-end LbC method, it is amenable to more unconventional applications including control synthesis from images or any sort of state embedding [44].Since we leverage black-box modeling to predict state transitions, as long as we can formulate constraints from the available state representation, we can apply our method for LbC with probabilistic safety guarantees.We relegate exploration of our method for embedding-based LbC to future work.

VIII. CONCLUSION
This paper presents an end-to-end distributionally robust model-based control algorithm.It addresses the problem of safety during learning-based control with strong limitations on our available knowledge and subject matter expertise.We adopt a stochastic MPC formulation where we augment constraints with random variables corresponding to empirical distributions of modeling residuals.By applying Wasserstein ambiguity sets to optimize over the worst-case modeling error, we translate an out-of-sample safety guarantee subject to new data and experience.We validate this finding through simulation experiments.This method is applicable to nonlinear MPC, but when applying to convex MPC programs it preserves convexity of the optimization program.
Our results provide the basis for several meaningful insights.It is clear that the supporting research for Wasserstein ambiguity sets provide an ideal base for its application to learning-based control.Our numerical experiments indicate our approach is highly effective at providing probabilistic safety guarantees even in challenging cases of online learning-based control nearly from scratch.we see trivially that the feasible region defined by (70-73) is identical to that defined solely by (73).This pattern continues for any m ∈ N of R ∈ R m .

Evolutionary Strategies and Random Search
In our paper, we utilize a (1 + λ) evolutionary strategy to approximately solve the numerical MPC optimization program.This is a form of random search, where instead of utilizing gradients for optimization, we utilize a random strategy to iteratively test mutations of our initial guess until converging to a reasonable approximately optimal solution.This is a subset of what is generally referred to as a ( µ ρ + λ) evolutionary strategy, whose precise definition can be referenced in [45].A ( µ ρ + λ) evolutionary strategy is a very simple form of a genetic algorithm, whereby at each generation/iteration of optimization, we have some number of "parents" who are mutated, and the parents are replaced by the highest performing mutated offspring.Random search has been shown to be a highly effective method for solving optimization problems in reinforcement learning literature [46].Random search is also highly amenable to constrained optimization (without equality constraints), as infeasible mutants can be pruned from selection.Furthermore, if no feasible mutants are found, the mutant that least violates the constraint boundary can be defaulted to if additional computation is undesirable.

Slow Model Adaptation
To accommodate potential cases where the true plant dynamics change slowly over time, we can adopt the following approach which preserves the safety guarantees of the Wasserstein DRO framework.We have system dynamics x ∈ R n with no finite escape time.Furthermore, g(x, u, θ * ) ≤ 0 is our constraint function.Suppose it holds that the function g behaves where θ * t is the parameterization of the true plant at time t, and θ t is the learned model at time t.If we add a value to the residual R (t) we accommodate for worst-case model adaptation in our algorithm.This scheme, coupled with a judiciously designed moving window of residuals, can accommodate model adaptation in the true underlying plant.This provides a conservative, but robust means to address additional model adaptation throughout the learning process.Ideally the bound on the change of the constraint function C is small, meaning the true plant changes gradually over time.In this case, the additional offset will present a relatively small additional contribution to the overall robust offset.

Visualization of DRO Offset and PoE Demonstration
To visualize both an added PoE component and the DRO offset, we run the following additional experiment, plotting the evolution of the offset throughout time.Here, we consider a set N of additive Uniform noise to the control input capped at ±5. Figure 7 shows these results.

Fig. 2 .
Fig. 2. Comparison of nonlinear MPC Controller with and without DRO for lithium-ion battery fast charging.Run 1 is shown here.

Fig. 3 .
Fig. 3. Comparison of nonlinear MPC Controller with and without DRO for lithium-ion battery fast charging.Run 4 is shown here.

Fig. 4 .
Fig. 4. Comparison of nonlinear MPC Controller with and without DRO for vehicle obstacle avoidance.In this run, the DRO controller does not violate the constraints at all.This figure shows run 1, with the bottom plots revealing close ups of the areas with the highest constraint violation.

Fig. 5 .
Fig. 5. Comparison of nonlinear MPC Controller with and without DRO for vehicle obstacle avoidance.This figure shows run 3, with the bottom plots revealing close ups of the areas with the highest constraint violation.

Fig. 7 .
Fig. 7. Battery experiment showing time evolution of the DRO offset and added PoE component.The PoE component adds noise to the input signal while maintaining probabilistic feasibility.We cap the DRO offset at r DRO;max = 0.4 (the max true value was temporarily 14.24), which would create an empty feasible set.Remark 8 in Section IV.E describes how implementation works when the DRO feasible set is ostensibly empty.

TABLE I UPDATE
VALUES TO BE CONSISTENT WITH REPO CODE RUN SAFETY, COMPUTATIONAL, AND PERFORMANCE COMPARISON FOR DRO-MPC AND MPC WITH BATTERY FAST CHARGING.ACTIVATION OF THE DRO OFFSET BEGINS AT minResidNum = 2.

TABLE IV SAFETY
COMPARISON FOR DRO-MPC AND MPC WITH VEHICLE OBSTACLE AVOIDANCE.THE MAX VIOLATION IS IN TERMS OF THE EUCLIDEAN DISTANCE.THE NUMBERS IN PARENTHESIS ARE THE TOTAL NUMBER OF TIMESTEPS WHERE CONSTRAINTS ARE VIOLATED, WITH THE DENOMINATOR BEING THE NUMBER OF TIMESTEPS BEFORE THE VEHICLE LEAVES THE 100 × 100 SIZED ENVIRONMENT.