Learning Robust Output Control Barrier Functions from Safe Expert Demonstrations

This paper addresses learning safe output feedback control laws from partial observations of expert demonstrations. We assume that a model of the system dynamics and a state estimator are available along with corresponding error bounds, e.g., estimated from data in practice. We first propose robust output control barrier functions (ROCBFs) as a means to guarantee safety, as defined through controlled forward invariance of a safe set. We then formulate an optimization problem to learn ROCBFs from expert demonstrations that exhibit safe system behavior, e.g., data collected from a human operator or an expert controller. When the parametrization of the ROCBF is linear, then we show that, under mild assumptions, the optimization problem is convex. Along with the optimization problem, we provide verifiable conditions in terms of the density of the data, smoothness of the system model and state estimator, and the size of the error bounds that guarantee validity of the obtained ROCBF. Towards obtaining a practical control algorithm, we propose an algorithmic implementation of our theoretical framework that accounts for assumptions made in our framework in practice. We validate our algorithm in the autonomous driving simulator CARLA and demonstrate how to learn safe control laws from simulated RGB camera images.


Introduction
Safety-critical systems rely on robust control laws that can account for uncertainties in system dynamics and state estimation.For example, consider an autonomous car equipped with noisy sensors that navigates through urban traffic [1].The state of the car is not exactly known and estimated from output measurements, e.g., from a dashboard camera, while the dynamics of the car are not perfectly known either, e.g., due to unknown friction coefficients.A model of the system dynamics and a state estimator can usually be obtained, e.g., from first principles or estimated from summarized in [34].The work in [35] considers the construction of higher order CBFs and their composition by, similarly to [32,33], alternating-descent heuristics to solve the arising bilinear SOS program.Such SOS-based approaches, however, are known to be limited in scalability and do not use potentially available expert demonstrations.
A promising research direction is to learn CBFs from data.The authors in [36] construct CBFs from safe and unsafe data using support vector machines, while authors in [37] learn a set of linear CBFs for clustered datasets.The authors in [38] proposed learning limited duration CBFs and the work in [39] learns signed distance fields that define a CBF.In [40], a neural network controller is trained episodically to imitate an already given CBF.The authors in [41] learn parameters associated with the constraints of a CBF to improve feasibility.These works present empirical validations, but no formal correctness guarantees are provided.The authors in [42][43][44][45] propose counter-example guided approaches to learn Lyapunov and barrier functions for known closedloop systems, while Lyapunov functions for unknown systems are learned in [46].In [47][48][49] control barrier functions are learned and post-hoc verified, e.g., using Lipschitz arguments and satisfiability modulo theory, while [50] uses a counter-example guided approach.As opposed to these works, we make use of safe expert demonstrations.Expert trajectories are utilized in [51] to learn a contraction metric along with a tracking controller, while motion primitives are learned from expert demonstrations in [52].In our previous work [53], we proposed to learn CBFs for known nonlinear systems from expert demonstrations.We provided the first conditions that ensure correctness of the learned CBF using Lipschitz continuity and covering number arguments.In [54] and [55], we extended this framework to partially unknown hybrid systems.In this paper, we focus on state estimation and provide sophisticated simulations of our method in CARLA.

Contributions
In this paper, we learn safe output feedback control laws for unknown systems.We first present robust output control barrier functions (ROCBFs) to establish safety under system dynamics and state estimation uncertainties.We then formulate a constrained optimization problem for constructing ROCBFs from safe expert demonstrations, and we present verifiable conditions that guarantee the validity of the ROCBF.While the optimization problem is in general nonconvex, we identify conditions under which the problem is convex.For the general case, we propose an approximate unconstrained optimization problem that we can solve efficiently.Finally, we propose an algorithmic implementation of our theoretical framework to learn ROCBFs in practice, and we present an empirical validation in CARLA [56].
In contrast to our previous works [53][54][55], in which we assume perfect state knowledge, we focus on dealing with state estimation errors.Our paper additionally differs from [53][54][55] in its practical focus.We discuss the algorithmic implementation of our framework to account for assumptions of our work in practice.For instance, our framework crucially relies on obtaining "unsafe" data which is hard to obtain in practice, and we propose a new algorithm to obtain unsafe datapoints as boundary points from the set of safe expert demonstrations based on reverse k-nearest neighbors.

Background and Problem Formulation
At time t ∈ R ≥0 , let x(t) ∈ R n be the state of the dynamical control system described by the set of equations ẋ(t) = f (t) + g(t)u(t), x(0) := x 0 (initial condition), (1b) where x 0 ∈ R n is the initial condition.The functions F : R n ×R ≥0 → R n and G : R n ×R ≥0 → R n×m are only partially known, e.g., due to unmodeled dynamics or noise, and locally Lipschitz continuous in the first and piecewise continuous and bounded in the second argument.Assumption 1.We assume known nominal models F : R n ×R ≥0 → R n and Ĝ : R n ×R ≥0 → R n×m together with functions ∆ F : R n ×R ≥0 → R ≥0 and ∆ G : R n ×R ≥0 → R ≥0 that bound their respective errors as The functions F (x, t), Ĝ(x, t), ∆ F (x, t) and ∆ G (x, t) are assumed to be locally Lipschitz continuous in the first and piecewise continuous and bounded in the second argument.
The models F (x, t) and Ĝ(x, t) may be obtained by identifying model parameters or by system identification [57], while the assumption of error bounds ∆ F (x, t) and ∆ G (x, t) is standard in robust control [2].We now define the set of admissible system dynamics as The output measurement map Y : R n → R p is only partially known and locally Lipschitz continuous.For instance, Y can describe a dashboard camera that is hard to model.We assume that there exists an inverse yet unknown map X : R p → R n that recovers the state x as X(Y (x)) = x.This means that a measurement y uniquely defines a corresponding state x and implies that p ≥ n.This way, we implicitly assume high-dimensional measurements y such as a dashboard camera where the inverse map X recovers the position of the system, or even its velocity when a sequence of camera images is available.Since Y and X are unknown, one can, however, not recover the state x from y.We present an example in the simulation study and refer to related literature using similar assumptions, such as [15,58].Assumption 2. We assume to have a known model X : R p → R n together with a function ∆ X : R p → R ≥0 that bounds the error The functions X(y) and ∆ X (y) are assumed to be locally Lipschitz continuous.
The state estimator X(y) and the error bound ∆ X (y) may be obtained using machine learning methods, see e.g., [15,58], or X(y) can encode the extended Kalman filter together with ∆ X (y), see e.g., [59,60].We now define the set of admissible inverse output measurement maps as Finally, the function U : R p × R ≥0 → U is the output feedback control law where U ⊆ R m encodes input constraints.System (1) is illustrated in Fig. 1.Let a solution to (1) under an output feedback control law U (y, t) be x : I → R n where I ⊆ R ≥0 is the maximum definition interval of x.The goal in this paper is to learn an output feedback control law U (y, t) such that prescribed safety properties with respect to a geometric safe set S ⊆ R n are met by the system in (1).By geometric safe set, we mean that S describes the set of safe states as naturally specified on a subset of the state space (e.g., to avoid collision, vehicles must maintain a minimum separating distance).Definition 1.A set C ⊆ R n is said to be robustly output controlled forward invariant with respect to the system in (1) if there exist an output feedback control law U (y, t) such that, for all initial conditions x(0) ∈ C, for all admissible system dynamics F (x, t) ∈ F(x, t) and G(x, t) ∈ G(x, t), and for all admissible inverse output measurement maps X(y) ∈ X (y), every solution x(t) to (1) under U (y, t) is such that: 1) x(t) ∈ C for all (t) ∈ I, and 2) the interval I is unbounded, i.e., I = [0, ∞).If the set C is additionally contained within the geometric safe set S, i.e., C ⊆ S, the system in (1) is said to be safe under the safe control law U (y, t).
Towards this goal, we assume a data set of expert demonstrations consisting of N 1 input-output data pairs (y i , u i ) ∈ R p × R m along with a time stamp t i ∈ R ≥0 as that were recorded when the system was in a safe state X(y i ) ∈ int(S) where int(S) denotes the interior of the safe set S. We assume to have expert control inputs u i available that can later be used for learning a safe control law.The pairs of expert demonstrations (y i , t i , u i ) have to be such that a system trajectory starting from a state x ∈ ∆ X (y i ) can be kept within the safe set S. If this was not the case, the later posed optimization problem (in equation (7)) would be infeasible.There are interesting observations as to what constitutes a "good" expert action u i , see [53] for details.Problem 1.Let the system in (1) and the set of safe expert demonstrations Z dyn be given.Under Assumptions 1 and 2, learn a function h : R n → R from Z dyn so that the set is robustly output controlled forward invariant with respect to (1) and such that C ⊆ S, i.e., so that (1) is safe.
An overview of our proposed solution is shown in Fig. 2. We formulate a constrained optimization problem to learn a function h(x) so that the learned safe set C is robustly output controlled forward invariant and contained within the geometric safe set S, i.e., C ⊆ S. The optimization problem takes the system model M := ( F , Ĝ, X, ∆ F , ∆ G , ∆ X ) and the expert demonstrations {(y i , u i , t i )} as inputs and imposes constraints on a function q and h(x) that will be derived in the sequel.We remark that the proofs of technical lemmas, propositions, and theorems can be found in the appendices.   2 This assumption is equivalent to X(Y) ⊇ C. The set X(Y) is typically the domain of interest in existing state-based CBF frameworks, see e.g., [4].

Robust Output Control Barrier Functions
We recall that a function α : R → R is an extended class K function if it is a strictly increasing function with α(0) = 0. We can guarantee that C is robustly output controlled forward invariant if there exists a locally Lipschitz continuous extended class K function α : R → R such that for all (y, t) ∈ Y × R ≥0 where ⟨•, •⟩ denotes the inner-product between two vectors.Unfortunately, the condition (3) is difficult to evaluate due the infimum operators.Towards a more tractable condition, we first define the function Note that ROCBFs account for both system model and estimation error uncertainties.The standard CBF condition from [4] is recovered if the system is completely known, i.e., the sets F(x, t), G(x, t), and X (y) are singletons.Now define the set of safe control inputs induced by a ROCBF as We next show that a control law U (y, t) ∈ U s (y, t) renders the set C robustly output controlled forward invariant.Theorem 1. Assume that h(x) is a ROCBF on the set Y that is such that Y ⊇ Y (C), and assume that the function U : Y × R ≥0 → U is continuous in the first and piecewise continuous in the second argument and such that U (y, t) ∈ U s (y, t).Then x(0) ∈ C implies x(t) ∈ C for all t ∈ I.If the set C is compact, it follows that C is robustly output controlled forward invariant under U (y, t), i.e., I = [0, ∞).

Learning ROCBFs from Expert Demonstrations
The previous section provides safety guarantees when h(x) is a ROCBF.However, one is still left with the potentially difficult task of constructing a twice continuously differentiable function h(x) such that (i) the set C defined in equation ( 2) is contained within the set S and has a sufficiently large volume, and (ii) it satisfies the barrier constraint ( 4) on an open set Y that is such that Y ⊇ Y (C).In fact, ensuring that a function h(x) satisfies the constraint (4) can involve verifying complex relationships between the vector fields F (x, t) and Ĝ(x, t), the state estimate X(y), the function h(x), and its gradient ∇h(x), while accounting for the error bounds ∆ X (y) as well as ∆ F (x, t) and ∆ G (x, t).
This challenge motivates the approach taken in this paper, wherein we propose an optimizationbased approach to learning a ROCBF from safe expert demonstrations.

The Datasets
We first define the finite set of safe datapoints as the projection of all datapoints y i in Z dyn via the state estimator X into the state domain.For ϵ > 0, define the set of admissible states D ⊆ R n as ϵ} is the closed norm ball of size ϵ centered at x i and where bd(•) denotes the boundary of a set.Conditions on ϵ will be specified later to ensure validity of the learned control law.The set D ′ is the union of these ϵ norm balls, see Fig. 3 (left and centre).The set of admissible states D is equivalent to the set D ′ without its boundary so that D is open.Note that D is based on expert demonstrations y i via the state estimator X(y i ).The expert demonstrations y i in Z dyn define an ϵ-net of D. In other words, for each x ∈ D there exists a y i in (y i , t i , u i ) ∈ Z dyn such that ∥ X(y i ) − x∥ ≤ ϵ.We additionally assume that D is such that D ⊆ S, which can be easily achieved by adjusting ϵ or by omitting y i from Z dyn in the definition of Z safe when datapoints X(y i ) are close to bd(S).Note here that S is typically known as part of the safety specification.This additional requirement is necessary to later ensure safety in the sense that the learned safe set is such that C ⊆ S.
We define the set of admissible output measurements as i.e., as the projection of the set D under the unknown output measurement map Y .We remark that the set Y, illustrated in Fig. 3 (left), is consequently also unknown.Note however that the set Y is open as required in Theorem 1.
For σ > 0, we define the set of unsafe labeled states where ⊕ is the Minkowski sum operator.The set N should be thought of as a layer of width σ surrounding the set D, see Fig. 3 (right) for a graphical depiction.As will be made clear in the sequel, by enforcing that the value of the learned function h(x) is negative on N , we ensure that the set C (defined as the zero-superlevel set of h(x)) is contained within D, and hence also within S. This is why we refer to N as set of unsafe labeled states.To ensure that h(x) < 0 for all x ∈ N , we assume that points are sampled from N such that Z N forms an ϵ N -net of N , i.e., for each x ∈ N there exists a Conditions on ϵ N will be specified later.We emphasize that no control inputs u i are needed for the samples in Z N as these points are not generated by the expert and are instead obtained by computational methods such as gridding or uniform sampling (see Section 5 for details).While the definition of the set C in ( 2) is specified over all of R n , e.g., the definition of C considers all x ∈ R n such that h(x) ≥ 0, we make a minor modification to this definition in order to restrict the domain of interest to N ∪ D as This restriction is natural, as we are learning a function h(x) from data sampled only over N ∪ D. The size of the set D affects the size of the set C, i.e., C may be conservative if only few expert demonstrations are available, e.g., consider Fig. 3 but with fewer expert demonstrations in which case the green and grey regions would simply shrink.

The Constrained Optimization Problem
We first state the constrained optimization problem for learning valid ROCBFs, and then provide conditions in Section 4.3 under which a feasible solution is a valid ROCBF.
Let H be a normed function space of twice continuously differentiable functions h : R n → R. Define q(y, t, u) := B( X(y), t, u) − Lip B (y, t, u)∆ X (y) (6) analogously to (4), but using a known surrogate function Lip B (y, t, u) in place of the Lipschitz constant Lip B (y, t, u).The function Lip B (y, t, u) will be a hyperparameter4 in our algorithm as discussed in Section 5, and will be adjusted to ensure that Lip B (y, t, u) ≥ Lip B (y, t, u).
We formulate the following constrained optimization problem to learn a ROCBF from expert demonstrations: where the set Z safe is a subset of Z safe , i.e., Z safe ⊆ Z safe , as detailed in the next section and where γ safe , γ unsafe , γ dyn > 0 are hyperparameters.Instead of global hyperparameters γ safe , γ unsafe , and γ dyn , one can use individual hyperparameters for each datapoint.Note that expert demonstrations (y i , t i , u i ) indicate feasibility of the control problem at hand, and hence indicate feasibility of (7).
With increasing sizes of the uncertainty sets F(x, t), G(x, t), and X (y), the optimization problem (7) may however become infeasible.

Conditions guaranteeing learned safe ROCBFs
We now derive conditions under which a feasible solution to the constrained optimization problem ( 7) is a safe ROCBF.

Guaranteeing C ⊂ D ⊆ S
We begin with establishing the requirement that C ⊂ D ⊆ S. First note that constraint (7b) ensures that the set C, as defined in equation ( 5), has non-empty interior when Z safe ̸ = ∅.We next state conditions under which the constraint (7c) ensures that the learned function h from (7) satisfies h(x) < 0 for all x ∈ N , which in turn ensures that C ⊂ D ⊆ S. Proposition 1.Let h(x) be Lipschitz continuous with local Lipschitz constant Lip h (x i ) within the set B ϵ N (x i ) for datapoints x i ∈ Z N .Let γ unsafe > 0, Z N be an ϵ N -net of N , and let for all x i ∈ Z N .Then, the constraint (7c) ensures that h(x) < 0 for all x ∈ N .In summary, Proposition 1 says that a larger Lipschitz constant of the function h requires a larger margin γ unsafe and/or a finer net of unsafe datapoints as indicated by ϵ N .
We next discuss the choice of Z safe .Assume first that Z safe = Z safe in constraint (7b).In this case, the constraints (7b) and (7c), as well as the condition in (8) of Proposition 1, may be conflicting, leading to infeasibility of the optimization problem (7).This infeasibility arises from the fact that we are simultaneously asking for the value of h(x) to vary from γ safe to −γ unsafe over a short distance of at most ϵ + ϵ N while having a small Lipschitz constant.In particular, as posed, the constraints require that |h(x s ) − h(x u )| ≥ γ safe + γ unsafe for x s ∈ Z safe and x u ∈ Z N safe and unsafe samples, respectively, but the sampling requirements (Z safe and Z N being ϵ and ϵ N -nets of D and N , respectively) imply that ∥x s − x u ∥ ≤ ϵ N + ϵ for at least some pair (x s , x u ), which in turn implies that The local Lipschitz constant Lip h (x u ) may hence get too large if γ safe and γ unsafe are chosen to be too large, and we may exceed the required upper bound γ unsafe /ϵ N in equation (8).We address this issue as follows: for fixed γ safe , γ unsafe , and desired Lipschitz constant L h < γ unsafe /ϵ N , we define which corresponds to a subset of admissible safe states, i.e., Z safe ⊂ Z safe .Intuitively, this introduces a buffer region across which h(x) can vary in value from γ safe to −γ unsafe for the desired Lipschitz constant L h .Enforcing (7b) over Z safe allows for smoother functions h(x) to be learned at the expense of a smaller invariant safe set C. Note finally that, if h(x) is such that C ⊂ D (i.e., under the conditions in Proposition 1), then

Increasing the volume of C
We next explain how to avoid learning a safe set C consisting of many disconnected sets, which would not be practical, and show simultaneously how to increase the volume of C. Let and note that Z safe is an ϵ-net of D by definition.We next show conditions under which h(x) ≥ 0 for all x ∈ D. Proposition 2. Let h(x) be Lipschitz continuous with local Lipschitz constant Lip h (x i ) within the set B ϵ (x i ) for datapoints x i ∈ Z safe .Let γ safe > 0 and let for all x i ∈ Z safe .Then, the constraint (7b) ensures that h(x) ≥ 0 for all x ∈ D.
The previous result can be used to guarantee that the set C defined in equation ( 5) contains the set D, i.e., D ⊆ C. Hence, the set D can be seen as the minimum volume of the set C that we can guarantee.Note that, under the provided conditions, it holds that C is such that D ⊆ C ⊂ D ⊆ S.
We note that the amount of data needed to satisfy conditions ( 8) and (10) in Propositions 1 and 2 grows exponentially with the dimension n, see e.g., [61, Section 4.2].

Guaranteeing that h(x) is a ROCBF
Propositions 1 and 2 guarantee that the level-sets of the learned function h(x) satisfy the desired geometric safety properties.We now derive conditions that ensure that h(x) is a ROCBF, i.e., that the ROCBF constraint ( 4) is also satisfied.
To satisfy constraint (4) for each (y, t) ∈ Y × R ≥0 , there must exist a control input u ∈ U such that q(y, t, u) ≥ 0. We follow a similar idea as in Propositions 1 and 2 and note in this respect that the y components of Z dyn form an ε-net of Y (see Appendix D for a proof) where ε is with ∆X := sup y∈Y ∆ X (y) denoting the maximum estimation error and Lip Y being the Lipschitz constant of the output measurement map Y within the set D := D ⊕ B 2 ∆X (0). 5e additionally assume to know a bound on the difference of the function q for different times t.More formally, for each ȳ ∈ B ε(y), let Bnd q (y, u) be such that The bound Bnd q (y, u) exists and can be obtained as all components of q(y, t, u) are bounded in t.This is a natural assumption to obtain formal guarantees on the function q(y, t, u) from a finite dataset Z dyn since it is not possible to sample the time domain R ≥0 densely with a finite number of samples.It can be seen that Bnd q (y, u) = 0 when the system (1) is independent of t.Proposition 3. Let q(y, t, u) be Lipschitz continuous6 in y for fixed t and u with local Lipschitz constant Lip q (y i , t i , u i ) within the set B ε(y i ) for each (y i , t i , u i ) ∈ Z dyn .Let γ dyn > 0 and for all (y i , t i , u i ) ∈ Z dyn .Then, the constraint (7d) ensures that, for each (y, t) ∈ Y × R ≥0 , there exists a u ∈ U such that q(y, t, u) ≥ 0. If additionally Lip B (y, t, u) ≤ Lip B (y, t, u) for each In summary, Proposition 3 says that a larger Lipschitz constant of the function q requires a larger margin γ dyn and/or a smaller ε, i.e., a finer net of safe datapoints as indicated by ϵ and/or a reduction in the measurement map error ∆ X .
The next theorem summarizes our results, follows from the previous results, and is provided without proof.Theorem 2. Let h(x) be a twice continuously differentiable function.Let the sets S, C, Y, D, and N as well as the data-sets Z safe , Z dyn , and Z N be defined as above.Suppose that Z N forms an ϵ-net of N and that the conditions (8), (10), and (12) are satisfied.Assume also that Lip B (y, t, u) ≤ Lip B (y, t, u) for each (y, t, u) ∈ Y × R ≥0 × U.If h(x) satisfies the constraints (7b), (7c), and (7d), then h(x) is a ROCBF on Y and it holds that the set C is non-empty and such that D ⊆ C ⊆ S.

Algorithmic Implementation
In this section, we present the algorithmic implementation of the previously presented results.We will discuss various aspects related to solving the constrained optimization problem (7), the construction of the involved datasets, and estimating Lipschitz constants of the functions h(x) and q(y, t, u).

The Algorithm
We summarize our algorithm to learn safe ROCBFs h(x) in Algorithm 1.We first construct the set of safe datapoints Z safe from the expert demonstrations Z dyn (line 3).We construct the set of as unsafe labeled datapoints Z N from Z safe (line 4), i.e., Z N ⊆ Z safe , by identifying boundary points in Z safe and labeling them as unsafe (details can be found in Section 5.2).We then re-define Z safe by removing the unsafe labeled datapoints Z N from Z safe (line 5).Following our discussion in Section 4.3, we obtain Z safe according to equation ( 9) (line 6).We then solve the constrained optimization problem (7) by an unconstrained relaxation defined in (13) (line 7) as discussed in Section 5.3.Finally, we check if the constraints (7b)-(7d) and the constraints (8), (10), (12) are satisfied by the learned function h(x) (line 8).If the constraints are not satisfied, the hyperparameters are adjusted and the process is repeated (line 9).We discuss Lipschitz constant estimation of h and q and the hyperparameter selection in Section 5.4.

Algorithm 1 Learning ROCBF from Expert Demonstrations
Input: Set of expert demonstrations Z dyn , 1: system model ( F , Ĝ, X, ∆ F , ∆ G , ∆ X ), 2: hyperparameters (α, γ safe , γ unsafe , γ dyn , L h , Lip B , k, η) Output: Safe ROCBF h(x) 3: Z safe ← ∪ (y i ,t i ,u i )∈Z dyn X(y i ) # Safe datapoints 4: Z N ← BPD(Z safe , k, η) # Unsafe datapoints obtained by boundary point detection (BPD) in Algorithm 2. 5: Z safe ← Z safe \ Z N 6: Z safe ← according to (9) 7: h(x) ← solution of (13) # relaxation of the constrained optimization problem in (7) 8: while constr.(7b)-(7d), ( 8), ( 10), ( 12) are violated do 9: Modify hyperparameters and start from line 3 10: end while While our algorithmic implementation is an approximate solution of the proposed framework, we mention that solving an unconstrained relaxation of ( 7) and bootstrapping hyperparameters is a common technique in machine learning when solving nonconvex constrained optimization problems [62].Such techniques are necessary for learning based methods to be applied to realistic systems.As we reported in earlier works, see e.g., [54] for hybrid systems, such techniques perform well in practice and can even outperform experts.

Construction of the Datasets
Due to the conditions in equations ( 8), (10), and (12), the first requirement is that the datasets Z safe and Z N are dense, i.e., that ϵ and ϵ N are small.It is also required that Z N is an ϵ N -net of the set of unsafe labeled states N .In order to construct the ϵ N -net Z N of N , a simple randomized algorithm, which repeatedly uniformly samples from N , works with high probability, see e.g., [61].Hence, as long as we can efficiently sample from N , e.g., when N is a basic primitive set or has a set-membership oracle, uniform sampling or gridding methods are viable strategies.
However, as this is in general not possible, we use a boundary point detection algorithm in line 4 of Algorithm 1.The idea is to obtain the set of unsafe labeled datapoints Z N instead from the set of safe datapoints Z safe .To perform this step efficiently, our approach is to detect geometric boundary points of the set Z safe .This subset of boundary points is labeled as Z N , while we re-define Z safe to exclude the boundary points Z N in line 5 of Algorithm 1. Particularly, we detect boundary points in Z safe based on the concept of reverse k-nearest neighbors, see e.g., [63].The main idea is that boundary points typically have fewer reverse k-nearest neighbors than interior points.For k > 0, we find the k-nearest neighbors of each datapoint x i ∈ Z safe .Then, we find the reverse k-nearest neighbors of each datapoint x i ∈ Z safe , that is, we find the datapoints x ′ i ∈ Z safe that have x i as their k-nearest neighbor.Finally, we choose a threshold η > 0 and label all datapoints x i ∈ Z safe as a boundary point whose cardinality of reverse k-nearest neighbors is below η.
Algorithm 2 summarizes the boundary point detection algorithm.We first compute the pairwise distances between each of the N 1 safe datapoints in Z safe (line 1).The result is a symmetric N 1 ×N 1 matrix M where the element at position (i, j) represents the pairwise distance between the states x i and x j , i.e., M ij := ∥x i − x j ∥.Next, we calculate the k-nearest neighbors of each x i , denoted by kN N i , as the set of indices corresponding to the k smallest column entries in the ith row of M (line 2).We calculate the reverse k-nearest neighbors of each x i as RkN N i := |{x j ∈ Z safe |x i ∈ kN N j }| (line 3).We then threshold each RkN N i by η (line 4), i.e., z i := 1(RkNN i ≤ η) where 1 is the indicator function.We obtain a tuple (z 1 , . . ., z N 1 ) ∈ R N 1 , where the indices i corresponding to states x i = 1 are boundary points.
Algorithm 2 Boundary Point Detection -BPD(Z safe ,k,η) Input: Set of safe states Z safe , number of nearest neighbors k, neighbor threshold η > 0 Output: Set of boundary points, i.e., the set of as unsafe labeled states Z N 1: M ← compute pairwise dists(Z safe ) # compute pairwise distances between elements in Z safe 2: We have not yet specified the paramters ϵ and ϵ N to be able to check the constraints ( 8), (10), and (12).While the value of ϵ merely defines the set of admissible states D and determines the size of the safe set C as discussed in Section 4.4.3, the value of ϵ N is important as the set of unsafe labeled states N should fully enclose D. This imposes an implicit lower bound on ϵ N to guarantee safety.Therefore, one can artificially sample additional datapoints in proximity of Z N and add these to the set Z N .One way to get an estimate of ϵ N is to calculate the distance of each datapoint to the closest datapoint in Z N , respectively.Then, taking the maximum or an average over these values gives a good estimate of ϵ N .
Finally, we discuss what behavior expert demonstrations in Z dyn should exhibit.We focus on the ROCBF constraint (4), which must be verified to hold for some u ∈ U, by using the expert demonstration (y i , t i , u i ) in (7d).The more transverse the vector field of the input dynamics ⟨ Ĝ( X(y i ), t i ), u i ⟩ is to the level sets of the function h( X(y i )) (i.e., the more parallel it is to the inward pointing normal ∇h( X(y i ))), the larger the inner-product term in constraint (7d) will be without increasing the Lipschitz constant of h(x).This means that the expert demonstrations should demonstrate how to move away from the unsafe labeled set.

Solving the Constrained Optimization Problem
Some remarks are in order with respect to the optimization problem (7).If the extended class K function α is linear and H is parameterized as H := {⟨ϕ(x), θ⟩|θ ∈ Θ} where Θ ⊆ R l is a convex set and ϕ : R n → R l is a known basis function, then the optimization problem ( 7) is convex.Note here in particular that ∥∇h(x)∥ ⋆ is convex in θ since 1) ∇h(x) is linear in θ, 2) norms are convex functions, and 3) composition of a convex with a linear function preserves convexity.We remark that rich function classes such as infinite dimensional reproducing kernel Hilbert spaces can be approximated to arbitrary accuracy with such an H [64].
In the more general case when H := {h(x; θ)|θ ∈ Θ}, such as when h(x; θ) is a deep neural network or when α is a general nonlinear function, the optimization problem ( 7) is nonconvex.Due to the computational complexity of general nonlinear constrained programming, we propose an unconstrained relaxation of the optimization problem (7).Our unconstrained relaxation results where [r] + := max{r, 0} for r ∈ R and where the function q(u, y, t; θ) is as in ( 6) but now defined via h(x; θ).The positive parameters λ s , λ u , and λ d are dual variables.While the unconstrained optimization problem ( 13) is in general a nonconvex optimization problem, it can be solved efficiently in practice by iteratively solving the outer and inner optimization problems with respect to (λ s , λ u , λ d ) and θ, respectively, with stochastic first-order gradient methods such as Adam or stochastic gradient descent [62].

Hyperparameters and Lipschitz Constant Estimation
We treat (α, γ safe , γ unsafe , γ dyn , L h , Lip B , k, η) as hyperparameters and bootstrap over them.This is a common technique in machine learning and usually done via grid search.Due to the nonconvexity of the optimization problem, one may not be able to satisfy all constraints in (7b)-(7d), ( 8), ( 10), (12).We hence terminate the while loop in line 9 of Algorithm 1 when a satisfactory empirical behavior is achieved.The conditions in equations ( 8), (10), and ( 12) depend on Lipschitz constants of the functions h and q.Since we assume that h is twice continuously differentiable and we restrict ourselves to a compact domain N ∪ D, we have that h and ∇h are both uniformly Lipschitz over N ∪ D. In [53], we showed two examples of H (we considered DNNs and functions parametrized by random Fourier features) where an upper bound on the Lipschitz constants of functions h ∈ H and its gradient ∇h(x) can be efficiently estimated, and we refer the interested reader to [53].

Simulations
We construct a safe ROCBF within the autonomous driving simulator CARLA [56] for a car driving on a road by using camera images, see Fig. 4. In particular, our goal is to learn a ROCBF for the lateral control of the car, i.e., a lane keeping controller, while we use a built-in controller for longitudinal control.Lane keeping in CARLA is achieved by tracking a set of predefined waypoints.The control problem at hand is challenging which makes it difficult to satisfy all constraints in equations (7b)-(7d) and ( 8), (10), (12).As described in Section 5.4, we search over the hyperparameters of Algorithm 1 until satisfactory behavior is achieved.The code for our simulations and videos of the car under the learned safe ROCBFs are available at https://github.com/unstable-zeros/learning-rocbfs.
As we have no direct access to the system dynamics of the car, we identify a system model.The model for longitudinal control is estimated from data and consists of the velocity v of the car and the integrator state d of the PID.The identified longitudinal model of the car is For the lateral control of the car, we consider a bicycle model where p x and p y denote the position in a global coordinate frame, θ is the heading angle with respect to a global coordinate frame, and L := 2.51 is the distance between the front and the rear axles of the car.The control input is the steering angle δ that we design such that the car tracks waypoints provided by CARLA.Treating u := tan(δ) as the control input yields a control affine system.
To be able to learn a ROCBF in a data efficient manner, we convert the above lateral model (defined in a global coordinate frame) into a local coordinate frame.We do so relatively to the waypoints that the car has to follow.We consider the cross-track error c e of the car.In particular, let wp 1 be the waypoint that is closest to the car and let wp 2 be the waypoint proceeding wp 1 .Then the cross-track error is defined as c e := ∥w∥ sin(θ w ) where w ∈ R 2 is the vector pointing from wp 1 to the car and θ w is the angle between w and the vector pointing from wp 1 to wp 2 .We further consider the error angle θ e := θ − θ t where θ t is the angle between the vector pointing from wp 1 to wp  along with estimated error bounds ∆ F (x, t) := 0.1 and ∆ G (x, t) = 0.1 (calculated from simulations).
For collecting safe expert demonstrations Z dyn , we use an "expert" PID controller u(x) that uses full state knowledge of x.Throughout this section, we use the parameters α(r) := r, γ safe := γ unsafe := 0.05, and γ dyn := 0.01 to train safe ROCBFs h(x).For the boundary point detection algorithm in Algorithm 2, we select k := 200 and η such that 40 percent of the points in Z safe are labeled as boundary points.

State-based ROCBF
We first learn a ROCBF controller in the case that the state x is perfectly known, i.e., the model of the output measurement map is such that X(y) = X(y) = x and the error is ∆ X (y) := 0. The trained ROCBF h(x) is a two-layer DNN with 32 and 16 neurons per layer.
The safety controller applied to the car is then obtained as the solution of the convex optimization problem min u∈U ∥u∥ subject to the constraint q(u, y, t) ≥ 0. In Fig. 5a, example trajectories of c e (t) and θ e (t) under this controller are shown.Solid lines indicate the learned ROCBF controller, while dashed lines indicate the expert PID controller for comparison.Colors in both subplots match the corresponding trajectories.The initial conditions of d(0) and v(0) are set to zero in all cases here, similar to all other plots in the remainder.Fig. 5b shows different initial conditions c e (0) and θ e (0) and how the ROCBF controller performs relatively to the expert PID controller on the training course.In particular, each point in the plot indicates an initial condition from which system trajectories under both the ROCBF and expert PID controller are collected.The color map shows max  (t) denote the cross-track errors under the ROCBF and expert PID controllers, respectively.Fig. 5c shows the same plot, but for the test course from which no data has been collected to train the ROCBF.In this plot, one ROCBF trajectory resulted in a collision as detected by CARLA.We assign by default a value of 2.5 in case of a collision (see the yellow point in Fig. 5c).

Perception-based ROCBF
We next learn a ROCBF in the case that y corresponds to images taken from an RGB camera mounted to the dashboard of the car.To train a perception map X, we have resized the images as shown in Fig. 4c.We assume knowledge of θ e , v, and d, while we estimate c e from y, i.e., x := v d X(y) θ e T .The architecture of X is a Resnet18, i.e., a convolutional neural network with 18 layers.Its performance on training data within operation range c e ∈ [−2, 2] is shown in Fig. 6a.Based on this plot, we set ∆ X (y) := 0.5 to account for estimation errors within this range.We remark that we observed larger estimation errors outside this range.However, larger ∆ X (y) resulted in learning infeasible ROCBFs.We additionally selected the hyperparameter Lip B (y, t, u) := Lip 1 + Lip 2 ∥u∥.We achieved the best results by using Lip 1 = Lip 2 := 0.1 during testing, while using the norm of the partial derivatives of ⟨∇h(x), F (x, t)⟩ + α(h(x)) − ∥∇h(x)∥ 2 ∆ F (x, t) and ⟨∇h(x), Ĝ(x, t)⟩ − ∥∇∥ 2 ∆ G (x, t), respectively, during training.The trained ROCBF h(x) is again a two-layer DNN with 32 and 16 neurons per layer.Figs.6b-6c show the same plots as in the previous section and evaluate the ROCBF relatively to the expert PID controller.Importantly, note here that the expert PID controller uses state knowledge, while the ROCBF uses RGB images from the dashboard camera as inputs so that it is no surprise that the relative gap between these two becomes larger, as shown in Figs.6b-6c.
We further performed a comparison with our prior work [53] where we learn CBFs which corresponds to the setting when ∆ F = ∆ G = ∆ X = 0.The result of the learned CBF is shown in Figure 7.In direct comparison with Figure 6c, we can see the benefit of learning ROCBFs.

Conclusion and Summary
In this paper, we have shown how safe control laws can be learned from expert demonstrations under system model and measurement map uncertainties.We first presented robust output control barrier functions (ROCBFs) as a means to enforce safety, which is here defined as the ability of a system to remain within a safe set using the notion of forward invariance.We then proposed an optimization problem to learn such ROCBFs from safe expert demonstrations, and presented verifiable conditions for the validity of the ROCBF.These conditions are stated in terms of the density of the data and on Lipschitz and boundedness constants of the learned function as well as the models of the system dynamics and the measurement map.We proposed an algorithmic implementation of our theoretical framework to learn ROCBFs in practice.Finally, our simulation studies show how to learn safe control laws from RGB camera images within the autonomous driving simulator CARLA.

Appendix A Proof of Theorem 1
Recall that f (t) := F (x(t), t), g(t) := G(x(t), t), y(t) := Y (x(t)) and u(t) := U (y(t), t) according to (1c)-(1f) and we define for convenience f (t) := F (x(t), t) Due to the chain rule and since U (y, t) ∈ U s (y, t), note that each solution x : Note now that the term Lip B (y(t), t, u(t))∆ X (y(t)) in the right-hand side of ( where the implication in (c) follows since Next note that v(t) = −α(v(t)) with v(0) ≥ 0 admits a unique solution v(t) that is such that v(t) ≥ 0 for all t ≥ 0 [65, Lemma 4.4].Using the Comparison Lemma [65,Lemma 3.4] and assuming that h(x(0)) ≥ 0, it follows that h(x(t)) ≥ v(t) ≥ 0 for all t ∈ I, i.e., x(0) ∈ C implies x(t) ∈ C for all t ∈ I. Recall that (1) is defined on X(Y) and that Y ⊇ Y (C) so that X(Y) ⊇ C. Since x ∈ C for all t ∈ I and when C is compact, it follows by [65,Theorem 3.3] 7 that I = [0, ∞), i.e., C is forward invariant under U (y, t).

B Proof of Proposition 1
Note that, for any x ∈ N , there exists a point x i ∈ Z N satisfying ∥x − x i ∥ ≤ ϵ N since Z N is an ϵ N -net of N .For any x ∈ N , we now select such an x i ∈ Z N for which Note that inequality (a) follows from constraint (7c), while inequality (b) follows by Lipschitz continuity.Inequality (c) follows by the assumption of Z N being an ϵ N -net of N and, finally, the strict inequality in (d) follows due to (8).

C Proof of Proposition 2
The proof follows similarly to the proof of Proposition 1.For any x ∈ D, we select an x i ∈ Z safe with ∥x − x i ∥ ≤ ϵ which is possible since the set Z safe is an ϵ-net of D. It follows that 0 = h(x i ) − h(x i )

E Proof of Proposition 3
Note first that, for each y ∈ Y, there exists a pair (y i , t i , u i ) ∈ Z dyn satisfying ∥y − y i ∥ ≤ ε since the y component of Z dyn form an ε-net of Y by Lemma 1.For any pair (y, t) ∈ Y × R ≥0 , we now select such a pair (y i , t i , u i ) ∈ Z dyn satisfying ∥y − y i ∥ ≤ ε for which then 0 (a) ≤ q(y i , t i , u i ) − γ dyn ≤ |q(y i , t i , u i ) − q(y, t i , u i )| + q(y, t i , u i ) − γ dyn (b) ≤ Lip q (y i , t i , u i )∥y i − y∥ + q(y, t i , u i ) − γ dyn (c) ≤ Lip q (y i , t i , u i )ε + q(y, t i , u i ) − γ dyn ≤ Lip q (y i , t i , u i )ε + |q(y, t i , u i ) − q(y, t, u i )| + q(y, t, u i ) − γ dyn (d) ≤ Lip q (y i , t i , u i )ε + Bnd q (y i , u i ) + q(y, t, u i ) − γ dyn (e) ≤ q(y, t, u i ).
Inequality (a) follows from constraint (7d).Inequality (b) follows by Lipschitz continuity, while inequality (c) follows since the y component of Z dyn is an ε-net of Y. Inequality (d) follows by the bound Bnd q (y i , u i ) that bounds the function q for all values of t.Inequality (e) follows simply by (12).Consequently, q(y, t, u i ) ≥ 0 for all (y, t) ∈ Y × R ≥0 .
If now Lip B (y, t, u) ≤ Lip B (y, t, u), as stated per assumption, it follows that (4) holds and that h(x) is a ROCBF.

Figure 1 :
Figure 1: Uncertain system under consideration.The goal in this paper is to learn an output feedback control law U (y, t) such that prescribed safety properties with respect to a geometric safe set S ⊆ R n are met by the system in (1).By geometric safe set, we mean that S describes the set of safe states as naturally specified on a subset of the state space (e.g., to avoid collision, vehicles must maintain a minimum separating distance).

Figure 2 :
Figure 2: Proposed framework to learn safe control laws.
Let h : R n → R be a twice continuously differentiable function, and assume that h is such that the set C in (2) has non-empty interior.Let Y ⊆ R p be a sufficiently large open set such that Y ⊇ Y (C) where Y (C) denotes the image of C under Y .

Figure 3 :
Figure 3: Problem Setup (left): The set of observed safe expert demonstrations Z dyn (black lines).Also shown, the set of admissible output measurements Y (orange ring).Transformation into state domain (centre): The geometric safe set S (red box) and the set of admissible safe states D (green region) that is defined as the union of ϵ-balls centered at X(y i ).Learned safe set (right): The set of as unsafe labeled states N (golden ring) that defines the σ-layer surrounding D.

Figure 4 :
Figure 4: Simulation environment in CARLA.The cars are tracking desired reference paths on different courses.Left: The training course from which training data, during a left turn, was generated to train and test the ROCBF.Middle: An unknown test course on which the learned ROCBF is tested.Right: Downsampled RGB dashboard camera image.

2 − 1 .
and the global coordinate frame.The simplified local model is ċe = v sin(θ e ), θe = v/2.51u− θt .In summary, we have the state x := v d c e θ e T and the control input u := tan(δ) as well as the external input, given during runtime, of θt .095v − 0.007v 2 − 0.152d + 3.74 3.6v − 20 v sin(θ e ) Trajectories c e (t) and θ e (t) for three randomly chosen initial conditions.Solid lines correspond to the learned ROCBF, dashed lines correspond to the expert (PID).(b) Shown are 250 different initial conditions on the training course.The color legend encodes max t |c ROCBF e (t)| − max t |c Exp e (t)|.(c) Shown are 300 different initial conditions on the test course.The color legend encodes max t |c ROCBF e (t)| − max t |c Exp e (t)|.
(a) Performance of the perception map X on training data.Blue and orange are boundary and nonboundary points from Algorithm 2. (b) Trajectories c e (t) and θ e (t) for randomly chosen initial conditions.Solid lines correspond to the learned ROCBF, dashed lines correspond to the expert (PID).(c) Shown are 200 different initial conditions on the training course.The color legend encodes max t |c ROCBF e (t)| − max t |c Exp e (t)|.
x i ) − h(x) + h(x) − γ safe (b) ≤ Lip h (x i )∥x − x i ∥ + h(x) − γ safe (c) ≤ Lip h (x i )ϵ + h(x) − γ safe (d) ≤ h(x).Note that inequality (a) follows from constraint (7b), while inequality (b) follows by Lipschitz continuity.Inequality (c) follows by Z safe being an ϵ-net of D and, finally, the inequality in (d) follows due to(10).D Proof that y components of Z dyn form an ε-net of Y Lemma 1.Let ε := Lip Y (ϵ + ∆X ) where Lip Y is the Lipschitz constant of the function Y within the set D := D ⊕ B 2 ∆X (0) where ∆X := sup y∈Y ∆ X (y).Then the y components of Z dyn form an ε-net of Y.Proof: For each y ∈ Y, there exists (y i , t i , u i ) ∈ Z dyn such that ∥X(y) − X(y i )∥ ≤ ϵ by definition of Y as Y = Y (D) and since the y components of Z dyn transformed via X form an ϵ-net of D. By Assumption 2, we also know that ∥X(y i ) − X(y i )∥ ≤ ∆X .By Lipschitz continuity of Y , it follows that∥y − y i ∥ = ∥Y (X(y)) − Y (X(y i ))∥ ≤ Lip Y ∥X(y) − X(y i )∥ ≤ Lip Y (∥X(y) − X(y i )∥ + ∥ X(y i ) − X(y i )∥)≤ Lip Y (ϵ + ∆X ) =: ε.Consequently, the y components of Z dyn form an ε-net of Y.
≥B(x(t), t, u(t)) − B(x(t), t, u(t))where (a) follows since x(t) ∈ X (y(t)) due to Assumption 2 and where (b) simply follows since Lip B (y(t), t, u(t)) is the local Lipschitz constant of the function B(x, t, u) within the set X (y(t)).From (14) and the definitions of the functions B(x, t, u) and Lip B (y, t, u), it hence follows that ≥|B(x(t), t, u(t)) − B(x(t), t, u(t))|