Learning Discrete-Time Uncertain Nonlinear Systems With Probabilistic Safety and Stability Constraints

This paper presents a discrete-time dynamical system model learning method from demonstration while providing probabilistic guarantees on the safety and stability of the learned model. The controlled dynamic model of a discrete-time system with a zero-mean Gaussian process noise is approximated using an Extreme Learning Machine (ELM) whose parameters are learned subject to chance constraints derived using a discrete-time control barrier function and discrete-time control Lyapunov function in the presence of the ELM reconstruction error. To estimate the ELM parameters a quadratically constrained quadratic program (QCQP) is developed subject to the constraints that are only required to be evaluated at sampled points. Simulations validate that the system model learned using the proposed method can reproduce the demonstrations inside a prescribed safe set while converging to the desired goal location starting from various different initial conditions inside the safe set. Furthermore, it is shown that the learned model can adapt to changes in goal location during reproductions without violating the stability and safety constraints.


I. INTRODUCTION
Advancements in data-driven dynamic model learning [1], [2], [3] provide new possibilities to learn dynamic models with applications to autonomous systems operating in remote, hazardous and/or confined environments. Behavior cloning (BC), one of the methods of imitation learning, learns a dynamic system model and/or a policy from expert demonstrations [4], [5]. Motion of autonomous systems or robots generated from expert demonstrations often exhibit stability and safety properties that should be preserved while learning the system model. BC presents many advantages such as increased sample efficiency compared to reinforcement learning and can be useful in many robotics and engineering applications [6], [7], [8], [9], [10], [11]. Dynamic system learning-based BC methods have incorporated stability [8], [9], [11], [12], [13] and safety [5], [10], [14] constraints in learning linear or nonlinear continuous-time system models. However, incorporating these constraints for uncertain discrete-time nonlinear systems represented using neural networks (NN) with noise poses technical challenges for constrained learning of such system models.
This paper addresses the challenge by presenting a nonlinear dynamic system model learning method subject to probabilistic stability and safety constraints. The controlled dynamic model of a discrete-time system with a zeromean Gaussian process noise is approximated using extreme learning machine (ELM), which is a computationally efficient approximation for the system model compared to NN [15]. Probabilistic constraints on the ELM parameter learning are derived using control Lyapunov function (CLF) and control Barrier function (CBF) theory for discretetime systems. A chance-constrained quadratically constrained quadratic program (QCQP) optimization problem is then formulated to learn the ELM parameters with the probabilistic stability and safety constraints, which are further simplified to make the optimization problem tractable. The schematic view of the proposed approach is illustrated in Fig. 1.
The concept of safety is centered around the idea of constraining the behavior of the system model to a prescribed set, by ensuring forward invariance of the set with respect to the system model. Barrier functions (BF) are commonly used to certify the forward invariance of a closed set with respect to a system model. For the synthesis of safety-critical controller design CBFs have been used via Quadratic Programming (QP) [16], [17]. CLF and CBF are merged to synthesize stable and safe controllers by solving a QP in [18], [19]. CBF methods work for constraints on systems that have relative-degree more than one. In [20], a backstepping-inspired methodology is used to obtain a CBF that renders a closed set forward invariant when the system has relative degree more than one. However, demonstrating a backstepping-based CBF design for systems with relative degree greater than two in practice is challenging [21], [22]. An exponential control barrier function (ECBF) that can handle state-dependent constraints for nonlinear systems with any relative degree is introduced in [22]. In [23], a more general form of control barrier function termed high-order control barrier function (HOCBF) is proposed. HOCBFs are not restricted to exponential functions and are determined by set of class K functions.
In [24] an adaptive CBF (aCBF) is proposed, which ensures the forward invariance of a closed set with respect to a nonlinear control-affine system with parametric uncertainties. In [25] the aCBF method is merged with an adaptive data-driven safety controller for contracting systems. Authors in [26] develop a method to learn a robustly safe controller along with learning the system parameters and corresponding uncertainty bounds. A metric based on the covariance of the parameter estimates is used to determine if the data is sufficient to update the parameters. These methods mainly focus on controller synthesis problem with a known dynamic system model or with model adaptation.
The controller synthesis methods assume the knowledge of barrier function. Methods for learning barrier functions and safe regions directly from sensor data are developed in [16], [27], and [28]. Although there is a lot of work on safe controller synthesis problem, very few methods are developed for learning safe and stable closed-loop autonomous system using ELM dynamic system modeling that incorporates safety and stability constraints.
Classical linear system identification techniques utilize parametric model structure and guarantee the stability of learned models [2], [3]. Machine learning techniques, such as NN and Gaussian processes (GPs), have been used for nonlinear system identification [11], [29]. Model uncertainties, in the form of a NN approximation error, are accounted for in [14] along with stability and safety constraints using reciprocal BFs, where the uncertainty is bounded by a constant. In the context of Bayesian learning, GP state-space models are learned with stability constraints in [11]. In [30], a simultaneous continuous system model learning and control synthesis method is developed, where GPs are used for model learning, and probabilistic CBF and CLF are used to achieve safety and stability.
Compared to our previous work in [31], a controlled discrete-time nonlinear system with a zero-mean Gaussian noise is considered in this paper, resulting in a more generic formalism for learning closed-loop autonomous systems. Detailed proofs of theorems and lemmas are presented while including the ELM reconstruction error, and the method is evaluated on three different simulation examples. The contributions of the paper are summarized as follows.
r A learning method for discrete-time nonlinear controlled systems approximated using ELM with additive zero-mean Gaussian process noise is developed while verifying the probabilistic safety and stability properties of the learned system. r A discrete-time zeroing control barrier function (DT-ZCBF) and discrete-time control Lyapunov function (DT-CLF) [32] are used to obtain safety and stability chance constraints that yield a computationally tractable system identification with probabilistic guarantees on the system's safety and stability, respectively. r An ELM network is used to represent an uncertain discrete-time system, and the ELM reconstruction error is regarded as a bounded signal.
r The constrained ELM parameter learning problem is formulated as a QCQP, which contains finite number of decision variables with infinite constraints leading to a semi-infinite program. To make the optimization tractable, an equivalent optimization problem is derived that requires the constraints to be evaluated only at a finite number of sampled points [10], [33]. r The constrained model learning approach is demonstrated in simulation using three examples: (1) a 2 • of freedom (DoF) planar robot whose joint positions are lower and upper bounded using a zeroing barrier function, (2) learning motion trajectories of one of the complex shapes from a publicly available dataset, (3) a wheeled mobile robot that travels along a certain trajectories in a confined space. Notation: The sets of real numbers and integers are denoted by R and Z, respectively. The symbols R + and Z + denote the set of non-negative real numbers and non-negative integers, respectively. For ease of notation, the dependency of variables is abbreviated, for example, x(k) is denoted by x k , unless necessary for clarity. The standard Euclidean norm of a vector is denoted by · and · F is the Frobenius norm. The weighted norm of a vector x ∈ R n by a positive definite matrix A is denoted by x A = √ x Ax. The symbol λ max {·} denotes the maximum eigenvalue of a matrix. The symbol [·] τ denotes the function evaluation at a point that belongs to a state-space D τ . g(s k ) ∈ R n+1 denotes the sigmoid activation function computed using the estimated ELM input weight matrix U and the estimated internal slopes and biases a p and b p . (s k ) ∈ R n denotes the ELM model reconstruction error. The arguments of g and are dropped for brevity, unless necessary for clarity.

II. PRELIMINARIES
In this section, preliminaries of DT-CLF and DT-ZCBF are presented. Consider an unknown nonlinear discrete-time system of the form where f : D × U → D is a continuous function, x(k) ∈ D ⊂ R n is the state of the system at time step k ∈ Z + , and u(k) = π (e(k)) ∈ U ⊂ R m is the system control input with a policy π (·) that depends on the error e(k) = x(k) − x * , ν(k) is independent and identically distributed zero-mean Gaussian process noise with covariance , which is a symmetric and positive definite matrix, and x * is the system equilibrium.

A. DISCRETE-TIME CONTROL LYAPUNOV FUNCTION
The classical stability analysis of continuous-time nonlinear dynamical systems using Lyapunov theory is extended to the discrete-time domain. It entails finding a positive definite function V : D → R, which decreases along the trajectories of the system given in (1). Refer to [32] and references therein for a comprehensive study on discrete-time Lyapunov theorems. Definition 1 (DT-CLF): Given the discrete-time system (1), V (x k ) is said to be a DT-CLF if it can be bounded by and β belong to class K function.

B. DISCRETE-TIME ZEROING CONTROL BARRIER FUNCTION
Safety properties can be achieved by constraining the solutions of the discrete-time dynamical system to a pre-specified closed set S = {x k ∈ D : h(x k ) ≥ 0}, where h : R n → R is a continuous function associated with a barrier function [32].
Definition 2 (DT-ZCBF): Given the discrete-time system (1) and the set S, a continuously differentiable function h : R n → R is said to be a DT-ZCBF, if there exists a class K function satisfying η(r) < r, ∀r > 0, a set D with S ⊆ D ⊂ R n , and a control input u k ∈ U such that The set S is forward invariant with respect to the discretetime system in (1) if and only if there exists a DT-ZCBF as defined in Definition 2, which means for any initial condition x 0 ∈ S, implies x k ∈ S, ∀k ∈ Z + [34].
Assumption 1: The function f : D × U → D is locally Lipschitz continuous and bounded in D.

III. CHANCE-CONSTRAINED ELM LEARNING PROBLEM
Consider a discrete-time nonlinear dynamical system in (1), it is assumed that the unknown function f (·) can be parameterized as f (x k , π (e k )) = W g(s k ), where W ∈ R (n h +1)×n is the unknown ideal bounded constant output layer weight matrix, g(s k ) = [ψ (qs k ) , 1] : R 2n+1 → R n h +1 , s k = [x k , e k , 1] ∈ R 2n+1 , ψ (·) ∈ R n h is the vector sigmoid activation function, n h is the number of neurons in the hidden layer of the ELM. The vector sigmoid activation function ψ (·) is given by , a p ∈ R n h and b p ∈ R n h are the ideal internal slopes and biases vectors, respectively and U ∈ R 2n×n h is the ideal bounded input layer weight matrix. Other activation functions such as radial basis functions, tangent sigmoid can also be used [35], [36]. The nonlinear function f (x k , u k ), defined in (1), is modeled using an ELM given by where W ∈ R (n h +1)×n is the estimated ELM output weight matrix, (s k ) ∈ R n is the function reconstruction error, and g(s k ) = [ψ (qs k ) , 1] , where q is constructed using the estimated ELM input weight matrix U and the estimated internal slopes and biases a p and b p . The parameters a p , b p and U of ELM are computed using intrinsic plasticity (IP) or batch intrinsic plasticity (BIP) algorithms. See [36] and [14] for preliminaries of ELM. Assumption 2: The ideal weights of the ELM are bounded by known positive constants, i.e., W F ≤W , U F ≤Ū [35].

Remark 2:
The nonlinear function f (x k , u k ) is within real and positive constant¯ of the ELM range if there exists a finite number of hidden neurons n h and constant weights so that for all x k ∈ S the approximation in (5) holds with (s k ) ≤¯ . The boundedness of the reconstruction error (s k ) follows from the Universal Approximation Property of single layer neural network [37] and the fact that for any activation function the reconstruction error goes to zero [15], [29], i.e., lim Remark 3: Using Assumptions 1-2 and Remark 1, the following inequality holds: ∂ denote a set of N demonstrations with T m time samples of the system's trajectory of (1), where each demonstration ends at a goal location x * and G 1: denote their sigmoid functions. The process noise ν k together with the model yields the following likelihood function where the indexes for X and G are dropped for brevity. The Gaussian distribution of the process noise induces a distribution over the DT-CLF and DT-ZCBF constraints in (2) and (3), which can be used to obtain probabilistic stability and safety guarantees in the constrained parameter learning of the ELM, respectively.

III. PROBLEM FORMULATION
Consider a set of state trajectories X 1:N 1:T m and their corresponding sigmoid functions G 1:N 0:T m −1 generated by a model of a controlled system, where there exists u k ∈ U that stabilizes the system's trajectories to a desired goal location while keeping them in a confined space. The constrained model learning problem is then formulated as a parameter learning problem of an ELM approximation of f (x k , u k ) given in (5), such that the system in (1) satisfy probabilistic safety and stability properties.
Given a safe set S defined in Section II, the ELM parameterized model in (5), the initial state x 0 ∈ S, and the likelihood function p(X|G, W ) in (6), the ELM parameter learning problem can be formulated as a chance constrained regularized maximum log-likelihood estimation as follows and ∀k ∈ Z + , where μ W ∈ R + is the regularization parameter, and W * denotes the optimal solution for W , which is used in (5) to represent the learned system. p k ∈ (0, 1) is a user-specified risk tolerance, and ζ , δ ∈ R.
The negative log-likelihood function in (8) has the ELM reconstruction error term, which is not measurable in practice. To circumvent this issue and to make (8) implementable, we only keep the terms with the decision variable W , in the objective function. For deriving the cost function consider the following expression Using the Young's inequality, the second term in (9) can be upper bounded so we have where (s k ) 2 −1 can be ignored since it is not associated with the decision variable W . Hence, the cost function J (W ) to be implemented takes the following form

IV. CHANCE-CONSTRAINED OPTIMIZATION A. ENCODING PROBABILISTIC SAFETY CONSTRAINTS
DT-ZCBF defines a forward invariant set such that solutions of the nonlinear dynamical system that start in that set remain in that set for all time. A hyperellipsoid is used to represent a DT-ZCBF, h(x k ) : R n → R, and it is given by where A ∈ R n×n is a symmetric positive definite matrix, and x is the vector that denotes the center of gravity of the hyperellipsoid.
To account for the uncertainty in safety constraint satisfaction, the DT-ZCBF is rewritten as where F C B (ζ ) is given by and (·) is a cumulative distribution function of the standard Gaussian. Note that as the variance tends to zero, i.e., As the uncertainty in the system dynamics decreases to zero the safety assurance can be guaranteed with probability one. Substituting (14) into (13), the probabilistic safety constraint is given by where c(p k ) = √ 2 erf −1 (2p k − 1). Lemma 1 is presented next to compute the parameters required for the cumulative distribution in (15) Lemma 1: Given the controlled system in (1) modeled using (5), the process noise ν k induces a distribution over the DT-ZCBF constraint in (3) with the following parameters: where the class , and γ is a positive constant. Proof: Substituting (12) in (3) for h(x k+1 ) and using (1) and (5) yields Taking the expectation of C B we have (19) and the variance is Utilizing properties such as the odd moments of multivariate Gaussian distribution are zero and the trace is invariant under circular permutations, (20) can be written as After substituting E[(ν k Aν k ) 2 ] = tr 2 (A ) + 2tr(A A ) in (21), the result in (17) is obtained.

B. ENCODING PROBABILISTIC STABILITY CONSTRAINTS
In this subsection, a probabilistic stability constraint is derived using Definition 1 for the uncertain discrete-time system given in (5) and a Lyapunov candidate To account for the uncertainty in stability constraint satisfaction, the DT-CLF in (2) can be rewritten as where Substituting (23) into (22), the probabilistic stability constraint is given by where E[C L ] and Var[C L ] are given in Lemma 2, which is presented next. Lemma 2: Given the controlled system in (1) modeled using (5), the process noise ν k induces a distribution over the DT-CLF constraint in (2) with the following parameters: where the class K function β( x k ) in Definition 1 is defined as β( x k ) = ρV (x k ) and 0 < ρ < 1. (2) for V (x k+1 ) and using (1) and (5) yields Taking the expectation of C L we have and the variance is Utilizing properties that odd moments of multivariate Gaussian distribution are zero, and that the trace, when the matrices are of suitable dimensions, is invariant under circular permutations, (29) can be written as Var [C L ] = 4 g W P PW g + P P + x * T P Px * +2 g W P P − 2 g W P Px * − 2 P Px * + E P 2 − tr 2 (P ) After substituting E[( P ) 2 ] = tr 2 (P ) + 2tr(P P ) in (30), the result in (26) 1] as the slope of the tangent line to their corresponding arguments become steeper as they approach to zero. Therefore, Assumption 3 is made to satisfy the constraint's Lipschitz continuity requirement.
Remark 5: If Assumption 3 does not hold, i.e., Var[C B ] < 1 and Var[C L ] < 1, the right hand side (RHS) of the constraints in (15) and (24) can be modified by introducing a constant ξ > 1 such that ξ 2 Var[C B ] ≥ 1 and ξ 2 Var[C L ] ≥ 1. For example, for the safety constraint in (15) we get for which the right hand side of (31) can be lower bounded by Scaling by the selected ξ is equivalent to solving the optimization problem with a tighter constraint.

C. ENCODING PROBABILISTIC SAFETY AND STABILITY CONSTRAINTS VIA QCQP
In this section, the chance-constrained ELM learning problem is formulated as a QCQP. Proposition 1: Given the controlled system in (1) modeled using (5) and Assumption 3, the ELM parameter learning problem with probabilistic safety and stability constraints given in (7b)-(7c) can be formulated as a QCQP with the cost function in (11) and the following constraints: where Proof: Using Assumption 3, the RHS of (15) can be lower bounded, and the constraint can be rewritten as Substituting E[C B ] and Var[C B ] from (16) and (17) into (33) we have where the RHS of (34) can be substituted by C B . Expanding and rearranging (34) we get which after substituting A for the middle term in (35) the result in (32b) is obtained. Similarly, using Assumption 3 to upper bound the RHS of (24) we have and when E[C L ] and Var[C L ] from (25) and (26) are substituted into (36) we get where the RHS of (37) can be substituted by C L . Expanding and rearranging (37) we get which after substituting P for the middle term in (38) the result in (32c) is obtained. The constraints (32b) and (32c) for the optimization problem must be satisfied for all x k ∈ D, which leads to a semi-infinite program. To circumvent the challenge of having an uncountably infinite set, Theorem 1 is formulated that proves only enforcing a modified constraints for a finite number of sampled points is sufficient for obtaining a solution for the optimization problem in (32a)-(32c).
Let D τ ⊂ D be a discretization of the state space D with the closest point in D τ to x k ∈ D denoted by x k − [x k ] τ ≤ τ 2 , where τ is the discretization resolution. Thus, sampling x k from D τ , we have s k − [s k ] τ ≤ τ √ 2 . Theorem 1: If Assumptions 2-3 hold, then the optimization problem in (32a)-(32c) can be solved with the following modified conditions , andā p = diag(a p ) F then the safety and stability conditions in (32b) and (32c) are satisfied ∀x k ∈ D.
Proof: The modified safety constraint in (39) can be written as is a lower bound on the left hand side of (42) then the safety constraint in (32b) is satisfied. Consider which by using the triangle inequality O B can be upper bounded as After expanding each term, using Assumption 2 and Remark 1, we obtain an upper bound on the RHS of (44) given by Using (43) and the upper bound obtained in (45) yields The first term on the RHS of (46) contains [ ] τ . Hence, it is not measurable. However, expanding the norm squared and upper bounding the terms that include [ ] τ results in From (42), (47), and (41a)-(41d), it is concluded that A In a similar fashion, the modified stability constraint in (40) can be written as is a lower bound on the left hand side of (48) then the safety constraint in (32c) is satisfied. Consider which by using the triangle inequality can be upper bounded as After expanding each term and using Assumption 2 and Remark 1, the RHS of (50) can be upper bounded by Using (49) and the upper bound derived in (51) yields The first term on the RHS of (52) contains [ ] τ . Hence, it is not measurable. However, after expanding the norm squared, the terms associated with [ ] τ can be upper bounded, and in doing so we obtain From (48), (53), and (41e)-(41h), it is concluded that P Remark 6: Theorem 1 proves that enforcing a modified stability and Lyapunov constraints for a finite number of sampled points is sufficient to obtain an optimal solution of the QCQP defined in (11).
A relaxation variable ϑ can also be introduced in the modified Lyapunov constraint given in (40) to maintain the feasibility of the optimization problem by softening the Lyapunov constraint. The modified probabilistic discrete-time control barrier and Lyapunov functions-based QCQP (PDCBFLF-QCQP) is given by:

PDCBFLF-QCQP
where c is a positive constant and J (W ) is given in (11). Remark 7: In practice, the unconstrained learning for ELM can be used to obtain the Frobenius norm of the output layer weight matrix W stated in Assumption 2 to be included as a constraint for the optimization problem in (54a). Moreover, the unconstrained learning can also be used to obtain¯ and¯ used in constraints (54b)-(54c).

V. NUMERICAL EVALUATIONS
In this section three numerical examples to evaluate the proposed method are presented.For each example, a set of point-to-point demonstrations of a motion governed by the controlled difference equation in (1) is considered. Each demonstration consisting of the state trajectories is assumed to be the solution of the underlying discrete-time dynamical system. The objective is to learn the parameters of the ELM network with theoretical guarantees such that the solutions of the learned dynamics remain inside a prescribed safe region while converging to the desired goal location. An ellipse, which is a commonly used approximation for representing the region of operation, is selected such that it encloses all the demonstrations. Once the model is learned, the parameters are used to forward propagate the dynamics from various initial conditions and to validate that the trajectories remain inside the ellipse at all times without crossing the boundaries and converge to the desired goal location. The optimization problem is solved using the CVX package in MATLAB 2020b. Random initialization process in the ELM algorithm may lead to having saturated or constant neurons, which are not desired while learning the model [38]. To circumvent this problem, a BIP learning rule is used to estimate the slopes and biases of the ELM [38]. The active sampling strategy introduced in [14] is also used to select informative points for the constraint computations of the optimization problem in (54a)-(54c). For all the three examples, the risk tolerance of p k = 0.8, for ν k the covariance of = 0.02I, and regularization parameter μ W = 0.01 are chosen. The subsequent results are generated using an ELM network with hidden neurons n h = 25. A 100 Monte Carlo simulation runs are conducted to test the model on bound violations during the trajectory reproduction from various initial conditions. To motivate our first example, consider a scenario in a manufacturing industry where humans and robots must work collaboratively and close to one another. In such settings, robots must maintain their joint movements bounded when moving in a confined space to ensure the safety of humans and the manufacturing operation is carried out effectively. Robot manipulators are modeled using Euler-Lagrange (EL) dynamics. To demonstrate the applicability of our proposed method to such cases, we consider, for simplicity, a two-link robot planar manipulator with lengths L 1 = 1 [m] and L 2 = 1 [m] and masses m 1 = 1 [kg] and m 2 = 1 [kg], whose dynamics are described by the EL equation: M(q)q + C(q,q)q + G(q) = u, where M(q) ∈ R 2×2 denotes the inertia matrix, C(q,q) ∈ R 2×2 denotes the centripetal-Coriolis matrix, G(q) ∈ R 2 denotes the gravity vector, u ∈ R 2 represents the control input vector, and q,q ∈ R 2 denote the joint angles and angular velocity, respectively. A nonlinear black-box model of the EL-dynamics in the following form q k+1 = f (q k , u k ) is considered.
To generate the state trajectories data q k for training the ELM model, a PID set-point controller is designed to regulate the joint positions to a desired value q = [π, − π 4 ] T . Through empirical investigations, a set of seven trajectories with randomly selected initial conditions are collected and their PID gains are tuned individually. The joint position satisfies the following bounds: 1.10 ≤ q(1) ≤ 3.49 and −1.90 ≤ q(2) ≤ 0.31, where q(1) and q(2) denote the joint's first and second dimension. To avoid over sampling for the constraint computations, an invariant region in the form of an ellipse inscribed within the joint positions bounds is selected, i.e., h(q k ) = 1 − (q k −q) A(q k −q), whereq ∈ R 2 denotes the hyperellipsoid's center of gravity, ι 1 and ι 2 are the major and minor axes of the ellipse, respectively and α is the orientation of the ellipse such that it encloses the demonstrations data. Other parameters γ = 0.8, ρ = 0.9, ζ = 0.8, and δ = 0.9 are selected empirically. In Fig. 2 the results of the ELM parameter learning are shown when the system parameters are subject to probabilistic safety and stability constraints given in (54b) and (54c). Using the DT-ZCBF implies asymptotic stability of set S, shown as an invariant region in Fig. 2. Hence, if any bounded disturbances push the state outside the invariant region, the set S is asymptotically reached. For this reason, when training the system model, the points are sampled from a more extensive set than the selected ellipse [19]. The robot joint position trajectory is produced at a different initial condition selected at random from a uniform distribution between the initial condition used during training and a point that is 0.5 mm distant apart. It can be seen from Fig. 2, the reproduced trajectory remains inside the desired invariant region for all time and converges to the desired set-point. Same results are observed for all the Monte Carlo runs. Fig. 3 shows the evolution of the joint angles using the learned ELM parameters plotted as function of time.
The reproduced joint position trajectories never cross the set boundaries in either direction and asymptotically converge to a point very close to the desired position. Advanced driver-assistance systems (ADAS) are technologies that assist drivers by providing sensory information as warnings or reducing their control efforts. Most modern-day vehicles are equipped with ADAS, an example of safetycritical control applications. The architecture of autonomous driving technology, which can also be viewed as ADAS but with lesser driver engagements, consists of sensing, perception, planning, and operation [39]. Motion planning that computes moving paths of a vehicle while operating within a where p x and p y denote the 2D position, ψ is the orientation, and v, ω denote the linear and angular velocities of the robot, respectively. Moreover, a represents the center of the wheel to the position of interest (p x , p y ), which for this example a constant a = 0.12 [m] is chosen. This model can be written as a nonlinear system given in (1) after discritization, where x k = [p x (k), p y (k), ψ(k)] ∈ R 3 is the state vector, and u k ∈ R 2 is the system control input. For the system in (56), an adaptive controller similar to [40] is first designed that tracks a desired trajectory. Through empirical investigation, five trajectories with randomly selected initial conditions are collected to be used as a demonstration data for training the ELM network. The bounds on the states of (56) are selected as p x ∈ (−2, , wherex ∈ R 3 denotes the ellipsoid's center of gravity, and A wmr = diag( −2 1 , −2 2 , −2 3 ), where 1 , 2 , and 3 are the semi-axes of the ellipsoid whose values are chosen such the ellipsoid encloses the demonstration data. Along with the states x k two additional inputs in the form of the error, i.e., e x = p x − p * x and e y = p y − p * y are provided as inputs to the ELM network, where p * x and p * y are the target locations that can be obtained from the demonstration data. The design parameters for this simulation are selected empirically as γ = 0.95 and ρ = 0.90, ζ = 0.5, δ = 1. Fig. 4 shows the significance of constrained model learning methods. As shown in Fig. 4, the green trajectory is generated using a model that is learned subject to probabilistic safety and stability constraints, whereas the blue trajectory that goes outside the state constraints is a result of the model that is learned without any constraints. Moreover, the robustness of the described method to the change of location is tested, and it is shown in Fig. 5. The change of location is determined according to the guidelines presented in [41]. The constrained trajectories are bound to stay inside a user defined safe region while reaching the target.
The purpose of imitation learning is to efficiently learn motion dynamics by showing an example. However, its application is not limited to physical systems. The LASA human handwriting dataset [41] consists of 7 demonstrations where the handwriting motions were collected from a pen input using a Tablet PC. For the third example, the proposed method is tested on the BendedLine shape from dataset. An ellipse is chosen for the invariant region that encloses the BendedLine shape's seven demonstrations. The state trajectories in all 30 shapes in the LASA dataset converge to an origin equilibrium, i.e., x * = 0, causing redundancy in the input vector to the ELM network, that is s k = [x k , x k , 1] . Thus, to circumvent the data redundancy issue for this particular dataset and be able to compute the constraints derived in (54b) and (54c), the state trajectory for the selected shape is translated by a constant before training. Once the model is learned, the state trajectories are translated back to zero. The optimization design parameters for this simulation are selected empirically as follows: γ = 0.85 and ρ = 0.75, ζ = 0.1, δ = 1. Fig. 6 shows that the trajectories generated by the learned model remain inside a safe set, and converge to the system equilibrium. Since the constraints are derived for one-step-ahead prediction, no barrier violation occurs during the Monte Carlo runs if the initial condition is sufficiently close to the initial condition of the demonstration trajectories. The average CPU time for training the ELM network for all the simulations is computed over five independent runs with a 1000 number of sampling points is 3.4 seconds.

VI. DISCUSSION AND FUTURE WORK
This paper presents a dynamical system model learning method while providing theoretical guarantees on satisfying the safety and stability properties of the underlying nonlinear discrete-time dynamical system. The first two simulations are examples of physical systems where motion dynamics are modeled using an ELM network whose parameters are learned such that the state remains inside a predefined safe region of operation while converging to the desired goal location. The third simulation shows another example of using the imitation learning method to learn hand-drawn shapes collected from a pen input using a Tablet PC while providing theoretical guarantees on convergence to the goal location and staying within a predefined safe operating region. The scope of the work is that of learning the dynamic model to represent the trajectories of a controlled dynamical system, e.g., the ones considered in the simulation examples. The constrained learned model can be used in many ways, for instance, to produce reference trajectories for an autonomous systems from different initial states to the final states to be followed by designing a controller or to forecast trajectories of a human motion similar to that of the third simulation example.
While designing identification algorithms for nonlinear discrete-time systems from data is fundamental and has significant use in many applications, studying the sample complexity of the proposed learning scheme is of the same importance. The sample complexity of the proposed learning scheme can be analyzed via the following two avenues of research which will be pursued as a future work. 1) Investigations on obtaining an optimal number of demonstrations required to learn the state trajectories of constrained controlled dynamical systems. The sample-complexity properties of incremental gain stability-constrained iterative learning algorithms is analyzed in [42] and some insights are provided into the relationship between system stability and samplecomplexity. An algorithm is introduced in [43] that learns the parameters of a stable nonlinear dynamical system from a single trajectory with optimal sample complexity (offset Rademacher complexity) and nearlylinear run time. 2) While the method presented in this paper uses the discretization resolution τ to make the semi-infinite program tractable, authors in [7] uses the notion of (α-β )-grid to obtain modified constraints with an upper bound on the sample-point density in the region of attraction. Inspired by the method in [7] a study will be conducted to determine the minimum number of samples required to compute CBF and CLF so that the safety and stability of the underlying systems are satisfied point-wise in a closed set. In a related work [44], algorithms are proposed to learn how much data is required so that the certificate function (the Lyapunov or contraction metric) is valid at a prescribed fraction of the relevant state space.

VII. CONCLUSION
A discrete-time nonlinear system model learning method is presented which uses the ELM with probabilistic DT-ZCBF and DT-CLF constraints. The system process noise is modeled as a normally distributed Gaussian noise, which induces a distribution over the safety and stability constraints. Therefore, the probabilistic form of the ZBF and Lyapunov constraints are derived such that safety and stability are guaranteed within user-defined risk tolerance. The constrained ELM learning problem is formulated as a QCQP. A theorem is developed to make the QCQP implementable, which proves that with a modification to the constraints, the quadratic program with an infinite number of constraints can be relaxed to a finite number of constraints. The applicability of the proposed model learning method is demonstrated using three robotics engineering examples. The safety set is modeled as an hyperellipsoid in this work, which is a common representation for a region of operation or an obstacle. In future the safe set representation will be generalized to other sets by modifying the constraints.