Online Optimization of Dynamical Systems with Deep Learning Perception

This paper considers the problem of controlling a dynamical system when the state cannot be directly measured and the control performance metrics are unknown or partially known. In particular, we focus on the design of data-driven controllers to regulate a dynamical system to the solution of a constrained convex optimization problem where: i) the state must be estimated from nonlinear and possibly high-dimensional data; and, ii) the cost of the optimization problem -- which models control objectives associated with inputs and states of the system -- is not available and must be learned from data. We propose a data-driven feedback controller that is based on adaptations of a projected gradient-flow method; the controller includes neural networks as integral components for the estimation of the unknown functions. Leveraging stability theory for perturbed systems, we derive sufficient conditions to guarantee exponential input-to-state stability (ISS) of the control loop. In particular, we show that the interconnected system is ISS with respect to the approximation errors of the neural network and unknown disturbances affecting the system. The transient bounds combine the universal approximation property of deep neural networks with the ISS characterization. Illustrative numerical results are presented in the context of control of robotics and epidemics.


I. INTRODUCTION
Control frameworks for modern engineering and societal systems critically rely on the use of perceptual information from sensing and estimation mechanisms. Extraction of critical information for feedback control increasingly requires the processing of high-dimensional sensory data obtained from nonlinear sensory systems [1], [2], [3], [4], and the interpretation of information received from humans interacting with the system regarding the end-user perception of safety, comfort, or (dis)satisfaction [5], [6]. For example, control systems in autonomous driving rely on positioning information extracted from camera images [1] and must account for the perception of the safety of the vehicle occupants [7]. In power grids, state feedback is derived from nonlinear state estimators or pseudo-measurements [8], and control goals must account for comfort and satisfaction objectives of the end-users that are difficult to model [9].
Within this broad context, this paper considers the problem of developing feedback controllers for dynamical systems where the acquisition of information on the system state and on the control performance metrics requires a systematic integration of supervised learning methods in the controller design process. Further, our problem pertains to the design of feedback controllers to steer a dynamical system towards the solution of a constrained convex optimization problem, where the cost models objectives that are associated with the state and the controllable inputs. The design of feedback controllers inspired by first-order optimization methods has received significant attention during the last decade [10], [11], [12], [13], [14], [15], [16], [17], [18], [19]; see also the recent line of work on using online optimization methods for discrete-time linear time-invariant (LTI) systems [20], [21], [22], [23]. However, open research questions remain on how it is possible to systematically integrate learning methods in the control loop when information on the system and on the optimization model is not directly available, and on how to analyze the robustness and safety of optimization-based controllers in the presence of learning and estimation errors.
In this work, we investigate the design of feedback controllers based on an adaptation of the projected gradient flow method [18], [24] combined with learning components, where: i) estimates of the state of the system are provided by feedforward neural networks [25], [26] and residual neural networks [27], [28], and ii) the gradient information is acquired via finite-differences based on a deep neural network approximation of the costs. When the neural network-based controller is interconnected with the dynamical system, we establish conditions that guarantee input-to-state stability [29], [30] by leveraging tools from the theory of perturbed systems [31], Ch. 9] and singular perturbation theory [31], Ch. 11]. In particular, the ISS bounds show how the transient and asymptotic behaviors of the interconnected system are related to the neural network approximation errors. When the system is subject to unknown disturbances, the ISS bounds also account for the time-variability of the disturbances.
Prior works: Perception-based control of discrete-time linear time-invariant systems is considered in, e.g., [1], [2], where the authors study the effect of state estimation errors on controllers designed via system-level synthesis. Further insights on the tradeoffs between learning accuracy and performance are offered in [4]. For continuous-time systems, ISS results for dynamical systems with deep neural network approximations of state observers and controllers are provided in [3]. Differently from [3], we consider the estimation of states and cost functions and the interconnection of optimization-based controllers with dynamic plants. Optimization methods with learning of the cost function are considered in, e.g., [6], [32], [33] (see also references therein); however, these optimization algorithms are not implemented in closed-loop with a dynamic plant. Regarding control problems for dynamical systems, existing approaches leveraged gradient-flow controllers [17], [34], proximal-methods in [14], prediction-correction methods [15], and hybrid accelerated methods [17]. Plants with nonlinear dynamics were considered in [11], [16], and switched LTI systems in [18]. A joint stabilization and regulation problem was considered in [13], [35]. See also the recent survey by [19]. In all of these works, the states and outputs are assumed to be observable and cost functions are known.
We also acknowledge works where controllers are learned using neural networks; see, for example, [3], [36], [37], and the work on reinforcement learning in [38]. Similarly to this literature, we leverage neural networks to supply state and gradient estimates to a projected gradient-flow controller. By analogy with dynamical systems, optimization has been applied to Markov decision processes in, e.g., [39].
Finally, we note that ISS applied to perturbed gradient flows was investigated in [40]. In this work, we consider interconnections between a perturbed, projected gradient-flow and a dynamical system, and combine the theory of perturbed systems [31], Ch. 9] with singular perturbation [31], Ch. 11]. Our ISS bounds are then customized for feedforward neural networks [25], [26] and residual neural networks [27], [28].
We also acknowledge [41], where basis expansions are utilized to learn a function, which is subsequently minimized via extremum seeking.
Finally, the preliminary work [42] used a gradient-flow controller in cases where the optimization cost is learned via least-squares methods. Here, we extend [42] by accounting for systems with nonlinear dynamics, by using neural networks instead of parametric estimation techniques, by considering errors in the state estimates, and by combining ISS estimates with neural network approximation results.
Contributions: The contribution of this work is threefold. First, we characterize the transient performance of a projected gradient-based controller applied to a nonlinear dynamical system while operating with errors in the gradient. Our analysis is based on tools from ISS analysis of nonlinear dynamical systems. More precisely, we leverage Lyapunov-based singular perturbation reasonings to prove that the proposed control method guarantees that the controlled system is ISS with respect to the variation of exogenous disturbances affecting the system and the error in the gradient. This fact is remarkable because unknown exogenous disturbances introduce shifts in the equilibrium point of the system to control. Second, we propose a framework where optimization-based controllers are used in combination with deep neural networks. We tailor our results to two types of deep neural networks that can be used for this purpose: deep residual networks and deep feedforward networks. We then combine the universal approximation property of deep neural networks with the ISS characterization and provide an explicit transient bound for feedback-based optimizing controllers with neural-network state estimators. Third, we propose a novel framework where deep neural networks are used to estimate the gradients of the cost functions characterizing the control goal based on training data. Analogously to the case above, we tailor our results to two cases: deep residual networks and feedforward networks. In this case, we leverage our ISS analysis to show how it is possible to design optimization-based controllers to accomplish the target control task, and we provide an explicit transient bound for these methods. Finally, we illustrate the benefits of the methods in: (i) an application in robotic control and (ii) the problem of controlling the outbreak of an epidemic modeled by using a susceptible-infected-susceptible model. Overall, our results show for the first time that the universal approximation properties of deep neural networks can be harnessed, in combination with the robustness properties of feedback-based optimization algorithms, to provide guarantees in perception-based control.
In conclusion, we highlight that the assumptions and control frameworks outlined in this paper find applications in, for example, power systems [13], [14], [16], [34], traffic flow control in transportation networks [18], epidemic control [43], and in neuroscience [44]. When the dynamical model for the plant does not include exogenous disturbances, our optimization-based controllers can also be utilized in the context of autonomous driving [1], [2] and robotics [45].
Organization: The remainder of this paper is organized as follows. Section II describes the problem formulation and introduces some key preliminaries used in our analysis. Section III presents a main technical result that characterizes an error bound for gradient-type controllers with arbitrary gradient error. In Sections IV and V, we present our main control algorithms corresponding to the case of state perception and cost perception, respectively. Section VI and Section VII illustrate our simulation results and Section VIII concludes the paper.

II. PRELIMINARIES AND PROBLEM FORMULATION
We first outline the notation used throughout the paper and provide relevant definitions.
Notation: We denote by N, N >0 , R, R >0 , and R ≥0 the set of natural numbers, the set of positive natural numbers, the set of real numbers, the set of positive real numbers, and the set of non-negative real numbers. For vectors x ∈ R n and u ∈ R m , x denotes the Euclidean norm of x, x ∞ denotes the supremum norm, and (x, u) ∈ R n+m denotes their vector concatenation; x denotes transposition, and x i denotes the i-th element of x. For a matrix A ∈ R n×m , A is the induced 2-norm and A ∞ the supremum norm.
The set B n (r) := {z ∈ R n : z < r} is the open ball in R n with radius r > 0; B n [r] := {z ∈ R n : z ≤ r} is the closed ball. Given two sets X ⊂ R n and Y ⊂ R m , X × Y denotes their Cartesian product; moreover, X + B n (r) is the open set defined as X + B n (r) = {x + y : x ∈ X , y ∈ B n (r)}. Given a closed and convex set C ⊂ R n , C {·} denotes the Euclidean projection onto the closed and convex set; i.e., C {y} := arg min x∈C x − y 2 . For a continuously differentiable function φ : R n → R, ∇φ(x) ∈ R n denotes its gradient. If the function is not differentiable at a point x, ∂φ(x) denotes its subdifferential.
Partial ordering: The first orthant partial order on R n is denoted as and it is defined as follows: for any x, z ∈ R n , we say that x z if x i ≤ z i for i = 1, . . . , n. We say that a function φ : R n → R n is monotone if for any x, z ∈ R n such that x z, we have that φ(x) ≤ φ(z). Finally, the interval [x, z], for some x, z ∈ R n , is defined as [x, z] = {w ∈ R n : x w z}.
Set covering: Let Q, Q s ⊂ R n , with Q compact. We say that Q s is an -cover of Q, for some > 0, if for any x ∈ Q there exists a z ∈ Q s such that x − z ∞ ≤ . We say that Q s is an -cover of Q "with respect to the partial order ," for some > 0, if for any x ∈ Q there exists w, z ∈ Q s such that x ∈ [w, z] and w − z ∞ ≤ [28].

A. MODEL OF THE PLANT
We consider systems that can be modeled using continuoustime nonlinear dynamics: where f : X × U × W → R n , with X ⊆ R n , U ⊆ R n u , W ⊆ R n w open and connected sets. In (1), x : R ≥0 → X denotes the state, x 0 ∈ X is the initial condition, u : R ≥0 → U is the control input, and w t : R ≥0 → W is a time-varying exogenous disturbance (the notation w t emphasizes the dependence on time). In the remainder, we restrict our attention to cases where u ∈ U c at all times, where U c ⊂ U is compact 1 . Additionally, we assume that the vector field f (x, u, w) is continuously-differentiable and Lipschitz-continuous, with constants L x , L u , L w , respectively, in its variables. We make the following assumptions on (1). Assumption 1 (Steady-state map): There exists a (unique) continuously differentiable function h : U × W → X such that, for any fixedū ∈ U ,w ∈ W, f (h(ū,w),ū,w) = 0. Moreover, h(u, w) admits the decomposition h(u, w) = h u (u) + h w (w), where h u and h w are Lipschitz continuous with constants h u and h w , respectively. Assumption 1 guarantees that, with constant inputsū,w, system (1) admits a unique equilibrium pointx := h(ū,w). Notice that existence of h(u, w) is always guaranteed in cases where, in addition, ∇ x f (x,ū,w) is invertible for anyū,w. Indeed, in these cases, the implicit function theorem [16] guarantees that h(u, w) exists and is differentiable, since f (x, u, w) is continuously differentiable.
In this work, we interpret w t as an unknown exogenous input modeling disturbances affecting the system. We make the following assumption on w t .

Assumption 2 (Properties of exogenous inputs):
Assumption 2 imposes basic continuity and compactness requirements on the exogenous disturbances affecting (1). Following Assumption 2, in the remainder of this paper we denote by X eq := h(U c × W c ) the set of admissible equilibrium points of the system (1). We note that in Assumption 1 we consider a decomposition h(u, w) = h u (u) + h w (w) so that the Jacobian of h(u, w) with respect to u does not depend on the unknown disturbance w; this property will be leveraged in the implementation of our gradient-based controller. Notably, this assumption is satisfied in, e.g., power systems [13], [14], [16], [34], transportation networks [18], and in neuroscience [44]. Our model clearly subsumes the case where no disturbance w is present, as in the models for, e.g., autonomous driving [1], [2] and robotics [45]. We also emphasize that the dynamics (1) can model both the dynamics of the physical system and of the stabilizing controllers; see, for example, [13], our previous work on LTI systems in [47], and the recent survey [19].
Remark 1 (Compactness of the equilibrium set): Notice that the equilibrium set X eq is compact. This follows by noting that U c × W c is compact, h(u, w) is continuously differentiable, and by application of [48], Theorem 4.14]. Moreover, notice that ∇ u h(u,w) ≤ h u for all u ∈ U c , which follows from compactness of U c , see [48], Ch. 4].
Before proceeding, we let r denote the largest positive constant such that X r := X eq + B n (r) satisfies X r ⊆ X (see Fig. 1  for an illustration). For instance, if X = {x ∈ R n : x < } and X eq = {0}, then r = .
Assumption 3 (Exponential stability): There exist a, k > 0 such that for any fixedū ∈ U c ,w ∈ W c , the bound holds for all t ≥ t 0 and for every initial condition is exponentially stable, uniformly in time. This, in turn, implies the existence of a Lyapunov function as formalized in the following result, which is a direct application of [31,Thm. 4.14].
Lemma 1 (Existence of a Lyapunov function for (1)): Let Assumptions 1-3 hold and let X 0 be the set of initial conditions as in Assumption 3. Then, there exists a function W : X 0 × U × W → R that satisfies the inequalities: for some positive constants Proof: We begin by noting that, under our assumptions, the vector field f (x, u, w) is Lipschitz in X r × U c × W c , and thus its Jacobian ∂ f ∂x is bounded on X r , uniformly with respect to u and w. The proof thus follows by iterating the steps in [31], Thm. 4.14] for fixed u ∈ U c and w ∈ W c , by noting that Assumption 3 implies that solutions that start in X 0 do not leave X r , and thus (2) holds. Then, sensitivity with respect to u, w follows from [31], Lemma 9.8] and [49].
In the following, we state the main optimization problem associated with (1) and formalize the problem statements.

B. TARGET CONTROL PROBLEM
In this work, we focus on the problem of controlling, at every time t, the system (1) to a solution of the following timedependent optimization problem: where φ : U → R and ψ : X → R describe costs associated with the system's inputs and states, respectively, and C ⊂ U c is a closed and convex set representing constraints on the input at optimality.

Remark 2 (Interpretation of the control objective):
The optimization problem (4) formalizes an optimal equilibrium selection problem, where the objective is to select an optimal input-state pair (u * t , x * t ) that, at equilibrium, minimizes the cost specified by φ(·) and ψ (·). It is worth noting thatdifferently from stabilization problems, where the objective is to guarantee that the trajectories of (1) converge to some equilibrium point -the control objective here is to select, among all equilibrium points of (1), an equilibrium point that is optimal as described by the function φ(u) + ψ (x). In this sense, (4) can be interpreted as a high-level control objective that can be nested with a stabilizing controller (where the latter is used to guarantee the satisfaction of Assumption 3).
Two important observations are in order. First, the constraint (4b) is parametrized by the disturbance w t , and thus the solutions of (4) are parametrized by w t (or, equivalently, by time). In this sense, the pairs (u * t , x * t ) are time-dependent and characterize optimal trajectories [50]. Secondly, by recalling that w t is assumed to be unknown and unmeasurable, solutions of (4) cannot be computed explicitly.
By recalling that h(u, w) is unique for any fixed u, w, problem (4) can be rewritten as an unconstrained problem: We make the following assumptions on the costs of (5).

Assumption 4 (Smoothness and strong convexity):
The following conditions hold: Assumption 5 (Regularity of optimal trajectory map): There exists a continuous function J : Assumption 5 imposes regularity assumptions on the function that maps w t (which parametrizes the problem (5)) into the optimal solution u * t [51], Ch. 2]; conditions can be obtained from standard arguments in parametric convex programming.

C. OPTIMAL REGULATION WITH PERCEPTION IN-THE-LOOP
Feedback-based optimizing controllers for (1)-(4) were studied in [18] when (1) has linear dynamics and in [16] when (4) is unconstrained and w t is constant. The authors consider low-gain gradient-type controllers of the form: where H (u) denotes the Jacobian of h u (u) and η > 0 is a tunable controller parameter. The controller (6) is of the form of a projected gradient-flow algorithm, often adopted to solve problems of the form (4), yet modified by replacing the true gradient ∇ψ (h(u, w t )) with the gradient ∇ψ (x) evaluated at the instantaneous system state, thus making the iteration (6) independent of the unknown disturbance w t . Implementations of the controller (6) critically rely on the exact knowledge of the system state x as well as of the gradients ∇φ(u) and ∇ψ (x). In this work, we consider two scenarios. In the first, the controller is used with an estimatê x of x provided by a deep neural network. More precisely, we focus on cases where x is not directly measurable; instead, we have access only to nonlinear and possibly high-dimensional observations of the state ξ = q(x), where q : X → R n ξ is an unknown map. In the second case, the controller is used with estimates of the gradients ∇φ(u), ∇ψ (x), obtained by using a deep neural network. Similarly to before, we consider cases where the analytic expressions of the gradients are unknown, and instead, we have only access to functional evaluations the cost functions. We formalize these two cases next.
Problem 1 (Optimization with state perception): Design a feedback controller to regulate inputs and states of (1) to the time-varying solution of (4) when x is unmeasurable and, instead, we have access only to state estimatesx =p(ξ ) produced by a deep neural networkp(·) trained as a state observer.
Problem 2 (Optimization with cost perception): Design a feedback controller to regulate inputs and states of (1) to the time-varying solution of (4) when ∇φ(u), ∇ψ (x) are unknown and, instead, we have access only to estimatesφ(u), ψ (x) of φ(u), ψ (x) produced by a deep neural network trained as a function estimator.
We conclude by discussing in the following remarks the relevance of Problems 1-2 in the applications.
Remark 3 (Motivating applications for Problem 1): In applications in autonomous driving, vehicles states are often reconstructed from perception-based maps ξ = q(x) where q describes images generated by cameras. In a power systems context, ξ = q(x) describes the nonlinear power flow equations describing the relationships between net powers and voltages at the buses (described by ξ ) and generators' phase angles and frequencies (described by x). Finally, we note that a related observer design problem was considered in [3].

Remark 4 (Motivating applications for Problem 2):
When systems interact with humans, φ(u) is often used to model end-users' perception regarding safety, comfort, or (dis)satisfaction of the adopted control policy [6], [32], [33], [42], [52]. Due to the complexity of modeling humans, φ(u) is often unknown and learned from available historical data. In robotic trajectory tracking problems, ψ (x) = x − x r 2 where x r ∈ R n models an unknown target to be tracked. In these cases, we have access only to measurements of the relative distance x − x r 2 between the robot the target. Additional examples include cases where ψ (x) represents a barrier function associated with unknown sets [53], [54].

III. GENERAL ANALYSIS OF GRADIENT-FLOW CONTROLLERS WITH GRADIENT ERROR
In this section, we take a holistic approach to address Problems 1-2 and we provide a general result characterizing gradient-type controllers of the form (6) that operate with general errors. More precisely, in this section we study the following plant-controller interconnection: is the nominal gradient as in (6), and e : X × U → R n u models any state-or input-dependent error. It is worth noting three important features of the controller (7b). First, (7b) can be implemented without knowledge of w t (similarly to (6), the true gradient ∇ψ (h(u, w t )) is replaced by evaluations of the gradient at the instantaneous state ∇ψ (x)). Second, since the vector field in (7b) is Lipschitz-continuous, for any (x 0 , u 0 ) the initial value problem (7) admits a unique solution that is continuously differentiable [18], Lemma 3.2], [24]. Third, the set C is attractive and forward-invariant for the dynamics (7b); namely, if u(t 0 ) ∈ U , then u(t ) approaches C exponentially, and if To state our results, we let z := (x − x * t , u − u * t ) be the tracking error between the state of (7) and the optimizer of (4). Moreover, for fixed s ∈ (0, 1), define: and Theorem 1 (Transient bound for gradient flows with error): Consider the closed-loop system (7) and let Assumptions 1-5 be satisfied. Suppose that, for any x ∈ X 0 and u ∈ U , the gradient error satisfies the condition for some δ > 0 and γ ∈ [0, c 0 /c 3 ). If η ∈ (0, η * ), with η * := min 2μ 2 , then, the tracking error satisfies for all t ≥ t 0 , where α = c 0 − γ c 3 and for any x(t 0 ) ∈ D 0 := X eq + B n (r ) where r is such that , and for any u(t 0 ) ∈ U . The proof of this claim is postponed to the Appendix. Theorem 1 asserts that if the worst-case estimation error e(x, u) is bounded by a term γ z that vanishes at the optimizer and by a nonvanishing but constant term δ, then a sufficiently-small choice of the gain η guarantees exponential convergence of the tracking error to a neighborhood of zero. More precisely, the tracking error z is ultimately bounded by two terms: the first κ 2 ess sup t 0 ≤τ ≤t ẇ τ accounts for the effects of the time-variability of w t on the optimizer (u * t , x * t ), and the second κ 3 δ accounts for the effects of a nonvanishing error in the utilized gradient function. It follows that the bound (11) guarantees input-to state stability (ISS) of (7) (in the sense of [30], [40], [55]) with respect toẇ t and δ.

IV. OPTIMIZATION WITH NEURAL NETWORK STATE PERCEPTION
In this section, we propose an algorithm to address Problem 1 and tailor the conclusions drawn in Theorem 1 to characterize the performance of the proposed algorithm.

A. ALGORITHM DESCRIPTION
To produce estimates of the system statex = p(ξ ), we assume that a set of training points {(ξ (i) , x (i) )} N i=1 is utilized to train a neural network via empirical risk minimization. More precisely, in the remainder, we will study two types of neural networks that can be used for this purpose: (i) feedforward neural networks and (ii) residual neural networks 2 . We thus propose to train a neural network to produce a mapx =p(ξ ) that yields estimates of the system state given nonlinear and 2 We refer the reader to the representative papers [25], [28] for an overview of feedforward and residual networks. Briefly, a neural network consists of inputs, various hidden layers, activation functions, and output layers, and can be trained for, e.g., functional estimation and classification. When the layers are sequential and the architecture is described by a directed acyclic graph, the underlying network is called feedforward; when some of these layers are bypassed, then the underlying network is called residual.
Given: set U , funct.s ∇φ, ∇ψ, H (u), neural netp, gain η Initial conditions: high-dimensional observations ξ . Accordingly, we modify the controller (6) to operate with estimates of the system statex produced by the neural network. The proposed framework is described in Algorithm 1 and illustrated in Fig. 2.
In the training phase of Algorithm 1, the map NN-learning(·) denotes a generic training procedure for the neural network via empirical risk minimization. The output of the training phase is the neural network mappinĝ p(·). In the feedback control phase, the mapp(·) is then used to produce estimates of the state of the dynamical system x =p(ξ ) in order to evaluate the gradient functions. Notice that, relative to the nominal controller (6), (12c) leverages a gradient that is evaluated at an approximate pointx, and thus fits the more general model (7b).

B. ANALYSIS OF ALGORITHM 1
In what follows, we analyze the tracking properties of Algorithm 1. To this end, we introduce the following.
Assumption 6 (Generative and Perception Maps): The generative map x → q(x) = ξ is such that, for any compact set X ⊆ X r , the image q(X ) is compact. Moreover, there exists a continuous p : R n ξ → R n such that p(ξ ) = x for any x ∈ X r , where ξ = q(x).

Remark 5 (Relationship with System Observability):
We note that a standard approach in the literature for the state observer design problem is to leverage the concept ofobservability, where the state is estimated based on + 1 samples of ξ and samples of the inputs u, w [3]. However, in our setup, we do not have access to measurements of the exogenous input w t . Therefore, we rely on an approach similar to [1], [2], where x is estimated from the observation ξ .
To guarantee that network training is well-posed, we assume that the N training points {x (i) } N i=1 for the state are drawn from a compact set X train := X eq + B n [r train ], where r train is such that r 0 ≤ r train < r. Moreover, we let Q train := q(X train ) denote the perception set associated with the training set X train , and we denote by Q train,s := {ξ (i) = q(x (i) ), i = 1, . . . , N} ⊂ Q train the set of available perception samples. Notice that the set Q train is compact by Assumption 6. Compactess of Q train will allow us to build on the results of [25] and [3], [27] to bound the perception error p(ξ ) −p(ξ ) on the compact set Q train . With this background, we let sup ξ ∈Q train,s p(ξ ) −p(ξ ) ∞ denote the supremum norm of the approximation error over Q train,s .
Remark 6 (Properties of the Training Set) Notice that the set of training data X train is assumed to contain the set of initial conditions X 0 . This allows us to guarantee that the neural network can be trained over the domain of definition of the Lyapunov function W in Lemma 1 (see Fig. 1 for an illustration). By contrast, if the set X train were contained in X 0 , then set of initial conditions of (12) must be modified so that the trajectories do not leave the set X train .
We begin by characterizing the performance of (12) when residual networks are utilized to reconstruct the system state. For simplicity of exposition, we outline the main result for the case where n = n ξ , and then discuss how to consider the case n < n ξ in Remark 7.
Proposition 1 (Transient Performance of Algorithm 1 with Residual Neural Network): Consider the closed-loop system (12), let Assumptions 1-6 be satisfied, and assume n = n ξ . Assume that the training set Q train,s is a -cover of Q train with respect to the partial order , for some > 0. Let p resNet : R n ξ → R n ξ describe a residual network, and assume that it can be decomposed as p resNet = m + A, where m : R n ξ → R n ξ is monotone and A : R n ξ → R n ξ is a linear function. If Algorithm 1 is implemented withp = p resNet and η ∈ (0, η * ), then the error z(t ) = (x − x * t , u − u * t ) of (12) satisfies (11) with κ 1 , κ 2 , κ 3 as in Theorem 1, γ = 0, and where ω p is a modulus of continuity of p on Q train . Proof: Start by noticing that (12c) can be written in the generic form (7b) with the error e(x, u) given by e(u, (p(ξ )), by simply adding and subtracting the true gradient H (u) ∇ψ (x) in (12c). A uniform bound on the norm of e(u, x) over the compact set Q train is given by: where we have used Assumption 1 and the fact that the norm of the Jacobian of h is bounded over the compact set U . Next, notice first that p(ξ ) −p(ξ ) ≤ √ n ξ p(ξ ) − p resNet (ξ ) ∞ . Since Q train,s is a -cover of Q d with respect to the partial order , for some > 0, and p resNet = m + A, the infimum norm of the estimation error can be upper bounded as sup ξ ∈Q train p(ξ ) − p resNet (ξ ) ∞ ≤ 3 sup ξ ∈Q train,s p(ξ ) − p resNet (ξ ) ∞ + 2 ω p ( ) + 2 A p ∞ as shown in [28], Theorem 7]. The result then follows from Theorem 1 by setting γ = 0 and δ as in (13).
Proposition 1 shows that the control method in Algorithm 1 guarantees convergence to the optimizer of (4) up to an error that depends only on the uniform approximation error of the adopted neural network. Notice that sup ξ ∈Q train,s p(ξ ) − p resNet (ξ ) ∞ is a constant that denotes the worse-case approximation error of the training data over the compact set Q train,s . More precisely, the result characterizes the role of the approximation errors due to the use of a neural network in the transient and asymptotic performance of the interconnected system (12).
In this case, we use the training set {(x (i) , 0 q−n ), ξ (i) } N i=1 to train the neural network implementing a map p resNet : R n ξ → R n ξ [3]. Subsequently, the perception mapp that will be used in (12c) is given byp = π • p resNet , where π : R n ξ → R n is a projection map that returns the first n entries of its argument, namely, π (y) = (y 1 , . . . , y n ), for any y ∈ R q . In short, the training step NN-learning(·) in Algorithm 1 for this case involves the training of the map p resNet , followed by the projectionx =p(ξ ) = π (p resNet (ξ )). Finally, we notice that, for this case, the claim in Proposition 14 holds unchanged by replacing p(ξ ) withp(ξ ). This follows by noting that

Remark 8 (Density of Training Set):
Proposition 1 requires the training set Q train,s is a -cover of Q train , with respect to the partial order . As pointed out in [3], verifying this condition often involves computing the relative position of the training points and the points in the set Q train . When this is not possible, [3], Lemma 2] shows that there exists a relationship between -covering of the set Q train with respect to and the density of the training points. In particular, the authors show that if the set of training points is a -cover of Q train , then it is also a -covering for a set the set Q train , Q train ⊂ Q train , with respect to , for some > ; see [3,Lemma 2].
In the remainder of this section, we focus on characterizing the performance of (12) when a feedforward network is utilized to reconstruct the system state. More precisely, we consider cases where the training set {ξ (i) , x (i) } N i=1 is utilized to train n multilayer feedforward networks, each of them implementing a map p feedNet,i : R n ξ → R that estimates the i-th component of the system statex i = p feedNet,i (ξ ). In this case, we assume that Algorithm 1 is implemented withp = p feedNet , where p feedNet (ξ ) := (p feedNet,1 (ξ ), . . . , p feedNet,n (ξ )) in (12). Next, we recall that feedforward neural networks are capable of approximating any measurable function on compact sets with any desired degree of accuracy (see, for instance, [25], [26] and the bounds in [56], [57]).
The proof follows similar steps as in Proposition 1, and it is omitted. Proposition 2 shows that the control method in Algorithm 1 guarantees convergence to the optimizer of (4), up to an error that depends only on the uniform approximation error sup ξ ∈Q train p(ξ ) − p feedNet (ξ ) ∞ (computed over the entire training set Q train ). Notice that, with respect to Proposition 1, the adoption of a feedforward network allows us to provide tighter guarantees in terms of the entire set Q train (as opposed to the set of available samples Q train,s ). We conclude by noting that the bound (15) can be further customized for specific error bounds, given the architecture of the feedforward network [56], [57].
Remark 9 (Noisy generative and perception maps): Assumption 6 is borrowed from [2] and it holds when, for example, q is injective. Although the model in Assumption 6 is used for simplicity, the subsequent analysis of our perception-based controllers can be readily extended to the case where: (i) the perception map imperfectly estimates the state; that is, one has that p(ξ ) = x + ν, with ξ = q(x), and where ν ∈ R n is a bounded error [1]. (ii) When unknown externalities enter the generative map. One way to collectively account for both externalities entering q and for approximate perception map is to use the noisy model p(q(x)) = x + ν , with ν ∈ R n a given error (bounded in norm). The results presented in this section can be readily modified to account for this additional error by adding a term proportional to the norm of ν in the parameter δ. Obtain: Given: For t ≥ t 0 :

V. OPTIMIZATION WITH COST-FUNCTION PERCEPTION
In this section, we propose an algorithm to address Problem 2, and we tailor the conclusions drawn in Theorem 1 to characterize the performance of the proposed algorithm.

A. ALGORITHM DESCRIPTION
To determine estimates of the gradient functions ∇φ(u), ∇ψ (x), we assume the availability of functional , with u (i) ∈ C and x (i) ∈ X train . We then consider the training of two neural networks that approximate the functions u → φ(u) and x → ψ (x), respectively, to determineφ(u), ψ (x). Accordingly, we modify the controller (6) to operate with estimates of the system statex produced by the neural network. The proposed framework is described in Algorithm 2 and illustrated in Fig. 3.
In Algorithm 2, the gradients of the costs are obtained via centered difference, applied to the estimated mapsφ andψ, where ε > 0, b i denotes the i-th canonical vector of R n u , and d i is the the i-th canonical vector of R n . The computation of the approximate gradientĝ u (respectively,ĝ x ) thus requires 2n u functional evaluations (respectively, 2n) of the neural network mapφ (respectively,ψ). The gradient estimatesĝ u andĝ x are then utilized in the gradient-based feedback controller (16d).

B. ANALYSIS OF ALGORITHM 2
We begin by characterizing the performance of (16) when feedforward networks are utilized to estimate the costs.
Proposition 3 (Transient Performance of Algorithm 2 with Feedforward Neural Network): Suppose that feedforward network mapsφ feedNet andψ feedNet approximate the costs φ and ψ over the compact sets X train and C train := C + B[ε], respectively. Consider the interconnected system (16), witĥ φ =φ feedNet andψ =ψ feedNet , and let Assumptions 1-5 be satisfied. If η ∈ (0, η * ), then error z(t ) = (x − x * t , u − u * t ) satisfies (11) with κ 1 , κ 2 , κ 3 as in Theorem 1, γ = 0, and where e x,fd and e x,fd are bounds on the centered difference approximation error for the functions φ and ψ, respectively; namely, be the finite difference approximations of the true gradients for brevity. Adding and subtracting g u (u) and g x (x) using the triangle inequality, and Assumption 1, we get e(x, u) ≤ ∇φ(u) − g u (u) The terms ∇φ(u) − g u (u) and ∇ψ (x) − g x (x) are errors due to a finite difference approximation of the true gradients, and are bounded by e u,fd and e x,fd , respectively. On the other hand, g u (u) −ĝ u (u) can be bounded as: where we used the fact that b i = 1. Similar steps can be used to bound the error term g x (x) −ĝ x (x) to get the final expression for δ in (19). Proposition 3 shows that the control method in Algorithm 2 guarantees convergence to the optimizer of (4), up to an error that depends on the uniform approximation error of the neural networks and on the accuracy of the centered approximation method. More precisely, the result characterizes the role of the approximation errors due to the use of a feedforward neural network in the transient and asymptotic performance of the interconnected system (16).
In the remainder of this section, we focus on characterizing the performance of (16) when a residual network is utilized to reconstruct the system state. To provide guarantees for residual networks, it is necessary to replace φ(·) by its the lifted counterpartφ : R n u → R n u , defined asφ = ι φ • φ, where ι φ : R → R n u is the injection ι φ (z) = (z, 0, . . . , 0) for any z ∈ R. Following [28], we consider a residual network mapφ resNet : R n u → R n u approximating the lifted mapφ; the functionφ used in (16) is then given byφ(u) =φ resNet (u) b 1 , where we recall that b 1 is the first canonical vector of R n u . Similarly, consider the lifted mapψ : R n → R n defined asψ = ι ψ • ψ, where ι φ : R → R n is such that ι ψ (z) = (z, 0, . . . , 0) for any z ∈ R, and consider a residual network mapψ resNet : R n → R n approximating the lifted mapψ. Accordingly, it follows thatψ (x) =ψ resNet (x) d 1 . With this setup, we have the following.
Proposition 4 (Transient Performance of Algorithm 2 with Residual Neural Network): Suppose that the residual network mapsφ resNet andψ resNet approximate the functionsφ andφ over the compact sets X train and C train , respectively. Suppose that the set of training points C train,s := {u (i) i } is a u -cover of C train with respect to the partial order , for some u > 0, and X train,s = {x (i) i } is a x -cover of X train with respect to the partial order , for some x > 0. Moreover, suppose that the residual network maps can be decomposed asφ resNet = m u + A u andψ resNet = m x + A x , where m u : R n u → R n u and m x : R n → R n are monotone, and A u , A x are a linear functions. Consider the interconnected system (16), withφ(u) = φ resNet (u) b 1 andψ (x) =ψ resNet (x) d 1 , and let Assumptions 1-5 be satisfied. If η ∈ (0, η * ), the error (11) with κ 1 , κ 2 , κ 3 as in Theorem 1, γ = 0, and δ = e u,fd + n 3/2 where e x,fd and e x,fd are defined as in Proposition 3, ω u , ω x are the moduli of continuity ofφ andψ, respectively, and The proof of Proposition 4 follows similar steps as the proof of Proposition 3 and is omitted for space limitations. Proposition 4 shows that the control method in Algorithm 2 guarantees convergence to the optimizer of (4) up to an error that depends on the uniform approximation error of the neural networks and on the accuracy of the centered approximation method. Notice that, with respect to the characterization in Proposition 3, the use of a residual neural network allows us to characterize the error with respect to accuracy of the available set of samples C train,s and X train,s .

VI. APPLICATION TO ROBOTIC CONTROL
In this section, we illustrate how to apply the proposed framework to control a unicycle robot to track an optimal equilibrium point and whose position is accessible only through camera images. We consider a robot described by unicycle dynamics with state x = (a, b, θ ), where r := (a, b) T ∈ R 2 denotes the position of the robot in a 2-dimensional plane, and θ ∈ (−π, π] denotes its orientation with respect to the a−axis [45]. The unicycle dynamics are: where v, ω ∈ R are the controllable inputs. Given the dynamics (20), we assume that its state x = (a, b, θ ) is not directly measurable for control purposes, instead, at every time x can be observed only through a noisy camera image, denoted by ξ = q(x); see, e.g. [1], [2]. To tackle the desired problem, we consider an instance of (4) with: where r f ∈ R 2 denotes the desired final position of the robot.
To address this problem, we consider a two-level control architecture, in which an onboard (low-level) stabilizing controller is first used to stabilize the unicycle dynamics (20) (to guarantee satisfaction of Assumption 3) and, subsequently, the control framework outlined in Algorithm 1 is utilized to select an optimal high-level control references. To design a stabilizing controller, let u = (u a , u b ) ∈ R 2 denote the instantaneous high-level control input, and consider the standard change of variables from rectangular to polar coordinates (ξ, φ), given by: In the new variables, the dynamics read as: The following lemma provides a stabilizing control law for (22). Lemma 2: (Stability of unicycle dynamics) The unicycle dynamics (22) with the following control law: (23) k > 0, admit a unique equilibrium point (ξ, φ) = (0, 0) that is globally exponentially stable.
Proof: Noticing that the dynamics of ξ and φ are decoupled, consider the Lyapunov function V (φ) = 1 2 φ 2 . We have: with V (φ) = 0 if and only if φ = 0. The proof of exponential stability of ξ follows immediately from [45], Lem. 2.1]. According to Lemma 2, the dynamics (22) with the onboard control law (23) satisfy Assumption 3. We next apply the perception-based control framework outlined in Algorithm 2 to design the reference input u to be utilized in (21).
For our perception-based controller, we use a residual neural network to estimate the state perception map; in particular, the neural network returns estimated state coordinates from aerial images following the procedure of Algorithm 1. To prepare for the execution of Algorithm 1, we generate 122,880 square, blurry, RGB images of size 64 × 64 pixels to represent the location and orientation of the robot on a two-dimensional plane. These images were built using the MATLAB Image Processing Toolbox. First, we build a base image of a maroon pentagon representing the robot over a grassy background (using ranges of RGB values for different green colors), and then add blurriness to the image using the MATLAB function imgaussfilt; see, for example, the sample images in Fig. 4(b). This function filters the base image with a 2-D Gaussian smoothing kernel with a standard deviation of 0.5. For our residual network, we used the resnet50 network structure given in the MATLAB Deep Learning Toolbox and tailored the input output sizes for our particular setting. Specifically, we set the input layer to read in images of size 64 × 64 pixels and set the number of possible output labels as 64 2 = 4996 to account for all possible locations or coordinates of the pixelized image. The number of training images for the network was chosen so that all possible locations or coordinates of the pixelized image are generated for 30 different orientations, totaling 122,880 training images. After training, the residual network returns estimated locations in the 2D plane by identifying the center pixel of the robot. Of course, during the time-evolution of the closed loop dynamics with the residual network, the returned labels are converted into (a, b) coordinates. We note that, while the MATLAB Image Processing Toolbox allows for classification of images into output labels, future efforts will look at implementations of residual networks for regression.
Simulation results are presented in Fig. 4, when using the initial conditions (a 0 , b 0 , θ 0 , u a 0 , u b 0 ) = (−1.5, −1.5, −π/2, 1, 1); moreover, the optimal solution of the problem is r * = (−0.5, −0.5). As evidenced in Figure 4(a), differences persist between the trajectories produced by the nominal controller, which utilizes perfect state information, and by the perception-based controller. Importantly, imperfect estimates of (a, b) are also due to the fact that the labels returned from our trained network correspond to the pixels of the image, which can limit how well the (a, b) values are represented. Fig. 4(b) shows sample images at times t = 4.32 and t = 70.36; these images are used as inputs to the neural network. As expected, the ideal controller converges to the reference state arbitrarily well, while the perception-based controller converges to the reference state to within a neighborhood dependent on the error associated with the perceived state. Overall, these simulations demonstrate the claims made in Proposition 1.

VII. APPLICATION TO EPIDEMIC CONTROL
In this section, we apply the proposed framework to regulate the spread of an epidemic by controlling contact-relevant transmissions. The latter is achieved by e.g. selecting the intensity of restrictions such as mask-wearing, social restrictions, school closures, and stay-at-home orders, etc. To describe the epidemic evolution, we adopt a susceptibleinfected-susceptible (SIS) model [58], described by: (24) where s ∈ R, x ∈ R describe the fraction of susceptible and infected population, respectively, with s + x = 1 at all times, u ∈ (0, 1] is an input modeling the reduction in contactrelevant transmissions, β > 0 is the transmission rate, μ > 0 is the death/birth rate, and γ > 0 is the recovery rate. Model parameters of (24) are chosen as follows: β = 4, γ = 1/9, μ = 10 −4 , cost function parameters: u ref = 0.36, x ref = 0.85, w ψ = w ψ = 1. As characterized in [58], Thm. 2.1 and Lem. 2.1], (24) admits a unique (unstable) equilibrium point described by x = 0 (called disease-free equilibrium) and a unique equilibrium point with x = 0 (called endemic equilibrium) that is exponentially stable [58], Thm. 2.4], thus satisfying Assumption 3. We utilize the control problem (4) to determine an optimal balance between a desired fraction of infections x ref and a desired level of restrictions u ref . More formally, we consider an instance of (4) with φ(u) = w φ (u − u ref ) 2 1] are desired reference inputs and states and w φ , w ψ ∈ R ≥0 are weighting factors. For our simulations, we perform the change of variables (x,ũ) = (x, 1 u ); in the new variables, h(ũ) = 1 − μ+γ βũ satisfies Assumptions 1 and 4. In order to illustrate the effects of perception on the control loops, in what follows we assume w t = 0 so all the tracking error can be associated to perception errors.

A. OPTIMIZATION WITH STATE PERCEPTION
We begin by illustrating the case of state perception (Section IV). One of the main challenges in predicting and controlling the outbreak of an epidemic is related to the difficulty in estimating the true number of infections from incomplete information describing documented infected individuals with symptoms severe enough to be confirmed. During the outbreak of COVID-19, researchers have proposed several methods to overcome these challenges and by using several sources of data including [59] detected cases, recovered cases, deaths, test positivity [60], and mobility data [46], [61]. In our simulations, we adopted the approach in Algorithm 1 to achieve this task. For the purpose of illustration, we utilized a map q(·) composed of a set of 4 Gaussian basis functions with mean μ b = (1,5,9,13) and variance σ = I to determine the perception signal ξ . The training phase of Algorithm 1 has then been performed using a feedforward neural network to determine the mapx =p(ξ ) to reconstruct the state of (24). Simulation results are illustrated in Fig. 5. As illustrated by the state trajectories in Fig. 5(a)-(b), the use of a neural network originates discrepancies between the true state and the estimated state, especially during transient phases. Fig. 5(c) provides a comparison of the tracking error between the use of the ideal controller (6) (which uses the exact state) and the controller in Algorithm 1. The numerics illustrate that while the exact controller (6) is capable of converging to the desired optimizer with arbitrary accuracy, the controller in Algorithm 1 yields an error of the order 10 −2 , due to uncertainties in the reconstructed system state. Overall, the simulations validate the convergence claim made in Proposition 2.

B. OPTIMIZATION WITH COST-FUNCTION PERCEPTION
Next, we illustrate the case of cost perception (Section V). For illustrative purposes, we focus on cases where the analytic expression of φ(u) in (4) is known, while ψ (x) is unknown. As described in Algorithm 2, we utilized a set of samples {(x i , ψ(x i ))} M i=1 to train a feedforward neural network to approximate the function ψ (x). Simulation results are illustrated in Fig. 6. Fig. 6(a) illustrates the set of samples used for training and provides a comparison between the true gradient ∇ψ (x) and the approximate gradientĝ x (x), obtained through the neural network. Fig. 6(b) provides a comparison between the state trajectories obtained by using the ideal controller (6) and those obtained through the perception-based controller in Algorithm 2. Fig. 5(c) provides a comparison of the tracking error between the use of the ideal controller (6) (which uses the exact gradients of the cost) and the controller in Algorithm 2. The simulations illustrate that while the exact controller (6) is capable of converging to the desired optimizer with arbitrary accuracy, the controller in Algorithm 2 yields a nontrivial steady-state error due to uncertainties in the gradient function. Overall, the simulations validate the convergence claim made in Proposition 3.

VIII. CONCLUSION
We proposed algorithms to control and optimize dynamical systems when the system state cannot be measured and the cost functions in the optimization are unknown, but instead, can only be sensed via neural network-based perception. Our results show for the first time how feedback-based optimizing controllers can be adapted to operate with perception in the control loop. Our findings crucially hinge on recent results on the uniform approximation properties of deep neural networks. We believe that the results can be further refined to account for cases where the training of the neural networks is performed online and we are currently investigating such possibility. While this paper provided conditions to obtain exponential ISS results for the interconnection of a plant with a projected gradient-flow controller, future efforts will analyze the interconnection of plants and controllers that are (locally) asymptotically stable when considered individually.