Reachability Analysis of Neural Feedback Loops

Neural Networks (NNs) can provide major empirical performance improvements for closed-loop systems, but they also introduce challenges in formally analyzing those systems' safety properties. In particular, this work focuses on estimating the forward reachable set of \textit{neural feedback loops} (closed-loop systems with NN controllers). Recent work provides bounds on these reachable sets, but the computationally tractable approaches yield overly conservative bounds (thus cannot be used to verify useful properties), and the methods that yield tighter bounds are too intensive for online computation. This work bridges the gap by formulating a convex optimization problem for the reachability analysis of closed-loop systems with NN controllers. While the solutions are less tight than previous (semidefinite program-based) methods, they are substantially faster to compute, and some of those computational time savings can be used to refine the bounds through new input set partitioning techniques, which is shown to dramatically reduce the tightness gap. The new framework is developed for systems with uncertainty (e.g., measurement and process noise) and nonlinearities (e.g., polynomial dynamics), and thus is shown to be applicable to real-world systems. To inform the design of an initial state set when only the target state set is known/specified, a novel algorithm for backward reachability analysis is also provided, which computes the set of states that are guaranteed to lead to the target set. The numerical experiments show that our approach (based on linear relaxations and partitioning) gives a $5\times$ reduction in conservatism in $150\times$ less computation time compared to the state-of-the-art. Furthermore, experiments on quadrotor, 270-state, and polynomial systems demonstrate the method's ability to handle uncertainty sources, high dimensionality, and nonlinear dynamics, respectively.


I. INTRODUCTION
N EURAL Networks (NNs) are pervasive in many fields because of their ability to express highly general inputoutput relationships, such as for perception, planning, and control tasks in robotics. However, before deploying NNs on safety-critical systems, there must be techniques to guarantee that the closed-loop behavior of systems with NNs, which we call neural feedback loops (NFLs), will meet desired specifications. The goal of this paper is to develop a framework for guaranteeing that systems with NN controllers will reach their goal states while avoiding undesirable regions of the state space, as in Fig. 1.
Despite the importance of analyzing closed-loop behavior, much of the recent work on formal NN analysis has focused on NNs in isolation (e.g., for image classification) [1]-FIGURE 1: Forward Reachability Analysis. The objective is to compute blue sets R t (X 0 ), to ensure a system starting in X 0 (yellow) ends in G (green) and avoids A 0 , A 1 (red). This is challenging for systems with NN control policies. VOLUME X, XXXX 1 arXiv:2108.04140v2 [eess.SY] 2 Feb 2022 [6], with an emphasis on efficiently relaxing NN nonlinearities [7]- [13]. On the other hand, closed-loop system reachability has been studied for decades, but traditional approaches, such as Hamilton-Jacobi methods [14], [15], do not consider NNs in the loop.
Several recent works [16]- [21] propose methods that compute forward reachable sets of NFLs. A key challenge is in maintaining computational efficiency while still providing tight bounds on the reachable sets. In addition, large initial state sets present challenges as the underlying NN relaxations become loose, leading to loose reachable set estimates. The literature also typically assumes perfect knowledge of system dynamics, with no stochasticity. Furthermore, the literature focuses on forward reachability analysis (what states could the system end in, starting from a given X 0 ?), but backward reachability analysis (what states can the system start from, to end in a given X T ?) is an equally important problem in safety analysis. While forward and backward reachability differ only by a change of variables in classical systems [15], [22], both the nonlinearities and matrix dimensions of NN controllers raise fundamental challenges that prevent propagating sets backward through a NFL.
To address these challenges, this work's contributions include: • A linear programming-based formulation of forward reachability analysis for NFLs, which provides a computationally efficient method for verifying safety properties, • The use of input set partitioning techniques to provide tight bounds on the reachable sets despite large initial state sets, • The consideration of measurement noise, process noise, and nonlinear dynamics, which improves the applicability to real systems with uncertainty, and • A framework for backward reachability analysis of NFLs, which enables estimating which starting states will lead to a target set despite non-invertible NN weight matrices and activations.
Numerical experiments show that the new method provides 5× better accuracy in 150× less computation time compared to [21] and that the method can handle uncertainty sources, high dimensionality, and nonlinear dynamics via applications on quadrotor, 270-state, and polynomial systems. This article extends our conference paper [23] by providing an improved problem formulation, notation, and algorithmic details. It also includes: • A closed-form solution to provide an order-ofmagnitude computational speedup for the same bounds in §IV-E, • An extension of state-of-the-art partitioning techniques to the NFL setting in §V, • Formulations that handle nonlinearities (control limits and polynomial terms) in the dynamics in §VI, • A framework for backward reachability analysis in §VII. Open-source software implementations of this paper's algorithms and results can be found at https://github.com/ mit-acl/nn_robustness_analysis.

II. RELATED WORK
Related work on reachability analysis can be categorized into works on NNs in isolation, closed-loop systems without NNs, and closed-loop systems with NNs. For instance, machine learning literature includes many methods to verify properties of NNs, often motivated by defending against adversarial examples [24]. These methods broadly range from exact [25] to tight [13] to efficient [10] to fast [7]. Although these tools are not designed for closed-loop systems, the NN relaxations from [10] provide a key foundation here.
Recent reachability analysis approaches that do account for NN control policies face a tradeoff between computation time and conservatism. [16]- [18] use polynomial approximations of NNs to make the analysis tractable. Most works consider NNs with ReLU approximations, whereas [19] considers sigmoidal activations. [20], [33] introduce conservatism by assuming the NN controller could output its extreme values at every state. Most recently, [21] formulated the problem as a SDP, called Reach-SDP. This work builds on both [20], [21] and makes the latter more scalable by reformulating the SDP as a linear program, introduces sources of uncertainty in the closed-loop dynamics, and shows further improvements by partitioning the input set.

A. CLOSED-LOOP SYSTEM DYNAMICS
Consider a discrete-time linear time-varying system, where x t ∈ R nx , u t ∈ R nu , y t ∈ R ny are state, control, and output vectors, A t , B t , C t are known system matrices, c t ∈ R nx is a known exogenous input, and ω t ∼ Ω and ν t ∼ N are process and measurement noises sampled at each timestep from unknown distributions with known, finite support (i.e., ω t ∈ [ω t ,ω t ], ν t ∈ [ν t ,ν t ] element-wise).
We assume an output-feedback controller u t = π(y t ) parameterized by an m-layer feed-forward NN, optionally subject to control constraints, u t ∈ U t . We denote the closedloop system with dynamics (1) and control policy π as This NFL is visualized in Fig. 2. At each timestep, the state x enters the perception block and is perturbed by sensor noise ν to create observation y. The observation is fed into the learned policy, which is a NN that selects the control input u. The control input enters the plant, which is defined by A, B, c matrices and subject to process noise ω. The output of the plant is the state vector at the next timestep.

B. REACHABLE SETS
For the closed-loop system (2), we denote R t (X 0 ) the forward reachable set at time t from a given set of initial conditions X 0 ⊆ R nx , which is defined by the recursion

C. FINITE-TIME REACH-AVOID VERIFICATION PROBLEM
The finite-time reach-avoid property verification problem is defined as follows: Given a goal set G ⊆ R nx , a sequence of avoid sets A t ⊆ R nx , and a sequence of reachable set estimates R t ⊆ R nx , determining that every state in the final estimated reachable set will be in the goal set and any state in the estimated reachable sets will not enter an avoid set requires computing set intersections, VERIFIED(G, In the case of our nonlinear closed-loop system (2), where computing the reachable sets exactly is computationally intractable, we can instead compute outer-approximations of the reachable sets,R t (X 0 ) ⊇ R t (X 0 ). This is useful if the finite-time reach-avoid properties of the system as described by outer-approximations of the reachable sets are verified, because that implies the finite-time reach-avoid properties of the exact closed loop system are verified as well. Tight outerapproximations of the reachable sets are desirable, as they enable verification of tight goal and avoid set specifications, and they reduce the chances of verification being unsuccessful even if the exact system meets the specifications.

D. CONTROL POLICY NEURAL NETWORK STRUCTURE
Using notation from [10], for the m-layer neural network used in the control policy, the number of neurons in each layer is n l , ∀l ∈ [m], where [a] denotes the set {1, 2, . . . , a}. Let the l-th layer weight matrix be W (l) ∈ R n l ×n l−1 and bias vector be b (l) ∈ R n l , and let F l : R nx → R n l be the operator mapping from network input (measured output vector y t ) to layer l.
is the coordinate-wise activation function. The framework applies to general activations, including ReLU, σ(z) = max(0, z). The network input F 0 (y t ) = y t produces the (unclipped) control input,

E. NEURAL NETWORK ROBUSTNESS VERIFICATION
A key step in quickly computing reachable sets of the NFL (2) is to relax the nonlinear constraints induced by the NN's nonlinear activation functions. The relaxation converts each nonlinearity into a linear upper and lower bound, where each bound holds within the known range of inputs to the activation.
To represent the range of inputs to the activation functions, we first define the -ball. Denote the -ball (also called the p -ball) under the p norm, centered atx, with scalar radius, , The -ball is extended to the case of vector ∈ R n ≥0 (i.e., -ball), defined as where denotes element-wise division.
In a closed-loop system, Theorem III.1 bounds the control output for a particular measurement y. Moreover, if all that is known is y ∈ B p (ẙ, ), Theorem III.1 provides affine relationships between y and u (i.e., bounds valid within the known set of possible y). These relationships enable efficient calculation of NN output bounds, using Corollary 3.3 of [10].
We could leverage [10] to compute reachable sets by first bounding the possible controls, then bounding the next state set by applying the extreme controls from each state. This is roughly the approach in [20], [33], for example. VOLUME X, XXXX However, this introduces excessive conservatism, because both extremes of control would not be applied at every state (barring pathological examples). To produce tight bounds on the reachable sets, we leverage the relationship between measured output and control in §IV.

IV. FORWARD REACHABILITY ANALYSIS
Recall that our goal is to find the set of all possible next states, x t+1 ∈ X t+1 , given that the current state lies within a known set, x t ∈ X t . This will allow us to compute reachable sets recursively starting from an initial set X t = X 0 .
The approach follows the architecture in Fig. 3. After first relaxing the NN controller using Theorem III.1, we then associate linearized extreme controllers with extreme next states in §IV-B. Then, using the linearized extreme controller, we optimize over all states in the input set to find extreme next states in §IV-C. We extend the formulation to handle control limits in §VI-A, then describe how to convert the solutions of the optimization problems into reachable set descriptions in §IV-D.

A. ASSUMPTIONS & PROBLEM STATEMENT
This work assumes that X t is described by either: • an p -ball for some norm p ∈ [1, ∞], radius , and centroidx, s.t.
and shows how to compute X t+1 as described by either: • an ∞ -ball with radius and centroidx, s.t. X t+1 = B ∞ (x, ); or • a polytope for a specified A out ∈ R mout×nx , meaning we will compute b out ∈ R mout s.t. X t+1 = {x t ∈ R nx | A out x t ≤ b out }. We assume that either A out is provided (in the case of polytope output bounds), or that A out = I nx (in the case of ∞ output bounds). Note that we use i to index polytope facets, j to index the control vectors, and k to index the state vectors. In this section, we assume that U t = R nu (no control input constraints) for cleaner notation; this assumption is relaxed in §VI-A.
The exact 1-step closed-loop reachability problem is as follows. For each row i in A out , solve the following optimization problem, where (b out ) * defines the tightest description of X t+1 associated with A out . However, the nonlinearities in π make solving this problem intractable in practice, so this section describes a relaxation that provides bounds on (9).
Proof. For any particular measurement y t , after relaxing the NN according to Theorem III.1, let Π(y t ) = {π|π L j (y t ) ≤ π j (y t ) ≤ π U j (y t )∀j ∈ [n u ]} denote the set of possible effective control policies. Denote the control policy π U CL :,i ∈ Π(y t ) as one that induces the least upper bound on the i-th facet of the next state polytope, Thus for y t , The resulting control input ∀i ∈ [m t+1 ], j ∈ [n u ] is, Writing (19) in matrix form results in (10). The proof of the lower bound follows similarly. : Approach Overview for simple 1D system. Theorem III.1 relaxes the NN to give affine relationships between observation y t and control: π U , π L . Lemma IV.1 uses the system dynamics to associate π U , π L with the next state set. Lemma IV.2 optimizes the closed-loop dynamics over all states x t ∈ X t to compute bounds on the next state, γ U t+1 , γ L t+1 .

C. BOUNDS ON xT +1 FROM ANY xt ∈ Xt
Now that we can bound each facet of the next state polytope given a particular current state and observation, we can form bounds on the next state polytope facet given a set of possible current states. This is necessary to handle initial state set constraints and to compute "t > 1"-step reachable sets recursively as in (3). We assume x t ∈ X t .
Proof. Bound the next state polytope's i-th facet above, where (30) substitutes the definition of π U CL :,i from Lemma IV.1, (31) substitutes the observation from (2), (32) separates terms that depend on x t , and (33) introduces the worst-case realizations of process and measurement noise. Substituting M U i,: , n U i results in (20). The proof of the lower bound follows similarly.
The optimization problems in Eqs. (20) and (21) have convex cost with convex constraints x t ∈ X t (e.g., polytope X t ). We solve the linear programs (LPs) with cvxpy [34],

D. CONVERTING STATE CONSTRAINTS INTO REACHABLE SETS 1) Reachable Sets as ∞-balls
and letR 0 (X 0 ) = X 0 . Using the results of the previous sec-

2) Reachable Sets as Polytopes
Assume X 0 is an p -ball or polytope. Either define {p, ,x} Using the results of the previous section, use

VOLUME X, XXXX
In both cases,R t (X 0 ) ⊇ R t (X 0 )∀t ≥ 0, so theseR t can be used to verify the original closed loop system (2).

E. CLOSED-FORM SOLUTION
Rather than employing an LP solver as in (34), the optimization problem in (20) can be solved in closed-form when X t is described by an p -ball.
with denoting element-wise multiplication (Hadamard product). Recall that n U i does not depend on x t . From (40) to (41), we substitute x t := ζ +x t , to shift and rescale the observation to within the unit ball around zero, ζ ∈ B p (0, 1). The commutative property of the Hadamard product allows (41) to (42). The maximization in (42) is equivalent to a q -norm in (43) by the definition of the dual norm ||a|| q = {sup a T b : ||b|| p ≤ 1} and the fact that the q norm is the dual of the p norm for p, q ∈ [1, ∞) (with 1/p + 1/q = 1). The proof of the lower bound follows similarly.

F. ALGORITHM FOR COMPUTING FORWARD REACHABLE SETS
The proposed procedure for estimating forward reachable sets from X 0 for a neural feedback loop is provided in Algorithm 1 (the term "propagator" will be explained in §V). After initializing the zeroth forward reachable set as the initial set (Line 1), we recursively computeR t (X 0 ) forward in time (Line 2). For each timestep, we determine which observations the NN could receive, then pass those through CROWN [10] (or another NN relaxation method, provided it returns the corresponding terms). Depending on whether polytope facets or p -balls are used, the upper/lower bounds are computed facet-by-facet or state-by-state. Then, the γ terms are converted into set representations as in §IV-D. After following this procedure for each timestep, the algorithm returns a sequence of outer-approximations of the forward reachable sets (Line 12).

Algorithm 1 Closed-Loop CROWN Propagator
Input: initial state set X 0 , trained NN control policy π, dynamics f , time horizon τ Output: forward reachable set approximationsR 1:τ (39) 8: end for // Polytope Reachable Sets from (36) 11: end for 12: returnR 1:τ (X 0 ) FIGURE 4: System Architecture. A closed-loop propagator uses the trained NN control policy and dynamics to estimate reachable sets, and a closed-loop partitioner decides how to split the initial state set into pieces. This is the closed-loop extension of the architecture from our prior work [37].

V. TIGHTER REACHABLE SETS BY PARTITIONING THE INITIAL STATE SET
NN relaxation methods can be improved by partitioning the input set [36], particularly when the input set is large but of low dimension. Here, we achieve tighter bounds by splitting X 0 into subsets, computing N -step reachable sets for each of the subsets separately, then returning the union of all reachable sets from each subset.
Recall that [37] introduced an architecture composed of a partitioner and propagator for analyzing NNs in isolation. The partitioner is an algorithm to split the NN input set in an intelligent way (e.g., uniform gridding [36], simulation guidance [20], greedy simulation guidance [37]), and the propagator is an algorithm to estimate bounds on the NN outputs given a NN input set (e.g., IBP [7], Fast-Lin [8], CROWN [10], SDP [13]). The partitioner can query the propagator repeatedly to refine the estimated output set bounds. The choice and design of partitioners and propagators has important implications on the bound tightness and computational runtime.

Algorithm 2 Closed-Loop Uniform Partitioner
Input: initial state set X 0 , trained NN control policy π, dynamics f , time horizon τ , number of partition cells r Output: forward reachable set approximationsR 1:τ This work extends the framework from [37] to closed-loop systems, as visualized in Fig. 4. In particular, CL-CROWN and Reach-SDP represent closed-loop propagators (given an initial state set, they compute reachable sets for a trained NN control policy and closed-loop dynamics), and in this section we extend partitioners discussed in [37] to be closedloop partitioners. Altogether, we call the nested architecture a closed-loop analyzer.

A. CLOSED-LOOP PARTITIONERS
The simplest closed-loop partitioner splits the input set into a uniform grid, as described in Algorithm 2. After splitting X 0 into Π nx k=0 r k cells (Line 2), each cell of X 0 is passed through a closed-loop propagator, CLProp (e.g., CL-CROWN (Algorithm 1) or Reach-SDP [21]). For each timestep, the returned estimate of the reachable set for all of X 0 is thus the union of all reachable sets for cells of the X 0 partition.
There are several reasons to use an iterative partitioner rather than the uniform partitioner presented before, as discussed in [20], [37]. Thus, the closed-loop variant of the greedy simulation-guided partitioner from [37] is described in Algorithm 3. After acquiring N Monte Carlo samples from X 0 (Line 1), those sampled points are simulated forward in time according to the dynamics and control policy (Line 2). The extrema of these samples can be used to define the sampled reachable sets at each timestep (Line 3). Then, the full initial state set X 0 is passed through a closed-loop propagator (Line 4) and the input and reachable set estimates are added to a stack, M .
An iterative process is used to refine the reachable set estimates from the full X 0 . The element with largest distance from the sampled reachable set estimate is popped from M (e.g., use the same idea from Fig. 3a of [37] on the final timestep's reachable set). If the chosen element's reachable set is within the sampled reachable set for all timesteps, there is no need to further refine that cell and its reachable set can be added to set that will eventually be returned. Otherwise, the input set of that element is bisected (Line 11) and each of the two input sets are passed through CLProp and added to M . Termination conditions, such as an empty M or number of CLProp calls or a timeout can be implemented depending Algorithm 3 Closed-Loop Greedy Sim-Guided Partitioner Input: initial state set X 0 , number of MC samples N , trained NN control policy π, dynamics f , time horizon τ Output: forward reachable set approximationsR 1:τ (X 0 ) . . , τ } then 9:R t ←R t ∪ R t ∀t ∈ {1, . . . , τ } 10: else 11: . . , τ } 18: returnR 1:τ (X 0 ) FIGURE 5: Control Limits. The NN control policy (red) is bounded by π U , π L from CROWN (blue), but none of these respect the control limits (green). While adding a clip could be non-convex (right block), §VI-A borrows the idea from [21] to append a few layers to the NN to handle control limits. on the application. Finally, the remaining elements of the stack are added to the set that will be returned.

A. ACCOUNTING FOR CONTROL LIMITS, UT
The key terms in Lemma IV.1 can be modified to account for control input constraints, as π U CL :,i (y t ) = Proj Ut (Υ i,:,: y t + Γ :,i )

VOLUME X, XXXX
A common example is box control input constraints. The element-wise control input is, where clip saturates the control if it exceeds the limits. However, clip could be non-convex depending on the domain of x t (and violates DCP rules in cvxpy [34] regardless), as visualized in Fig. 5. We instead utilize the insights from [21] that clip looks like two ReLUs stitched together (one is inverted). In particular, we make the following changes to the NN: All of the preceding results apply to this modified NN for a system with box control input constraints. Note that adding these ReLUs does not limit which types of activations can be used in other layers of the NN.
While these additional layers exactly mimic clip for any particular y t , they can induce additional conservatism because the additional ReLUs must be relaxed as well. Additionally, while CROWN provides several options for the slope of the lower bound for a ReLU relaxation, note that the two additional ReLUs should be relaxed with a zero-slope lower bound to ensure no values exceed the control limits.

B. EXTENSION TO POLYNOMIAL DYNAMICS
This section extends the above work to systems with polynomial dynamics. Consider the system with dynamics wheref (x) : R n → R n is a polynomial and the other terms are defined in the same way as in (1). As a result, the counterpart of (20) and (21) in Lemma IV.2 are respectively as follows where f j (x t ) = A out i,:f (x t ). Because of the polynomial term f j (x t ) in (48) and (49), these objectives can be nonconvex and NP-Hard to solve for global optimality [38]. Thus, we adopt a principled convex relaxation technique [39] to compute a guaranteed upper and lower bound for (48) and (49), respectively.
We first transform the polynomial term f (x t ) to a linear or quadratic term by introducing new variables s ∈ R ns and new quadratic constraints. For example, f (x t ) = x t;1 x t;2 x t;3 x t;4 is equivalent to f (s) = s 1 s 2 , with s 1 = x t;1 x t;2 and s 2 = x t;3 x t;4 . Then (48) can be transformed as a quadratically constrained quadratic programming (QCQP) in the following where the subscripts are dropped here for clarity,s = [x T , s T ] T ∈ R n+ns , Q 0 ∈ S n+ns , b 0 ∈ R n+ns and Q = {s |s T Q is + b T is + c i = 0, Q i ∈ S n+ns , b i ∈ R n+ns , c i ∈ R, ∀i = 1, . . .} is formulated by a sequence of quadratic constraints. For QCQP, the semidefinite relaxation can give the tightest bounds among existing convex relaxation techniques [39]. The semidefinite relaxation of (50) can be formulated as (50) if an extra nonconvex constraint rank(S) = 1 is added. As a result, solving the convex program (51) will yield a guaranteed upper bound for (48). The lower bound for (49) follows similarly.

VII. BACKWARD REACHABILITY ANALYSIS
So far, this paper has considered the forward reachability problem: starting from a set X 0 , what are all the possible states the system could occupy at time t? In this section, we discuss the opposite problem of backward reachability: given a target set X T , from which states will the system end up in the target set? Depending on whether the initial or target state sets are known a priori and/or change frequently, one of these two paradigms would be a better fit for a given application. Furthermore, the target set could represent a goal set or avoid/collision set -either way, this section provides a framework for guaranteeing the system will reach/avoid the target set.
Traditionally, it is straightforward to switch between forward and backward reachability analysis through a change of variables [15], [22]. However, including a NN in the analysis adds new challenges. In particular, a key challenge with propagating sets backward through NNs is that common NN activations have finite range (e.g., ReLU(x) = 0 could correspond to any x ≤ 0), which causes the sets to quickly explode to ∞. Moreover, even using an infinite range activation (e.g., Leaky ReLU), the weight matrices may not be invertible (e.g., singular, insufficient rank). Our approach is able to avoid these issues and still leverage forward propagation tools (e.g., CROWN [10]) despite thinking backward in time.
Recent related work has developed specific invertible-bydesign NN architectures [40] and modifications to the training procedure to regularize for invertibility [41]. In contrast, our approach can be applied to any NN for which an affine relaxation can be computed, i.e., the same broad class of NN architectures (e.g., feedforward NNs with general activation ∀u ∈ [π L (y −1 ), π U (y −1 )] FIGURE 6: Backreachable, backprojection, and target sets. Given a target set, X T , the backreachable set R B −1 (X T ) contains all states for which some control exists to move the system to X T in one timestep. The backprojection set, P −1 (X T ) contains all states for which the NN controller leads the system to X T . Inner-approximationP −1 (X T ) contains all states for which all controls that the relaxed NN could apply lead the system to X T . Fig. 6 illustrates the important concepts for this analysis. Three different sets (left) contain states that lead to the target set X T under different control inputs, as defined below. The nomenclature of these sets is motivated by [45], but with slightly different definitions.

A. BACKREACHABLE & BACKPROJECTION SETS
The first of these sets is the backward reachable set, 0 (X T ) = X T , which consists of states x t for which some admissible control trajectory, u t:T , takes the system to X T . To simplify the notation, we assume no noise (i.e.,ω t =ω t =ν t =ν t = 0) and perfect observations, C T t = I ⇒ y t = x t in this section. Backward reachable sets are particularly useful when the controller is truly a "black box" or has not yet been defined (since the definition only uses the fact that u ∈ U, rather than π).
In the case of a trained NN control policy, we can be more precise about the closed loop system's safety properties. In particular, for the closed-loop system f from (2), we denote P −t (X T ) the backprojection set, which contains all states that will lead to a given target set X T ⊆ R nx in exactly t timesteps, as defined by the recursion Recall that in §III-C, we computed over-approximations of the reachable sets because the NN makes exact reachability analysis computationally challenging. Analogously, we will compute under-approximations of the backprojection sets, P −t (X T ) ⊆ P −t (X T ). Under-approximations are still useful because ifP −t (X T ) ⊇ X 0 , then P −t (X T ) ⊇ X 0 as well, meaning for every state in X 0 , the control policy π is guaranteed to lead the system to X T in T timesteps. Thus, it is desirable forP −t (X T ) to be as close to P −t (X T ) as possible.

B. DERIVING BACKPROJECTION SETS
We first provide a summary of the proposed approach: 1) Ignoring the NN, find (hyper-)rectangular bounds on the backreachable set (this is a superset of the backprojection set associated with the particular NN controller) 2) Within this region, relax the NN controller to acquire upper/lower affine bounds on control 3) Compute the states that will lead to the target set for every control effort within the upper/lower bounds This last step provides an under-approximation of the backprojection set, which is the set of interest.
The following lemma provides polytope bounds on P t (X T ) for a single timestep, which is the key component of the recursive algorithm introduced in the next section.
Lemma VII.1. Given an m-layer NN control policy π : R ny → R nu , closed-loop dynamics f : R nx × Π → R nx as in Eqs. (1) and (2), and bounds on the state set,x t+1 ≤ x t+1 ≤x t+1 , the following polytope bounds on the previous state set hold:  where Z BΦΨ , Z BΨΦ ∈ R nx×ny and z Bβα , z Bαβ ∈ R nx are defined ∀k ∈ [n x ] (by row/element), Proof. Given dynamics from Eqs. (1) and (2), solve the following optimization problems for each state k ∈ [n x ], which provides a (hyper-)rectangular outer bound (x t ≤ x t ≤x t ) on the backreachable set. Note that this is a LP for convex X t+1 , U t as we will use here. Given that bound,x t ≤ x t ≤x t , Theorem III.1 provides Ψ, Φ, α, β which lead to the following inequalities based on Lemma IV.1 with A out = I nx , ∀k ∈ [n x ], After substituting in for y t , grouping x t terms together, and using Eqs. (55) to (58), To ensure no NN control pushes the system into a state beyond [x t+1 ,x t+1 ], the following inequalities hold, Written in matrix form, To arrive at (54), appendx t ≤ x t ≤x t to (69).
Corollary VII.2. The setP −1 (X T ) composed of all x t that satisfy (54) is a subset of the one-step backprojection set, P −1 (X T ).

C. ALGORITHM FOR COMPUTING BACKPROJECTION SETS
Lemma VII.1 provides a formula for computingP −1 (X T ), i.e., an under-approximation of the backprojection set for a single timestep. This section extends that idea to enable com-putingP −τ :0 (X T ) for a time horizon τ and also describes how partitioning can help improve the results, particularly when (54) is the empty set. The proposed procedure is summarized in Algorithm 4.
After initializing the zeroth backprojection set as the target set (Line 1), we recursively computeP t (X T ) backward in time (Line 2). For each timestep, we first compute an overapproximation on the backreachable set,R B t (X T ), by solving the LPs in Eqs. (59) to (61) and Eqs. (62) to (64) (Line 4).
Since R B t (X T ) ⊇ P t (X T ), we can relax the NN over R B t (X T ), and that relaxation will also hold for all states in P t (X T ) andP t (X T ). However, R B t (X T ) could be large, which could lead to a loose NN relaxation. Thus, instead of

Algorithm 4 Backprojection Set Estimation
Input: target state set X T , trained NN control policy π, time horizon τ , partition parameter r Output: backprojection set approximationsP −τ :0 (X T ) 1: end for 13: end for 14: returnP −τ :0 (X T ) analyzing R B t (X T ) as a single set, we uniformly partition the set R B t (X T ) into r ∈ N nx cells (defined per dimension). For each cell in the partition, we have a (hyper-)rectangular set of states, [x t ,x t ] i , that could be passed as inputs to the NN controller. The NN can be relaxed over this set using CROWN (Theorem III.1) or a similar type of algorithm. In particular, we extract Ψ, Φ, α, β from CROWN, which define P and p, the constraints in (54) (Lines 8 and 9). These constraints define a polytope of states that are guaranteed to lead toP t+1 (X T ) at the next timestep (for any control within CROWN's bounds), so those states should be added toP t (X T ) (Line 11). Note that it is possible, despite partitioning, that A = ∅. After following this procedure for each timestep, the algorithm returns a sequence of innerapproximations of the backprojection sets (Line 14).

VIII. EXPERIMENTAL RESULTS
This section demonstrates our convex reachability analysis tool, Reach-LP, on simulated scenarios. We first show an example verification task and quantify the improvement in runtime vs. bound tightness over the state-of-the-art [21] for a double integrator system. We then apply the algorithm on a 6D quadrotor model subject to multiple sources of noise.

A. DOUBLE INTEGRATOR
Consider the LTI double integrator system from [21], Integrator. The first two methods analyze the NN and dynamics separately, leading to high error accumulation, while the next 4 methods analyze the NN and dynamics together. Reach-LP is 4, 000× faster to compute but 7× looser than Reach-SDP [21]. Reach-LP-Partition refines the Reach-LP bounds by splitting the input set into 16 subsets, giving 150× faster computation time and 5× tighter bounds than Reach-SDP [21].
(a) Reachable Set Estimates (b) Over-approximation error with c t = 0, C t = I 2 and no noise, discretized with sampling time t s = 1s. As in [21], we implemented a linear MPC with prediction horizon N M P C = 10, weighting matrices Q = I 2 , R = 1, and terminal weighting matrix P ∞ synthesized from the discrete-time Riccati equation, subject to state constraints A C = [−5, 5] × [−1, 1] and input constraint u t ∈ [−1, 1]∀t. We used MPC to generate 2420 samples of state and input pairs then trained a NN with Keras [46] for 20 epochs with batch size 32. Table 1 and Fig. 7 compare several algorithms on the double integrator system using a NN with [5,5] neurons and ReLU activations 1 . The key takeaway is that Reach-LP-Partition provides a 5× improvement in reachable set tightness over the prior state-of-the-art, Reach-SDP [21] (which does not use input set partitioning), while requiring 150× less computation time. We implemented Reach-SDP in Python with cvxpy and MOSEK [47]. All computation times (until §VIII-I) are reported from an i7-6700K CPU with 16GB RAM on Ubuntu 20.04. We also note the large error accumulation that occurs when the NN and dynamics are analyzed separately, as in [20], [33], as that assumes the NN could output its extreme values from any state in X t (top 2 rows of Table 1). CL-CROWN is a simple baseline where, at each timestep, the NN output bounds U NN t are computed with CROWN [10], then (9) is solved (in closed-form) assuming u t ∈ U NN t . Similarly, CL-SG-IBP is the method from [20], where U NN t is instead computed with SimGuided-IBP to resolution ε = 0.1. Fig. 7a shows sampled trajectories, where each colored cluster of points represents sampled reachable states at a particular timestep (blue→orange→green, etc.). Recall that sampling trajectories could miss possible reachable states, whereas these algorithms are guaranteed to over-approximate the reachable sets. Reachable set bounds are visualized for various algorithms: Reach-SDP [21], Reach-LP, and those two algorithms after partitioning the input set into 16 cells.

B. COMPARISON WITH BASELINE
The key takeaway is that while all approaches provide outer bounds on the sampled trajectories, the algorithms provide various degrees of tightness to the sampled points.
We quantify tightness as the ratio of areas between the smallest axis-aligned bounding box on the sampled points and the provided reachable set (minus 1), shown in Fig. 7b as the system progresses forward in time. Note that as expected, all algorithms get worse as the number of timesteps increase, but that Reach-LP-Partition and Reach-SDP-Partition perform the best and similarly. This provides numerical comparisons of the rectangle sizes from Fig. 7a.
Note that both Reach-LP and Reach-SDP methods could be improved by properly choosing the direction of polytope facets. Additionally, while Reach-SDP can provide ellipsoidal bounds given the quadratic nature of the formulation, we implement only the polytope bounds in this comparison.

C. VERIFICATION
A primary application of reachable sets is to verify reachavoid properties. In Fig. 7a, we consider a case with an avoid set A = {x | x 1 ≥ 0.35} (orange) and a goal set G = [−0.5, 0.5]×[−0.25, 0.25] (cyan). Reach-LP and Reach-SDP-Partition, verify these properties for this 5-step scenario, highlighting the importance of tight reachable sets.

D. SCALABILITY TO DEEP NNS
To demonstrate the scalability of the method, we trained NNs with 1-10 hidden layers of 5 neurons and report the average runtime of 5 trials of reachability analysis of the double integrator system. In Fig. 8a, while Reach-SDP appears to grow exponentially (taking > 800s for a 10-layer NN), our proposed Reach-LP methods remain very efficient (< 0.75s for Reach-LP on all NNs). Note that we omit Reach-SDP-Partition (∼ 16× more than Reach-SDP) from this plot to maintain reasonable scale. Recall that CROWN itself has O(m 2 n 3 ) time complexity [10], for an m-layer network with n neurons per layer and n outputs.

E. ABLATION STUDY: ∞ VS. POLYTOPES
Recall that §IV described reachable sets as either polytopes or ∞ -balls. Fig. 8b shows the effect of that choice: as the number of sides of the polytope increases, the reachable set VOLUME X, XXXX size decreases. The tradeoff is that the computation time scales linearly with the number of sides on the polytope. Note that a ∞ -ball is a 4-polytope, and that X 0 was chosen to show a different scenario than Fig. 7.

F. ABLATION STUDY: CLOSED-FORM VS. LP
The computational benefit of the closed-form solution in Lemma IV.3 is shown in Table 2. For both the double integrator (a) and 6D quadrotor (b), the runtime of the closedform (C.F.) solution is over an order of magnitude faster than with the LP solver. This speedup is observed across various partition resolutions, with times reported as the mean ± one standard deviation of 5 repeats. Note that the C.F. and LP solvers return the same reachable sets, so there is no tightness tradeoff -solely a computational speedup. Fig. 9 shows the refinement of reachable sets under the Closed-Loop Greedy Sim-Guided partitioner and CL-CROWN propagator. The blue reachable set estimates become much tighter to the Monte Carlo samples, while still providing a guaranteed over-approximation of the true sys-tem's forward reachable sets. Note that the partition of X 0 is not uniform, as seen most clearly in Fig. 9e.

H. 6D QUADROTOR WITH NOISE
Consider the 6D nonlinear quadrotor from [21], [48], which differs from [21], [48] in that we add ω t as a uniform process noise, and that the output is measured as in (1) with C t = I 6 , subject to uniform sensor noise. As in [21], the state vector contains 3D positions and velocities, [p x , p y , p z , v x , v y , v z ], while nonlinearities from [48] are absorbed into the control as functions of θ (pitch), φ (roll), and τ (thrust) (subject to the same actuator constraints as [21]). We implemented a similar nonlinear MPC as [21] in MATLAB to collect (x t , u t ) training pairs, then trained a [32,32] NN with Keras as above. We use Euler integration to account for (71) in our discrete time formulation. Fig. 10 shows the reachable sets with and without noise. Note that while these plots only show (x, y, z) position, the reachable sets are estimated in all 6D. The first key takeaway is that the blue boxes (Reach-LP with ∞ -balls) provide meaningful bounds for a long horizon (12 steps, 1.2s shown). Secondly, unlike Reach-SDP, Reach-LP is guaranteed to bound worst-case noise realizations. Computing these bounds took a total of 0.11 ± 0.003 seconds.

I. HIGH-DIMENSIONAL SYSTEM
To demonstrate that Reach-LP scales to high-dimensional systems, Fig. 11 uses the International Space Station (ISS) Component 1R model 2 , which has n x = 270 and n u = 3. After training a NN in the same manner as §VIII-A, reachable sets for all 270 states were computed, with only (x 0 , x 1 ) shown. The average time for each reachability analysis is about 25.34 sec on a desktop computer with i7-9700K CPU@3.60GHz (8 cores) and 32 GB RAM.

J. POLYNOMIAL DYNAMICS
To show the SDP-based method from §VI-B on a system with polynomial dynamics, we consider the Duffing oscillator with the following dynamicṡ where x ∈ R 2 and u ∈ R are the state and control inputs, respectively, and ζ = 0.3 is the damping coefficient. Furthermore, c t = 0, C t = I 2 , zero noise is considered, and  Only (x, y, z) states are shown, even though the reachable sets are computed in 6D. Blue boxes (Reach-LP) bound the clusters of sampled points at each discrete timestep, starting from the blue X 0 . It took 0.11 sec to compute the 12 reachable sets per scenario. In (b), ν ∼ Unif(±0.001 · 1 6 ), ω ∼ Unif(±0.005 · 1 6 ).
the dynamics are discretized with sampling time t s = 0.3s. We fit a neural network controller described in [49]. Applying the techniques introduced in §VI-B, the reachable sets are illustrated in Fig. 13. The estimated reachable sets contain the sampled states, but due to the relaxation, the bounds of the reachable sets are relatively loose compared to the linear systems that did not require relaxations for the dynamics. Reach-LP scales to this high-dimensional system.

K. BACKWARD REACHABILITY
For the double integrator system and NN described in §VIII-A, Fig. 12 shows one-step backprojection sets for a given target set, X T . Recall that this NN is not invertible, both because it uses ReLU activations and some weight matrix dimensions prevent inversion.   : Reachable Sets for the Duffing oscillator. Starting from X 0 (gray rectangle, blue samples), the Reach-LP extension from §VI-B is used to estimate reachable sets for 6 timesteps into the future, despite the nonlinear (polynomial) terms in the dynamics. estimated backprojection set of each cell is computed and visualized as a red polytope. Thus, the estimated backprojection set of X T is the union of all red polytopes. In practice, if X T represents a goal set, one could check that the system's starting state is inside one of the gray polytopes before "pressing go" on the system. To demonstrate that points inside the backprojection set will move the system to the target set under the NN control policy, we uniformly sampled points in each polytope of the estimated backprojection set and simulated the system forward one step (using the NN control policy). In Fig. 12, the blue points represent these samples at timestep T −1, and the orange points are the resulting system state at timestep T . All of the orange points lie within the green rectangle of X T , which suggests the backprojection sets were computed properly.
Note that in (a), the under-approximation of the backprojection set has some clear gaps. There are a few reasons for this, but the key issue is that A = ∅ (from Algorithm 4) for some of the cells of the X T partition. What causes A = ∅? If X T is large, the backward reachable set could be as well. Thus, even a partitioned X T could lead to a large backward reachable set partition, which is used as the NN input set for CROWN. If the input set to CROWN is large, the NN relaxations can be relatively loose. Loose bounds on the NN restrict the size ofP, sinceP only includes states for which that entire range of relaxed control inputs will lead to X T (see bottom arrow of Fig. 6). To get around this issue, (b)-(d) repeat the analysis with finer partitions, which substantially improves the coverage of samples across X T . Future work could investigate better methods for partitioning X T (e.g., using simulation as guidance [20], [37]).

IX. FUTURE DIRECTIONS
Many open research directions remain in analyzing NFLs. We first describe several natural extensions of this work. One existing challenge is in mitigating the conservatism due to the accumulation of approximation error over many timesteps. This could be addressed, for example, by replacing the recursive calculations with a method for computing nstep reachable sets at once. In addition, capturing other types of uncertainties and nonlinearities (e.g., uncertainty in A t and B t matrices) will expand the set of neural feedback loops that can be analyzed. Continuous time systems, which likely have hybrid dynamics due to the NN control policy, present additional opportunities for expanding the framework beyond the Euler integration approach described here.
More broadly, an open challenge is in synthesizing provably robust control policies. While this work focused on analyzing NFLs with a pre-trained NN, future work could explore modifications to the training process to enable faster or tighter online analysis. Moreover, bringing similar ideas to stability analysis could provide additional notions of robustness beyond reachability.
Finally, the adoption of these analysis methods on real safety-critical systems will require realistic measurement/perception models, as many modern systems use highdimensional sensors (e.g., camera, lidar), which are often fed directly into perception NNs (e.g., for object detection or segmentation). The effects of uncertainty propagating through these modular pipelines presents new challenges before such systems are ready to be deployed in safety-critical settings, such as robots operating alongside humans.

X. CONCLUSION
This paper proposed a convex relaxation-based algorithm for computing forward reachable sets and backprojection sets of NFLs, which are closed-loop systems with NN controllers. Prior work is limited to shallow NNs and is computationally intensive, which limits applicability to real systems. Furthermore, our method accounts for measurement of sensor and process noise, as well as nonlinearities in the dynamics. The results show that this work advances the state-of-the-art in guaranteeing properties of systems that employ NNs in the feedback loop. GOLNAZ HABIBI is currently a Research Scientist with the Department of Aeronautics and Astronautics at MIT. Golnaz received her Bsc in electrical and control engineering from K.N.Toosi University of Technology, Iran, in 2005, her Msc. in control engineering from Tarbiat Modares University, Iran, in 2007, and her PhD in computer science from Rice University, in 2015. Golnaz is broadly interested in robotics, control systems, machine learning, and multi agent systems. Her current research focuses on visual navigation, reliable communication, and improving the safety and reliability of autonomous agents. Her paper has been nominated for best student paper award in DARS 2012 and she received the K2I award by Chevron Co. in 2013. He is the Director of the Ford-MIT Alliance and was a member of the USAF Scientific Advisory Board (SAB) from 2014-17. His research focuses on robust planning and learning under uncertainty with an emphasis on multiagent systems, and he was the planning and control lead for the MIT DARPA Urban Challenge team. His work has been recognized with multiple awards, including the 2020 AIAA Intelligent Systems Award. He is a Fellow of IEEE and AIAA.