Verifying Controllers with Convolutional Neural Network-based Perception: A Case for Intelligible, Safe, and Precise Abstractions

Convolutional Neural Networks (CNN) for object detection, lane detection, and segmentation now sit at the head of most autonomy pipelines, and yet, their safety analysis remains an important challenge. Formal analysis of perception models is fundamentally difficult because their correctness is hard if not impossible to specify. We present a technique for inferring intelligible and safe abstractions for perception models from system-level safety requirements, data, and program analysis of the modules that are downstream from perception. The technique can help tradeoff safety, size, and precision, in creating abstractions and the subsequent verification. We apply the method to two significant case studies based on high-fidelity simulations (a) a vision-based lane keeping controller for an autonomous vehicle and (b) a controller for an agricultural robot. We show how the generated abstractions can be composed with the downstream modules and then the resulting abstract system can be verified using program analysis tools like CBMC. Detailed evaluations of the impacts of size, safety requirements, and the environmental parameters (e.g., lighting, road surface, plant type) on the precision of the generated abstractions suggest that the approach can help guide the search for corner cases and safe operating envelops.

central for decision and control. Not coincidentally, research on formally verifying isolated neural networks (NN), has received a degree of attention [6,15,26,30,46,51]. System-level safety assurance brings a somewhat different set of opportunities and challenges and has been addressed in a small number of recent efforts [12,13,31].
Even though writing formal requirements for CNN perception models may be ill-posed [48], safety requirements for autonomous systems using those very models, is usually fairly obvious. A "lane" may be difficult to specify in terms of pixel intensity thresholds, but the safety requirements of a lane keeping control system are less mysterious. It is well-known that CNNs have fragile decision boundaries and are susceptible to adversarial inputs [50]. Since the existing NN verification tools verify properties around a small neighborhood of the input space of the NN, presence of adversarial inputs make the NN verification results conservative. System-level safety analysis, on the other hand, has to deal with the temporal evolution of the whole system (including the NNs), and therefore, could benefit from the smoothness of the physical world, and be more robust and precise by overlooking the occasional mis-classifications. Such robustness has indeed been empirically observed [36].
In this paper, we propose a safety assurance technique of control systems that use CNN models for perception. Since completely formal specification and verification of such CNN models is hard, if not impossible, our method creates abstractions of the CNN-based perception subsystem. An abstraction (or over-approximation) of a CNN perception subsystem is obviously useful for safety analysis-if the abstract system obtained by replacing with can be verified safe, then we can infer safety of the original concrete system. However, there are two barriers to this approach.
First, the problem of verification of the CNN is now shifted to the problem of proving that is an abstraction of . For the same reasons mentioned above, we will not attempt to solve this problem formally. Instead, we aim to provide statistical evidence about the precision of the abstraction.
Second, the abstraction should not only prove safety of the system, but also it should be intelligible. Safety assurances should not only come from tests, proofs, processes, and verification artifacts showing that the system is correct, but also from explications on why it is so [1,5]. We agree with this sentiment and will aim to create abstractions that are intelligible by human designers, testers, and auditors.
These three axes-safety, intelligibility, and precision-define a space for exploring different safety assurance methodologies for autonomous systems. In this paper, we present a particular method that constructs property-guided, piece-wise affine abstractions. To our knowledge, this is the first abstraction-based approach to verify control systems that use CNNs for perception. We use a piece-wise affine template for : Suppose the ground truth perception input to the control system is m * (x, e), for a given state x and a set of environmental parameters e. These parameters could include lighting conditions, road surface, weather conditions, etc. Then (x, e) will be a set-valued function, where the center (mean or bias) of the each set is a piece-wise affine function A (m * (x, e)) + b of the ground truth m * (x, e). We need not know this ground truth function m * or its precise dependence on the environmental parameters, however, we can infer the linear model using regression on the data generated by running the CNN model with different x and e inputs. In the case of synthetic data generated using a simulator, as we do in our experiments, we can also label the (x, e) data with the corresponding ground truth value. This improves the precision of , by indirectly reducing the error with respect to the quantity that the CNN-based perception subsystem is trained to estimate, namely, the ground truth.
While the center (mean) of the set m * (x, e) is defined by training data, the size and shape of the set (variance) is inferred from safety. Assume that the control system is safe with respect to a given unsafe set , when it uses perfect perception. Using program analysis tools like CBMC [11] and IKOS [9] on the code for the controller, we infer the set of unsafe outputs from for any x. Then, the set-valued output from (x, e) is determined to be the largest set, centered at A (m * (x, e)) + b , that keeps the system safe. The computation of this largest set is an optimization problem.
Our method produces intelligible abstractions. The resulting output abstraction is a piece-wise affine set-valued function of the actual variable that the perception system is trying to estimate. The abstraction is guaranteed to be safe relative to the given property . We check this using CBMC by plugging-in into the downstream modules of the control system. This safetyfirst approach can be precision-challenged under some conditions. In our experiments with two end-to-end autonomous systemsvision-based lane keeping for an electric vehicle and a vision-based corn row scouting robot-we evaluated the abstractions from large variations of environments such as roads with varying numbers of lanes, lighting conditions, different types of crops and fields, generated with high fidelity simulator Gazebo. We observe that in certain parts of the state space, the computed safe abstractions are not able to match the original perception subsystem very accurately due to the strong inductive invariant that we used as the requirement. While for some other conditions, the precision of the abstraction can be over 90% match (explained in detail in Section 4.5) with finer abstractions, narrower environmental variations, and alternative inductive requirements.
We believe that the trifecta of safety, intelligibility, and precision provides a useful framework for constructing abstraction and verifying autonomous systems. In addition to providing assurances for safety requirements, the notion of precision of abstractions developed here can shed light on parts of the state space and environment where the CNN-based perception system is fragile, and likely to violate requirements. These quantitative insights can design perception and control subsystems, as well as inform the definition of the system-level operating design domains (ODDs) [33].
In summary, our contributions of this work are: • Formalization of an intelligible, safe, and precise abstractions for CNNs as a piece-wise affine set-valued function . • Approach to find through a combination of linear regression and constrained optimization. • Demonstration that can be composed with the downstream modules and verified using existing tools.
• Quantitative evaluation of the precision of with respect to the original CNN based perception subsystem , and the interpretation of its dependence on various factors.

RELATED WORKS
XAI. The explainable AI (XAI) and interpretable machine learning (IML) research areas have been explosively growing over the past five years. Figure 1 of [38] suggests that in 2020 there were 400+ publications related to interpretability alone. The survey articles [1,8,38] provide systematic overview of the terminologies and the available techniques for different types of AI models for text, image, and tables. Some of the prominent techniques rely on the notions of feature importance [3], Shapley values [27], and counterfactual explanations [34]. Our piece-wise affine abstractions for CNNs is a natural interpretable model, but we have not seen this used in the literature so far.
Using the terminology of the XAI literature, our method provides model-agnostic, global interpretations of image-based AI models. The notion of interpretability itself has differing definitions in the literature. Our interpretations help with transparency, or equivalently intelligibility, in that, they help a human to understand the functioning of the black-box CNN. Our interpretations and methods are model agnostic because they do not rely on the internal structure or workings of the CNN models, instead, they only require input-output data. Also, out interpretations are global in the sense that they provide interpretations for the entire domain.
Analysis of closed loop systems with NNs. The closest related works are VerifAI [13] and a recent work [31]. VerifAI [13] and related publications provide a comprehensive framework to analyze a closed loop system with ML-based perception components. They focus on the falsification of the system specification, data augmentation, and redesigning the neural network. Their techniques include fuzz testing, simulation, counterexample guided data augmentation, syntheses of hyper-parameters and model parameters. Our work provides a safe abstraction, and therefore complements the falsification approaches of VerifAI.
Our work is similar in spirit to the white paper [42] and the work reported in [31] 1 , in that, they propose using abstraction/contracts for perception components. In [31], the authors train generative adversarial networks (GANs) to produce a simpler neural network that abstracts away image sensing and image based perception. The simpler network directly transforms states and environment parameters to estimates similar to our abstraction . In comparison, our work provides an intelligible set-valued function instead of a simpler network.
In addition, [12] considers synthesizing robust perception based controller. Plenty of recent works focus on verification [24,51], reachability analysis [16,17,19,23,26], statistical model checking [20], and synthesis [25] on neural feedback systems with neural network controllers. [28,29] specifically focus on developing and verifying a neural network replacement for ACAS-Xu collision avoidance decision tables. Such controller NNs are typically very small compared to perception CNNs. Our work is the first to provide intelligible abstractions for CNN-based perception and provides safety guarantees for the closed loop system.
Isolated neural network verification. [39] evaluates the safety assurance of ML-based Perception and Control in UAS via simulation and discusses the challenges of integrating existing neural network verification tools for the system level analysis. The authors point out that the neural network used for perception in small UAS is larger than any previously analyzed NN in the competition [6]. GAS [4] analyzes the impact of NN perception uncertainty on vehicles using an approximated perception model and Generalized Polynomial Chaos. Unlike our approach, GAS only estimates the probability that the vehicle will reach an unsafe state. Its perception model approximation does not incorporate system-level specifications.

SYSTEM-LEVEL SAFETY ASSURANCE
The problem of assuring safety of a control system can be stated as follows: given a control system or a program Sys on a state space X, we would like to check that it satisfies an invariant I ⊆ X. For example, for a lane keeping control system for a vehicle, the invariant requirement is that the vehicle always remains within the lanes. This seemingly textbook statement of the problem is complicated by two factors in an actual autonomous system. First, Sys uses deep neural networks (DNNs) for perceptionconverting pixels to percepts such as deviation from lane-center, and such perception systems are not amenable to formal specification and verification. Secondly, the DNN's output depends on environmental factors such as lighting, texture, and pavement moisture. These dependencies are neither well-understood nor controllable.

System description
Before stating our safety analysis problem more precisely, we set up a mathematical representation of Sys. We view Sys as a discrete time transition system with four interconnected components transforming different types of data and ultimately defining the state transitions ( Figure 2).
The DNN is a deep neural network model for perception. It takes an image (or a high-dimensional vector) p as an input and produces a percept or a low-dimensional estimate vector z = h (p) as the output. In a lane tracking system, this percept z could be, for example, the position of the camera relative to the lanes seen in the image. That is, we model the DNN as a function h : → mapping the space of images to the space of percepts .
The control module is a program that takes a percept z as an input and produces a control action u = g (z) as the output. In a lane tracking system, the control action u is a vector of throttle, steering, and brake signals. The implementation of the controller control may involve a number of modules including navigation, planning, and optimization. For simplicity, we model the control as a function g : → U mapping the space of precepts to the space of control actions.
Then dynamics defines the evolution of the system state x as a function of the previous state and the output from the control . We model the dynamics as a function f : X × U → X. In our example, the state x of the vehicle includes its position, orientation, velocity, etc., and the dynamics function defines how the state changes with a given control action u ∈ U. In this paper, we consider discrete time models, and write the state at time + 1 as where x and u are the state and the control actions at time . This state transition function could be generalized to a relation to accommodate uncertainty, without significantly affecting our framework or the results.
The final component closing the loop is the sensor which defines the image p as a function of the current state x and a set of nontime varying, environmental parameters e. In our example, these parameters include, for example, lighting conditions, nature of the road surface, types of markings defining lanes, etc. We model the sensor as a function s : X × → , where is the space of environmental parameter values. In a real system, we may not know all the environmental parameters, they may not be time-invariant, and their precise functional influence on the image will also be unknown. Therefore, it does not make sense to prove anything mathematically about s. For the purpose of generating abstractions of h • s, we reasonably assume that we can sample inputs of s according to some distribution over and X. In our experiments, we generate synthetic data using a simulator, and the same could also be done with the actual vehicle platform at a higher cost.

Assurances for closed-loop system
The behaviors of the overall system are modeled as sequences of states called executions. Given an initial state x 0 ∈ X and an environmental parameter value e ∈ , an execution of the overall system (x 0 , e) is a sequence of states x 0 , x 1 , x 2 , . . . such that for each index in the sequence: In an ideal world, we would like to have methods that can assure that, given a range of environmental parameter values 0 ⊆ , an unsafe set ⊆ X, and a set of initial conditions X 0 ⊆ X, none of the resulting executions of the system from X 0 can reach under any choice of 0 . Such a method will be a useful tool in checking safety of autonomous systems. Also, in indirectly helping determine 0 for which the system can be assured to be safe, such methods can be used as a scientific basis for specifying the operating design domain (ODD) [32] for the control system.
Since, the functions s and h are partially unknown with unknown dependence on e and x, it is unreasonable to look for the above type of methods. Instead, in this paper, we develop a method for the following weaker problem: Problem. Given an unsafe set ⊆ X and a range of environmental parameters 0 ⊆ , find an abstraction of the perception system h • s, such that it is: used in the closed loop system substituting h • s makes the resulting system safe with respect to . (b) Intelligible, i.e., human designers can understand the behavior of . (c) Precise, that is, and h • s are close.
The goal of this paper is to explore strategies for creating such abstractions. Note that the construction of will rely on both the knowledge of the unsafe set and the range of environmental parameters under consideration.
For the substitution to make sense, the abstraction must have the same type as h • s, however, to allow it higher precision across different environments, we make it a set valued function. That is, : X × → 2 . Since, the actual perception system h • s and its dependence on the environment is incompletely understood, any assertion about the closeness to the abstraction will have to be empirically evaluated. There are many options for measuring closeness that can factor in information about the environmental parameters. We will see later that indeed fine-grained measurement of closeness is possible. We will also discuss how such comparisons can guide both the process of data collection for more precise empirical evaluations, as well as the inference of operating design domains for the system.

An example: Vision-based lane keeping
As a example, we consider a computer vision-based lane keeping control system as shown in Figures 1 and 2.
Dynamics and control. The vehicle state x ∈ X consists of the 2D position ( , ) of the center of the front axle in a global coordinate system, and the heading angle w.r.t the -axis. The input u ∈ U is the steering angle . The discrete time model for the above state vector is the well-known kinematic bicycle function [41] f (x, u): where is the forward velocity, is the wheel base, and Δ is a time discretization parameter. We discuss the impact of different vehicle models on our methodology in Section 7. The input to the dynamic function f comes from the decision and control program.
Here we use the standard Stanley controller [22] used for lateral control of vehicles. This controller uses the percept z ∈ , which consists of the heading difference and cross track distance from the center of the lane to the ego-vehicle. In Figure 3 the heading coincides with the negation of heading difference − , but this happens only in the special case where the lane is aligned with the -axis.
The controller function g (z) is defined as: where is the steering angle limit and is a controller gain parameter.
Perception. Now we describe the complex perceptual part of the system that estimates heading difference and cross track distance using computer vision. First, the sensor function s uses cameras to capture an image, and processes the image through a sequence of computer vision algorithms including cropping the region of interest, undistortion, morphological transformations, resizing, etc., to prepare the input image p for the DNN. The particular DNN used here is LaneNet [40] which uses 512 × 256 RGB images to detect lane pixels. Internally, LaneNet contains two sub-nets for both the identification and instance segmentation of lane marking pixels; then a curve fitting algorithm is applied on identified lane marking pixels to represent each detected lane as a polynomial function. Further, the perspective warping is applied to map the polynomial function to bird's eye view, which gives the final percept z = ( , ) of the relative position of the vehicle to the lane center ( ) and the heading difference ( ), as shown in Figure 2.
System safety requirement. A common specification for lane keeping control is to avoid going out of the lane boundaries. We assume that the vehicle is driving on a straight road with lane width . For the purpose of simplifying exposition, we assume that the center line is aligned with the -axis of the global coordinate system. Thus, the unsafe set can be specified as:

SAFE INTELLIGIBLE ABSTRACTIONS
In this section we will discuss our method for constructing the the abstraction for the perception system h • s. will be a piece-wise affine set-valued function of the ground truth value that h • s is supposed to estimate. Section 4.1 sets the stage showing how a set-valued abstraction of h • s defines an abstraction of the original control system Sys, and therefore, can be useful for verification. Section 4.2 presents the main algorithm that learns, from perception data, the center (mean) of the output set (x, e). Section 4.3 defines the next step in the construction of . This step analyzes the control program f • g to optimize the shape and the size of the output set around the mean, to assure the safety of the abstract system with with respect to the unsafe set . Section 4.4 establishes the safety of the constructed abstraction, not only at the theoretical model level, but it also shows how can be plugged in to the rest of the Sys code and verified using program analysis tools, namely CBMC [] in our work. Finally, Section 4.5 discusses our methods for empirically evaluating the precision of .

Abstract perception in closed-loop
We will construct a set-valued perception function that abstracts the complex perception system h • s and meets the three requirements of safety, intelligibility, and precision. For the safety requirement, our constructed function : X × → 2 should be such that when it is "substituted" in the closed loop system of Equation (1), the resulting system is safe with respect to the requirement . Formally, substituting h • s (x, e) with (x, e), the result is the non-deterministic system Sys ( ) given by: That is, when the actual system state is x (and the environmental parameters e), then the output from the abstract perception function can be anything in the set (x , e). This set-valued approach is a standard way for modeling noisy sensors.
This definition ensures (x, e) covers all possible percepts from h • s (x, e) for all states and environments. If a function is an abstraction of h • s, then it follows that Sys ( ) is an abstraction of Sys, that is, the set of executions of Sys ( ) contains the executions of Sys. Therefore, any state invariant I ⊆ X for Sys ( ) carries over as an invariant of Sys. Fixing an arbitrary initial state x 0 and an environment e, this follows immediately from Equation (3) by deriving: f (x, g (h • s (x, e))) ∈ {f (x, g (z)) | ∃z ∈ (x, e)}. Definition 4.1 is too general to be useful for constructing safe, intelligible, and precise abstractions. At one extreme, it allows the definition (x, e) := {h • s (x, e)} which is exactly same as the original perception system, but does not help with intelligibility nor with safety. At the other end, we can make (x, e) to be all of , which may be intelligible but not useful for safety. Our approach is to utilize available information about safety of the control system without perception. Informally, consider a version of the closed loop control system that uses the ground truth values of , instead of relying on camera and DNNs to estimate these values. In order to prove safety of this ideal system with respect to , we can use standard invariant assertions [37,[43][44][45]47]. We will construct the abstraction for Sys in a way that can utilize the knowledge such invariants. Finding an invariant preserving abstraction satisfying Definition 4.3 will guide us towards creating more practical abstractions of the perception system.

Learning piece-wise abstractions from data
For the abstract perception function : X × → 2 to be intelligible, for any x ∈ X and e ∈ , the output (x, e) should be related to the ground truth value z * ∈ that the perception system is supposed to estimate. For example, for a given state x = ( , , ) of the vehicle in the lane keeping system, and a given configuration of the lanes defined by e, the ground truth z * = ( * , * )-consisting of the relative position to lane center ( * ) and the angle with the lane orientation ( * )-is uniquely determined by the geometry of the vehicle, the camera, and the lanes. The perception system s • h is designed to capture this functional relationship between x and z (and it is affected by the environment e). For the sake of this discussion, let m * (x) = z * be the idealized function that gives the ground truth percept z * for any state x. We may not know m * and only have access to it as a black-box function. Nevertheless, a well-trained and well-designed perception system h • s should minimize the error 2 ||m * (x) − h • s (x, e)|| over relevant states and environmental conditions. As is an abstraction of h • s, therefore, to achieve precision, should also minimize error with respect to m * (x).
In this paper, we consider a piece-wise affine structure of the abstraction . This is an expressive class of functions with conceptual and representational simplicity, and hence human readable and comprehensible. First, given a partition {X } =1... of the target invariant domain, i.e., I = =1 X , we define as: where we search for R : → 2 that returns a neighborhood around m * (x).
In what follows, we will show how R 's can be derived as a linear function of m * (x) that is both safe with respect to the target invariant I and minimizes error with respect to training data samples available from the perception system. ComputeAbstraction gives our algorithm for computing this abstraction for each partition X . Algorithm 1: Construction of abstraction for partition X . The output set is resented by a center defined by transformation matrix A and a vector b , and a ball around the center defined by .
To find a candidate R : ↦ → 2 for a given subset X ⊆ X, we consider that, when given z * as input, R returns a parameterized ball defined as below: where the parameters A and b define an affine transformation from z * to the ball's center, and defines the radius. Here we are using a ball defined by the ℓ 2 norm on . Our approach generalizes to other norms and linear coordinate transformations. We discuss other norms and their effects in Section 7.
We start with the input to ComputeAbstraction in Algorithm 1. Besides the subset X ⊆ X, the invariant I, aforementioned modules f , g, and m * , ComputeAbstraction also requires a training set of pairs (z * , z) where the z * = m * (x) is the ground truth, and z = h • s (x, e) is the percepts obtained with the vision and DNN based perception. These pairs can be obtained from existing labeled data for training DNNs. A labeled data point for DNNs h is already an image p = s (x, e) sampled from X and and its labeled ground truth z * = m * (x). In practice, the state x = ( , , ) can be obtained from other more accurate sensors such as GPS to label the images. We filter the state x with the subset X and obtain the ground truth z * = m * (x). We then simply collect the perceived z = h (p) by applying DNN on image p.
ComputeAbstraction first uses the training set of pairs of (z * , z) to learn A and b using multivariate linear regression. The next section describes how it infers a safe radius around the center A × m * (x) + b by solving a constrained optimization problem.

Making abstractions safe and precise
The A and b computed from multivariate linear regression in Line 2 of ComputeAbstraction implicitly the center A ×m * (x)+b that minimizes distance to the training data in X . Next, we would like to infer a safe radius around the center A × m * (x) + b . There is a tension between safety and precision in the choice of . On one hand, we want a larger radius to cover more samples, Formally, the set of unsafe percepts is {z | ∃x ∈ X .f (x, g (z)) ∉ I} and should be disjoint with the safe neighborhood. Figure 4 illustrates such a safe neighborhood for one particular state x. Note that R has to extend to all states x ∈ X , and hence we need to find a minimum for any x ∈ X . At the same time we would also like as large as possible to cover more perceived values. Further, Figure 5 shows we have to infer for all X in the partition.
Our solution is to find an just below the minimum distance * from the center A × m * (x) + b to the set of unsafe percepts. This is formalized as the constrained optimization problem below: Observe that x ∈ X is a set of simple bounds on each state variables by designing the partition. x ′ ∉ I is simply the invariant predicate over state variables. However, the third constraint x ′ = f (x, g (z)) encodes the controller g and dynamics f in optimization constraints. Encoding the dynamic model f as optimization constraints is a common technique in Model Predictive Control. Encoding the controller g can be achieved with a program analysis tool to convert each if-branch of control laws into equality constraints between z and controller output u = g (z). An example template for Gurobi solver [21] is shown in MinDist.
We argue ComputeAbstraction computes a function R that returns a safe neighborhood for any ground truth percept m * (x).

Proposition 4.4.
For each x ∈ X , R (m * (x)) computed by ComputeAbstraction is disjoint with the unsafe percepts, i.e, ∀x ∈ X .R (m * (x)) ∩ {z | ∃x ∈ X .f (x, g (z)) ∉ I} = ∅ Proof. Our proof is to analyze the possible outcome status from the optimization solver, and propagate each outcome through our functions. At Line 5, the solver may return the following status: (1) When status=OPTIMAL or status=SUBOPTIMAL, the solver returns a distance^and a bound such that the true minimum * is within the bound, i.e.,^≥ * and^− * < Modern solvers all provide the bound to address numerical error or sub-optimal solutions. Consequently, =^− at Line 7 ensures < * , and hence the ball with the radius is disjoint with the unsafe set. (2) When status=INFEASIBLE,the constraints are unsatisfiable, i.e., the unsafe set {z | ∃x ∈ X .f (x, g (z)) ∉ I} is proven to be ∅. We let = +∞ and thus R is equivalent to the whole space of percepts .

Verifying with abstraction: Theory & code
In this subsection, we summarize the claim that the abstraction computed by ComputeAbstraction indeed assures safety of the abstract system Sys ( ) and show how it can be used for codelevel verification. At a mathematical-level, the safety of follows essentially from the construction in ComputeAbstraction. Using Proposition 4.4, we can show that the constructed abstraction preserves the invariant I. Proof. Let us fix x ∈ X and the corresponding ground truth percept m * (x), and R (m * (x)) represents all percepts allowed by R Using the R computed by ComputeAbstraction, we have shown in Proposition 4.4 that R (m * (x)) does not intersect with any percept that can cause the next state f (x, g (z)) to leave I. We then rewrite it as, for each x ∈ X , any percept z ∈ R (m * (x)) preserves the invariant I, i.e, ∀x ∈ X .∀z ∈ R (m * (x)).f (x, g (z)) ∈ I.
(4) Therefore, the invariant I is preserved for each subset X , The proof of Proposition 4.5 is then to expand Definition 4.3 with the function body of and extend the guarantee from Equation (4) to all x ∈ I simply because {X } =1... covers I. □ More importantly, the constructed abstraction can be plugged into the models of the system Sys, with different levels of detail, and verified using any number of powerful formal verification tools that have been developed over the past decades. For example, the abstract perception system could be plugged into the controller g and dynamics f functions represented by complex, explicit models, code, and differential equations, and we can verify the resulting system rigorously.
To illustrate this point, in this paper, we showcase how to use with C code implementations of g and f and verify the resulting system with CBMC [11] to gain a high-level of assurance for the control system. Recall our set valued abstraction is of the form:  This can be directly translated into program contracts, that is, preconditions and postconditions, supported by numerous existing program analysis tools [7,11,18,35]. For instance, we are able to implement the abstraction shown as C code in the following template with CBMC's APIs. We then are able to verify the whole system integrating the controller and the dynamics such as the example code above with CBMC. A prominent example in the above code is that, in C, arctan with division in its input expression is often implemented with atan2 instead to handle zero denominators correctly, and it becomes obscure if the proof for mathematical models will still hold in the code level, It is therefore crucial to use CBMC to automatically check safety requirements still holds.

Precision of abstraction
How close is the computed abstraction to the actual perception system s • h? As we discussed earlier, it is difficult if not impossible to rigorously answer this question because the perception system (and therefore the learning stage of ) depends on the e in complex and unknown ways. Also a simple answer to this question is unlikely to be satisfactory. We might care more about precise in certain parts of the state space and for certain environmental conditions, than others. Trying to argue which environmental conditions are more likely to arise in the real world, may be complicated.
We propose a simple and fine-grained empirical measure of precision. We fix a range of environmental parameter values . For each partition X , we collect a testing set of pairs of (z * , z) by sampling across X × using some distribution D, where as before z = h • s (x, e) is the actual perception output and z * = m * (x, e) is the ground truth. We denote a pair (z * , z) that satisfies z ∈ R (z * ) as a positive pair. Then, the fraction of positive pairs gives us the empirical probability with respect to D that the actual perception system (with DNN) outputs percepts that are proved to be safe in Sys ( ) with respect to the invariant. It may be tempting to interpret this probability as a probability of system-level safety, but without additional information how D is related to the actual distributions over X and , we cannot make such conclusions.
In experiments discussed in the following sections, we use D to be the uniform distribution. Each of the heatmaps shown in Figures 6 illustrate the precision of different safe abstractions over X . A darker green X means that a higher fraction of outputs from the perception system matches the provably safe abstraction .

CASE STUDY 1: VISION-BASED LANE KEEPING WITH LANENET
Recall our motivating example in Section 3.3, we study the Polaris GEM e2 Electric Vehicle and its high-fidelity Gazebo simulation [14]. The perception module uses LaneNet [40] for lane detection. 3 In this section, we first discuss the construction of the safe abstraction in Section 5.1. In Section 5.2, we discuss the interpretation of the precision heatmaps. We aim study the impact of the following three factors on the precision of abstractions: RQ 1. Selection of partitions {X }. RQ 2. Environment parameter distributions D. RQ 3. Abstractions for different invariant I requirements.  Next, we discuss about the invariant I we will use to prove . A standard induction-based proof for control systems is to define an error function (Lyapunov function) over the perceived values, and then prove that the error is non-increasing by induction. Formally, a tracking error function is : ↦ → R ≥0 , with (0 z ) = 0 and (z) > 0 when x ≠ 0 z . where 0 z ∈ is the equilibrium. The different error functions used in this paper are shown in Table 1.
Take 1 in Table 1 as an example, the ideal perceptual output of a state ( , , ) is obtained by ( * , * ) = m * ( , , ), and the tracking error is then 1 ( * , * ). This function m * may in general be complicated and dependent on the geometry of the lanes. The next state according to Equation (1) is ( ′ , ′ , ′ ) = f (( , , ), g (m * ( , , ))). The next percept is then obtained by ( * ′ , * ′ ) = m * ( ′ , ′ , ′ ). We then define the invariant of non-increasing error I 1 ⊆ X as: We give the detail descriptions and values of constant symbols used in f and g as well as the perfect estimation m * in Appendix A.
To infer the safety abstraction , we consider the invariant is covered by partitions {X } ≤ with within ±1.2 meters and heading angle within ±15 • , that is, Further, we consider three different partitions ∈ {8×5, 8×10, 8× 20}; larger numbers partition more finely and produce refinements of the coarser abstractions. Here we do not partition along for (a) better visualization and (b) because lanes are aligned with the -axis, partitioning -axis does not produce interesting results.
To prepare the training data for learning A and b to construct R , we use the Gazebo simulator in [14] to generate camera images p labeled with their ground truth percepts z * . Each image is sampled from an uniform distribution D over the subspace X as well as the environment space . The environment parameter space is defined by: (i) three types of straight roads, two-lane, four-lane, and six-lane, (ii) two different lighting conditions, day and dawn. The ground truth percept z * = m * (x) is then calculated using information from simulator, and we ensure that at least 300 images are collected for each X to learn A and b for R .
For each partition, given A and b learned from multivariate linear regression using the data. MinDist, implemented in Gurobi [21], solves the following nonlinear optimization problem to find : Each X covers an interval of 0.3 meter for and 3 • for . We discuss the optimization problem for a particular subset X that covers from 0.9 to 1.2 meters and from 12 • to 15 • as an example, i.e, X = ( , , ) | ∈ [0.9, 1.2] ∧ ∈ 15 , 12 .
All the computed abstractions were composed with the code for the controller g and the dynamics f and verified for the corresponding invariant with CBMC. Figure 6 shows the precision maps for three abstractions resulting from three increasingly finer partitions and two sets of testing environments. First, we discuss the broad trends and then delve into the details.

Interpretation of precision of abstractions
At equilibrium, abstraction breaks but it does not matter. All six heatmaps demonstrate a common trend where there is a band of white (low score) cells going from the second to the fourth quadrant. There are areas where the safe radius of R is too small be an abstraction of the perception system. This phenomenon can be informally understood as follows: First, the center (equilibrium) of the plot corresponds to near zero error in deviation and heading . Consider when a vehicle state with the tracking error approach 0, the percept must also approach the ground truth so that the next state would maintain the error does not increase. Consequently, the safe radius → 0. Recall that is minimized with respect to all state x ∈ . We can view the overall system Sys ( ) as a fixed-resolution quantized control system. It is well-known that such a system cannot achieve perfect asymptotic stability [10]. The feedback does not have enough resolution to drive the state to the equilibrium, but instead, it can converge to some neighborhood around it. This explains why the error function cannot be nonincreasing around the origin. In other words, here the abstraction is "failing to be safe" because of the control. We note that not being able to prove safety around the origin is less of a problem because these are precisely the states where the vehicle is centered between the lanes and its heading is aligned.
Weak invariants can break safe abstraction. Secondly, along the diagonal line (through the origin) we have states where the vehicle's deviation from the lane center and the heading are in opposing direction. By observing 1 from Table 1, we know and are of opposite signs at the equilibrium points 1 ( , ). Hence the band of white cells goes from the second to the fourth quadrant. Therefore, the tracking error cannot be non-increasing in these states in one step as required by I. These regions of the precision map are white because, as mentioned, the safe radius is too small to be an abstraction of R. In these regions, the abstraction is failing because the invariant I property we are trying to prove is too weak to be proven in one step. A remedy for this problem will be to come up with stronger inductive invariants for the system with perfect perception.
Finer partitions improve precision of safe abstractions. We observe from each row of heatmaps in Figure 6 that finer partitions generate more precise abstractions. In the finest partition, several cells achieve over 90 percent. The reasons are twofold. (1) With a finer partition, linear regression can better fit a smaller interval of the original nonlinear perception. (2) The safe radius is minimized for all states x ∈ X . If a smaller subset X ⊂ X excludes the worst state, the radius can improve and cover larger safe neighborhood.
Fewer environmental variations improve precision. For RQ 2, we observe each column of heatmaps in Figure 6. We generated two testing sets under different distributions over the environment space including (1) the same uniform distribution for the training set, and (2) an uniform distribution over the subspace with only the two-lane road. The colors become darker for the same cell locations as expected. The variance in the perceived values by DNN reduces because of the fewer environmental variations. The same radius now can cover more samples in the testing set.
Variations with different safety requirements. Finally for RQ 3, we consider another invariant which uses different tracking error functions 2 and 3 (listed in Table 1). 2 considers only lane deviation error ( ), and 3 uses the vector norm as error. Both can be used to prove the same unsafe set . Three heatmaps for each tracking error function are shown in Figure 7 for the same three partitions and with the testing set with two-lane road. By comparing Figures 6 and 7, we see a white band now surrounding the line = 0 for 2 and a white spot around the origin for 3 . This validates our explanation that the abstraction breaks owing to the stringent requirement of non-increasing error.

CASE STUDY 2: CORN ROW FOLLOWING AGBOT
Our second case study is the visual navigation system of undercanopy agricultural robot (AgBot), CropFollow, developed in [49].  The system is responsible for the lateral control when the vehicle traverses the space between two rows of crops. Similar to our first case study, the system captures the image in front of the vehicle with a camera (Figure 8), applies a ResNet-18 CNN on the camera image to perceive the relative positions of the corn rows to the ego vehicle, and uses a modified Stanley controller to reduce the lateral deviation.
Here we give the detailed definition of each component. In Crop-Follow [49], the vehicle dynamics is approximated with a kinematic differential model of a skid-steering mobile robot. The state x consists of the 2D position and and the heading . The input u is the desired angular velocity . The dynamics f (x, u) is: Likewise, the modified Stanley controller takes a percept z ∈ composed of the heading difference and cross track distance to an imaginary center line of two corn rows, and outputs the angular velocity . The controller g is given as: For the farm robots, we wish to avoid two undesirable outcomes: (1) if | | > 0.228 meters, the vehicle will hit the corn, and (2) if | | > 30 • , the neural network output becomes highly inaccurate and recovery may be impossible. Therefore, we define the unsafe Similarly, we consider three different partitions ∈ {5 × 5, 10 × 10, 20 × 20} to cover the invariant; the whole space =1 X covers ±0.228 meters in and ±30 • in . We follow the same procedure to sample images and derive the safe neighbor function R for X . For this case study, the environment parameter space is defined by five different plant fields, including three stages of corn (baby, small, and adult) and two stages of tobacco (early and late) fields. We use the uniform distribution over the state space X and the five environment parameters for both the training testing set. Figure 9 shows the precision heatmaps for the abstractions inferred with three different partitions. We observe almost identical broad trends compared to Figure 6, including the white band around equilibrium, the white spots in the upper right and lower left corners close to the violation of invariant, and higher precision score with finer partitions. This case study reaffirms the validity of our interpretation over the precision heatmap in Section 5. It also showcases that our analysis can be applied on vastly different vision and DNN based perception system with similar percept space.

DISCUSSION AND FUTURE DIRECTIONS
Safety assurance of autonomous systems that use machine learning models for perception is an important challenge. We presented an approach for constructing abstractions for verifying control systems that use convolutional neural networks for perception. The approach learns piece-wise affine set-valued abstractions of the perception system from data. It maximizes the these sets for improving precision, while assuring a given safety requirement. Viewing abstractions of perception systems along the triple axes of safety, intelligibility, and precision may be a productive perspective for tackling the problems of safety assurance of autonomous systems. We discuss some of the lessons learned and the future research directions they suggest.
Within the space of intelligible abstractions, we have explored one corner with piece-wise affine models. Needless to say that this was a somewhat arbitrary choice, and many other options should be explored, for example, with decision trees, polynomial models, space partitioning algorithms, etc. Developing algorithms for computing such abstractions and as well as verifying the endto-end abstract system Sys ( ) would be interesting directions for future research.
Our piece-wise affine abstractions used uniform rectangular partitions. We observed that the size of the partitions have significant impact on the precision of the safe abstractions. The results suggest that non-uniform or adaptive partitioning (e.g., finer partitions nearer to the equilibrium) would yield more precise abstractions. Using domain knowledge and symmetries in creating the abstractions should substantially improve their precision and size.
As expected, the safety requirement (or invariant) guiding the construction of the abstraction significantly impacts the precision of the abstraction. The precision maps shed light on parts of the state space and environment where the DNN-based perception system is most fragile, likely to violate requirements. Such quantitative insights can inform design decisions for the perception system, the control system, as well as the definition of the system-level operating design domains (ODDs).
Finally, we chose to use discrete time models and used CBMC for verifying the closed system with the abstraction. Extending the approach to continuous and hybrid would be interesting and would require nontrivial extensions of existing verification tools.