Context-triggered Abstraction-based Control Design

We consider the problem of automatically synthesizing a hybrid controller for non-linear dynamical systems which ensures that the closed-loop fulfills an arbitrary \emph{Linear Temporal Logic} specification. Moreover, the specification may take into account logical context switches induced by an external environment or the system itself. Finally, we want to avoid classical brute-force time- and space-discretization for scalability. We achieve these goals by a novel two-layer strategy synthesis approach, where the controller generated in the lower layer provides invariant sets and basins of attraction, which are exploited at the upper logical layer in an abstract way. In order to achieve this, we provide new techniques for both the upper- and lower-level synthesis. Our new methodology allows to leverage both the computing power of state space control techniques and the intelligence of finite game solving for complex specifications, in a scalable way.


I. INTRODUCTION
The problem of synthesizing controllers for different classes of non-linear systems with respect to temporal logic specifications has received considerable attention in the last decades, especially in the context of cyberphysical systems (CPS) design.The goal of these methods is to allow for fully automated synthesis of feedback controllers, which enforce temporal logic constraints and hence, to allow for a much larger spectrum of specifications than classical feedback controller synthesis techniques.In order to achieve this goal, techniques from the formal methods and the control communities need to be combined.
While there has been enormous progress towards this goal in the last decade, documented by various recent textbooks on this problem, e.g.[1], [2], [3], most of the existing approaches still tackle the overall problem mainly from either the control or the formal methods side.Thereby, the potential of techniques available in the respective other domain is not fully exploited, leading to unsatisfying solutions in settings where low-layer continuous control and high-layer logical decision making are tightly intertwined.Such problems occur for example in the control of autonomous robots deployed in warehouses [4], underwater inspection [5], [6] or in rescue and evacuation scenarios [7], [8].In these applications, the robots need to (a) directly compensate environment uncertainty during their movement (such as rough terrain or sensor/actuator noise), and (b) strategically react to any logical context change, e.g., a newly arriving package that needs to be re-located in the warehouse, a leak in an oil pipeline that needs to be fixed under water, or a door that got closed and needs to be re-opened to reach a target in a rescue scenario.These context changes are triggered by the external environment and can occur at any time.They must directly result in (high-level) strategic reactions of the robots that trigger new objectives of the (low-level) feedback control policy which, on the other hand, is able to correctly actuate non-trivial non-linear dynamical systems.Control problems with a similar required integration of logical decision making and lowlayer feedback control occur for example in sustainable building management [9], or smart energy grid operation [10] or safety-critical medical operations [11].
This paper presents a novel approach to such integrated control problems, which automatically computes a provably correct hybrid controller that seamlessly reacts to (high-layer) logical context switches.Therein, the main contribution of our work is twofold: the new gamesolving formalism we present (i) provides a certified and reactive interface between the higher and the lower control layers via control Lyapunov functions and (ii) while dismissing grid-based discretization of both the input and the state spaces.On the same line, our approach does not require discretization of time ab initio.Rather, it considers time implicitly at the high-level strategy design, and defers the actual discretization of time to the low-level controller design, in an opportunistic way.Thereby, it enhances scalability and avoids numerical problems due to small sampling time intervals.
Moreover, the full class of LTL specifications can be considered for a large class of non-linear continuous dynamics.

A. Motivating Example, Challenges and Contributions
Throughout this paper, we re-visit the following simple robot control example to outline the challenges and contributions of our new hybrid controller synthesis approach.
Example.We consider a simple moving robot r, in a setting composed by two neighboring rooms, connected by a sliding door, as depicted in Fig. 1.There are three target sets: T 1 , T 2 in the left room and T 3 in the right room.An external user (the environment), at each instant of time, chooses a mode among M i , i ∈ {1, 2, 3} indicating the current desired target T i for the robot.Moreover, the opening status of the door can be controlled by the robot -entering the target T 1 or T 3 opens the door (if it was previously closed) while entering the target T 2 closes it (if it was previously open).This can be expressed by the LTL formula 1 The goal is to design a feedback control policy that reacts to the external environment decisions M i , by moving to the chosen target T i while adhering to additional safetyconstraints, i.e. not hitting the walls W (including the door if it is closed).This can be expressed by the LTL formula Summarizing formally, the overall specification for the robot is φ A ⇒ φ G , i.e, it needs to guarantee its goal φ G while assuming that φ A holds.
Challenges.This example showcases three main challenges that are tackled by our new controller synthesis approach. 1See Section II-B for an introduction to linear temporal logic (LTL).First, the environment can change the mode at any time.Considering a real application where targets might be far away from each other, we would like the robot to immediately adapt its motion towards the new target, and not only after "completing" the previously assigned task of reaching another target.We achieve this direct reactivity, by autonomously switching the low-layer controller in reaction to a mode change.This, however, requires caution to avoid well-known instability problems in switched control settings.
Second, as the robot itself is controlling a part of the logical context (by being able to open and close the door), a hybrid controller cannot naively switch between low-layer controllers for different targets based on the active mode.If, for example, the desired target is set to be equal to T 3 and the robot is currently in the left room while the door is closed, the robot should automatically decide to first visit the target T 1 to open the door.Scaling this to applications (e.g., in warehouses) where many logical requirements interact, requires a principled way to design a correct strategy for the robot to react to context changes such that a given formal specification, for instance φ A ⇒ φ G , is satisfied.
Third, it is important that the low-layer control design does not simply implement what should be done (i.e., which target should be reached) but also what should not be done.For example, if the robot is in the left room moving towards T 3 while the door is open, it must not pass over T 2 , as this would close the door.In addition, the door can be both an obstacle and a target, dependent on the current context.
To design a correct-by-construction hybrid controller tackling the last two challenges, one needs (i) a formally correct mechanism to translate strategic choices from the higher layer to feedback-control problems (with suitable guarantees) in the lower layer and (ii) incorporate all necessary information about the workspace and the lowlayer closed-loop properties into the high-layer strategy synthesis problem.
Contribution.This paper achieves these two goals by a new game-solving formalism for high-layer strategy synthesis, which (i) computes strategy templates instead of single strategies and (ii) allows for progress group augmentations.We show that (i) strategy templates provide a certified top-down interface by allowing a direct translation into context-dependent reach-while-avoid (RWA) controller synthesis problems, which, in turn, can be certifiably solved via control Lyapunov functions.This leads to provably correct low-layer controllers implementing high-layer strategy choices.Further, we show that (ii) progress group augmentations provide a certified bottom-up interface that enables a non-conservative and discretization-free incorporation of low-layer closedloop properties into the higher-layer strategy synthesis game.

B. Literature Review
Existing approaches tackling the outlined integrated controller synthesis problem, can roughly be divided into three different research lines.First, discretizationbased abstraction techniques can be used to incorporate low-level dynamics into the high-level strategy synthesis games (see e.g., [1], [12] for an overview and [13], [14], [15], [16], [17] for tool support).These approaches are able to handle the full problem class we tackle, but are known to suffer heavily from the curse of dimensionality and from conservatism introduced by the abstraction.Second, both the specification and the dynamics of the system can be mapped into a large optimization problem that searches for an optimal control law ensuring that both the logical specification and the dynamical constrains are satisfied (see e.g.[18] for a survey).These methods, however, scale poorly with the number of logical constrains and cannot handle external environment inputs.Third, a constrained system can be generated, which searches for certificates on the lower level dynamical system to enforce a temporal specification (see e.g.[2, Ch.12] for an overview).This approach is usually restricted to particular classes of logical specifications and non-linear dynamics.
Within this paper, we mainly follow the third approach utilizing certificates, in particular control Lyapunov functions, to realize reach-while-avoid objectives.What distinguishes our work from existing ones (e.g., [10], [19], [20], [21]) is the presence of logical inputs operated by the external environment.In the absence of these, the resulting synthesis problem reduces to a temporal logic planning problem, which does not require a reactive strategy on the higher layer, i.e., a single plan can be computed and executed in an open-loop fashion.Our approach produces closed-loop controllers in both layers instead.
While recent methods combining certificates with high-granularity abstractions (e.g.[22]) also produce closed-loop solutions, there, environment inputs can only be handled at transition points between abstract states.In our example, the robot would need to complete one motion (reaching a particular target) before it can receive a new objective, leading to an unsatisfying closed-loop behavior.
In addition, our new game solving formalism is also related to other work in the reactive synthesis community.While strategy templates have been very recently introduced in [23], [24], progress group annotations appeared previously in [25] for a restricted class of temporal specifications and only induced by uncontrolled dynamics.Further, [26] also tackles the problem of reactive control for dynamical systems via parity games, but only presents sufficient conditions for the existence of certificates and controllers, while our method is fully constructive.

II. PRELIMINARIES
In this section we recall, in a condensed form, the main concepts and results from dynamical control systems theory and formal methods settings.

A. Dynamical Systems
Let us introduce the state-space setting and the main stabilization/control techniques that we consider in order to achieve the logical specifications described in previous sections.First, we introduce the notion of continuoustime control systems considered in this manuscript.Definition 1.A (continuous-time) control system is defined by a triple S := (X, U, f ) where: Given a control system S := (X, U, f ) and a measurable function u : X → U , a solution of S for u starting at x ∈ X is a function ξ x,u : [0, T ) → X (for some T > 0 and possibly T = +∞) such that ξ x,u (0) = x, ξ x,u (t) ∈ X for all t ∈ [0, T ) and ξx,u (t) = f ξ x,u (t), u(ξ x,u (t)) for almost all t ∈ [0, T ).
To cope with reach-while-avoid objectives, we must design control policies driving the solutions to desired targets, possibly avoiding obstacles/staying in safe regions.Thus, we aim to design feedback control strategies, using the formalism of control Lyapunov functions (CLF).Let us recall in what follows the main definitions and concepts from CLF-based feedback design literature (for an overview, see [27], [28], [29]).To ease notation, we denote by C 1 (X, R) the set of continuously differentiable functions from X to R; given a function w : X → R and any c ∈ R, we denote by In this case, the set X w := X w (C) is the basin of attraction of w.If X = R nx , w is radially unbounded and inequality (4) holds in R nx \ X w (c), then w is said to be a global CLF.
Intuitively, the condition (4) implies that, whenever x ∈ X w \ X w (c), there exists a u ∈ U for which the directional derivative of w along the vector f (x, u) is strictly negative, and thus the value of the Lyapunov function is decreasing along solutions of (2) following such direction.This observation motivates the following CLF-based design result.
Lemma 1.Consider a control system S := (X, U, f ), a compact target set X T ⊂ X, and suppose that w ∈ C 1 (X, R) is a CLF in the sense of Definition 2. Consider a continuous u : The proof follows from classic Lyapunov theory and the comparison argument, therefore, we refer to [30], [28] or related literature for a detailed demonstration.
We note that Definition 2 considers basins of attraction X w which are sublevel sets of CLFs.Hence, these sets are safe by construction, that is, all solutions under a control u satisfying (5) will always stay inside X w (in addition to eventually reaching X w (c)).As such, the CLFs considered in this paper allow to enforce reach-while-avoid objectives, by provably avoiding an unsafe region while reaching a target region within the state space.As the computation of such CLFs can introduce some conservatism, we note that more general approaches, such as control Lyapunov barrier functions (see e.g.[21], [19], [31]) can similarly be used for the purpose of guaranteeing safety, if no property of convergence is required.
Remark 1 (CLFs-based Feedback design: Literature review).Definition 2 is stated in a form particularly suited for our purposes and many extensions/modifications are possible.
First of all, let us point out that some technical issues can arise, even in the restricted context of Definitions 1 and 2, when considering feedback control laws satisfying (5).Indeed, functions u : X → U satisfying (5) can be necessarly discontinuous and thus special care should be provided in defining tailored solution concepts for the closed loop ẋ = f (x, u(x)).For the interested reader, this technical topic is discussed in [29,Section 8].In the affine-control case, i.e. when U = R nu and f (x, u) = h(x) + g(x)u for some functions h : C 1 (X, R nx ) and g ∈ C 1 (X, R nx×nu ), a smooth CLF as in Definition 2 induces a continuous feedback law, as defined in [32] and well summarized in [33].Moreover, for notational simplicity, in Definition 2 we impose to the candidate CLF the continuous differentiability property.This hypothesis can be relaxed considering locally Lipschitz candidate control Lyapunov functions.In this case, in (4), Dini-derivatives or Clarke gradient formalism should be used, since the classical gradient is not defined for locally Lipschitz functions.We want to stress that, for the classical stabilizability problem of control systems, it is necessary, in order to avoid any conservatism, to consider non-smooth (but locally Lipschitz) CLFs, see [28] and references therein.

B. Linear Temporal Logic
In this section, we introduce the syntax and semantics of Linear Temporal Logic (LTL) in order to formally describe the logical specifications.For a complete overview, we refer to [34,Chapter 5].Atomic Propositions.An atomic proposition is a boolean variable (i.e., a variable that can either be true or false) which signals important information to the higher-layer logical control layer.In this paper, we consider three different (finite) sets of atomic propositions: (i) state propositions AP S , (ii) observation propositions AP O , and (iii) control propositions AP C .State propositions (e.g., T 1 , T 2 , T 3 in Fig. 1) are associated with a subset of the state space s.t.T i ∈ AP S is true at time t if the current state x(t) of the underlying dynamical system is within this subset 2 , i.e. x(t) ∈ T i ⊆ X. Observation propositions AP O denote all other aggregated information observed by the logical controller from the underlying continuous control system (e.g., D in Fig. 1) and the external environment (e.g., M 1 , M 2 , and M 3 in Fig. 1).Control propositions AP C denote a finite set of feedback control strategies that the high-level logical controller can choose (which will be introduced in Section IV-C).We denote by AP := AP S ∪ AP O ∪ AP C the set of all propositions.
Given a control system S = (X, U, f ), the state propositions AP S define a labelling function L : X → 2 APS s.t. for all X ∈ AP S holds that X ∈ L(x) ⇔ x ∈ X .In addition, Υ : R + → 2 APO denotes a piecewise-constant and right-continuous 3 logical disturbance function modelling the sequence of observation propositions acting on the system over time.We collect all logical disturbance functions acting on S in the set D.
Traces.For a set A, we write A ω to denote the set of all infinite sequences a 0 a 1 . . .with a i ∈ A for each i ≥ 0.
Linear Temporal Logic (LTL).We consider requirement specifications written in Linear Temporal Logic [35].LTL formulas over a set of atomic propositions AP are given by the grammar where p ∈ AP and ϕ is an LTL formula.
A trace π = l 0 l 1 . . .∈ (2 AP ) ω is defined to satisfy an LTL formula φ, written as π φ, recursively as follows: 2 With a slight abuse of notation we denote the state subset associated with a state proposition by the same symbol. 3A function L : R + → S, with S a finite set, is piecewise-constant if it has a finite number of discontinuities in any bounded subinterval of R + ; it is right-continuous if lim sցt L(s) = L(t) for all t ∈ R + .

C. Games on Graphs
In this section, we define the games on graphs and related techniques which will be used to compute a highlevel logical controller satisfying a given LTL specifications.Game Graphs.A (labelled) game graph over a set of atomic propositions AP is a tuple G = (V, E, ℓ) consisting of a finite set of vertices V partitioned into two sets: Player 0's (controller player) vertices and Player 1's (environment player) vertices, a set of edges E ⊆ V × V , and a labelling function ℓ : V → 2 AP .We write V i to denote Player i's vertices, and E i to denote the edges with source in V i , i.e., Games.A (alternating) two-player game is a pair G = (G, WIN) consisting of a game graph G = (V, E, ℓ) such that E ∩ (V i × V i ) = ∅ and a winning condition WIN ⊆ V ω .Every winning condition that we consider in this paper can equivalently be expressed as an LTL formula4 φ WIN over a set of propositions interpreted as subsets of V and we use both characterizations interchangeably.A play ρ is winning if ρ ends in a Player 1 dead-end or ρ ∈ WIN (or equivalently ρ φ WIN ).
A (memoryless) strategy for Player i, is a function A Player 0 strategy σ is winning from a vertex v if every σ-play from v are winning.Moreover, if such a strategy exists for a vertex v, then that vertex v is said to be winning.We collect all such winning vertices in the winning region; and a Player 0 strategy is said to be winning if it is winning from every vertex v in the winning region.Note that we have defined winning strategies only for Player 0 as only Player 0 wants to satisfy the specification in such a (zero-sum) game.Parity Games.A parity game is a game with a parity winning condition PARITY(P) defined via a priority function P : V → [0, d] that assigns to each vertex a priority.A play ρ = v 0 v 1 . . . is winning w.r.t.PARITY(P) if the maximum priority seen infinitely often along ρ is even.The parity winning condition PARITY(P) can be represented by an LTL formula whose atomic propositions are subsets P i ⊆ V collecting all states with priority i, yielding LTL to Parity Games.It is well-known5 that every LTL formula φ over some finite proposition set AP can be translated into an equivalent (labeled) parity game G = (G, PARITY(P)).This translation requires a partition of AP = AP 0 • ∪ AP 1 such that Player i (i.e., the controller or the environment player, respectively) chooses the propositions in AP i .We will see that for the synthesis problems that we consider in this paper, this partition is naturally given.In addition, plays ρ = v 0 v 1 . . .∈ V ω are translated into traces π = l 0 l 1 . . .∈ (2 AP ) ω (called generated by ρ) via the labeling function ℓ of G, s.t.l i = ℓ(v 2i+1 ) ∪ ℓ(v 2i+2 ) for each i ≥ 0. Furthermore, we say a game G or game graph G is total w.r.t.AP ′ ⊆ AP if for every trace π ′ over AP ′ , there exists a trace π generated by a play in G such that π| AP ′ = π ′ .
With this, we recall the following well-known result.
With Lemma 2, the problem of computing a logical controller which satisfies a given specification φ in interaction with an uncontrolled environment reduces to computing a winning strategy in a parity game G.

D. Strategy Templates
While it is well known how to compute a single winning strategy for a parity game G, it was recently shown that strategy templates [23], which characterize an infinite number of winning strategies in a succinct manner, are particularly useful in the context of CPS control design.They are utilized within this paper to obtain a novel translation of high-level logical control actions into low-level feedback controllers.
Strategy templates are constructed from three types of local edge conditions, i.e., safety, co-live and livegroup templates.Formally, given a game G = (G = (V, E, ℓ), WIN), a strategy template is a tuple (S, D, H) consisting of a set of unsafe edges S ⊆ E 0 , a set of colive edges D ⊆ E 0 , and a set of live-groups H ⊆ 2 E0 .This strategy template can also be represented by an LTL formula ψ = ψ UNSAFE (S)∧ψ COLIVE (D)∧ψ LIVE (H), where Here, an edge e = (u, v) represents the LTL formula u ∧ v, and SRC(H ) is the source set A Player 0's strategy σ satisfies a strategy template ψ if it is winning in the game (G, ψ).Intuitively, Player 0's strategy σ satisfies a strategy template (S, D, H) if every σ-play ρ satisfies the following: (i) ρ never uses the unsafe edges in S; (ii) eventually, ρ stops using the co-live edges in D; and (iii) if ρ visits SRC(H ) infinitely many times, then it also uses the edges in H infinitely many times.Moreover, a strategy template ψ is winning if every strategy satisfying ψ is winning in the original game G.Note that sources of all the edges in these templates are Player 0's vertices.The algorithm to compute a winning strategy template in a parity game lies in same time complexity class as the standard algorithm, i.e., Zielonka's algorithm [38], for solving parity games.This leads to the following result:

III. PROBLEM STATEMENT
This section gives a formal definition of the problem we are tackling in this paper.Our goal is to automatically synthesize a reactive hybrid controller that operates a non-linear control system based on external logical inputs.Towards a formal problem statement, we first define a hybrid state-feedback control policy which controls a system S while reacting to logical context switches induced by the sequence of observation propositions Υ ∈ D acting on S as logical disturbances.Definition 3. Let S = (X, U, f ) be a control system and Υ : R + → 2 AP O a disturbance function.A hybrid statefeedback policy is a function p : R + ×X ×D → U .A solution of S for p starting at x ∈ X under Υ is a function ξ x,p,Υ : [0, T ) → X (for some T > 0 and possibly This leads us to the following problem statement. Problem 1.Given a control system S = (X, U, f ) with labelling function L : X → 2 AP S and an LTL specification φ over the predicates AP S ∪ AP O , find a set of winning initial conditions X win ⊆ X and hybrid state-feedback policy p : R + × X × D → U s.t. for all x ∈ X win , all disturbance functions Υ ∈ D and all solutions ξ x,p,Υ , it holds that (i) ξ x,p,Υ (t) ∈ X win for all t ∈ R + , and (ii) every trace π ∈ Traces L,Υ (ξ x,p,Υ ) satisfies φ.
The remainder of this paper illustrates our solution to Problem 1 by first providing an overview of the entire multi-step synthesis algorithm in Section IV, then highlighting additional details for selected steps in Section V and Section VI, and showing simulation results for the motivating example from Section I-A in Section VII.

IV. SYNTHESIS OVERVIEW
This section overviews our automated synthesis procedure which consists of five steps which are schematically depicted in Fig. 2. First, in Section IV-A (Fig. 2, green) we solve a high-level logical game induced by the specification.Then, in Section IV-B (Fig. 2, pink) we build a top-down interface which allows us to translate strategic choices from the logical level into certified low-level feedback control policies.Afterwards, in Section IV-C (Fig. 2, cyan), we build a bottom-up interface to include relevant information about the lowlevel closed-loop dynamics into the logical synthesis game via augmentations.We then solve the resulting augmented high-level synthesis game in Section IV-D (Fig. 2, violet).Finally, in Section IV-E (Fig. 2, orange), the obtained winning strategy is used to construct a hybrid controller which is proven to solve Problem 1.

A. High-Level Logical Synthesis
This initial step only considers the (high-level) logical strategy synthesis problem induced by the LTL specification φ (realizing the green marked transitions in Fig. 2).As formalized in Problem 1, the specification φ only contains state and observation propositions, i.e., AP = AP S ∪ AP O .The definition of control propositions AP C is part of our synthesis framework and will be discussed in Section IV-B.
In order to use Lemma 2 to construct the initial parity game G I from φ, we need to divide AP into controller (player 0) and environment (player 1) propositions.To do this, we optimistically assume that the controller can instantly activate/deactivate all state propositions in AP S , thus defining AP 0 := AP S .This ignores the dynamics of S and how the state propositions are geometrically represented in the state-space.This is done on purpose to enable a lazy synthesis framework -our framework only adds aspects of both the dynamics and the geometric constraints which show to be relevant to the synthesis problem in a later step, discussed in Section IV-C.
As observation propositions are not under the control of the system or the controller, they are naturally interpreted as environment propositions, i.e., AP 1 := AP O .Intuitively, the initial game G I constructed from φ via Lemma 2 reveals all logical dependencies of propositions relevant to the synthesis problem at hand.After constructing G I from φ (i.e., going from 1 to 2 in Fig. 2), we can directly apply the algorithm from [23] to synthesize a winning strategy template ψ I on G I (i.e., going from 2 to 3 in Fig. 2) as discussed in Section II-D.This gives the following result which is a direct consequence of Lemma 2 and the definition of strategy templates.
Proposition 1.Given the LTL specification φ over AP = AP S ∪ AP O translated into an initial parity game G I that is total w.r.t.AP via Lemma 2 and a winning strategy template ψ I for G I the following holds: for every Player 0 strategy σ that satisfies the strategy template ψ I , it holds that the trace generated by a σ-play in the initial game G I satisfies the specification φ.
Example 1.For the example from Section I-A, the parity game A part of the resulting parity game G I is depicted in Fig. 3.
A winning strategy template for the part of the parity game G I depicted in Fig. 3 is where e vv ′ denotes the edge from v to v ′ .The strategy template ψ I forces the plays to never use the unsafe edges {e cf , e df } (indicated schematically by dotted red arrows) as they lead to vertex f where proposition W is true signaling that the robot hits the wall.Furthermore, ψ I forces the plays to eventually stop using the co-live edges {e cb , e db } (indicated schematically by dashed blue arrows).This is because if Player 0 (i.e., the controller) keeps using these edges, then Player 1 (i.e., the environment) can force a play to loop in one of the cycles (cbde) ω or (db) ω which does not lead to a winning play as the maximum priority seen infinitely often is odd (i.e., 1) in these cycles.

B. The Top-Down Interface
While Section IV-A utilizes existing techniques from reactive synthesis, this section contains the first technical contribution of the paper which is the translation of strategy templates into certified low-level feedback control policies (realizing the pink marked transitions in Fig. 2).
1) Reach-While-Avoid-Objectives: The strategy template ψ I computed in the last step defines, for all Player 0 vertices v, eventually required transitions (contained in H) and (eventually) prohibited transitions (contained in S or D) for strategies that result in a correct closedloop behavior.While the game solving engine assumes that these transitions can be instantaneously enabled (resp.disabled), they actually have to be enforced (resp.prevented) by a suitable actuation of the underlying dynamical system (e.g., the robot).The main observation that we exploit in this step is that the edge constraints for a Player 0 vertex v induced by a strategy template ψ I naturally translate into context-dependent reach-whileavoid objectives for the lower-layer synthesis problem.Definition 4. A context-dependent reach-while-avoid objective (cRWA) is defined as a triple Ω := (κ, R, A) where κ ⊆ AP O is the context, R ∈ 2 AP S is the target set (to be reached) and A ∈ 2 APS is the obstacle set (to be avoided).A control proposition C ∈ AP C is said to implement the reach-while-avoid objective Ω if In practice, the translation of winning strategy templates into reach-while-avoid objectives(i.e., going from 3 to 4 in Fig. 2) is done per vertex v ∈ V 0 (whose label defines the context) and reflects required and prohibited successors as targets and obstacles in the cRWA, respectively.In particular, as the final hybrid controller will make strategic decisions corresponding to exactly one transition, we compute cRWA's per required/allowed transition, while collecting all prohibited successors in the obstacles A of these cRWA's, as formalized next.
Definition 5. Let G be a parity game with game graph G = (V, E, ℓ) and winning strategy template We collect all such cRWA's for the strategy template ψ in the set cRWA(G, ψ).
Intuitively, for such cRWA's, A a consists of the propositions that need to be avoided "always", whereas A e consists of the propositions that need to be avoided "eventually always".This definition is illustrated by the follwing example.
Example 2. Consider the winning strategy template ψ I computed in Example 1 for the parity game given in Fig. 3. From vertex d, strategy template ψ I forces Player 0 to never use edge e df and eventually stop using edge e db .That means, Player 0 has to eventually only use edge e de from vertex d.The labels of the vertices imply that whenever mode M 1 is active and the door is closed, the system "always" has to reach T 1 while avoiding walls W and "eventually always" has to reach T 1 while avoiding both walls W and target 2) Feedback-Control Policies: Within this step, we utilize existing techniques to synthesize a feedbackcontrol policy u : X → U associated to cRWA problem Ω = (κ, R, A) (i.e., going from 4 to 5 in Fig. 2), s.t.all traces generated by solutions of S for u satisfy (6), given that C and κ are true for all t ∈ R + , where C ∈ AP C is a controller proposition that flags that the feedback control policy u is currently applied to S. This part of our controller design strategy comes with unavoidable conservatism.Indeed, it is well-known that very particular cases of the control problems that we tackle here face strong controllability barriers, such as undecidability and NP-hardness (see [39]).For this reason, we rely here on control techniques that are intrinsically conservative, but provide, when they converge, a satisfactory solution.
As an example of such approaches, which fits particularly well with our setting, we utilize existing techniques based on control Lyapunov functions (CLF), as introduced in Section II-A, to define u from an Ω = (κ, R, A).This is achieved by constructing a CLF w : X → R (recall Definition 2) w.r.t. to the target R and enforcing that the basin of attraction X w ⊆ X excludes A, i.e.A ∩ X w = ∅.
We thus have the following definition.
Definition 6.Given the control system S = (X, U, f ), consider a cRWA Ω = (κ, R, A).We say that a CLF w (as in Definition 2) with basin of attraction X w and the corresponding feedback map u w : Section VI-A will discuss a particular technique to synthesize X w and u w realizing a cRWA for particular classes of dynamical systems and state propositions.For any such realization of a cRWA we have the following guarantees on the resulting closed-loop system under a constant context, i.e., w.r.t. a trivial distrubance function Υ := κ ω , which are a direct consequence of Lemma 1 and Definition 6.
Proposition 2. Given the control system S = (X, U, f ) with labelling function L, let Ω = (κ, R, A) be a cRWA and let u w : X w → U be a feedback-control policy induced by a CLF w associated to Ω with basin of attraction X w .Then, for all x ∈ X w and for all solutions ξ x,uw of S, it holds that (i) ξ x,uw (t) ∈ X w for all t ∈ R + , (ii) every trace π ∈ Traces L,Υ (ξ x,uw ) satisfies φ Cw in (6), with C w ∈ AP C being the control proposition associated to w and Υ := κ ω inducing a constant context.
Example 3. Consider the robot example given in Fig. 1 basins of attraction X a and X e , respectively are depicted in Fig. 4.

C. The Bottom-Up Interface
The synthesis procedure from Section IV-B results in a finite set W of CLFs with a finite set U of control policies, such that each control policy u w ∈ U (resulting from a CLF w ∈ W) is equipped with a basin of attraction X w ⊆ X, associated to a given Ω ∈ cRWA(G I , ψ I ) resulting from a particular edge in the high-level synthesis game G I .This implies that whenever w is non-global, i.e., if X w X, the control policy u w cannot be applied anywhere.
Thinking back to the logical strategy computed in Section IV-A, policy u w must be used when its corresponding cRWA Ω for an edge e is "activated" by a logical control strategy, "choosing" the edge e in G I .By constructing the cRWA's for winning edges as defined in Definition 5, we essentially equip the resulting controller with a direct actuation capability of the underlying dynamical system -it must choose between available feedback-control policies.To reflect this change of actuation capabilities in the higher-level game, we introduce a controller proposition C w ∈ AP C for every available feedback-control policy u w which flags that u w ∈ U should be used to actuate S. Further, as every u w is equipped with a basin of attraction X w , the resulting hybrid controller is implementable only if the current continuous state x is in X w We therefore need to track this information in the synthesis game.For this purpose, we introduce a new state proposition X w for every u w ∈ U that flags whether the state is in its basin of attraction, and we define AP + S := AP S ∪ w∈W X w as the set of all state propositions including all additional state propositions X w 's.
The next four steps provide an algorithm that ensures that this information gets translated from the lower to the higher layer in a certified way (realizing the cyan marked transitions in Fig. 2), such that the resulting higher-layer synthesis game allows to synthesize a hybrid controller that solves Problem 1.
1) Changing Actuation Capabilities: As discussed before, in the initial game, the controller can activate/deactivate all state propositions in AP S .However, in order to prepare the high-layer initial game G I from Section IV-A for the incorporation of a refined system model, we need to incorporate the control propositions AP C and make sure that these are the only propositions the controller can choose with its strategy, leading to the desired direct actuation of lower-level feedback control policies.In particular, first, we need to ensure that all state propositions and observation propositions can only be activated/deactivated by the environment player.This is achieved by updating the initial game to a merged game G M (i.e., going from 2 to 6 in Fig. 2) while preserving the parity condition and a one-to-one correspondence between the traces generated by plays in G I and the ones generated by plays in G M .Definition 7. Given an initial game G I = (G I , PARITY(P I )) with game graph • The set of Player 1 vertices is preserved, i.e., This leads to the following lemma.
Lemma 4. Let G I be the parity game constructed from φ over AP as in Proposition 1 and G M its merged version constructed via Definition 7. Then G M is total w.r.t.AP, and every winning play in G M generates a trace which satisfies φ.
for every k ≥ 0, and let π = l 0 l 1 • • • be the trace generated by the play ρ.Then by construction, vertices v 2k also belong to V I 1 with same priority, i.e., P M (v 2k ) = P I (v 2k ) for every k ≥ 0. Furthermore, for every v 2k+1 ∈ V M 0 , there exists a corresponding vertex e 0 f 1 Fig. 5: Corresponding merged game for the initial game given in Fig. 3, where labels of Player 1 vertices are empty sets.
• is a winning play in game G I as maximum priority seen infinitely often in ρ ′ w.r.t.P I is same as the maximum priority seen infinitely often in ρ w.r.t P M .Now, let π ′ = l ′ 0 l ′ 1 • • • be the trace generated by ρ ′ in G I , then by construction of game G I , π ′ satisfies the specification φ.Moreover, since ℓ M (v 2k+2 ) = ∅ for every k ≥ 0, we have, by definition, and hence, π satisfies the specification φ.
Using similar arguments, it can be shown that for every play in G I , there exists a corresponding play in G M that generates the same trace.Hence, as G I is total w.r.t.AP, so is G M .Example 4. Consider the initial game G I given in Fig. 3. Then the resulting merged game G M is depicted in Fig. 5.As shown in the figure, Player 1 vertices, i.e., vertices b, e, f , are preserved with same priorities but empty labels.For every pair of Player 1 vertices connected via a Player 0 vertex in G I , there is a new vertex with label containing all necessary propositions that connects the pair in G M , e.g., for vertex b and f connected via d in G I , the new vertex d 2 containing labels of both d and f connects vertex b and f .Note that we still have not explicitly incorporated the control propositions in the merged game.In the next steps, we will introduce the control propositions that are realizable by low-level feedback control and incorporate them into the high-level game graph.
2) Control Graph Construction: In this step we construct a game graph that captures the interplay of the environment and observation propositions contained in the context κ of a given cRWA (i.e., going from 5 to 7 in Fig. 2) with the newly introduced control and state propositions C w ∈ AP C and X w ∈ AP + S .Intuitively, this graph captures which context changes an application of a particular feedback control policy u w for a CLF w (triggered by C w ) might cause.When composed with the modifided game graph G M from Section IV-C this leads to the lazy refinement of the logical synthesis game discussed earlier, which only includes relevant information about the low-level feedback control loop.
Let us denote the cRWA's for which the CLF w was synthesized by Ω w = (κ w , R w , A w ).Consider AP + S ⊇ AP S the set of all state propositions including all additional state propositions X w 's as defined above, and L + : X → 2 AP + S be an extended version of labelling function L defined by L + (x) = {X ∈ AP + S | x ∈ X }, (and thus, L + (x) ∩ AP S = L(x) for all x ∈ X).Definition 8. Given the control system S := (X, U, f ) with labelling function L + and the set W of all CLFs computed as before, the control game graph S ∪APO is defined as follows.
1) For each CLF w ∈ W, there are two Player 1 vertices in V C 1 , a transition vertex and an invariant vertex, both with label {C w }.

2) For every subset of propositions c ⊆ AP
, then there is an edge to the invariant vertex of w, else, then there is an edge to the transition vertex of w.
The construction of G C via Definition 8 translates some characteristics of the low-level continuous closed loop system captured by Proposition 2 into the higherlayer synthesis game.In addition, it ensures that a logical controller actuating a control policy u w via control proposition C w can only do so if context κ w is true and the continuous system is in the basin of attraction X w (signaled by the system proposition X w being true).These translations can be formalized via LTL formulas which are ensured to hold true on every play over G C as formalized in the next lemma.
Intuitively, given the premises of Lemma 5, equations ( 7)- (10) ensures the following low-level properties on the game graph level.First, (7) ensures that the basin of attaction X w does not have an intersection with the avoid region A w .Next, (8) ensures that the controller C w can only be applied if the system is within the corresponding basin of attaction X w and the context κ w holds.Note that this does not restrict the environment from changing the context right after the feedback control policy associated with C w was applied.Finally, ( 9)- (10) ensures that if the system is within the target region R w (resp.the basin of attaction X w ) and the controller C w is applied, the system cannot leave R w (resp.X w ).
In total, the control game graph G C models all the state proposition sequences generated by a trajectory ξ triggered by the controller policies associated with W as in Proposition 2. Furthermore, it also models the logical disturbances received as inputs via the disturbance function Υ ∈ D. This is formalized by the next lemma which directly follows by item 2-4 of Definition 8.

Lemma 6. Given the premises of Definition 8, one of the following holds for every disturbance function Υ ∈ D
• for some play in G C , its generated trace π satisfies that π| AP O = Υ, or • for some play in G C ending in a Player 0 deadend, its generated trace π satisfies that π| AP O is a prefix of Υ.
Example 5.For the CLFs w a and w e given in Example 3 with basins of attraction X a and X e as shown in Fig. 4, the corresponding control game graph (without the Player 0 dead-ends) is depicted in Fig. 6.As in the figure, the transition vertices of w a and w e are vertices a and c, respectively, and the invariant vertices are vertices g and i, respectively.Note that both CLFs have context {M 1 , D}.Hence, vertices with a label that contains X a or X e but not the propositions M 1 or D are Player 0 dead-ends (no outgoing edges are defined from them).For simplicity, those vertices are not shown in Fig. 6.
While we could now take the product of G C with G M from the previous step in order to obtain the new, refined logical synthesis game, we note that this typically does not lead to a game that actually has a winning strategy.The reason for this lies in the fact that the modification of G I to G M gives the right to trigger state propositions to the environment, i.e., now the controller actuates AP C and gets "notified" by the underlying dynamical systems via a triggering of AP S 's that the actuated controller actually resulted in the (hopefully desired) state proposition change.From a two-player game perspective, the environment could now use its additional power to prevent the robot to reach the target.E.g., in Fig. 6, starting from vertex b, if the controller keeps using the control policy for CLF w e , then the environment can force the play to loop between vertex a and b instead of reaching target T 1 represented by vertex h.This is because the resulting logical game still misses essential information about the low-level closed loop dynamics under a given feedback-control policy.We thus incorporate, in what follows, the information captured by item (ii) of Proposition 2.
3) Persistent Live-Groups: In order to capture item (ii) of Proposition 2 in the logical synthesis game, we construct so called persistent liveness constraints (i.e., going from 5 to 8 in Fig. 2) to annotate the control game graph G C which are inspired by progress groups from [25].
Definition 9. Given a game graph G = (V, E), a persistent live-group is a tuple (S, C, T) consisting of sets S, T ⊆ V and C ⊆ E 0 such that T ⊆ S. The constraints represented by such a persistent live group is expressed by the following LTL formula where ψ CONT (C) := SRC(C) ⇒ C.Moreover, the constraints represented by a set Λ of persistent live-groups is denoted by ψ PERS (Λ) := (S,C,T)∈Λ ψ PERS (S, C, T).
Intuitively, ψ CONT (C) ensures that edges in C are chosen when possible, as this is only possible for Player 0 vertices in S. Furthermore, (11) ensures that persistently choosing the edges in C from the source vertices S will eventually lead us to a vertex in T.
For a CLF w ∈ W, we construct a persistent livegroup (S w , C w , T w ) that captures Proposition 2 in the following way.Given the control graph G C as defined before, and a CLF w ∈ W, first, the persistent activation of C w is captured via the set C w collecting all (Player 0) edges that end in vertices with labeled by C w , i.e., Always choosing an edge from C w will force X w to remain true within the same context κ w , which is captured by the set S w collecting all (Player 0) vertices labeled by X w and propositions in κ w , and all (Player 1) vertices labeled by C w , i.e., Finally, we know that always choosing an edge from C w will eventually lead us to a vertex where R w is true, captured by the set T w collecting all vertices labeled by R w , i.e., Example 6.For example, consider the control game graph shown in Fig. 6 for Example 5.For CLF w 1 of Ω 1 , the corresponding persistent live-group is (S, C, T), where S = {a, b, g, h} corresponds to the region of basin of attraction for w 1 with context κ 1 = {M 1 , D} being true, C = {e ba , e hg } corresponds to the edges that represent using the control policy u w , and T = {h} corresponds to the target region of Ω 1 , i.e., vertices labeled by T 1 .
Given the set W of all CLFs as given before, we collect all the corresponding persistent live-groups for the CLFs in W in the set Λ C .With the persistent livegroup assumptions Λ C , the control game graph G C also ensures that item (ii) of Proposition 2 holds at a higher level as formalized below.
Lemma 7. Let G C be a control graph as in Definition 8 and W a set of CLFs with persistent live-groups (S w , C w , T w ) for all w ∈ W as in (13)- (14).Then a play over G C satisfies ψ PERS (S w , C w , T w ), if and only if its generated trace satisfies Moreover, (15) along with (7)-( 10) ensures that every trace generated by plays in G C satisfying ψ PERS (S w , C w , T w ) also satisfies φ Cw in (6).Conversely, every trace satisfying (6) is generated by a play in G C satisfying ψ PERS (S w , C w , T w ).
• be the trace generated by ρ.By the definition of the persistent live-groups as in ( 12)-( 14), rewriting (11) in terms of propositions gives us that, ρ satisfies ψ PERS (S w , C w , T w ) if and only if trace π satisfies (15).Furthermore, by Lemma 5, the trace π also satisfies ( 7)- (10).Now, suppose π satisfied (15), then we need to show that π also satisfies φ Cw in (6).It suffices to show that for every k ≥ 0, the trace π k = l k l k+1 • • • satisfies the following: Suppose π k satisfies (C w ∧ κ w ).Then, every j ≥ k, l j satisfies C w , which implies, by (8), l j also satisfies X w .Moreover, by (7), l j also satisfies ¬A w for each j ≥ 0. Therefore, trace π k satisfies both (C w ∧ κ w ∧ X w ) and ¬A w , which then implies, by (15), π k also satisfies ♦R w .That means, there exists m ≥ k such that l m satisfies R w .As l m also satisfies C w , by (9), l m+1 satisfies R w .Using the same argument inductively, we can show that l i satisfies R w for all i ≥ m.Therefore, π k satisfies both ♦ R w and ¬A w .Conversely, suppose π satisfies ( 6), then we need to show that ρ satisfies ψ PERS (S w , C w , T w ).It is enough to show that π satisfies (15), which trivially follows from (6).

4) Final Augmented Parity Game:
Given the three ingredients from the last steps, we are now ready to construct the final augmented (parity) game (i.e., going from 6 , 7 , 8 to 9 in Fig. 2) which serves a new logical synthesis game for the final hybrid controller and is defined next.Definition 10.An augmented game G is a tuple (G, φ, Λ) consisting of a game graph G, a set of persistent live-groups Λ over G and an LTL specification φ.Moreover, an augmented game (G, φ, Λ) is equivalent to the game (G, ψ PERS (Λ) ⇒ φ).
Let us now describe how the final augmented parity game, i.e., an augmented game with parity specification, is constructed.Recall that V M i and V C i are the vertices of Player i in game graph G M and G C , respectively.Definition 11.Given the merged game G M , control game graph G C , and persistent live-groups Λ C as computed before, the final augmented parity game G F = (G F , PARITY(P F ), Λ F ) with G F = (V F , E F , ℓ F ) is constructed by taking the product of the game G M and the tuple (G C , Λ C ) as follows: As the priority function P F is defined by the priority function P M of the merged game G M and every winning play in G F satisfying ψ PERS (Λ F ) needs to satisfy the parity condition PARITY(P F ), the next proposition directly follows from Lemma 4. Proposition 3. Given the LTL specification φ, initial game G I , and the final game G F with persistent livegroups Λ F as in Definition 11, suppose π be a trace generated by a winning play satisfying ψ PERS (Λ F ) in G F , then π satisfies the specification φ.

D. Solving the Final Augmented Game
As discussed in Section IV-A, the initial game G I allowed the system to instantaneously activate or deac- Fig. 7: The interconnection between the control system and the hybrid system H σ F defined in Definition 12 tivate all state propositions in AP S .However, this was no longer possible in the merged game G M .But, in the final game G F , the persistent live-groups, using the results described in Lemma 7, enable the system to activate or deactivate specific state propositions which are ensured to become eventually true (using the associated feedback-control policy) if no external context change is induced.
The next obvious step of our synthesis procedure is to solve the final augmented game G F , i.e., to compute a winning strategy in this game (realizing the violet marked transitions in Fig. 2, i.e., going from 9 to 10 ).Based on the observation made in Definition 10 that an augmented game (G, φ, Λ) is equivalent to the game (G, ψ PERS (Λ) ⇒ φ) one can use standard game solving techniques for this purpose.This, however, usually results in computationally intractable problems.We will therefore provide a new algorithm for solving augmented parity games, in the subsequent Section V, which has a similar algorithmic structure and therefore also similar worst-case time complexity as the standard algorithm for solving classical (non-augmented parity) games and therefore allows for a computationally tractable solution.
For the time being, we assume that we have solved G F , i.e., we have computed a winning region V F win ⊆ V F and a winning strategy σ F : V F 0 → V F 1 s.t.all resulting traces satisfy φ due to Proposition 3.

E. Constructing the Hybrid Controller
Given a winning region V F win ⊆ V F and a winning strategy σ F : V F 0 → V F 1 , we now construct a set of initial winning conditions X win ⊆ X and a hybrid feedback control policy p : R + × X × D → U (as in Definition 3) to solve Problem 1 (realizing the orange marked transitions in Fig. 2, i.e., going from 10 to 11 ).
We first observe that the winning region V F win ⊆ V F naturally translates into a set of initial winning condi-tions X win via the labeling function L + s.t.
(16) In order to translate the winning strategy σ F : V F 0 → V F 1 into a hybrid control policy p we take a two-step approach.We first construct a map Γ which uses σ F to translate the history of a continuous curve ζ : R + → X and a disturbance function Υ : R + → 2 AP O into a piecewise constant function ν : R + → V F 1 of Player 1 vertices of G F .The hybrid controller p then translates each vertex ν(t) ∈ V F 1 into the feedback control policy u w : X → U associated with its (unique) label6 ℓ F (ν(t)) = C w ∈ AP C , which is a single control proposition by construction of G F .This control policy u w is then applied to S via f .This is illustrated in Fig. 7 and formalized in the following definition.Definition 12. Let S = (X, U, f ) be a control system with labelling function L + and W the set of all CLFs.Consider σ F : V F 0 → V F 1 a winning strategy over the final game G F , a continuous curve ζ : R + → X and a disturbance function Υ : R + → 2 APO .Then the map Intuitively, Definition 12 models the fact that the logical layer of the hybrid controller (modelled by the game) might actuate a change in the low-level feedback control policy only when the context changes.This context change can either be induced externally (when Υ has a discontinuity point, i.e., the observation proposition changes) or when L + (ζ(t)) changes, i.e., the underlying system dynamics causing state propositions to change.Both is detected by a discontinuity point in L + (ζ(t)) ∪ Υ(t).At these triggering points (and only then), the map Γ σ F mimics the move of the winning strategy σ F by moving to the environment vertex v selected by σ F in G F while respecting the current context.
We emphasize that the definition of the map Γ σ F ,ζ,Υ is actually causal.It only uses the information from the past of ζ and Υ up to time point t − to compute ν(t).This implies that we can actually use it online to dynamically generate the signal ν from the past observations of a state trajectory ξ and the past logical disturbances Υ, as depicted in Fig. 7. As, in this context, the state trajectory ξ is not known a priory, we slightly abuse notation and refer to Γ σ F ,ζ,Υ as Γ σ F ,x,Υ , where x is the starting point of ξ.
With this slight notation overload, we can define the final closed loop system as follows.
Definition 13.Given the premises of Definition 12, the final closed loop system is given by where p(x(t), ν(t)) := u w (x) ∈ U and ν(t) is dynamically generated via Γ σ F ,x,Υ by interpreting (the past of) a solution ξ x,p,Υ : R + → X of ( 17) under p and Υ, with starting point x ∈ X, as (the past of) ζ in Definition 12.
This leads to the main result of this section establishing the correctness of our synthesis procedure.
Theorem 1.Consider a control system S = (X, U, f ) with labelling function L, an LTL specification φ over the predicates AP S ∪ AP O .Consider the final game G F , W the set of all CLFs, L + the extended labelling function, the winning region V F win and winning strategy Then x ∈ X win as in (16) and p as in Definition 13 solve Problem 1.

The proof of Theorem 1 combines all correctness results established in Section IV-A-Section IV-D.
Proof.Since the plays ending in Player 0 dead-ends are not winning in a game and σ F is a winning strategy in G F , no σ F -play ends in a Player 0 dead-end.Then, by Lemma 4 and Lemma 6, all possible changes in L + (triggered by applying control policies associated with W) and Υ are captured by the game graph G C .In particular, every solution ξ x,p,Υ corresponds to a play that every change in L + and Υ corresponds to a move by Player 1 to a vertex with corresponding label in ρ.Furthermore, as x ∈ X win (V F win ) we have v 0 ∈ V F win .Moreover, by Definition 12, ρ is a σ F -play starting from the winning region V F win of game G F .So, ρ is a winning play, and hence, it always stays in V F win .This implies, ξ x,p,Υ (t) also belongs to X win (V F win ) for all t ∈ R + .
By the discussed correspondence between ξ x,p,Υ and play ρ, a trace π generated by ξ x,p,Υ under L is also the trace generated by the play ρ.Furthermore, every play in G F corresponds to a play in the control graph G C as in Definition 8.Moreover, by Proposition 2, π satisfies (6).Then by Lemma 7 and Definition 10, π is generated by a play in G F satisfying ψ PERS (Λ F ). Hence, ρ satisfies ψ PERS (Λ F ).Moreover, as ρ is a winning play in G F , by Proposition 3, trace π satisfies the specification φ.

V. SYNTHESIS DETAILS: HIGH-LAYER
The previous section described our synthesis framework and established its ability to solve Problem 1 in Theorem 1.The main hypotheses in this statement are the existence of 1) a winning strategy for the final game G F , and 2) a CLF w for each cRWA.Within this section we give a novel algorithm to efficiently solving augmented parity games constructed in Section IV-C, thus tackling the first point.The second hypothesis is treated in subsequent Section VI, which presents the construction of feedback control policies implementing cRWA via CLFs used in Section IV-B, together with the proof of the well-posedness of the arising closed loop (17).

A. Augmented Reachability Games
While an augmented parity game can be reduced to a Rabin game (by transforming each persistent groupliveness constraints into an additional Rabin pair) and then solving the resulting Rabin game using classical algorithms [40], this method is computationally not tractable.This is due to the fact that existing algorithms are known to become intractable very quickly if the number of Rabin pairs grows.Therefore, we leverage the recent insight that local liveness constraints on the environment player typically fall into a class of synthesis problems that allow for an efficient direct synthesis procedure [25], [41].The augmented games we consider are similar to the ones discussed by Sun et al. [25].We, however, provide a novel algorithm that tackles the full class of parity games and thereby subsumes the restricted problem class considered in [25].
The practically most efficient known algorithm to solve classical (non-augmented) parity games is Zielonka's algorithm [38].This algorithm recursively solves reachability games for both players to compute a winning region and a winning strategy of the controller player in the original parity game.In order to mimic Zielonka's algorithm for augmented games, we first discuss an algorithm to solve augmented reachability games.From this, our new algorithm essentially follows as a corollary.
An augmented reachability game is a tuple G = (G, φ, Λ) where the specification φ = ♦T is to finally reach a set T ⊆ V of target vertices.The new recursive algorithm that solves an augmented reachability game G Algorithm 1 SOLVEREACH(G, T, Λ) Require: An augmented game G = (G, φ, Λ) with φ = ♦T Ensure: Winning region and winning strategy in the augmented game G 1: Initialize a random Player 0 strategy σ return (C, σ) 13: return A, σ is given in Algorithm 1.The main idea of the algorithm is to first compute the set of vertices A from which Player 0 can reach T even without the help of any persistent live-group constraints (line 2) along with the corresponding strategy σ for Player 0 (line 3).Afterwards, the algorithm computes the set of states B from which Player 0 has a strategy (i.e.σ B ) to reach A with the help of a persistent live-group (lines 5-7).If this set B enlarges the winning state set A (line 8), we use recursion to solve another augmented reachability game with target T := A ∪ B (line 12).
Within Algorithm 1, we use the following notation.Given a game graph G = (V, E) and a persistent livegroup (S, C, T), we write G| C to denote the restricted game graph (V, E ′ ) such that E ′ ⊆ E and for every edge e = (v ′ , v) ∈ E ′ , either e ∈ C or there is no edge in C starting from v ′ .Furthermore, pre(T ) ⊆ V is the set of vertices from which there is an edge to T .
For a set T of vertices, the attractor function ATTR i (G, T ) solves the (non-augmented) reachability game (G, ♦T ).I.e., it returns the attractor set A := attr i (G, T ) ⊆ V and a attractor strategy σ A of Player i. Intuitively, A collects all vertices from which Player i has a strategy (i.e., σ A ) to force every play starting in A to visit T in a finite number of steps.Moreover, the function SOLVE(G, φ) returns the winning region and a winning strategy in a game (G, φ) with φ = ♦A ∨ ¬T for some A, T ⊆ V .Both the functions ATTR and SOLVE solve classical synthesis problems with standard algorithms (see e.g.[42]).For the sake of a complete prove we note that SOLVE can be implemented using the following remark.
Remark 2. Given a game G = (G = (V, E), φ) where φ = ♦A ∨ S for some A, S ⊆ V , one can reduce the game to a smaller safety game (G ′ , φ ′ = S ′ ), where S ′ = S∪{v A } and G ′ is the game graph obtained from G by merging all vertices in A to a single new sink vertex v A , i.e., all incoming edges to A are retained but v A has only one outgoing edge that is (v A , v A ).In such a game, the winning region is V \ attr 1 (G ′ , V \ S ′ ), see [42].
With this, we can prove the correctness of Algorithm 1. Proof.Suppose V win be the winning region in the augmented game G. Using induction on the number of times SOLVEREACH(•) is called, we show that the set returned by the algorithm is indeed V win , and the updated strategy σ returned by the algorithm is a winning strategy in G.
Base case:: If SOLVEREACH(•) is never called, i.e., the algorithm returned (A, σ) in line 13.Hence, we need to show that A = V win .
First, let us show that A ⊆ V win .By the definition of attractor function ATTR 0 (G, T ), every σ A -play from A eventually visits T , and hence, satisfies φ (which is stronger than ψ PERS (Λ) ⇒ φ).Therefore, every vertex in A is trivially winning in G, and hence, A ⊆ V win .Now, for the other direction, suppose v be a vertex such that v ∈ A. It is enough to show that v ∈ V win .As v ∈ A = attr 0 (G, T ), Player 0 can not force the plays to visit T .If q ∈ S for every (S, C, T) ∈ Λ, then the persistent group-liveness constraints are not relevant for vertex v. Now, suppose v ∈ S for some (S, C, T) ∈ Λ.As the algorithm did not reach line 12, for every persistent live-group, one of the conditional statements, the one in line 5 or the one in line 8, is not satisfied.If the statement in line 5 is not satisfied, i.e., (S \ A) ∩ pre(A) = ∅, then there is no edge from S\A to A, and hence, this persistent live-group constraint does not help in reaching A from V \ A anyway.
Next, if the statement in line 5 is not satisfied, then it holds that B ⊆ A. Hence, v ∈ B. As B is the winning region for game (G| C , φ B ) and such a game is determined [42], Player 1 has a strategy σ 1 such that every σ 1 -play in this game starting from v satisfies ¬φ B = ¬A ∧ ♦(T ∪ V \ S).Therefore, every σ 1 -play trivially satisfies ψ PERS (S, C, T) without ever reaching A. Hence, if Player 1 sticks to strategy σ 1 , Player 0 can not make the plays from v visit A ⊇ T using this constraint.Therefore, in any case, Player 0 has no strategy that can enforce a play from v to satisfy ψ PERS (Λ) ⇒ ♦T .Hence, v ∈ V win .Now, let us show that the returned strategy σ is indeed a winning strategy in G.As σ A is the attractor strategy to reach T , line 3, it is easy to verify that every σplay starting from A \ T eventually visits T , and hence satisfies φ.Therefore, every σ-play from A is winning.
Induction case:: Suppose the algorithm returned (C, σ) in line 12 for some (S, C, T) ∈ Λ.By induction hypothesis, C is the winning region and σ C is a winning strategy in the augmented game First, let us show that V win ⊆ C. By the definition of attractor set attr 0 (G, •), it is easy to see that T ⊆ A. So, every play in G satisfies ♦T ⇒ ♦(A ∪ B).Therefore, a winning play in augmented game (G, T, Λ) is also winning in augmented game (G, A ∪ B, Λ).Therefore, V win ⊆ C. Now, for the other direction, let us first show that B ⊆ V win .As σ B is a winning strategy in game G B , every σ B -play ρ starting in B satisfies φ B .By definition of φ B , either ρ satisfies ♦A or it satisfies (S \ T).Furthermore, as ρ is a play in G| C , it satisfies (S ∧ ψ CONT (C)).Hence, if ρ satisfies ψ PERS (S, C, T), then it also satisfies ♦T.Therefore, ρ can not satisfy both ψ PERS (S, C, T) and (S \ T).As a consequence, ρ satisfies ψ PERS (S, C, T) ⇒ ♦A.Furthermore, as we know, A ⊆ V win .Therefore, ρ satisfies ♦A ⇒ ♦V win , and hence, satisfies ψ PERS (S, C, T) ⇒ ♦V win .So, every σ Bplay starting in B satisfies ψ PERS (Λ) ⇒ ♦V win .Then, one can construct a Player 0 strategy σ 0 (i.e., the one that uses σ B until the play reaches the winning region V win of game G, and then switches to a winning strategy of game G) such that every σ 0 -play starting in B satisfies the following and hence, satisfies ψ PERS (Λ) ⇒ ♦T .Therefore, B ⊆ V win .Now, let us the other direction for induction case, i.e., C ⊆ V win .As B ⊆ V win and A ⊆ V win as proven by the arguments given in base case, it holds that A ∪ B ⊆ V win .So, every play in G satisfies ♦(A ∪ B) ⇒ ♦V win .Furthermore, as σ C is a winning strategy in game G C , every σ C -play starting in C satisfies ψ PERS (Λ) ⇒ ♦(A ∪ B), and hence, satisfies ψ PERS (Λ) ⇒ ♦V win .Then, as in the last paragraph, one can construct a Player 0 strategy σ 0 (i.e., the one that uses σ C until the play reaches the winning region V win of game G, and then switches to a winning strategy of game G) such that every σ 0 -play starting in C satisfies the following Hence, every σ 0 -play starting in C satisfies ψ PERS (Λ) ⇒ ♦T .Therefore, C ⊆ V win .Now, let us show that the returned strategy σ in Algorithm 1 is also a winning strategy in game G.As σ is follows strategy σ C for vertices in C \ (A ∪ B), every σ-play from C \ (A ∪ B) eventually visits A ∪ B when ψ PERS (Λ) holds.Now, let σ M be the updated strategy until line 9.Then, from line 3,9, it is easy to see that σ(v) = σ M (v) for every vertex v in A ∪ B. As σ B is a winning strategy in game G B , using line 9 and the discussion above, every σ-play from B \ A eventually visits A when ψ PERS (Λ) holds.Then, using arguments of base case, every σ-play from A \ T eventually visits T .Therefore, in total, as σ is a strategy, every σ-play from C eventually visits T when ψ PERS (Λ) holds.Hence, σ is indeed a winning strategy in game G.

B. Augmented Parity Games
Zielonka's algorithm [38] solves classical parity games by recursively using attractor functions ATTR 0 (G, T ) and ATTR 1 (G, T ).The only difference between the attractor function ATTR 0 (G, T ) and our new function SOLVEREACH(G, T, Λ) from Algorithm 1 is the utilization of augmented live groups to solve reachability games.To solve an augmented parity game (G, φ, Λ), one can therefore simply replace every use of ATTR 0 (G, T ) with SOLVEREACH(G, T, Λ) within Zielonka's algorithm.Due to Theorem 2, the resulting algorithm correctly solves augmented parity games and returns a strategy, summarized in the following corollary.

VI. SYNTHESIS DETAILS: LOW-LEVEL
This section illustrates an efficient and flexible numerical method to design CLFs which can then be used to design feedback-control policies via Lemma 1.We show that the arising closed-loop exhibits existence of solutions from every feasible initial point and we discuss boundedness of solutions.

A. Synthesis of Control Policies from cRWAs
It is well-known that the problem of synthesizing CLFs (in the sense of Section IV-B) for general nonlinear control systems (as in Definition 1) over a generic state space X ⊆ R nx solving a generic cRWA problem Ω = (κ, R, A) is numerically intractable [39].For this reason, particular characteristics of the system and its dynamics need to be exploited for tractability.In this section, we therefore restrict the discussion to systems with affine dynamics, as mature computational solutions exist for this systems class.In particular, we present a novel approach to controller synthesis for cRWA problems over affine dynamical systems, by means of semidefinite optimization, considering a class of quadratic control Lyapunov functions.
While this only gives a construction for the top-down interface in Section IV-B for affine dynamical systems, we note that our overall hybrid controller synthesis approach discussed in Section IV and summarized in Fig. 2 can be applied to any dynamical system for which the generated cRWA problem can be solved.In particular, recent optimization-based approaches for enforcing logical constraints on more general nonlinear systems (see, e.g.[21], [19], [31]) can be utilized.We leave the integration of these methods into our synthesis framework for future work.
Assumption 1.The control system S = (X, U, f ) has affine dynamics of the form for some A ∈ R nx×nx , B ∈ R nx×nu and g ∈ R nx .Moreover, we suppose that the input space is a convex polytope, i.e.U = H(p U , H U ) := {x ∈ R nx : H ⊤ U (x − p U ) ≤ c 1}, for some h U and H U of appropriate dimensions.
In addition, we restrict the shape of the state-space regions linked to state propositions AP S .Assumption 2. Given a state proposition T ∈ AP S its corresponding state-space region is either ellipsoidal of the type E(q, S) = {x ∈ R nx : (x − q) ⊤ S(x − q) ≤ 1} or a convex polytope H(p, H) = {x ∈ R nx : H ⊤ (x − p) ≤ c 1}, where S ∈ R nx×nx is a symmetric positive semidefinite matrix, q, p ∈ R nx are vectors and H ∈ R nx×m .Under these assumptions, instead of searching for control Lyapunov functions all over the set of C 1 functions, we restrict our search to quadratic functions of the form where x c ∈ X is the center of w and P ∈ R nx×nx , P ≻ 0.
Inspired by the results in [20], we present a method to design a CLF w(x) in the form of ( 19) associated with a cRWA problem Ω = (κ, R, A) (as in Definition 6) in three steps: (A) Find x c such that R ⊂ L(x c ) and A ∩ L(x c ) = ∅.(B) Find a safe set S ⊆ X such that x c ∈ S and A ∩ L(x) = ∅ for all x ∈ S. (C) Construct a CLF w such that its basin of attraction is safe, i.e., X w ⊆ S.
These steps must be performed with awareness of the context κ and the changes that it causes in the continuous state space.First, Item (A) is a necessary condition for the existence of a CLF that generates a feasible controller for Ω.However, given that the set difference between the convex regions where R and A hold is potentially non-convex, checking whether such x c exists is a very difficult problem.To avoid resorting to global optimization strategies such as branch-and-bound algorithms, we introduce another assumption.
Assumption 3. Given a cRWA problem Ω = (κ, R, A), for all x ∈ X such that R ⊂ L(x) we have x / ∈ E A , where E A ⊂ 2 X is an ellipsoidal regions associated with a proposition in A.
Assumption 3 requires that any ellipsoidal set that is to be avoided in Ω does not intersect the region associated to R, i.e. the region to be reached.In prctice, if it is not the case, one can replace ellipsoidal obstacles by polytopic over-approximations.Lemma 8.A point x c satisfying Item (A) exists if the following optimization problem is feasible: where E R and P R are respectively the set of ellipsoids and polytopes associated with propositions in R while P A is the set of polytopic sets associated with propositions in A.
Proof.Applying the Schur Complement Lemma [43, p. 7], (21) becomes exactly the definition of an ellipsoid E(q r , S r ).The condition (23) ensures that A ∩ L(x c ) = ∅.Finally, (24) enforces that x c is a stationary point for the system under a constant input u c .This last condition can be handled directly by semidefinite programs whenever U is also a polytope, i.e., U = H(p U , H U ).
To find a safe set S as required in Item (B), we shall search for the largest ellipsoid E(x c , P S ) centered at x c and shaped through P S ∈ R nx×nx .
where ρ i = x ⊤ c P S x c + β i q ⊤ a P a q a − 1 − β i and α(h) = (1 + h ⊤ (p a − x c )) 2 and cols(H a ) denotes the set of column vectors of H a .
Proof.Note that ( 26) is an application of the Sprocedure [43, p. 23], ensuring that x / ∈ E(q a , P a ) for all x such that x ∈ E(x c , P S ).On the other hand, (27) ensures that all polytopes in P A have at least one hyperplane on their boundaries that separates them from the safe set S. Indeed, we can prove the following statement: For given polytope H(p, H) and ellipsoid E(q, S), if there is h ∈ cols(H) such that (1+h ⊤ (p−q)) 2 S ≻ hh ⊤ , we have H(p, H) ∩ E(q, S) = ∅.Indeed, since H(p, H) and E(q, S) are convex sets, the intersection H(p, H) ∩ E(q, S) is empty if there exists one column h ∈ R nx of H such that This inequality defines a separating hyperplane between E(q, S) and H(p, H), since h ⊤ (x − p) ≤ 1 for all x ∈ H(p, H), by definition.Since q ∈ E(q, S) we have h ⊤ (q − p) > 1, and we can rewrite (28) as (1 + h ⊤ (p − q)) −1 h ⊤ (x − q) < 1, for all x ∈ E(q, S).Also, since q ∈ R nx is the center of E(q, S), this ellipsoid is contained also in the hyperplane defined by (1 + h ⊤ (p − q)) −1 h ⊤ (x − q) > −1, and thus we have , for all x ∈ E(q, S).Thus ( 28) is equivalent to for all x ∈ E(q, S).This, by definition, holds if and only if (1 + h ⊤ (p − q)) 2 S ≻ hh ⊤ , concluding the proof.
Finally, having the safe set S = E(x c , P S ) fully determined, we can proceed with constructing the CLF and extracting feedback control policies from them, as required by Item (C).We summarize our sufficient conditions in the following statement.
Lemma 10.Suppose that the following semidefinite program, for a given decay rate ρ > 0, is feasible: Then, defining P = Z −1 and K = Y P , for the CLF defined by w(x) := (x−x c ) ⊤ P (x−x c ) and the surrogate controller u(x) := K(x − x c ) + u 0 it holds that 1) u(x) ∈ U for all x ∈ X w , 2) ∇w(x), f (x, u(x)) ≤ −ρw(x), for all x ∈ X w .In particular, the function w satisfies conditions in Item (C).
Proof.First, (30) ensures safety as, inverting both sides of the inequality implies that X w (1) = E(x c , P ) ⊂ S. Then (31) ensures the descent condition (4).Condition (32) implies that u(x) ∈ U = H(h U , H U ) for all x ∈ X w (1).To show that, consider a h U ∈ cols(H U ) and multiplying the first line and column of the matrix in (32) by P and apply the Schur Complement Lemma.The result is the equivalent matrix inequality Multiplying it to the right by (x − x c ) and to the left by (x − x c ) ⊤ while using the assumption that x ∈ X w (1) = E(x c , P ) yields

By definition, this inequality being fulfilled for all
Putting Lemmas 8, 9 and 10 together, it can be seen that the controller u(x) constructed in Lemma 10 is a feedback control policy satisfying Lemma 1, and hence also Proposition 2.
After providing all details on the synthesis of a hybrid controller solving Problem 1, we now discuss two additional issues in the correctness of this controller, which are not captured by Proposition 3.

B. Existence of Solutions
In our statement of Problem 1 and in the control technique formalized and summarized in Theorem 1 we state that any (trace of) solution of the closed loop system (17) satisfies the considered LTL specification.However, we did not provide a well-posedness result establishing existence of solutions for (17), for any initial condition and any external logical perturbation.Indeed, it is known that closed-loop feedback systems with statedependent piecewise-defined control input may exhibit pathological behaviors, such as chattering and sliding modes [44], [45], [46].
In what follows, we thus prove the existence of solutions, in the case studied in Section VI-A.
Proposition 4. Consider a control system S = (X, U, f ) with labelling function L, an LTL specification φ over the predicates AP S ∪ AP O , the final game G F and a winning strategy σ F : V F 0 → V F 1 .Suppose that Assumptions 1, 2 and 3 hold, and that the set of required CLFs W is build following the procedure introduced in Subsection VI-A.For every x ∈ X win , there exists a solution ξ x,p,Υ : R + → X to (17) starting at x, in the sense of Definition 3.
Proof.First, we recall that by Assumptions 2 and 3 and by construction, any state proposition AP + S is associated to a compact (ellipsoidal or polyhedral) subset of X.The closed loop (17), under Assumption 1 can be compactly rewritten as for all x ∈ R n and all C w ∈ AP C , for some K w , x cw and u 0w of appropriate dimensions, recall Lemma 10.Thus, the time-varying vector field G : R + × X → R nx is discontinuous in t, and recalling Definition 12, the discontinuity points are contained in the sequence of discontinuity points of L + (ξ x,p,Υ (•))∪Υ(•).We have to show that this sequence has no accumulation point, thus ruling out the so-called Zeno phenomenon, see [45].Since Υ ∈ D by assumption is piecewise constant, we have to check the behavior of discontinuities of L + (ξ x,p,Υ (•)), given a fixed context κ ⊆ AP O .By construction, these discontinuities can occur only if ξ x,p,Υ (•) lies at the boundaries of the regions of attraction of the CLFs w ∈ W, with w associated to a cRWA with context κ, i.e. the CLFs that can be activated at that instant of time.For the boundaries of these region of attractions, the vector field G satisfies a tranversability condition n(x) ⊤ G(t, x) < 0, where n(x) is the normal vector to the ellipsoid X w in x, i.e. the vector field is "pointing inward" the set X w .This follows by Item 2) in Lemma 10.This fact, also called patchy vector field property is a sufficient condition to ensure existence of solutions (in the sense of Definition 3), as proven in [47, Proposition 3.1], to which we refer for the details.The completeness of solutions, i.e. the fact that any solution is well-defined on the whole positive real line R + , follows by the fact that, as proven in Theorem 1, by Definition 12, a winning play ρ always stays in V F win .This implies, ξ x,p,Υ (t) also belongs to X win (V F win ) for all t ∈ R + , concluding the proof.
For a more detailed discussion regarding (properties of) solutions of discontinuous differential equations and hybrid systems, we refer to [44], [45], [46].

C. Preventing Instability
As said, since the external environment can change at any instant of time, the closed loop system (17) exhibits hybrid behavior.This may lead to undesired phenomena on infinite horizons, as we highlight in the following simple example.
Example 7. Consider a control system of the form S := (R nx , U, f ), and two compact target sets T 1 , T 2 ⊂ R nx such that T 1 ∩ T 2 = ∅, and consider AP S = {T 1 , T 2 }.We consider the following desired mode-target game specification (for an overview on mode-target games, see [48]): (33) where M 1 , M 2 ∈ AP O are the input atomic propositions representing the modes activated by the external environment.Suppose to have global CLFs w 1 , w 2 : R nx → R with respect to the target T 1 , T 2 , in the sense of Definition 2, and consider continuous u i : R nx → R nu satisfying (5) globally in R nx \X w (c), for any i ∈ {1, 2}.This provides a winning strategy for the game arising from (33): we activate the feedback law u i when the mode M i is active.Now consider the disturbance function Υ : R + → AP O modeling the environment behavior.
Then the resulting hybrid closed-loop system can be written as ẋ(t) = g(x(t), Υ(t)) (34) where g(x, M i ) := f (x, u i (x)) for i ∈ {1, 2}.Systems of the form (34) are known as switched systems, and have been intensively studied in recent years (see [49], [45] for an overview).It is well-known that, even if the targets T 1 , T 2 are asymptotically stable for the corresponding subsystems, the external disturbance Υ : R + → AP O can produce unbounded solutions for some initial condition x ∈ R nx , which is undesired in many contexts, see for example [49, Chapter 1].
There are many possible approaches to overcome the instability problem discussed in Example 7. Here, we informally highlight two of them.
First, consider a control system S = (X, U, f ) and an LTL specification φ over AP S ∪ AP O .Suppose that the problem is global i.e., X = R nx .Consider a large enough compact set C ⊂ R nx such that X ⊂ int(C) for all X ∈ AP S .Consider its boundary ∂C, add ∂C ∈ AP S (intuitively, a large enough "wall"), and consider a "new" specification φ ′ defined by φ ′ = φ ∧ ¬∂C.Thus, paying the price of considering a more "convoluted" specification, we force, on the logical level, the solutions of S to stay in the compact set C.
Second, suppose that the environment, while being unpredictable, does satisfy some assumptions on the frequency of its decisions.More formally, suppose there exists a dwell-time τ > 0, such that, if t ∈ R + is a discontinuity point of the disturbance function Υ (i.e. an instant at which the external environment changes), we suppose that Υ(s) = Υ(t), ∀s ∈ [t, t + τ ).It is wellknown that, if all the subsystems are asymptotically stable, a large enough dwell-time will ensure boundedness of solution of the switched system (34).The technical details are not reported here, we refer to [49,Section 3.2].
While the above-mentioned approaches can provide a simple stability guarantee to the hybrid-closed loop system arising from our design method, we point out that the formal study of stability/instability phenomena induced by LTL-based control is a largely open future research direction.

VII. EXPERIMENTAL RESULTS
In this section, we demonstrate the proposed techniques on an example.We consider the mode-target based example introduced in Section I-A in a 2-D space.The state space for the example is constrained to the box [0, 10]×[0, 10], and the three target regions T 1 , T 2 , and T 3 are ellipsoidal balls of radius 0.2 located at co-ordinates (3,4), (3,6), and (5, 5), respectively.The sliding door is a vertical line from (4, 0) to (4,10).The considered dynamical model for the motion of the robot is of the form introduced in Assumption 1, with a 2-dimensional input space.
We used our proposed techniques to solve Problem 1 for this example.All computations were done on a MacBook Pro 2.5GHz with 16GB RAM.We started by constructing the initial game G I from specification φ, as given in Example 1.The initial game G I has 51 vertices and 182 edges, which was constructed in 0.042 seconds.Next, we computed a strategy template for the initial game, and then, we translated this strategy template into several reach-while-avoid problems which took 0.007 seconds.Next, we constructed the control game graph G C with 159 vertices and 1704 edges in 6.13 seconds.Next, we constructed the final augmented game G F with 826 vertices and 17604 edges in 0.652 seconds.Finally, we solved the final game to compute a winning strategy in 112.495 seconds which is used as a hybrid controller in the state space.In total, our algorithm took 120 seconds to solve Problem 1 for this example.
Furthermore, we also conducted a simulation 7 of this example that uses the hybrid controller computed by our algorithm.A screenshot from the simulation video at 16.30s is shown in Fig. 8.The left part of the figures describes the continuous state-space, where we have three targets, i.e., T 1 as an red colored dot (blurred), T 2 as a green colored dot (blurred), and T 3 as a blue colored dot, the robot as a black dot in motion, and two basins of attraction per each target represented by the ellipsoids around the target.The smaller ellipsoids, i.e., green, red, blue colored ones around T 2 , T 1 , T 3 , respectively, are basins of attractions for the corresponding targets when the door is closed whereas the bigger gray ones are basins of attractions for the corresponding targets when the door is open.Moreover, this left part also describes the current state of the system.As we can see, the highlighted blue-colored target T 3 indicates that currently mode M 3 is active, the thick black line in the middle indicates that the door is closed, and the movement of the black dot from location of T 2 towards T 1 indicates that the robot is currently moving from target T 2 to T 1 .Furthermore, the upper-right part of the figure describes the current state of the play in the final augmented game.Currently, the play in the game is looping between vertex 25 and vertex 144.The label of the edge from environment player's vertex (i.e., vertex 25) indicates that the robot is currently inside the intersection of the basins of attraction X 1 and X 2 , and currently the door is closed and mode M 3 is active.Furthermore, the label of the edge from controller player's vertex (i.e., vertex 144) indicates that currently control policy associated with C 1 is being applied persistently.Intuitively, as mode M 3 is active, the robot needs to reach target T 3 , and since the door is closed, the robot first need to visit target T 1 in order to open the door.Specifically, in the video, the trajectory from 16.00s to 17.00s where the mode M 3 remains consistently active can be described as follows: initially, at 16.00s, the robot was positioned at target T 2 with the door closed.Subsequently, the robot moves towards target T 1 , as depicted in the screenshot shown in Fig. 8.At 16.60s, the robot reaches T 1 , resulting in the door opening.Following that, the robot proceeds towards target T 3 and successfully arrives at the target by 17.00s.
Returning to Fig. 8, the lower-right part of the figure presents the time-responses of the two components of the control input, namely u 1 and u 2 , which emerge from the hybrid feedback control policy defined in Subsection IV-E.

VIII. CONCLUSION
In this paper we proposed a method to synthesize feedback controllers for continuous-time systems, in order to fulfill general LTL specifications.We presented our main algorithm, which, on the logical level, aims to rewrite the general problem in the form of an augmented parity game.In order to efficiently perform our proposed method, a new solving algorithm for augmented games is proposed.On the continuous state-space level, the winning strategy is implemented via a control Lyapunov functions approach, which provides a natural and flexible feedback design for a large class of dynamical systems.
We believe that our work paves the way towards a new generation of symbolic controllers, where formal guarantees are still available, thanks to rigorous techniques both at the logical and dynamics levels; however with satisfactory scalability performances, because the (timeand space-) discretizations are computed endogenously, in an event-triggered philosophy.As further directions of research, we plan to extend our approach to more general logical/dynamical systems settings and to formally investigate and improve both numerical complexity and theoretical conservatism of the proposed methods.In particular, we believe that our framework fits for an iterative, or active learning, approach, where the solution, and the bottlenecks, at the logical level may be used as information to guide the low-level design, and viceversa.

T 1 T 3 T 2 DFig. 1 :
Fig.1: Motivating example: A robot must navigate to and remain at targets T 1 , T 2 or T 3 as directed by an external environment which imposes respective modes M 1 , M 2 , and M 3 , while avoiding any collision with the walls W and with the door D (if it is closed).

Fig. 2 :
Fig.2: Flowchart illustrating the overall algorithm given in Section IV.Nodes 0 , 1 are the inputs and node 11 is the output of our synthesis method.High-level and low-level synthesis steps are colored in dark and light grey, respectively, and discussed in the sections indicated at the arrows.

Fig. 3 :
Fig. 3: Illustration of a part of the initial parity game for the motivating example with Player 1 (squares) vertices and Player 0 (circles) vertices containing their priority in a black circle.A winning strategy template consists of unsafe edges indicated by red dotted arrows and co-live edges indicated by blue dashed arrows.

Fig. 6 :
Fig. 6: The corresponding control game graph (without Player 0 dead-ends) for the basins of attraction in Fig. 4.
Time complexity:: Let k be the number of times SOLVEREACH(•) is called.If T = V , then A = V , and hence, S \ A = ∅ for every (S, C, T) ∈ Λ, and hence, SOLVEREACH(•) will never be called.Furthermore, if T = V , then, by definition of attr 0 (G, •), it holds that T ⊆ A. So, in line 5, we keep adding at least one vertex to the target for the next call of SOLVEREACH(•).Hence, k can be at most |V |.Moreover, in each iteration, we might need to solve game (G| C , φ B ) for each (S, C, T) ∈ Λ; and using Remark 2, solving such a game can be reduced to computing attractor function attr 1 (G, •).As computing such an attractor function takes O(|E|) time [42], the algorithm takes O(|Λ| • |V | • |E|) time in total.

Fig. 8 :
Fig. 8: A screenshot from the simulation video Fig. 4: X a (region enclosed by red dotted line) and X e (region enclosed by blue dashed line) illustrate possible basins of attraction for the CLFs implementing the cRWAs Ω a (d, e) (ensuring to reach T 1 while avoiding only the walls) and Ω e (d, e) (ensuring to reach T 1 while avoiding walls and T 2 ), respectively from Example 2.
, the cRWAs Ω a (d, e) and Ω e (d, e) as given in Example 2. A possible set of corresponding CLFs w a and w e with