Continuous-Time Behavior Trees as Discontinuous Dynamical Systems

Behavior trees represent a hierarchical and modular way of combining several low-level control policies into a high-level task-switching policy. Hybrid dynamical systems can also be seen in terms of task switching between different policies, and therefore several comparisons between behavior trees and hybrid dynamical systems have been made, but only informally, and only in discrete time. A formal continuous-time formulation of behavior trees has been lacking. Additionally, convergence analyses of specific classes of behavior tree designs have been made, but not for general designs. In this letter, we provide the first continuous-time formulation of behavior trees, show that they can be seen as discontinuous dynamical systems (a subclass of hybrid dynamical systems), which enables the application of existence and uniqueness results to behavior trees, and finally, provide sufficient conditions under which such systems will converge to a desired region of the state space for general designs. With these results, a large body of results on continuous-time dynamical systems can be brought to use when designing behavior tree controllers.


I. INTRODUCTION
Behavior trees (BTs) are a way to combine a set of controllers (policies) into higher-level controllers in a hierarchical and modular way. In this paper, we give the first continuous-time representation of BTs and provide sufficient conditions for convergence of general BTs.
Modularity is a key tool to handle complexity in software systems, as it enables different components to be developed and tested individually, and BTs have been shown to be optimally modular in comparison to other decision structures [1]. Hierarchical modularity, where each module may contain submodules, is also beneficial since a single level of modules in a large system either leads to very large and complex modules, or a very large number of smaller modules. Additionally, a hierarchical structure is more natural in many applications, as many tasks can be divided into subtasks in a hierarchical way, such as when a robot has to fetch an object, which might include subtasks such as navigation, door opening, object grasping, and so on.
Improved modularity is the reason that BTs were conceived in the first place [2] as an equally expressive [3] alternative to finite-state machines (FSMs) in the design of non-player characters in video games. In this virtual setting, the world is predictable by design and many lowlevel policies can be developed with relative ease. Thus,  Conversely, if x ≤ T then x ∈ Ω 3 andẋ = u 3 (x) = 1, see Theorem 2. game developers started to put together large sets of lowlevel policies earlier than robot developers and therefore had a stronger need for modular tools. However, the interest in BTs from the robotics community has increased over time and they are now used in both open-source middleware, such as the Robotic Operating System (ROS) 1 and innovative industry software from Boston Dynamics 2 and Nvidia 3 .
Even though there is an increasing interest in BTs from the robotics and AI communities (see the recent survey in [4] with over 180 papers) there is still no continuous-time formulation available. The need for such a formulation is clear from the fact that almost all major branches of control theory, from linear systems to optimal control, have been developed for both continuous-time and discrete-time systems, but BTs have so far only had a discrete-time formulation. With the proposed continuous-time model, continuous-time control theory results, such as sliding mode control, can now be used to analyze BT designs. To date, the only efforts towards continuous-time models have either been informal comparisons of BTs and hybrid dynamical systems (HDS), considering discrete-time BTs and discrete-time HDS, or different ways of doing event-based ticking, or letting the tick frequency go to infinity [5]- [8].
A key topic in control theory is stability and convergence to a particular equilibrium point, or region of the state space. For a BT, this translates to reaching the so-called success region, a state where the BT returns success. Important results on sufficient conditions for convergence to the success region have been presented in [9], [10], but in both cases the analysis was limited to a particular subclass of BTs. In this letter we propose sufficient conditions that can be use to analyze any BT design.
The main contributions of this letter are as follows. We provide the first formal formulation of BTs in continuous time (Definition 1). We show that the proposed formulation can be seen as a discontinuous dynamical system (DDS) (Theorem 2), with corresponding results regarding existence and uniqueness (Theorem 3). We provide sufficient conditions under which a BT execution will converge to a desired region of the state space (Theorem 4).
The organization of this letter is as follows. In section II, we discuss how our contributions differ from those presented in related work. In section III, we provide a brief overview of tools for analyzing ordered trees and results regarding DDSs. Then, in Section IV, we formulate continuous-time BTs and connect them to DDSs in Section V. Finally, in Section VI, we present a convergence proof and in Section VIII, we state our conclusions.

II. RELATED WORK
In this section, we will describe related work from a number of different aspects. Continuous-time: In [6], a continuous-time BT is informally described as a discrete-time BT with an infinite tick rate, as a means to compare BTs to HDSs. In [8], instead of querying behaviors at a certain tick rate, behaviors run continuously and notify superior behaviors when their status changes. Our work addresses the same problems; however, our work does so on the basis of a formal state space definition of continuous-time BTs (Definition 1). Hybrid dynamical Systems: The first comparison of BTs to HDSs appears to have been made in [5]. Therein, it was described how BTs modularly represent HDSs and implicitly encode explicit state transitions through its tree structure. This discussion continued along the same lines in [7] and equivalence notions between discrete-time BTs and HDSs were presented in [6].
In these works, the interpretation of an HDS is such that a discrete state determines which behavior to use. However, as we will show, a BT is aptly described by a DDS [11], where the state's presence in certain regions solely determines which behavior is used. Thus, we go beyond related work by not only showing that BTs more closely correspond to DDSs [11], but we also do this formally (Theorem 2). As a result, we also address existence and uniqueness of solutions (Theorem 3). Convergence analysis: It was shown in [7] that the composition of behaviors in Fallback BTs is similar to the idea of sequential composition [12]. Therein, sufficient conditions for convergence to a goal state were presented formally in terms of the attraction region of individual behaviors. These concepts were applied in [13] to guarantee BT performance in the presence of black-box controllers.
A version of BTs called Robust Logical-Dynamical Systems was proposed in [9], which uses an Implicit-Sequence BT structure like in [7]. Therein, they show convergence in the presence of uncontrolled behavior changes. Our work is related to all of the above in that we prove convergence in BTs (Theorem 4); however, our work is different in the sense that the results can be applied to general BT structures, not just special classes.
A concept of [12] not used in the above works is the "prepares graph", a directed graph of transitions induced by the composition of policies. In [12], this graph is used to construct a totally ordered subgraph of policies that lead to the goal state. This construction was extended in [14] to allow for multiple controllers in the subgraph to overlap in order to attain more flexibility in the presence of disturbances, thereby forming a partially ordered subgraph. We will use this notion of a prepares graph as a tool to prove the convergence of general BTs.

III. PRELIMINARIES
In this section we will first describe how two partial orders can be used for analyzing ordered trees, and then present some results on DDSs.

A. Ordered Trees
As we will see below, BTs are ordered trees, and as was discussed in [15], ordered trees can either be seen as graphs, as drawn in Fig. 1, or as a set of vertices with two partial orders, the so-called parent and sibling orders.
A directed graph is often defined in terms of G = (V, E), where V is the vertices and E ⊂ V 2 is the edges. If the graph has no cycles and no two distinct paths from a starting vertex meet at the same ending vertex, it is called a tree; if one vertex is designated as the root, it is called rooted. Given a root, the usual concepts of parent/child can be applied to each edge, with the parent being closer to the root and the child further away. To create an ordering between siblings (children of the same parent) the vertices can be embedded in a plane (as drawn on a paper) and the order given by clockwise or left/right positions. In Fig. 1, the root would be vertex 0, and its two children vertex 1 and 4 (in that order) and so on.
In this letter, we will use the graph model for BTs, but we will also make use of order theory for analyzing ordered trees, as described in [15]. As we will show, this formulation will support the analysis. We now use (V, ≤ S , ≤ P ) to define the tree, where V is the vertex set as above, and ≤ S , ≤ P are two partial orders on V , called the sibling and parent orders, respectively.
A partial order ≤ on a set is a homogeneous binary relation The order is partial, since two elements x, y might not satisfy x ≤ y or y ≤ x. If so, x, y are said to be incomparable by ≤. If all elements are comparable, the order is said to be a total order, instead of a partial order. We write x < y if x ≤ y and x = y, and for the reversed order ≥ we write y ≥ x if x ≤ y.
In Fig. 1, we have that 1 ≤ S 4, since 1 and 4 are siblings and 1 is to the left of 4. Note that 0 and 1 are incomparable by ≤ S , since they have no sibling relation. Instead, they are comparable by ≤ P , with 0 ≤ P 1. Furthermore, 0 and 3 are comparable by ≤ P , with 0 ≤ P 3 by transitivity, but 0 and 3 are incomparable by ≤ S .
We can also combine orders into new orders as (1) In this way, we can define a generalized uncle relation from the sibling and parent relations as < LU :=< S • ≤ P (left uncle) > RU :=> S • ≤ P (right uncle). These relations include several steps in both sibling and parent directions, thus including siblings, uncles, great uncles, great-great uncles, and so on. In Fig. 1, we have that 4 > RU 2 and 4 > RU 3 because 4 is a right uncle of 2 and 3.
Independently of the graph or ordered set representations, we will use the parent map p : V → V , mapping a vertex to its parent.

B. Dynamical systems theory
In this section, we will remind readers of a result from [11] on the existence and uniqueness of the solutions to DDSs. The notation used here will be used in the following sections to show how BTs fit into this formalism.
Theorem 1 (Existence and uniqueness [11, Proposition 5, p.53]). Let X : R n → R n be a piecewise continuous vector field, with R n = D 1 ∪D 2 . Let S X = ∂ D 1 = ∂ D 2 , where ∂ is the boundary operator, be the set of points at which X is discontinuous, and assume that S X is a C 2 -manifold. Furthermore, assume that, for i ∈ {1, 2}, X |D i is continuously differentiable on D i and X |D 1 − X |D 2 is continuously differentiable on S X , where X |D i is the continuous extension of the restriction of X toD i . If, for each x ∈ S X , either X |D 1 points into D 2 or X |D 2 points into D 1 , there will exist a unique Filippov solution tȯ x = X(x) starting from each initial condition.

IV. CONTINUOUS-TIME BTS
In this section, we will define continuous-time BTs, and see how the example of Fig. 1 forms a continuous-time controller.
As noted above, BTs are a hierarchical and modular way of combining controllers into new controllers. In this letter we let all controllers be state-feedback controllers, i.e. functions from the state space R n to some control space R m . If one wants to include some internal dynamics, such as a Kalman filter, in the controller, the state space can be extended.
where i ∈ V is an index, u i : R n → R m is a controller, and r i : R n → {R, S , F } is a metadata function, describing the progress of the controller in terms of the outputs: running (R), success (S ), and failure (F ). Define the metadata regions for x ∈ R n as the running, success, and failure regions: respectively, which are pairwise disjoint and cover R n .
The metadata can intuitively be interpreted as follows. If x ∈ S i , T i has either succeeded with whatever it was supposed to do (such as opening a door), or the goal was already achieved to begin with (the door was open). Either way, it might make sense to execute another controller to achieve some other goal (perhaps a goal that was intended to be achieved after opening the door).
If x ∈ F i , T i has either failed (the door to be opened turned out to be locked), or has no chance of succeeding (the door is out of reach from the current position). Either way, it might make sense to execute another controller (either to open the door in some other way or to achieve a higher-level goal in a way that does not involve opening the door).
If x ∈ R i , it is too early to determine if T i will succeed or fail. In most cases, it makes sense to continue executing T i , but it could also be reasonable to change the controller if some other action is more important (e.g. low battery level indicates the need for recharging).
Definition 2 (Continuous BT execution). Given some dynamical system f : R n × R m → R n that is to be controlled, and assuming the root of the BT is T 0 (has index 0), we havė where u 0 (x) is given by (2).
Below we will describe the properties of this execution, and in particular show that it can be seen as a DDS, with corresponding results regarding the existence and uniqueness of solutions.
As described above, knowing if a lower-level controller failed, succeeded, or is still trying (running) is crucial for a higher-level controller to decide if another sequence should be initiated, or if some kind of fallback action needs to be invoked to achieve the desired outcome. These two cases are captured by the two fundamental BT composition types: Sequence and Fallback. The result of these behavior compositions is simply another BT that satisfies (2). This is what gives BTs their hierarchical modularity.
A Sequence is used to combine subtrees that are to be executed in order, where each one requires the success of the previous action. If any subtree fails, the whole sequence fails. In Fig. 2, node 0 is a Sequence. First, node 1 is executed to get into the kitchen, and then node 2 is executed to turn one of the lamps on. But it only makes sense to try turning the lamps on if the action of moving to the kitchen succeeds. Formally, a Sequence is defined as follows.
Definition 3 (Sequence). A function Seq that composes an arbitrarily finite sequence of M ∈ N BTs into a new BT as As can be seen in (5), a subtree T i is only executed if the state is in the success region of the siblings to the left T j , j < i.
A Fallback on the other hand only executes the next subtree if the previous one fails. If any subtree succeeds, the Fallback returns success, but it only returns failure if all subtrees fail. In Fig. 2, node 2 is a Fallback, and the two subtrees correspond to turning on either lamp A or lamp B. The metadata regions (3) of the Sequence and Fallback compositions are given by the definition, but can also be explicitly computed in terms of the children regions and the orders < S , < P as follows. Lemma 1. The metadata regions of a Sequence T i can be computed from the children metadata regions as follows:

Definition 4 (Fallback). A function Fal that composes an arbitrarily finite sequence of M ∈ N BTs into a new BT as
Proof. A straightforward application of (3) and (5). The running region of the sequence is the running region of the first child and the intersection of the success region of the first child with the running region of the second child and so on. The failure region works similarly, whereas the success region is the intersection of all the children success regions, as the sequence requires all children to succeed to return success.

Lemma 2. The metadata regions of a Fallback T i can be computed from the children metadata regions as follows
Proof. A straightforward application of (3) and (6). The running region is similar as for the Sequence above. The success region is similar to the running region, but the failure region is different since it requires all children to fail before returning failure.

V. BTS AS DISCONTINUOUS DYNAMICAL SYSTEMS
We need to show that the BT execution of (4) can be seen as a DDS. Thus we need to identify the operating regions Ω i of the BT, i.e. the regions where the root BT executes a particular subtree T 0 = T i . As we will see, the Ω i will depend on both the subtree T i itself, and its place in the surrounding BT. But, before we can define the operating region Ω i we need to define the influence region I i and the success and failure pathways S, F.
Informally, the influence region I i is the region where the design of T i influences the execution of T 0 , either by returning e.g. failure so another node executes or by executing itself (thus we will have I i ⊃ Ω i ).
We will be using the so-called left uncle (LU) order < LU :=< S • ≤ P defined in Section III. Note that T j : j < LU i are left siblings of either i or any ancestors of i. For a state to be in I i it needs to be in the success region of the left uncles that have a Sequence as a parent, and in the failure region of the left uncles that have a Fallback as a parent. Formally we write the following.
Definition 5 (Influence Region). A subset of the state space defined for T i as In the example of Fig. 2, assuming the state space is R n , we have that I 0 = R n , I 1 = R n , I 2 = S 1 , I 3 = S 1 , and I 4 = S 1 ∩ F 3 . Thus, a change in T 1 can influence T 0 in any part of the state space, but a change in T 4 can only influence T 0 if x ∈ S 1 ∩ F 3 , i.e., if going to the kitchen was successful and turning on lamp A failed.
If the state is in I i and T i returns running, it will execute. But, it will also execute in the case when T i returns success or failure and that same metadata is progressed all the way up to the root. Thus we need to identify what subtrees are on the so-called success and failure pathways. We now make use of the right uncle (RU) order that was also defined in Section III, > RU :=> S • ≤ P . Similarly, T j : j > RU i are right siblings of either i or any ancestors of i.
Informally, success pathways are vertices i such that there are no right uncles, with Sequence parents, that can take over the execution when T i returns success. Similarly, failure pathways are vertices i such that there are no right uncles, with Fallback parents, that can take over the execution when T i returns failure. We call them pathways since if i is on the pathway then so is every other vertex on the path from i to the root. Formally, we write the following.
In the example of Fig. 2, we have that S = {0, 2, 3, 4}, since success from these nodes leads to success of the entire BT, and only success in going to the kitchen leads to other actions. Similarly, F = {0, 1, 2, 4}, since failure from these nodes leads to failure of the entire BT, and only a failure in turning on lamp A can be handled (by turning on lamp B).
We are now ready to define the operating regions.
Definition 7 (Operating Region). A subset of the state space defined for T i as In the example of Fig. 2, we have that Ω 0 = R n , We will now show that a BT's operating region is partitioned by its childrens' operating regions. Lemma 3. Operating regions of siblings are pairwise disjoint, Ω i ∩ Ω j = / 0 for all i < S j, and cover their parent's operating region, Ω i = p( j)=i Ω j .
Proof. As shown in [7], compositions can be expressed as follows: Thus, it is sufficient to analyze the case of two children.
The first case is ruled out because 1 ∈ S ∩ F implies that ∃ j : j > RU 1 and we know that 2 > RU 1.
We will now formally prove that the state's presence in Ω i is indeed a sufficient condition to conclude that T i is being executed.
Theorem 2. Let P be the set of leaf nodes whose operating regions are non-empty: Then, we have x ∈ Ω i : i ∈ P =⇒ẋ = f (x, u 0 (x)) = f (x, u i (x)) and i∈P Ω i = R n .
Proof. We need to show that x ∈ Ω i : i ∈ P =⇒ f (x, u 0 (x)) = f (x, u i (x)) and that {Ω i } i∈P cover the state space.
We have that Ω i ⊂ I i by (12) and from (9) we see that no leaf to the left of u i can execute. Furthermore, by the construction of (12), either x ∈ R i , or x is in the success or failure region of a node on a success or failure pathway (respectively), so no leaf to the right of u i can execute. Thus, we conclude that Now we need to show that {Ω i } i∈P cover the state space. From Lemma 3 we have that Ω i for a set of siblings are pairwise disjoint and cover Ω p(i) . By definition, I 0 = R n and since 0 ∈ S ∩ F we have Ω 0 = I 0 = R n by (12). Applying Lemma 3 recursively down the tree we see that for the leaves in P we have that {Ω i } i∈P are pairwise disjoint and cover R n , i∈P Ω i = R n . Theorem 3. The execution (4) will have a unique Filippov solution (see [11]) for each initial state if, for every pair of neighboring sets with index in P, i.e. sets Ω i , Ω j with i, j ∈ P and ∂ Ω i ∪ ∂ Ω j = / 0, the sets Ω i , Ω j and the vector field are such that the following holds with D 1 = Ω i and D 2 = R n \ D 1 . S X = ∂ D i is the set where X(x) is discontinuous and S X is a C 2 -manifold. Furthermore, for i ∈ {1, 2}, X |D i is continuously differentiable on D i and X |D 1 − X |D 2 is continuously differentiable on S X . For each x ∈ S X , either X |D 1 points into D 2 or X |D 2 points into D 1 .
Proof. A straightforward application of Theorem 1 for every neighboring pair of Ω i .
Sufficient conditions for the existence and uniqueness of BT executions can thus be found using the corresponding results for DDS in Theorem 1. Fig. 3.

VI. CONVERGENCE ANALYSIS
Prepares graph for the BT in Fig. 2.
In this section, we will state the conditions under which a general BT is convergent. The main idea of our convergence theorem is similar to the concept of prepares from [12]. Given a BT and its operating regions, the region of attraction of each policy invokes switching between operating regions, thereby inducing a partial order ≤ f of transitions.
The reflexive-transitive reduction of this partial order is a directed acyclical graph (prepares graph), as illustrated in Fig. 3 for the kitchen-lamp example in Section IV. The transitions (edges) of this graph are described as follows: (a, b) going to the kitchen and trying to turn on lamp A because it is closer, (a, d) going to the kitchen and trying to turn on lamp B because it is closer, (b, d) trying to turn on lamp B because lamp A did not work, (b, c) successfully turning on lamp A, (d, e) successfully turning on lamp B. Note, the dashed regions in Fig. 3 correspond to the success and failure pathways. Informally speaking, the BT will be convergent if this graph is acyclical and has all its sinks in success regions. We will now formally state the convergence theorem.
Theorem 4. If there exists a subset L ⊆ P and a partial order ≤ f ⊂ L 2 such that the constraint region is invariant under f (x, u i (x)) for all i ∈ L, and there exists a finite time τ i > 0, such that if x(t) ∈ Ω i \ S 0 then x(t + τ i ) ∈ Ω i \ S 0 for all i ∈ L, then there exists a maximum number of transitions N ∈ N and a maximum duration t > 0, such that if x(0) ∈ Λ i for any i ∈ L, then x(t) ∈ S 0 in bounded time t ≤ t within N transitions.
Proof. We have that if totally ordered by ≤ f , and i ≤ f k for all k ∈ L 0 ∪ L 1 . In other words, L 0 and L 1 are the chains of transitions with the largest duration and cardinality, respectively.
We now have a tool to assess the convergence properties of a general BT. The key challenge is thus to design the structure of the BT itself and its controllers to satisfy Theorem 4. An extended version of this paper, with a longer example of the application of this result can be found in [16].

VII. EXAMPLE
In this section we will illustrate Theorem 4 with a simple example. Consider the normalized inverted pendulum model from where θ ∈ R is the pendulum's angle from the vertical, x := [θ ,θ ] is the state, and u ∈ R is a control input. We want to stabilize (16) to the unstable equilibrium at stationary-upright configuration, where cos(θ ) = 1 andθ = 0. A popular technique for doing so is energy control. Following [17], define the energy of (16) as and a control policy with constants k E , u m ∈ R >0 as u a (x) := sat u m k E Esgn θ cos (θ ) , where sat u m ensures that u a (x) ∈ [−u m , u m ]. Policies such as (18) are well-known to exponentially stabilize the pendulum (16) to its homoclinic orbit about E = 0 (shown by the dashed lines in Fig. 5) starting from all states other than the stable stationary-downward configuration, where cos(θ ) = −1 anḋ θ = 0. Unfortunately, however, the stationary-upright configuration is only a saddle equilibrium of (16) under the influence of (18). Thus, the system would only periodically pass through the stationary-upright configuration. Therefore, we need to define a local controller to "switch on" and stabilize the system when close enough to the stationary-upright configuration. Define a linear-feedback policy with constants k θ , kθ ∈ R >0 as The policy (19) is exponentially stabilizing within some region of the statespace around sin(θ ) = 0 andθ = 0, which is met at both the stationary-upright and stationary-downward configurations.
To make sure that the controller is only used in the stationary-upright configuration, we define an error metric δ (x) := l θ (cos (θ ) − 1) 2 + lθθ 2 , with constants l θ , lθ ∈ R >0 , which will only be zero at the stationary-upright configuration. Using this metric for the policies above, we define their metadata functions and regions, and where ε a , ε b ∈ R >0 are constants such that ε b ≤ ε a , and S a is a positively invariant set of f (x, u b (x)) containing its equilibrium (the stationary-upright configuration). Since the energy-based policy u a is exponentially stabilizing to the zero-energy manifold, we have that |E (t)| ≤ |E (0)| e −αt for some constant α ∈ R >0 . This implies that, for all ε a ∈ R >0 , there exists τ a ∈ R >0 such that |E (τ a )| ≤ |E (0)| e −ατ a < ε a . Since the angular velocityθ maintains positivity or negativity for the duration of each orbit (see Fig.  5), it is implied that, if S a ⊆ {x : E(x) ≤ ε a }, then there exists a finite bound τ a ∈ R ≥τ a , such that, if x(0) ∈ R a then x(t) ∈ S a in finite time t ≤ τ a for the executionẋ = f (x, u a (x)).
Since the linear policy u b is exponentially stabilizing within the region S a , we similarly know that there must exist a bound τ b ∈ R >0 such that if x(0) ∈ S a ∩ R b then x(t) ∈ S b in finite time t ≤ τ b for the executionẋ = f (x, u b (x)).
With T 1 := (u a , r a ) and T 2 := (u b , r b ), we then define and label the BT in Fig. 4 with The operating regions of (23) are then computed as Therefore, the execution (4) is computed with Based on the stability analysis above, we then have the following in the context of Theorem 4: with L := {1, 2} and ≤ f =≤ S , where N = 2 because L is totally ordered by ≤ S . The conclusion from (26) is that, for the execution (4), if x(0) ∈ Ω 1 then x(t) ∈ S 0 , where S 0 = S 1 ∩ S 2 , in finite time t ≤ t within N transitions.

VIII. CONCLUSIONS
In this letter, we have formulated BTs in continuous-time and shown how they fit the formalism of a DDS and the conditions under which solutions to their execution exist and are unique. To do this, we embedded the order of the BT structure itself into the formulation. These contributions allow the application of the rich literature in hybrid dynamical systems [18]- [20] to BTs in general. Finally, we have provided the conditions under which a general BT will be convergent to a goal.