HyHooVer : Veriﬁcation and Parameter Synthesis in Stochastic Systems With Hybrid State Space Using Optimistic Optimization

This article presents a new method for model-free veriﬁcation of a general class of control systems with unknown nonlinear dynamics, where the state space has both a continuum-based and a discrete component. Speciﬁcally, we focus on ﬁnding what choices of initial states or parameters maximize a given probabilistic objective function over all choices of initial states or parameters from such hybrid state space, without having exact knowledge of the system dynamics. We introduce the notion of set initialized Markov chains to represent such systems. Our method utilizes generalized techniques from multi-armed bandit theory on the continuum, in an attempt to make an efﬁcient use of the available sampling budget. We introduce a new algorithm called the Hybrid Hierarchical Optimistic Optimization (HyHOO) algorithm, which is designed to address the problem outlined in this paper. The algorithm combines elements of the existing Hierarchical Op-timistic Optimization (HOO) bandit algorithm with carefully chosen parameters to create a fresh perspective on the problem. By viewing the problem as a multi-armed bandit problem, we are able to provide theoretical regret bounds on sample efﬁciency of our tool, HyHooVer . This is achieved by making assumptions about the smoothness of the underlying system. The results of experiments in formal veriﬁcation and parameter synthesis of variety of scenarios, indicate that the proposed method is effective and efﬁcient when applied to realistic-sized problems and it performs well compared to other methods, speciﬁcally PlasmaLab, BoTorch, and the baseline HOO algorithm. Speciﬁcally, it demonstrates better efﬁciency when employed on models with large state space and when the objective function has sharp slopes in comparison with other tools.


I. INTRODUCTION
Our interest is in verifying safety and synthesizing parameters of autonomous and cyber-physical systems, where the system dynamics are not explicitly known.In other terms, our goal is to identify the choices of initial states or parameters that maximize a given objective function, while not having a detailed knowledge of the underlying system dynamics, and by only using noisy observations of the system.This problem is relevant in applications such as predictive monitoring or runtime verification of safety-critical systems like self-driving vehicles, drones and medical devices that incorporate complex black-box algorithms, where guaranteeing the safety of the system is essential.
Addressing this problem remains challenging because the system dynamics are not fully known and querying the system can be costly.Also typically the state space could be big and hybrid (with continuum-based and discrete components) which make the problem even harder.This problem can be categorized as a black-box optimization problem that involves expensive system queries.To address this problem, searching for black-box optimization methods that are sample efficient is logical.In the context of finite state-space systems, the theory of multi-armed bandits has demonstrated its efficiency in learning about unknown systems through adaptive sampling.Therefore, our approach not only utilizes the multi-armed bandits approach but also extends it to a new setting with a hybrid state space, marking the first time such an extension has been made.
The field of multi-armed bandits [1], [2], [3] has seen significant growth and development in both technique and practical use in the recent years [4], [5].Especially relevant to our research is X -armed bandit in [5] with its application in black-box optimization, where the goal is to find the maximum of an unknown function f with only noisy queries of the function (i.e. a query of x returns f (x)+ noise).One well-known algorithm in this area is the Upper Confidence Bound (UCB) [6] which utilizes the Principle of Optimism for adaptive search.Since the exact value of f (x) is unknown, this principle uses an estimated upper bound on the function -this is constructed from samples, and thus holds with high probability.The x chosen at each time is the one that maximizes this estimated upper bound.Thus, initially different parts of state space are explored, and as more information is gathered, it becomes less optimistic.This principle has been used in the context of black-box optimization by reformulating problems as tree search [4], [5], [7], [8].These algorithms use a tree structure to search through the unknown function's domain, balancing the exploration of less explored areas with the exploitation of high-value regions that have already been identified, to obtain an approximate solution within a given sampling budget.
These algorithms are assessed based on a measure called cumulative regret, which is the expected accumulated difference between the highest value found by the algorithm and the true maximum value.To ensure theoretical guarantees for the regret, these methods typically require that the objective function satisfy some smoothness properties.While the approach in [5] requires smoothness with respect to a semi-metric, the algorithms in [9], [10] relax the requirement of an exact semi-metric that captures the smoothness of f , and instead use two smoothness parameters to define their notion of smoothness.When estimating these parameters is not practical, [10] provides an algorithm to simultaneously test different parameters, and this approach has been extended to multi-fidelity settings in [9].
Given this, we can utilize the X -armed bandits method to tackle the problem at hand, where the concept of regret can be related to the error in verification and synthesis in our specific context (for more details refer to Section III-D).
In this article, we introduce a new algorithm for statistical verification and parameter synthesis of a new generalized class of discrete-time Markov chains (MCs) with hybrid state space, consisting of a continuum-based and a discrete part, which can be applied in situations where information about the transition probabilities are not known and sampling budget is limited.We carefully introduce the notion of set initialized Markov chains (SIMC) that captures the properties of discretetime MCs with hybrid state space.Our algorithm is called Hybrid Hierarchical Optimistic Optimization (HyHOO), that is built upon a tree-structured method called hierarchical optimistic optimization (HOO) [5] and is designed to maximize unknown functions.HyHOO extends our previous work in [11] to systems with hybrid state space and has the following features: r It expands HOO to cover hybrid systems whose states live in state space that has both continuum-based and discrete components.We use the terminology state to refer to continuous components, and the terminology mode to refer to discrete components.This allows the algorithm to be applied in a wider range of situations where the set of initial states or parameters includes both hyper-rectangles and finite sets.
r It employs the principle of optimism to explore the hy- brid state spaces, by strategically allocating the available budget to the regions of the state space that are likely to be optimal.
r It is designed with batched simulations feature which improves its memory usage and running time.To provide theoretical guarantees for the regret achieved by our algorithm HyHOO, we make assumptions about the smoothness of the function f .These assumptions relate to the smoothness of f in the vicinity of f (x * ) over the regions of the state space that correspond to each mode.This is less restrictive than the smoothness assumptions required for HOO and our previous work [11], which require the smoothness of f in the vicinity of f (x * ) over the entire state space.The theoretical bound on the regret of HyHOO is a function of the smoothness parameters, the sampling budget, the batch size parameter, and a property of the function called the near-optimality dimension which captures the steepness of the slope of the function around the optimum.This bound is different from the existing performance bounds found in the literature on statistical model checking.For example, the bounds relevant for tools like PlasmaLab [12] often use techniques like Monte Carlo sampling, Chernoff bounds, or sequential hypothesis testing.
We have built an open-source tool HyHooVer that utilizes the HyHOO algorithm, and have established benchmarks using Python to showcase its abilities and gauge its performance in terms of sample efficiency.These benchmarks involve traffic scenarios in a car simulation environment called "highway-env" [13], which offers a minimalist yet realistic framework for simulating cars 1 .Our tool's source code and usage instructions, along with the examples, are also available 2 .
Through our experiments on the benchmarks, we have observed the following: r Our tool is capable of performing verification and syn- thesis in scenarios with hybrid state spaces and it demonstrated better sample efficiency than other methods in situations with higher dimensions and multiple modes.
r The tool performs better in scenarios where the slope of the function is steeper around the maximum.This means that the tool is particularly well-suited to situations where there is a rare event involved.
r The batch size parameter can be adjusted to optimize the tool's performance in terms of running time and memory usage, without compromising the results of the verification and synthesis.

A. COMPARISON WITH RELATED APPROACHES
There is a substantial body of literature on model-based approaches for the verification and parameter synthesis of stochastic systems [14], [15], [16], [17], [18] (and the references therein), which rely on detailed knowledge of the probability transition kernel, which may not always be available.Statistical model checking (SMC) also addresses similar verification problems.SMC methods collect samples through system execution and use statistical tests to determine if the constraints have been met or violated [19], [20], [21], [22].Notable methods in this category include MODEST for probabilistic automata [23], PlasmaLab [12], the learning-based algorithm of [24], [25] in PRISM [26] and UPPAAL, and approaches for Markov Decision Processes with restricted classes of schedulers implemented in [27], [28], [29].
Comparing HyHooVer to other discrete-state SMC tools is challenging because the guarantees are different and it is hard to factor out platform-specific constants.We provide a detailed comparison with PlasmaLab in Section IV-D, which is a tool for model checking of unknown systems.HyHooVer generally have better estimation results with fewer samples than PlasmaLab, particularly for higher dimensional models and models with sharp slopes around the maxima.This suggests that HyHOO may be more effective in finding hard-to-detect bugs with fewer samples.The approach of [30] uses the original HOO algorithm of [5] but we could not find this tool online for running comparative experiments.Our approach differs from [30] in two important ways: (1) we use a search algorithm spawning different smoothness parameters and return the result of the best one; (2) we exploit batched simulations; and (3) we have extended HOO to hybrid state-space settings.
One of the prominent techniques for optimizing blackbox functions in stochastic systems is through Bayesian optimization-based approaches such as BoTorch [31].BoTorch is developed using PyTorch and is appropriate for systems with costly queries.We built upon our previous work [11] by providing a comparison of our tool with BoTorch in Section IV-D.Our results indicate that, overall, HyHooVer outperforms BoTorch in terms of running time, especially when a large number of queries are required to verify a model.This difference is particularly noticeable in cases where there are difficult-to-detect bugs or high-dimensional scenarios.

B. MOTIVATING EXAMPLE, TRAFFIC ROUNDABOUT
Consider a roundabout scenario of multiple cars which is simulated in highway-env and is depicted in Fig. 1.Each vehicle starts from an initial position which is randomly distributed from a distribution D and moves toward a target destination while meeting a target speed.Suppose the blue vehicles carry their motion according to Intelligent Driver Model (IDM) and Minimizing Overall Braking Induced by Lane Changes (MOBIL) decision policies (built-in policies in highway-env) while avoiding other cars.The green car, on the other hand is able to track its lane while meeting a target speed, however cannot detect rear-end hazards.Define collision as when distance between the green car and any of the blue cars become less than a threshold.We would like to find the most unsafe situation over a set of target destinations and intervals of initial speeds and target speeds of the cars.In this scenario the state space X can be constructed such that it consists of a discrete and a continuum-based component.Let us explain this with an example.In this scenario, let the letters N, E, W, and S represent the north, east, west, and south exits, respectively.Then let, for instance, the tuple (N, E) represent the north exit and east exit as the target destination for the green car and one of the blue cars.Additionally, let v l 0 , v u 0 denote lower and upper bounds for the initial speed of the green car, and let v l t , v u t denote lower and upper bounds for the target speed of the green car.Then an example of a set that consists of a discrete and a continuum-based component would be {(E, E), (N, E), (W, E), (S, N) ]. Detailed descriptions and Python simulators for our models are available from the HyHooVer source page 3 .

II. PROBLEM STATEMENT
Background and Notation: Let Z be a measurable space in R m and let X := [L] × Z, where [L] = k 1 , k 2 , . . ., k L for some integer L ≥ 1.Let the pair (X , F X ) be a measurable space, where F X is a σ -algebra over X and the elements of F X are referred to as measurable sets.Let P : X × F X → [0, 1] be a Markovian transition kernel on a measurable space (X , F X ), such that for all x ∈ X , P(x, •) is a probability measure on F X ; and for all A ∈ F X , P(•, A) is a F X -measurable function.Also let P β be a Markovian transition kernel that depends on parameter β ∈ R n .For a σ > 0, a real-valued random variable X is σ 2 -sub-Gaussian, if for all s ∈ R, E[exp(s(X − EX ))] ≤ exp(σ 2 s 2 /2) holds, where E denotes expectation.For a matrix z ∈ R n×m , z F denotes its Frobenius norm.
Definition 1: A set initialized Markov chain (SIMC) M is defined by a tuple ((X , F X ), P β , B, ), with: r (X , F X ), a measurable space over the state space X ; r P β : X × F X → [0, 1], a Markovian transition kernel depending on parameter β ∈ B; r B ⊆ X , a set of parameters; and r ⊆ X , a set of possible initial states.
In the examples in Section II-B, the state-dependent probabilistic choices are modeled by the Markov transition kernel P β .We reiterate that our algorithm will not rely on the knowledge of this kernel.Let α, a sequence of states α = x 0 x 1 • • • x k , be an execution of M of length k for any x 0 ∈ and any β ∈ B, where x i s ∈ X .Given x 0 , β and a sequence of measurable sets of states A 1 , . . ., A k ∈ F X , the measure of the set of executions {α | α 0 = x 0 and α i ∈ A i , ∀ i = 1, . . ., k} is given by: which is a standard result from the Ionescu Tulceȃ theorem [32], [33].We address two classes of problems:

A. VERIFICATION
Given an SIMCM and a measurable unsafe set U ∈ F X , we are interested in evaluating the worst-case probability of M hitting U over all possible nondeterministic choices of an initial state x 0 ∈ .Once an initial state x 0 ∈ is fixed, the probability of a set of paths is determined by the Markovian transition kernel in the model M, as described above.Circling 3 https://github.com/NeginMusavi/HyHooVer.git.back to our motivating example in Section II-B, the set would correspond to a set of target destinations and intervals of initial speeds and target speeds of the cars, and x 0 would be an element in this set.
We say that an execution α of length k hits the unsafe set U if there exists an integer i ∈ {0, . . ., k}, such that α i ∈ U .The complement of U , the safe subset of X , is denoted by S. The safe set is also a member of the σ -algebra F X since σ -algebras are closed under complementation.From a given initial state x 0 ∈ and a given parameter β ∈ B, the probability of M hitting U within k steps is denoted by (1) We are interested in finding the worst-case probability of hitting unsafe states over all possible initial states of the model M.This can be regarded as solving, for some k and some β ∈ B, the following optimization problem:

B. PARAMETER SYNTHESIS
Given an execution α of length k and a β ∈ B, let r(α, β ) be a real-valued objective function.Then, we are interested in evaluating the maximum of expected objective function over all possible nondeterministic choices of the parameter β ∈ B. This can be regarded as solving, the following related optimization problem: where the expectation is over the randomness of the transition and the initial state (drawn from a given distribution).

III. VERIFICATION AND PARAMETER SYNTHESIS WITH HIERARCHICAL OPTIMISTIC OPTIMIZATION
We will solve the optimization problems in ( 2) and ( 3) by developing the HyHOO algorithm with mini batches.This can be regarded as a variant of the HOO algorithm [5].The setup is as follows: suppose we have a sampling budget of N and want to maximize the function f : X → R based on receiving noisy observations of f , i.e., f + noise.It is assumed that f has a unique global maximum that achieves the value f * = sup x∈X f (x), where X = [L] × Z is as introduced in Section II.Our approach gets to choose a sequence of sample points (arms) x 1 , x 2 , . . .x N ∈ X , for which it receives the corresponding sequence of noisy observations (or rewards) y 1 , y 2 , . . ., y N .When the sampling budget N is exhausted, the algorithm has to decide the optimal point xN ∈ X with the aim of minimizing regret, which is defined as: Our approach has two key properties: r It is adaptive in the sense that the ( j + 1) st sample x j+1 should depend on the previous samples and their corresponding noisy observations; and r It does not rely on detailed knowledge of f but only on the sampled noisy observations.Algorithms with these properties are called black-box or zeroth-order algorithms.In order to derive rigorous bounds on the regret, however, we will need to make some assumptions on the smoothness of f (see Assumption 2) and on the relationship between f (x j ) and the corresponding observation y j .Assumption 1 formalizes the latter by stating that y j is distributed according to some (possibly unknown) distribution with mean f (x j ) and a strong tail-decay property.
Assumption 1: There exists a constant σ > 0 such that for each sampled x j , the corresponding noisy observation y j is distributed according to a σ 2 -sub-Gaussian distribution M x j satisfying udM x j (u) = f (x j ).
Hence any random variable that is bounded is a sub-Gaussian random variable and thus there exists a σ > 0 satisfies the conditions of this assumption.Next, we present the HyHOO algorithm.In Sections III-B and III-C we present its analysis leading to the regret bound and discuss the choice of its parameters as well as their implications.In Section III-D we discuss how HyHOO can be used for solving the optimization problems in ( 2) and (3).

A. HYBRID HIERARCHICAL TREE OPTIMIZATION
HyHOO (Algorithm 1) is a batched-sampling variant of HOO [5], which also takes into account a hybrid state space.HyHOO selects the next sample x j+1 by building a tree in which each height (or level) partitions the state space X into a number of regions.The algorithm samples states to estimate upper bounds on f over a region, and based on this estimate, decides to expand certain branches (i.e., repartition certain regions) to reduce the region sizes based on the smoothness of f .HyHOO allows us to execute batch simulations of size b to reduce the variance in the estimate of f (x i ) obtained from the noisy observations y i s for any state x i and, more importantly, maintain a lighter data structure.In Section III-C, we discuss the implications of the choice of batch size parameter b.The tree construction is based on the noisy tree-search algorithms (HOO and variants) [5].The root of the tree has L children, where the l-th child is the root of the l-th binary4 sub-tree constructed over k l × Z.Each node in the tree except for its root is labeled by a triple of integers (l, h, i), where l ∈ [L] is the sub-tree index that it belongs to, h ≥ 1 is the height, and i ∈ {1, . . ., 2 h−1 }, is its position within level h.Each node (l, h, i) can have two children (l, h + 1, 2i − 1) and (l, h + 1, 2i).Node (l, h, i) is associated with the region P l,h,i ⊆ k l × Z, where P l,h,i = 14: return a point among x 1 , x 2 , . .., x N chosen uniformly at random.P l,h+1,2i−1 ∪ P l,h+1,2i , and for each h these disjoint regions satisfy ∪ 2 h−1 i=1 P l,h,i = k l × Z.Thus, larger values of h represent finer partitions of k l × Z.It is noted that the root of tree is associated with the whole state space {k 1 , . . ., k L } × Z.For each node (l, h, i) in the tree, HyHOO computes the following quantities: r t l,h,i is the number of times the node is chosen or consid- ered for re-partitioning.r fl,h,i is the empirical mean of observations over points sampled in P l,h,i .r U l,h,i is an initial estimate of the upper-bound of f over P l,h,i based on the smoothness parameters.r B l,h,i is a tighter and optimistic upper bound for the same.The tree starts with a single root (−, 0, 1), with B-values At each iteration a path from the root to a leaf is found by traversing the child with the higher B-value (with ties broken arbitrarily), then a new node (lnew, hnew, inew) is added and all of the above quantities are updated.The partitioning continues until the sampling budget N is exhausted.After that the algorithm returns a point among x 1 , x 2 , . .., x N chosen uniformly at random.The details are provided in Algorithm 1.Hence, Algorithm 1 can be additionally adjusted to return the best-scoring input instead of a randomly chosen one, in order to enhance its performance in practice.

B. ANALYSIS OF REGRET BOUND
The notation and analysis of the regret bound for HyHOO closely follows that in [5] (and followups in [9], [10]).
Let l,h,i denote the sub-optimality gap of node (l, h, i), that is, l,h,i = f * − sup x∈P l,h,i f (x).A node (l, h, i) is optimal if l,h,i = 0 and it is sub-optimal if l,h,i > 0. We say that a node (l, h, i) is -optimal if l,h,i ≤ .These nodes are also called near-optimal nodes.Let l denote the suboptimality gap of the node associated with region k 1 × Z, that is, l = f * − sup x∈k 1 ×Z f (x).We will use two parameters ν > 0 and ρ ∈ (0, 1) to characterize the smoothness of f relative to the partitions (see Assumption 2).We define N h ( ) as in [10] as the number of -optimal nodes at depth h, that is, the number of nodes with l,h,i ≤ .
From the sampled estimate of f (x i ) at a single point x i , HyHOO attempts to estimate the maximum possible value that f * can take over X .This is achieved by assuming that f is locally smooth around x * , which is formalized in Assumption 2. It basically ensures that f does not change arbitrarily in the regions that are near-optimal.Based on the fact that the hierarchical partitioning is done over the region associated with each mode, this assumption relaxes the smoothness assumption in [9], which restricts the function to satisfy the smoothness assumption for all near-optimal regions across the entire state space.
Assumption 2: There exist ν > 0 and ρ ∈ (0, 1) such that for all (l, h, i) Here c is a parameter that relates the variation of f over all cνρ h -optimal nodes at all h ≥ 0. For instance, for c = 0, it implies that there exist smoothness parameters such that the gap between the f * and the value of f over all optimal nodes at all h ≥ 0 is bounded by νρ h .For instance, for c = 2, it implies that there exist smoothness parameters such that over all 2νρ h -optimal nodes, the gap between the f (x * ) and value of f over those nodes is bounded by 4νρ h -optimal, and so on.
Hence, for a finite sampling budget N the final constructed tree has a maximum height h max .Therefore it would be sufficient for f to satisfy the conditions of Assumption 2, for all h ∈ [0, h max ].This would allow f to have finite little jumps around x * .We now define a modified version of the near-optimality dimension which plays an important role in the analysis of black-box optimization algorithms [5], [10].This is a measure of closeness with respect to the number of nodes that have function values that are "close" to that of the optimum.
Definition 2: h max -bounded near-optimality dimension of f with respect to (ν, ρ) is: In other words, N h (2νρ h ) grow exponentially with h, and the near-optimality dimension gives the exponential rate of this growth.Thus, N h (2νρ h ) ≤ Bρ −d m (ν,ρ)h .The modified version of near-optimality dimension is adapted for a finite sampling budget case, and would allow the theoretical guarantees in Theorem 1 to hold for any f that satisfies Assumption 2 with a finite sampling budget.We are now ready to sketch a regret bound for HyHOO.
Theorem 1: With the input parameters satisfying Assumptions 1 and 2, a batch size parameter b, and a sampling budget of N, HyHOO achieves a regret bound of , where d m = d m (ν, ρ) is the h max -bounded near-optimality dimension and B is the constant appearing in Definition 2.
This theorem suggests that when the near-optimality dimension d m and/or the constant B decrease, indicating a decrease in the number of near-optimal regions and steeper slopes around the function's maxima, it theoretically leads to a decrease in the regret achieved by HyHOO.This aspect will be elaborated in the experiments.The proof of this theorem is presented below.
Proof: The proof closely follows the approach in [5] (see also [9], [10]), however, attention is needed when batched simulations are used (especially for large batches).Let R N = N i=1 ( f * − f (x i )) be the cumulative regret at the end of iteration N, where x i ∈ X is the point returned by HyHOO at iteration i.Let T be the tree constructed by HyHOO at the end of N iterations.Let H be a positive integer, that can be optimized for the best bound.As in [5], we divide T into three groups: T 1 that includes all 2νρ h -optimal nodes at h ≥ H, T 2 that includes all 2νρ h -optimal nodes at h ∈ [0, H − 1], and T 3 that includes sub-optimal nodes with sub-optimality gaps greater than 2νρ h for h ≥ 0. All nodes belonging to these groups contribute to the cumulative regret as where the first, second, and the third terms are the contributions of T 1 , T 2 , and T 3 to the cumulative regret, respectively.If b = 1, the third term dominates the second term, and we recover the bound in [5].However, if b is large, we can optimize H to get: , This can be easily converted to the desired simple regret bound of the algorithm (for more details refer to Remark 1 [5]).To be more precise, the connection between simple regret and cumulative regret can be represented in the following manner: N , which serves as the concluding statement of the proof.
A common alternative to Algorithm 1 where L > 1 is to equally distribute the sampling budget N among L modes and use HOO to address the following optimization problem sup We call such alternative approach the "baseline HOO" approach.With no loss of generality suppose that the optimum belongs to the region of the state space associated with the first .
Comparing this bound with HyHOO's regret bound reveals that the larger L and l for l = 1 get, the more HyHOO gains advantage over the baseline HOO.This is because the baseline HOO would spend a good portion of budget (i.e. (L−1)N L ) on searching for the optimum over the parts of state space that do not include the optimum.This is captured in the second term of the regret bound for this algorithm.

1) KNOWLEDGE OF SMOOTHNESS PARAMETERS:
It is worth noting that the exact values of the smoothness parameters in Assumption 2 are not crucial for the implementation of Algorithm 1.They are only necessary to achieve the most precise regret guarantees.If only upper bounds of these parameters are known, they can still be used in a parallel search algorithm.It is noted that determining the optimal smoothness parameters for a given verification problem is a difficult task and requires further research.When only approximate bounds for these parameters are available, which is often the case for physical processes, the search for the optimal parameters can be efficiently performed using the parallel optimistic optimization algorithm developed in [10] (Algorithm 2).This algorithm adaptively searches for the optimal smoothness parameters by creating multiple parallel instances of the HyHOO algorithm with different (ν, ρ) values.In the Section IV, the impact of the upper bound choice of (ν, ρ) on the performance of HyHOO will be discussed.

C. DISCUSSIONS ON CHOICE OF BATCH SIZE b
In the HyHOO algorithm, each node (l, h, i) is sampled multiple times (b times) as opposed to the original non-batched version where each node is only sampled once.This change in sampling affects the update rules for the values U l,h,i and B l,h,i .Indeed, by setting b = 1, the original HOO algorithm can be recovered and the simple regret bound is When comparing the regret bound of the batched version of HyHOO to the non-batched version, it is observed that the regret bound gets worse with larger b, but it has the benefit of reducing the number of nodes in the tree by a factor of b.This leads to a reduction in running time and makes the batched version more memory-efficient compared to the nonbatched version.Therefore, the parameter b holds significant importance as a hyperparameter and it is important to ensure that it does not compromise the accuracy of the verification or synthesis.This will be further elaborated in the experiment section.

D. VERIFICATION AND PARAMETER SYNTHESIS WITH HYHOO
Our objective now is to employ the HyHOO algorithm presented in Algorithm 1 to address both the verification problem in (2) and the synthesis problem in (3).To achieve this, we establish a connection between these problems and the optimization problem introduced earlier in Section III.Specifically, we need to identify the function f , the states x, and the noisy observations y in both problems.
r Verification: When using HyHOO for verification, we can choose the objective function as f (x) := p k,U ,β (x) for any initial state x ∈ and a given β ∈ B. Here, p k,U ,β (x) represents the probability of hitting the unsafe set U within k steps, starting from the initial state x and following the execution α.We define the noisy observation y as follows: y = 1 if α hits U within k steps, and 0 otherwise.Thus, for a given initial state x ∈ , y = 1 with probability p k,U ,β (x), and y = 0 with probability 1 − p k,U ,β (x).In other words, y is a Bernoulli random variable with a mean of p k,U ,β (x).
r Parameter Synthesis: When using HyHOO for parame- ter synthesis, we can choose the objective function as f (β ) := E[r(α, β )|β] for any parameter state β ∈ B. Here, r(α, β ) represents the outcome of the execution α starting from an initial state x 0 sampled from a given distribution, with the parameter set as β.The noisy observation y is defined as: where the mean of y is E[r(α, β )|β].By establishing this connection between the verification and synthesis problems and the optimization problem addressed by Algorithm 1, we can utilize HyHOO with a sampling budget of N to tackle these problems.Furthermore, we can present the following propositions that provide theoretical guarantees regarding the optimization error of HyHOO when applied to the verification and synthesis problems.
Proposition 1: Under the assumption that the noisy observations y 1 , . . ., y N in the verification problem satisfy Assumption 1, and the smoothness parameters (ν, ρ) satisfy Assumption 2 for the function f (x) = p k,U ,β (x), if Algorithm 1 returns xN ∈ , then the expected value of the The following remark discuss how Assumption 2 is satisfied by the verification and synthesis problems: Remark 1: Assumption 2 asserts that choices of initial states that are near x * (the initial states that make the system most unsafe) also lead to unsafe states in a verification problem.This requires the probability transition kernel to have a local Lipschitz property at x * ∈ .For the safety verification scenarios considered in this article, this assumption implies that initial configurations of the cars that are close to the most unsafe configuration are also unsafe, which aligns with our understanding of the physical dynamics involved.A similar argument holds for the parameter synthesis as well.If the probability transition kernel has a local Lipschitz property at x * ∈ B, then Assumption 2 holds for parameter synthesis problems.

IV. HyHooVer TOOL, EVALUATION, AND DISCUSSION
We devote this section to explain the structure of HyHooVer, to introduce several benchmark scenarios and to carefully evaluate its performance over instances of these scenarios.It is noted that our experiments were conducted on a Linux workstation with two Xeon Silver 4110 CPUs and 32 GB RAM.
HyHooVer consists of several components, as shown in Fig. 2. To use HyHooVer, the user needs to provide a Python code for the model that needs to be verified or synthesized (model.py),along with some input parameters.These input parameters include the sampling budget N, the upper bounds for the smoothness parameters (ν max , ρ max ), the number of HyHOO instances K, the noise parameter σ , the batch size parameter b, and the number of modes L. Once the input is provided, HyHooVer runs K instances of HyHOO with automatically calculated smoothness parameters (see Algorithm 2).For each instance i, it generates random trajectories from the model, receives noisy observations (rewards) (see Algorithm 1, line 5), and returns an estimate xi of the optimum.Once the sampling budget N is exhausted, HyHooVer returns the best xi by comparing the mean reward for each xi using Monte-Carlo simulations.For each xi , this is done by querying the model for n times (for some n determined by the user) and taking the average of the noisy observations returned by the model f ( xi ).Finally, HyHooVer outputs the x that gives the highest mean reward and its corresponding f ( x). 5In the following subsection, we describe the benchmarks used to evaluate the performance of HyHooVer.

A. BENCHMARKS
We evaluate the performance of HyHooVer using a variety of scenarios including a synthetic example, an LQR example and autonomous driving scenarios.It is noted that each of these scenarios presents unique characteristics.The synthetic example and the LQR example are designed to show the application of HyHooVer in parameter synthesis.The synthetic example, in addition, allows us to evaluate the performance of HyHooVer in state spaces with higher dimensions and modes.The purpose of the driving scenarios including BrokenLidar and Roundabout is to demonstrate how HyHooVer performs in safety verification.These scenarios are created specifically to address the challenges of testing the safety of automatic braking and collision warning systems, which are becoming standard features in vehicles.According to statistics, over 25% of accidents involve rear-end collisions, and about 85% of these occur on straight roads [34].The BrokenLidar and Roundabout scenarios are designed to capture these features.The BrokenLidar scenario assesses the performance of HyHooVer in rare event scenarios, while the Roundabout scenario is created using a car simulator called Highway-env.This simulator is equipped with motion planning algorithms and controllers that enable the creation of autonomous driving scenarios.The goal of the Roundabout scenario is to demonstrate the application and performance of HyHooVer in decision-making tasks related to autonomous driving.Examples of these scenarios are described in detail below.

1) Synthetic EXAMPLE
Let Z ⊂ R m , and let f : [L] × Z → R be defined as .This means that the near-optimal nodes of the function can be bounded by a constant B as defined in Definition 2. As a consequence, the near-optimality dimension of both g and f is also equal to zero.

2) LQR EXAMPLE
Consider the following linear system in discrete time: where A ∈ R n×n and B ∈ R n×m are unknown matrices, and we are interested in finding a state-feedback gain matrix W that minimizes the cost function c(W ) for the control policy u t = −W x t .The cost function is defined as follows: where D is a distribution that initial state x 0 is randomly drawn from N ( x0 , σ 2 I ) for some x0 and σ , and matrices Q ∈ R n×n 0 and R ∈ R m×m 0 that parameterize the cost.
The optimization problem we want to solve is sup where B is a given set and we only have access to noisy observations ∞ t=0 (x T t Qx t + u T t Ru t ) with x 0 ∼ D. This optimization problem is similar to (3) in Section II.

3) BrokenLidar SCENARIO
This scenario models a car running on a single-lane road and a pedestrian crossing the road in front of the car.The car is equipped with a broken Lidar device that somehow cannot detect obstacles in a specific direction.If the car detects the pedestrian, it starts braking.The probability of detecting the pedestrian is a function of the relative angle and distance (θ , d) between the car and the pedestrian is given by where θ b is a constant angle along which the pedestrians cannot be detected, and d b is the maximum distance that the Lidar can cover.Here s is a parameter controlling how fast the probability decreases when θ → θ b .The detecting probability also declines as the distance d increases.This model is almost always safe unless θ = θ b holds constantly for every step.This rarely happens when the speed and initial position of the car and the pedestrian satisfy some constraints.This is a realistic verification problem in the sense that the system is almost always safe unless it triggers some bugs.The goal of verification is to find such rare unsafe cases before deployment.
Let unsafety be defined as where the distance between the car and pedestrian falls below a certain threshold.We aim to solve the optimization problem for a given k, U and s where is a given set of possible initial position and speed of the car and the pedestrian.This optimization problem is similar to (2) in Section II.

4) Roundabout SCENARIO
This scenario is described in detail in Section I-B and is illustrated in Fig. 1.
It is worth noting that the unsafety probability/objective function in all these scenarios exhibits nonlinearity concerning the states/parameters.

B. PERFORMANCE OF HyHooVer WITH VARIOUS BATCH SIZE AND SMOOTHNESS PARAMETERS
The batch size parameter in the tool serves as a hyperparameter designed to reduce running time and memory usage.However, it requires careful consideration to ensure that it does not compromise the accuracy of the verification or synthesis process.Theoretical analysis, as presented in Theorem 1, shows that the simple regret concerning the batch size b scales as ) with a sampling budget of N. In Fig. 4, the convergence rate of the actual regret and the theoretical regret rate is depicted for instances of the Synthetic example with different sampling budgets N = 50K, 100K, 300K.The actual regret rate is shown to be upper-bounded by the theoretical rate provided by the theorem.While larger batch sizes can negatively impact the regret, choosing an appropriate batch size parameter b can significantly improve the running time and memory usage of HyHooVer without substantially sacrificing the quality of the final result.This observation is supported by Fig. 5 and Table 1.Fig. 5 illustrates the impact of the batch size parameter b on the performance of HyHOO for the Synthetic example.When the sampling budget exceeds 100K, the performance of   On the other hand, Table 1 demonstrates that with a sampling budget of 100K and batch size b ≤ 20, we can achieve comparable results with a smaller tree size, reduced memory usage, and faster running time.In essence, finding the optimal value for the hyperparameter b, which minimizes running time while keeping the actual regret below a specific threshold, is challenging to calculate precisely.Nevertheless, based on our observations, we recommend using a batch size ranging from 5 to 20 to achieve faster running times while maintaining satisfactory results.
As discussed in Section III-B, it is not always practical to determine the exact upper bound for ρ.To investigate the impact of different choices of ρ max on the performance of HyHooVer, experiments were conducted and the results are recorded in Table 2.According to the results, the performance of the tool is not influenced by different choices of ρ max for the Synthetic example.However, when ρ max is higher, the tool tends to perform a more aggressive search, resulting in more thorough exploration of the shallower levels of the tree due to uncertainty in observations over the region of state space associated with each node.Therefore, if the smoothness of the model is unknown, a higher ρ max can be chosen as a conservative approach, which still provides reasonably good estimates.

C. PERFORMANCE OF HyHooVer IN COMPARISON WITH OTHER METHODS
We found that among the available SMC tools, Plas-maLab [35] is the most similar to HyHooVer in terms of its ability to verify black-box systems with continuous/hybrid state spaces.It should be noted that Storm [36] and PRISM [26] do not have support for continuous state-space models.While it would be possible to compare HyHooVer with these tools using discrete versions of the examples, the comparison would not be appropriate as the guarantees offered by these tools are different.The SMC approach in [30], is related, but we were unable to find an implementation to compare against.Given this we conduct a more in-depth comparison with PlasmaLab.In addition, we conduct experiments with BoTorch [31] which is a Bayesian optimization based tool for black-box optimization of stochastic system and we provide in-depth comparison with HyHooVer's performance.For scenarios involving a hybrid state space, we also compare the performance of our tool with a traditional approach called baseline HOO.More information about these approaches is provided below: r PlasmaLab: It utilizes a smart sampling algorithm [37] to efficiently distribute the simulation budget among the schedulers of an MDP.To use this algorithm, one must set the parameters and δ in the Chernoff bound, with N max > ln (2/δ)/(2 2 ), where N max is the per-iteration simulation budget.We set the confidence parameter δ to 0.01, and then, for a given N max , obtained the precision parameter by = √ ln (2/δ)/(2 × 0.8 × N max ).To make a fair comparison, we developed a PlasmaLab r BoTorch: It is a software tool that is designed for black-box optimization of expensive-to-evaluate functions through Bayesian optimization techniques.It is a PyTorch-based library that can be utilized in a range of situations, including hyperparameter optimization for machine learning models, as well as scientific and engineering problems.To utilize BoTorch for our specific scenarios, we employ its Ax module.It is worth mentioning that we compare BoTorch with our tool in both parameter synthesis and safety verification scenarios.
r Baseline HOO: A common approach for addressing a problem with multiple modes is to divide the budget equally among the modes and conduct multiple HOOs and then compare the results.The theoretical guarantee of this approach is discussed in Section III-B.The evaluation of the four approaches (HyHooVer, Plas-maLab, BoTorch, and baseline HOO) involves the use of two metrics: sample complexity and running time.More information on the detail of these comparisons is provided as we go through the scenarios.Unless otherwise stated, we fix the batch size parameter to b = 10 and the maximum smoothness parameter to ρ max = 0.95 for all experiments conducted using HyHooVer.Additionally, we present the results of 10 independent runs for each approach.

1) PERFORMANCE OF HyHooVer ON PARAMETER SYNTHESIS
We tested HyHooVer on an instance of the Synthetic example with m = 10, L = 10 to evaluate the performance of HyHOO in high-dimensional state space with multiple modes.The results shown in Fig. 6 indicate that HyHOO needs fewer queries from the system compared to the baseline HOO, which is consistent with the theoretical guarantees provided for both methods in Section III-B.
Examining the plots more closely, it is evident that both methods show a decline in performance after initial improvement.This is mainly due to the non-monotonic behavior of the function f (x) in (5).Specifically, the UCB algorithm initially discovers regions of the state space that is close to optimal, but as it continues to explore, it finds other regions that are also near optimal but separate from the initial ones, and it then chooses to explore further to reduce the uncertainty of its observations.Remark 2: We did not compare the running time performance of our tool with PlasmaLab, despite noticing a significant lower running time of HyHOO compared to PlasmaLab to reach the same level of performance in the BrokenLidar and Roundabout scenarios.This decision was made because the PlasmaLab plug-in is developed in Java, and we used a simulation bridge between Python and PlasmaLab.The difference in performance could potentially be attributed to communication issues between the two languages, making a comparison between the two tools unfair.
We conducted a comparison between our tool and BoTorch by testing their performance on the Synthetic example in terms of their running time.The results are illustrated in Fig. 7.The graph demonstrates that HyHOO outperforms baseline HOO and BoTorch in terms of running time.Notice HyHOO achieves a result after only 0.04 hours of running time, whereas BoTorch could not achieve this level of performance even after running for 1.12 hours, indicating that HyHOO is approximately 30 times faster than BoTorch.It is noted that we expect the other approaches to achieve similar results as HyHOO achieves in a longer run; however, due to high computational times, we decided not to continue the runs for the other methods.In addition, note that BoTorch needed 1250 queries to achieve this result, whereas HyHOO requires 100K queries to achieve these results.Moreover, with 1250 queries, HyHOO attains a performance result of 0.677 ± 0.006, while BoTorch achieves a performance result of 0.734 ± 0.106 with the same number of queries.In summary, while HyHOO performs better in terms of running time, BoTorch performs better in terms of number of queries.
Remark 3: We did not compare the performance of Hy-HooVer and BoTorch based on the number of queries due to the long running time required for BoTorch to obtain its results.To explain further, Bayesian optimization is computationally expensive for two main reasons: (1) it requires an iterative numerical optimization step to fit a probabilistic model to the observed data, and (2) it involves optimizing an acquisition function to select the next point for evaluation at each iteration.In contrast, HyHooVer does not require such optimization steps, which is why it has a faster running time compared to BoTorch.Therefore, in scenarios where the cost of evaluating the system at each iteration is high and dominates the optimization process, BoTorch is expected to perform better than HyHooVer.However, in scenarios where the opposite is true, HyHooVer is expected to outperform BoTorch.The Synthetic example is an instance of the latter scenario.
We tested HyHooVer on an instance of the LQR example with n = 2, m = 1 to demonstrate its application in parameter synthesis in an optimal control setting.This problem has an analytical solution which can be computed by solving Algebraic Riccati Equation.The results shown in Fig. 8 indicate the accuracy of estimation is improved with increasing the sampling budget.

2) PERFORMANCE OF HyHooVer ON SAFETY VERIFICATION
We tested HyHooVer on instances of the BrokenLidar scenario with m = 4, L = 1 and different s parameters.Different parameters s result in functions with different slopes around their optimum.The smaller s corresponds to rarer unsafe  event.The results shown in Fig. 9 demonstrates comparison between performance of HyHooVer with s = 5e −5 , and Plas-maLab with s = 5e −2 .It suggests that as the unsafe event gets rarer (with s = 5e −5 ), HyHOO's performance is superior to PlasmaLab in terms of number of queries.This is because as s decreases, the number of near-optimal regions also decreases and from a theoretical standpoint, this corresponds to the smaller constant B introduced in Definition 2 which also corresponds to a lower simple regret in Theorem 1. Thus HyHOO offers an advantage if the function exhibits a sharper slope around the maxima in comparison with methods that rely on random sampling like PlasmaLab.outperforms BoTorch.In the case s = 1e −3 , this difference became even more significant.In this case HyHOO achieved its best performance in roughly 2 minutes, whereas BoTorch after 16 minutes still could not achieve one-fifth of HyHOO's result.It is noted that the BoTorch achieves these results with 500 queries.This scenario is another example of the scenarios where the cost of evaluating the system at each iteration is dominated by the cost of the optimization process in BoTorch and as a result HyHooVer performs better than BoTorch.
Finally, we ran HyHooVer on instances of the Roundabout scenario with m = 4, L = 15 and compared its performance with both the baseline HOO and PlasmaLab.As shown in Fig. 11, HyHOO generally performs better than the baseline HOO, which supports our claim that HyHooVer is effective when the number of modes increases.This is because the baseline HOO approach spends a significant portion of the budget on exploring regions of the state space associated with modes that do not contain the optimum.In addition, HyHooVer is more sample-efficient than PlasmaLab because the later relies on random exploration which is not efficient in state spaces with multiple modes.

V. CONCLUSION AND FUTURE FOCUS
We introduced HyHOO, an optimistic mini-batched tree search algorithm designed for verification and parameter synthesis in a specific class of discrete-time Markov chains.These Markov chains have states consisting of both discrete and continuum-based components.HyHOO operates by sequentially sampling executions of the Markov chain in batches, leveraging a mild assumption about the smoothness of the objective function.Its objective is to find solutions that are near-optimal, minimizing the corresponding regret.We provided theoretical regret bounds that consider factors such as the sampling budget, smoothness, near-optimality dimension, and sampling batch size.Importantly, HyHOO exhibits effective performance without requiring exact parameters or quantities.We created a tool named HyHooVer and assessed its effectiveness by analyzing various benchmark models and we showed that our approach competes favorably with BoTorch (a Bayesian-based tool), PlasmaLab (a tool based on random sampling), and the baseline HOO method, in terms of sample efficiency and/or running time.
Our current approach focuses on minimizing regret but could be extended to minimizing the sampling budget.This presents an intriguing area for further investigation and exploration.

FIGURE 1 .
FIGURE 1. Roundabout scenario which can be instantiated with different number of vehicles and initialization.

FIGURE 2 .
FIGURE 2. HyHooVer components.Input parameters N, K, (ρ max , ν max ), b, and σ stand for sampling budget, number of HyHOO instances, upper bounds for smoothness parameters, batch size parameter, and noise parameter, respectively.We fix K = 1 and ν max = 1.0 throughout all experiments in Section IV.We will discuss the choice of ρ max and b later in this section.
where g(z) = 0.5 sin(13z 1 ) sin(27z 1 ) z i the i-th component of z.Suppose we have access to noisy observations y(x) = f (x) + with ∼ N (0, σ ) for some σ > 0. Our goal is to solve the optimization problemsup x∈B f (x)for a given set B ⊂ [L] × Z.This problem is similar to the one presented in (3) in Section II.The function sin(13z 1 ) sin(27z 1 ) is an example of a function with a near-optimality dimension d m = 0, as introduced in[5]

FIGURE 4 .
FIGURE 4. Actual regret rate and theoretical rate versus different batch sizes b on an instance of the Synthetic example with m = 10, L = 1, and σ = 0.1.Here, ρ max = 0.95, and any data represented by "*" is averaged over 100 runs.It is noted that the lines are fitted to the data using the least square method, and the slopes of the lines are 0.25, 0.21, 0.22 for the three sampling budgets, respectively.

FIGURE 5 .TABLE 1
FIGURE 5. Impact of batch size b on performance of HyHooVer on instance of Synthetic example with m = 10, L = 1 and σ = 0.1.Here ρ max = 0.95 and results are averaged over 10 runs.

TABLE 2
Impact of smoothness parameter ρ max on the results returned by HyHooVer on instance of Synthetic example with m = 10, L = 1 and σ = 0.1.Here N = 200K, b = 20 and results are averaged over 10 runs.

FIGURE 6 .
FIGURE 6. Performance of HyHooVer in comparison with baseline HOO on instance of Synthetic example with m = 10, L = 10, and σ = 0.1.

FIGURE 8 .
FIGURE 8. Performance of HyHooVer on instance of LQR example with n = 2, m = 1, and σ 2 = 0.001.Here c refers to the value of cost evaluated at W returned by HyHooVer.In addition, W * and c * refers to the optimal state feedback gain and optimal cost computed by solving Algebraic Riccati Equation, respectively.

FIGURE 9 .
FIGURE 9. Performance of HyHooVer in comparison with PlasmaLab on instance of BrokenLidar scenario with m = 4 and L = 1.

FIGURE 10 .
FIGURE 10.Performance of HyHooVer in comparison with BoTorch on instance of BrokenLidar scenario with m = 4 and L = 1.

Fig. 10
displays the performance of HyHOO and BoTorch in terms of running time with different values of s parameter s = 5e −2 , s = 1e −2 , and s = 1e −3 .The graph indicates that as the value of s decreases, which implies a rarer event, HyHOO

FIGURE 11 .
FIGURE 11.Performance of HyHooVer in comparison with baseline HOO and PlasmaLab on Roundabout scenario with m = 4 and L = 15.