Control of Black-Box Embedded Systems by Integrating Automaton Learning and Supervisory Control Theory of Discrete-Event Systems

The paper presents an approach to the control of black-box embedded systems by integrating automaton learning and supervisory control theory (SCT) of discrete-event systems (DES), where automaton models of both the system and requirements are unavailable or hard to obtain. First, the system is tested against the requirements. If all the requirements are satisfied, no supervisor is needed and the process terminates. Otherwise, a supervisor is synthesized to enforce the system to satisfy the requirements. To apply SCT and automaton learning technologies efficiently, the system is abstracted to be a finite-discrete model. Then, a $C^{*}$ learning algorithm is proposed based on the classical $L^{*}$ algorithm to infer a Moore automaton describing both the behavior of the system and the conjunctive behavior of the system and the requirements. Subsequently, a supervisor for the system is derived from the learned Moore automaton and patched on the system. Finally, the controlled system is tested again to check the correctness of the supervisor. If the requirements are still not satisfied, a larger Moore automaton is learned and a refined supervisor is synthesized. The whole process iterates until the requirements hold in the controlled system. The effectiveness of the proposed approach is manifested through two realistic case studies. Note to Practitioners—Supervisory control theory of DES can synthesize maximally permissive supervisory controllers to ensure the correctness of software-controlled processes. The application of supervisory control theory relies on automaton models of the plant and specifications; however, the required models are often unavailable and difficult to obtain for black-box embedded systems. Automaton learning is an effective method for inferring models of black-box systems. This paper integrates the two technologies so that the supervisory control theory is applicable to the development of black-box embedded software systems. The proposed approach is implemented in a toolchain that connects automaton learning algorithms, SCT, and testing algorithms via scripts. The obtained supervisor is implemented as a software patch to monitor and control the original system online.


I. INTRODUCTION
I N THE development of complicated embedded systems, third-party components or legacy subsystems are reused to reduce the development costs. The source code of the reused components is often unavailable. If some requirements of the new application are not satisfied in the reused component, it is impossible to amend the system by modifying its source code. Although fault identification methods [1], [2] can detect the violations, they cannot correct the errors online. An alternative approach is to use a "supervisor" component to monitor the reused component and correct it, if necessary. The supervisory component can be designed by the supervisory control theory (SCT) of discrete-event systems (DES) [3].
In SCT, both the plant and requirements are specified in formal models, such as finite state automata [3], [4] and Petri nets [5]- [7]; however, the formal models of the reused components are often unknown in realistic systems. Since the source code of the reused components is unavailable, it is difficult to obtain a logical model for the system by analyzing its source code [4]. This paper considers the reused component as a black-box system [8] of which only input and output sequences are observable. Moreover, user requirements are often described as textual description, unified modeling languages (UMLs), structured natural languages, logical formulas, and so on. It is elusive to build appropriate automaton models for textual requirements [4]. Due to the lack of formal models, SCT can hardly be used for the verification and control of black-box systems. Fortunately, automaton learning algorithms can infer automaton models for systems and check the validity of requirements from the input and output trajectories of the system [9]- [12]. Thanks to these learning methods, this This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see http://creativecommons.org/licenses/by/4.0/ paper proposes an approach to synthesize a supervisor for a system by integrating automaton learning algorithms and SCT together to make requirements satisfiable when the automaton models of both the system and requirements are unavailable.
Studies integrating automata learning methods and SCT to synthesize supervisors for the plant are investigated in [13]- [20]. When the model of the system is confined to a finite look-ahead window [21], an optimal supervisor is learned by an adapted L * learning algorithm [13], [14]. In the case that the plant model is completely unavailable, the L * algorithm is modified to infer supervisors by observing uncontrollable illegal strings [15]- [17]. When the model of the system is known but that of the requirement is not, Hiraishi [18] proposes the K * algorithm to construct a reduced supervisor for the system. We have been studying supervisor synthesis for black-box systems in recent years. In [19], a deterministic finite automaton (DFA) model is inferred for a black-box system by LBTest [22]. The requirements specified in linear temporal logic (LTL) are converted into automata by tools such as LTL2BA [23]. Finally, a supervisor is computed using SCT. In [20], an S * learning algorithm is proposed to directly infer a supervisor for the system. If the learned supervisor is blocking, SCT is used to compute a nonblocking supervisor in a further step.
In SCT, the first step for computing a supervisor is to calculate the synchronous product of the automaton models of the plant and the requirement [3], [24]. Then, a supervisor is derived on the synchronous product automaton by removing blocking and uncontrollable states iteratively. Motivated by this, this paper proposes a learning algorithm to infer a Moore automaton describing both the behavior of the system and the conjunctive behavior of the system and requirements. The learned Moore automaton is used by an SCT algorithm to synthesize the supervisor. The new approach is more general than our previous method in [19], where the automaton models of the plant and the requirement are obtained separately. If the requirement cannot be described by a regular language but the intersection of the system and requirement can, the method in [19] cannot learn the automaton model of the requirement. Then a supervisor cannot be computed by SCT using the previous approach. On the contrary, the new approach in this paper can still find the supervisor if the intersection of the system and the requirement is regular.
Integrating automaton learning algorithms and SCT, this paper presents a novel approach to the supervisory control of black-box embedded systems, where the automaton models of both the systems and requirements are not directly available or hard to obtain. First, the system is tested against the requirements. If all the requirements are satisfied, the approach terminates. Otherwise, the system is abstracted to be discrete if its state space is continuous or extremely large. Subsequently, the C * algorithm adapted from the L * learning algorithm is proposed to construct a Moore automaton describing both the behavior of the system and the conjunction behavior of the system and requirements. Then, a supervisor for the system is computed based on the learned automaton by SCT. Next, the supervisor is implemented as a patch to monitor and control the original system. Then, the controlled system is tested again to check the reliability of the supervisor. The procedures of automaton learning, supervisor computing, and system testing are performed in an iterative manner until the requirement holds. The C * algorithm is implemented based on LearnLib [25], [26]. Scripts connecting LearnLib, TCT [27], and LBTest [22] are developed to automate the process. This paper provides two main contributions.
1) An approach for the supervisory control of black-box embedded systems is presented, where both the automaton models of the system and requirements are not directly available or hard to obtain. 2) A C * learning algorithm based on the L * algorithm is proposed to infer a Moore automaton describing both the behavior of the system and the conjunction behavior of the system and requirements such that the supremal nonblocking supervisor of the problem can be synthesized from the learned automaton by SCT. The organization of the remaining paper is as follows. Section II concisely describes the primary knowledge of automata and SCT and gives a rough sketch of the L * learning algorithm. Section III states the proposed approach in detail. A small example is given in Section IV to illustrate the implementation detail of the proposed approach. Experimental studies on realistic systems are performed in Section V. Finally, the paper is concluded in Section VI.

A. Basics of Automata
An alphabet is a nonempty finite set of symbols. All finite strings over including the empty string form * . Let σ ∈ and s ∈ * . The symbol #σ (s) denotes the number of occurrences of σ in s. A language L over is a subset of * . The concatenation of strings w 1 and w 2 in * is represented by w 1 .w 2 or w 1 w 2 for shorthand. The concatenation of two languages L 1 and L 2 is L 1 .L 2 = {l 1 l 2 |l 1 ∈ L 1 & l 2 ∈ L 2 }. Given two strings s and s in * , if there is a string u in * such that s .u = s, s is said to be a prefix of s. The prefix closure L of a language L consists of all the prefixes of all the strings in L. If L = L, L is prefix-closed. If there exist strings u and s in * such that u.s = s, s is a suffix of s. All the suffixes of all the strings in L form the suffix closure of L. If L is equivalent to its suffix closure, then L is suffix-closed.
A DFA A over an alphabet is a quintuple where Q set of states; alphabet; δ: Q × → Q transition function; q 0 ∈ Q initial state; Q m ⊆ Q set of marker states.
Function δ is considered to be partial in a DFA, i.e., for σ ∈ , δ(q, σ ) is not always defined. The extension of the function δ from domains Q × to Q × * is defined recursively: L(A) = {s ∈ * |δ(q 0 , s) is defined } and L m (A) = {s ∈ L(A)|δ(q 0 , s) ∈ Q m } are the closed and marked languages of A, respectively. A Moore automaton M over an alphabet is a six-tuple M = (Q, , O, δ, λ, q 0 ), where Q, , and q 0 have the same meanings as in a DFA. The finite set O consists of output symbols. λ : Q → O represents the output function defined on states. The transition function δ and λ in a Moore automaton are completely defined. If O = {true, false} and λ(q) = true ↔ q ∈ Q m , the Moore automaton M is isomorphic to a DFA A.

B. Supervisory Control Theory
SCT is a control technique to enforce a plant to satisfy its requirement in the DES framework [3], [28]. Assume that an uncontrolled DES, namely a plant, is modeled by a DFA G with an alphabet , where is partitioned into two disjoint subsets c and u , i.e., = c u . Set c consists of controllable events that can be disabled, and set u consists of uncontrollable events that cannot be prevented from happening. In SCT, the requirement that the plant must satisfy is modeled by a finite automaton H, which has the same alphabet as G.
The premise is that the behavior of G may violate the requirement and must be modified through feedback control. In order to alter the behavior of G, a supervisor implements the control function V : The controllable language is a fundamental concept in SCT [28].
Definition 1: A language K ⊆ * is controllable with respect to a prefix-closed language L ⊆ * and an uncon- Given a language M ⊆ L, the set of all sublanguages of M that are controllable with respect to L is The supremal element of C(M, L) is supC(M, L) [28].

C. Essential Description of the L * Algorithm
The L * algorithm is a classical active automaton learning algorithm for identifying unknown regular languages [9]. Suppose that a set U is the target language defined over an alphabet . The L * algorithm infers U by constructing a canonical DFA A such that L m (A) = U .
In the learning process, membership queries and equivalence queries are generated by the learner. The membership query asks whether a string s ∈ * belongs to U and the equivalence query asks whether the learned language is equal to U . A minimal adequate teacher (MAT) is assumed to answer the two types of queries. An observation table (S, E, T ) is the key structure of the learning algorithm, where the set S is nonempty prefix-closed, the set E is nonempty suffix-closed, and T is a function T : After a conjecture is learned, an equivalence query is asked to check whether the conjecture equals to the target. If the answer is positive, the L * algorithm terminates. Otherwise, a counterexample is identified by the MAT to refine the conjecture. All the prefixes of the counterexample are added to the set S to refine the conjecture. Then a new conjecture is constructed. The learning process iterates until no counterexample is found.
Let the number of states of the canonical DFA that accepts U be m and the upper bound on the length of any counterexample provided by the MAT be n. Angluin [9] proves that the total running time of the L * algorithm is bounded by a polynomial with respect to m and n.
III. LEARNING AND CONTROL APPROACH Given an embedded system and its requirements, our approach provides a complete solution including testing and supervisor synthesis to correct the system online. The approach is effective when the following assumptions hold.
1) The input and output of the system are observable.
2) The membership queries to the requirement can be answered by a computer program through analyzing input and output signals. 3) With proper abstraction, the system can be represented by a DFA. 4) The conjunction of the system and the requirement is representable by a regular language. Fig. 1 illustrates the procedure of the proposed approach. The inputs of the approach are a black-box system and a requirement. First, the system is tested against the requirement. If the requirement is satisfied, the procedure terminates. Otherwise, the approach designs a supervisor to prevent the system online from reaching unacceptable states that violate the requirement. Since a realistic system may involve large or infinite domains of input and output, it is necessary to abstract the system to be discrete such that automaton learning algorithms and SCT can work effectively. When the automaton models of the system and requirement are unavailable, the C * algorithm is proposed to infer a Moore automaton capturing the conjunction behavior of the system and the requirement. Then, a supervisor is computed from the learned automaton by SCT in the fourth step. Finally, the supervisor is supplemented to the system to ensure its correctness. To assure that the controlled system satisfies the requirement, the system is tested again. If the requirement is still not satisfied in the controlled system, a larger automaton is inferred in the next learning process. A new supervisor is computed and patched to the original system. The whole procedure iterates until the controlled system satisfies the requirement. Sections III-A-III-E elaborate on the technical details of these steps.

A. Testing the System
The embedded system is tested against the requirement. If the system satisfies the requirement, no supervisor is needed. Then, the approach terminates. There are plenty of technologies for testing black-box systems, such as boundary value analysis, equivalence partitioning, error guessing, and so on [29]. This paper applies two different testing methods to the two case studies. Section V-A applies learning-based testing [30] to a brake-by-wire (BBW) system and Section V-B applies random testing to a vehicle platooning program.

B. System Abstraction
The purpose of the system abstraction is to convert the continuous input and output domain as discrete ones. In embedded control systems, the input signals to the embedded control software are typical reference values determined by the human or other decision functions and sensor measurements. The output signals of the embedded control system are typically sensor measurements and system states actively sent out by the software. These signals are naturally observable.
Suppose a system with m input signals x 1 , . . ., x m and n output signals y 1 , . . ., y n , whose domains are X i (i = 1, . . . , m) and Y j ( j = 1, . . . , n), respectively. Define the Cartesian products X = X 1 ×· · · × X m and Y = Y 1 ×· · ·×Y n . Evaluations of the input and output signals are represented by vectors x ∈ X ⊆ R m and y ∈ Y ⊆ R n , where R is the set of real numbers.
In realistic systems, the domains of input and output may be continuous or finite but extremely large. Since the automaton learning algorithm and SCT are applicable only for finite-state problems, it is necessary to abstract the real input and output to discrete ones. To this end, two functions f in and f out are defined to abstract the concrete input and output to abstract ones, respectively.
We intend to obtain an abstract system such that the desired requirement can be analyzed while hiding irrelevant details. Generally, the abstraction of input and output of the embedded system is realized by partitioning the continuous or large finite domains into finite grids. Based on the description of the requirement, abstract output symbols can be first identified. Then, the output abstract function f out is defined. Assume that there are p output symbols ω 1 , . . . , ω p considered in the requirement, of which the corresponding domains are 1 , . . ., p . An evaluation of the abstract output is a vector Let be the set of all abstract input events. Definition 5: The output abstract function of the system is Definition 6: The input refinement function is defined as The input refinement function is a dual of the output abstract function, because it converts an abstract input event to a concrete evaluation of the input vector x and sends the vector to the black-box system.
Converting continuous input and output signals into abstract input and output events is both important and challenging for finding an approximate discrete model of the real system. A bad selection of the output abstract function and input refinement function misses critical information of the system. The decision in practice requires a significant amount of prior knowledge about the system and iterative tests. Normally, the finer is the discretization step, the larger is the conjecture model. A guideline of selecting proper discretization resolution is to use the lowest resolution that allows the user to synthesize an effective supervisor for the system. In the future, we shall also study more advanced automaton learning methods to reduce the complexity of the conjecture model.

C. Learning Moore automata
Suppose that the marked and closed languages of the system are L and L, respectively. Let H be the prefix-closed language of the requirement. In SCT, the nonblocking supremal supervisor is calculated by supC(L ∩ H, L). Therefore, it is necessary to determine the sets L ∩ H and L to compute the supervisor for the system. To this end, the set * is partitioned into three disjoint sets. 1) L ∩ H represents the marked system behaviors that also satisfy the requirement. 2) L − (L ∩ H ) represents all system behaviors that do not belong to the first subset. 3) * − L represents all behaviors that are not acceptable by the system. The C * algorithm is proposed to infer a Moore automaton that distinguishes the foregoing three types of languages.
1) Membership Query: The L * algorithm relies on a MAT to correctly answer membership queries. This paper develops a software program to answer these queries. The program sends the event sequences to the input of the black-box system, monitors the corresponding output sequences, and then determines the validity of the input event sequences. Formally, a membership query function is defined to answer membership queries.
Definition 7: The membership query function is defined as It is trivial to prove the following proposition. (1) Proposition 2: The Moore automaton M(S, E, T ) defined in Definition 8 is closed in Q and deterministic.
The proof of this proposition is similar to that of [20,Proposition 4].
Proposition 3: The Moore automaton M(S, E, T ) defined in Definition 8 is minimal.
The proof of this proposition is similar to that of [20,Proposition 6].
3) Equivalence Checking: Let the learned automaton be M = (Q, , O, δ, λ, q 0 ). Equivalence queries ask whether the conjecture M is isomorphic to the target automaton model. In the L * algorithm, a MAT is assumed to answer equivalence queries. In practice, however, the MAT is unavailable, because the target automaton model is unknown. It is impossible to directly check the isomorphism of M and the target structurally. We perform equivalence checking by verifying the behavioral equivalence between M and the target. The behavior of the target is described by the membership query function f defined in Definition 7. The procedure for equivalence checking is illustrated in Fig. 2.
Definition 9: If there is a string s in * such that f (s) = λ(δ(q 0 , s)), s is a counterexample between M and the system.
The challenge of the procedure in Fig. 2 is how to efficiently generate test strings and when to terminate the procedure. If a counterexample is found, the checking procedure terminates; if no counterexample is found after checking many test strings, one cannot decide whether the conjecture is indeed equivalent to the target or more test strings should be examined. The difficulty is similar to increasing the test coverage in software testing.
In the field of software testing, methods such as W-method [31], Wp-method [32], complete exploration, random testing, etc., are usually used to generate test strings for finite-state automata. The selection of these methods depends upon the specific systems to be learned. In Section V-A, the length of counterexamples found in the learning process of the BBW system increases gradually. Hence, the Wpmethod is adopted. In Section V-B, however, the length of counterexamples of the platooning program is relatively long. Hence random testing is more suitable.
Random testing method generates random test strings in * of length between 0 and an upper bound. The test process stops if either a counterexample is found or a time upper bound is reached. The Wp-method generates test strings as follows. Assume that the numbers of states of the conjecture M and the target automaton are m and n, respectively. Let V = 0 ∪ ∪ 2 ∪ · · · ∪ (n−m) . Denote P, U and W as the transition cover set, state cover set, and characterization set of M, respectively, whose detailed definitions are given in [32]. The state cover set is a subset of the transition cover set, i.e., U ⊆ P. At every reachable state q i ∈ Q, the Wp-method defines the identification set W (q i ) ⊆ W [32].
The Wp-method consists of two steps.
Step 1 tests all strings in the set U.V.W . If no counterexample is found, step 2 continues to test all strings in the set Under the assumption that the state number n of the target automaton can be correctly estimated, Fujiwara et al. [32] prove that the conjecture M is isomorphic to the target if there is no counterexample found by the Wp-method. In practice, however, the state number of the black-box software system can hardly be estimated. Equivalence query is essentially software testing and hence cannot guarantee the equivalence between the learned conjecture and the target even though no counterexample is found. This is an inherent limit of the learning-based method.
4) C * Learning Algorithm: Suppose that the conjunctive behaviors of the black box system and the requirements can be represented by a Moore automaton M c = (Q, , O, δ, λ, q 0 ), where L i = {s ∈ * |λ(δ(q 0 , s)) = i } for i = 0, 1, 2. We call the Moore automaton the target of the learning algorithm. To infer the target Moore automaton M c , this paper proposes the C * learning algorithm in Algorithm 1, which is adapted from the classical L * algorithm [9]. The differences between the L * and C * algorithms are illustrated as follows.
1) A DFA is learned by the L * algorithm, but a Moore automaton with output set O = {0, 1, 2} is constructed by the C * algorithm.
2) The L * algorithm assumes that an MAT answers membership queries, but our C * algorithm checks whether the queried strings are acceptable by the system and the requirements via executing the system. 3) The L * algorithm assumes that the MAT always correctly answers the equivalence queries, but our C * algorithm applies software testing technologies (e.g., random testing and Wp-method) to check the equivalence between the conjecture automaton and the real system. At line 16, a testing method is called to find a counterexample between M and the target. If t is a counterexample, all its prefixes are added to the set S. The observation table is enlarged, and the conjecture is refined. If no counterexample exists, the algorithm returns the conjecture M and terminates.
Proposition 4: The C * algorithm terminates. The proof of this proposition is similar to that of [20,Proposition 7].
Assume that the number of states of the target M c is n and the maximum length of all the counterexamples provided by the equivalence checking procedure is m. In Algorithm 1, both S and E have one element initially. There is at least one state in the initial conjecture. If the observation table is not consistent, one element is added to the set E. If the observation table is not closed, one element is added to the set S. After the observation table becomes consistent and closed again, the number of states of the conjecture increases at least by 1. Therefore, the number for checking the consistency and closed properties of the C * algorithm is at most n −1. There are n −1 counterexamples at most. For each counterexample, at most m strings are added to the set S. Therefore, the complexity of the C * is bounded by a polynomial function of m and n.
An example is given in Section IV-A for illustrating the C * algorithm.

D. Computing Supervisors
Let M = (Q, , O, δ, λ, q 0 ) be a Moore automaton constructed by the C * algorithm. The state set Q is partitioned as three disjoint sets Q 0 , Q 1 , and Q 2 , where Q i = {q ∈ Q|λ(q) = i }, i = 0, 1, 2. Evidently, the set Q 0 contains all dump states.
Definition 10: The automaton M = (Q , , δ , q 0 , Q m ) is a DFA derived from the Moore automaton M, In Definition 10, if q 0 / ∈ Q , then q 0 ∈ Q 0 and / ∈ L. The closed language L of the system is empty, which is rare for real systems. The languages L(M ) and L m (M ) describe the closed behavior of the system and the conjunction of the marked behavior of the system and the requirements, respectively. An illustrative example for computing supervisors is presented in Section IV-B.

E. Supervisor Implementation
The supervisor constructed in Section III-D is implemented as a patch to force the behavior of the system to follow the requirement. Fig. 3 shows the detailed implementation of the feedback control. The entities of the physical world (e.g., the system, actuators, and sensors) are denoted by gray blocks. There are two types of input signals transmitting to the embedded software (SW) components. The first type comes from the sensor measurement of the physical system, which is uncontrollable. The other one is the control commands from the embedded controller, which may be uncontrollable or controllable. The supervisor implements the control function by overriding the control commands from the embedded control software. The whole control mechanism works in a feedback loop. As a result, the supervisor enforces the behavior of the system to act as desired.
The combination of the original system and the supervisor is regarded as a new system. The procedure iterates to Section III-A to test if the requirement is satisfied. If the requirement is still not satisfied, a larger conjecture is inferred and a new supervisor is calculated. The processes of automaton learning, supervisor computing, and system testing iterate until the requirement is satisfied.

IV. ILLUSTRATIVE EXAMPLE
The proposed approach is illustrated by a simple example from [4], assuming that the automaton models of the plant and requirements are not available. The system contains two identical machines, which repeatedly perform the cycle of a i b i , i = 1, 2. The system is at a marker state when both  I   OBSERVATION TABLE I   TABLE II  OBSERVATION TABLE II machines have identical numbers of a i and b i . The alphabet of the system is = {a 1 , b 1 , a 2 , b 2 }.
Requirement: After an occurrence of event b 1 , b 1 shall not occur again until event b 2 occurs at least once.
Formally, a string s ∈ * satisfies the requirement if either #b 1 (s) ≤ 1 or It can be easily checked that the system does not satisfy the requirement. Since the alphabet of the system is already discrete, we do not need to abstract it.

A. Learning Moore Automata
The first step is to construct the target Moore automaton with respect to the system and requirement by the proposed C * learning algorithm. In the initial observation table (S 1 , E 1 , T 1 ),  Table I.
In Table I, since row(a 1 ) = 1 and row(b 1 ) = 0 are not equivalent to the only row of set S 1 , T 1 is not closed. Then, a 1 .a 1 , a 1 .a 2 , a 1 .b 1 , a 1 .b 2 , b 1 .a 1 , b 1 .a 2 , b 1 .b 1 , b 1 The updated observation table is shown in Table II. For brevity, if s ∈ S 2 . − S 2 and row(s) is a zero vector, then the row is omitted.  Fig. 4 is constructed by Definition 8. Each row of a string in S 2 of the observation table denotes a state of the conjecture. The outputs of the states are labeled in the circles. If two strings in S 2 have the same row vector, they represent the same state in Q 1 . If the row of a string is the zero vector, such as b 1 in T 2 , the state denoted by the string is a dump state. For simplicity, the dump states of all automaton models in this paper are not shown in the pertinent figures.  TABLE III   OBSERVATION TABLE III   TABLE IV   OBSERVATION TABLE IV To evaluate whether the automaton shown in Fig. 4 is equivalent to the system, an equivalence checking is performed by random testing. Since f (a 1 a 2 b 1 b 2 ) = 2, namely, the string is accepted by both the system and the requirement, but λ 1 (a 1 a 2 b 1 b 2 ) = 0, string a 1 a 2 b 1 b 2 is a counterexample. All prefixes of a 1 a 2 b 1 b 2 are added to S 2 of the observation table T 2 . Then S 3 = { , a 1 , b 1 , a 1 a 2 , a 1 a 2 b 1 , a 1 a 2 b 1 Table III.
In Table III, row(a 1 ) = row(a 1 a 2 ) but T 3 (a 1 a 2 ) = T 3 (a 1 a 2 a 2 ), and row(a 1 a 2 ) = row(a 1 a 2 b 1 ) but T 3 (a 1 a 2 a 1 a 2 b 1 b 1 ), Table III is not consistent. Then strings a 2 and b 1 are added to the set E 3 . The updated observation table T 4 has S 4 = S 3 and E 4 = { , a 2 , b 1 }, as shown in Table IV. The table is also closed. The conjecture M 2 = (Q 2 , , O 2 , δ 2 , λ 2 , q 2 0 ) shown in Fig. 5 is constructed correspondingly. For clarity, the state is labeled by the row vector in T 4 . The output of each state is represented by the first number of the vector at the state.
The equivalence query by random testing finds a counterexample c = a 1 b 1 a 1 b 1 a 1 a 2 b 1 b 2 . In Moore automaton M 2 , the string c reaches the initial state and the corresponding   output is 2, namely, λ 2 (c) = 2; however, the string c violates the requirement because event b 1 occurs three times before event b 2 occurs. Therefore, f (c) = 1. Adding all prefixes of c into set S 4 , the set is enlarged to S 5 = S 4 ∪ {c}. There is no change on the column set: E 5 = E 4 . From the observation table (S 5 , E 5 , T 5 ), we repeat the steps described in Algorithm 1 until no more counterexamples are found. The final Moore automaton model is shown in Fig. 6 with the dump state omitted. Then, DFA M is derived from the Moore automaton M 3 by Definition 10, which has the same state transition function as M 3 and has two marker states illustrated in Fig. 6.

B. Supervisor Computing
The supremal nonblocking supervisor can be derived from the DFA M based on Theorem 1 and the standard supervisor synthesis algorithm. If we take the unobservable event set as u = {b 1 , b 2 }, the supremal nonblocking supervisor is shown in Fig. 7. One can easily verify that the supervisor is isomorphic to the one obtained directly from the given DFA models of the plant and the requirement.

C. Supervisor Implementation
The supervisor is developed as an executed program according to the logical structure shown in Fig. 7. Then, the supervisor runs synchronously with the system to implement control actions on line.
V. EXPERIMENTAL STUDY A BBW system [33] and a platooning program [44] are studied on a personal laptop running Windows 10 with

8-GB RAM and an
Intel Core i7 4712MQ CPU at 2.30 GHz to illustrate the feasibility of the proposed approach on realistic systems. The C * algorithm is implemented based on the Learn-Lib framework. A Java program is developed for checking satisfiability of requirements. LBTest [22] is used for testing requirements and TCT [27] is used for computing supervisors. The toolchain of the proposed approach is implemented with some scripts connecting LearnLib, TCT, and LBTest together as depicted in Fig. 8. The output format of the learned conjecture is a dot file. The format of the input and output files of TCT is either the text (.ADS) file or the binary (.DES) file. The supervisor is used to control the realistic systems. Scripts 1 and 2 are developed to transform dot files to ADS files and ADS to jar files, respectively.

A. BBW System
The Brake-by-Wire (BBW) system is a hard real-time controller for automobiles [33]. Recently, many researchers explore the BBW as case studies for code validation [34], resource usage [35], requirements verification [36] and testing [33]. In this paper, the BBW system is an executable jar file developed by Feng et al. [33] and the requirements are described as natural languages. Automaton models of both the system and requirements are unavailable. Fig. 9 describes the hardware model of the BBW. Cubes denote the five electronic control units (ECUs). The four wheels are represented by gray round-corner rectangles. The gas and brake pedals are connected with the central ECU, while the other four wheels are attached to the other four ECUs. The input of the system is the position percentages of the gas and brake pedals. The software on the central ECU reads the input, calculates the global drive/brake torque, and distributes the torque to the four wheels. Then, the software on the four-wheel ECUs carries out the anti-lock braking system (ABS) function, which releases the brake actuator when the wheel is slipping.
1) Abstracting and Testing the System: The domains of the input and output of the BBW system are continuous. It is necessary to abstract the continuous system to a discrete one such that the proposed approach can be applied effectively. To distinguish concrete and abstract signals, the first letter of a concrete output signal is the lower case but the first letter of the abstract output is capital. The values of the abstract output are lower case.
Three requirements considered in [33] are described as follows.
Requirement 1: If the brake pedal is pressed and the wheel speed is greater than zero, the value of brake torque applied to the wheel by the corresponding ABS component shall eventually be greater than 0.
Requirement 2: If the brake pedal is pressed, the actual speed of the vehicle is larger than 10 km/h and a wheel is slipping, the corresponding brake torque at the wheel shall be zero.
Requirement 3: If both the brake and gas pedals are pressed, the actual vehicle speed shall be decreased.
For simplicity, the safety of the rear right wheel is studied as an example, and functions f in and f out with respect to Requirement 2 are introduced in detail. The output signals of the concrete system is: vehicle speed (veh_Speed), rotational speed of the rear right wheel (speed_RR), and torque value on the rear right wheel (torque_RR). Requirement 2 concerns three abstract output signals: vehicle status (VehStatus), brake torque on the wheel (TorqueRR), and wheel shipping status (SlipRR). Correspondingly, three output abstract functions f out 1 , f out 2 , and f out 3 are defined. Let y represent the concrete output vector.
The range of the symbol VehStatus is {moving, still}. The component output abstract function of VehStatus is The domain of the symbol TorqueRR is {nonzero, zero}. The component output abstract function of TorqueRR is The range of abstract output SlipRR is {slip, noslip}. Let P = 2 × veh_Speed/3.6 and T = 10 × (veh_Speed/3.6 − wSpeed_RR × whl_Radius), where w_SpeedRR and whl_Radius are angular velocity of the wheel and wheel radius. The component output abstract function of SlipRR is f out 3 (y) = noslip, veh_Speed < 10 or T <= P slip, veh_Speed ≥10 and T > P.
The input of the concrete system is the position percentages of the brake and gas pedals in the range [0,100], denoted by [braPos, gasPos]. The abstract alphabet of the input is = {brake, acc, idle}. The input refinement function is LBTest is a software testing tool integrating learning-based testing methods with model checking technology and can check both safety and liveness properties [22]. Requirements 1-3 are tested by LBTest as presented in [33]. When testing Requirements 1-3, no counterexample appears after performing 300 testing iterations. When Requirement 2 is tested, a violation appears at the fifth testing round. The counterexample is "acc,acc,acc,acc,brake,brake." The corresponding sequence of abstract outputs is " [ nonzero, slip]." The counterexample shows that a violation appears at the second brake event. When the vehicle is moving and the wheel is slipping, the brake torque at the wheel is nonzero.
2) Inferring Moore Automata: The C * algorithm is used to infer the target Moore automaton with respect to the system and Requirement 2 so that a supervisor can be synthesized to make the system satisfy Requirement 2. The Wp-method introduced in Section III-C3 is used for the equivalence checking. The correctness and efficiency of the Wp-method rely on the accurate estimate of the state size n of the true automaton model of the black-box system. In real applications, however, the estimation value is often hard to obtain. If the estimate is smaller than the real value, the Wp-method may not find the counterexample and the conjecture is wrong. If the estimate is too large, the Wp-method may not terminate within an acceptable period of time.
We propose an iterative approach to estimate n on the basis of the state size of the conjecture automaton [20]. At the i th iteration of the C * algorithm, if the state size of the (i − 1)th conjecture automaton is m i−1 , then the estimated state size of the target automaton isn i = m i−1 +l, where l is a nonnegative integer. Thus, the set V required by the Wp-method becomes V = 0 ∪ ∪ · · · ∪ l . The following experiments reveal that if a counterexample exists, it can be found with a small value of l.
Recall Wp-method in Section III-C3. A test string s ∈ * in both the steps consists of three substrings u, v, and w: s = uvw, where u ∈ U (step 1) or P − U (step 2), v ∈ V , and w ∈ W (step 1) or W (q i ) (step 2). We call them "prefix string," "middle string," and "suffix string," respectively. The lengths of the prefix and suffix strings are determined by the conjecture automaton and hence irrelevant to the state estimatê n i . According to the definition of V , the maximal length of the middle string is dependent onn i or l. Fig. 10 shows the lengths of the three substrings of counterexamples during the learning iterations. The lengths of the middle strings almost remain constant 1 except being 5 at the first iteration. Therefore, we take l = 5 for the Wp-method applied to the BBW system.  To obtain a reliable supervisor, a series of Moore automata approximating the target with respect to the BBW system and Requirement 2 are constructed by increasing the number of maximal learning iterations. The numbers of states (#State(M)) and transitions (#Trans(M)) of the learned automata and the time consumption on constructing the observation tables and performing equivalence checking in the ith learning iteration are listed from row 2 to row 5 in Table V, where i ranges from 1 to 9.
3) Computing Supervisors, Controlling, and Testing the System: According to the basic principle of antilock braking, if the wheel is slipping, the BBW controller may ignore the brake request. Therefore, c = {brake} is the controllable event set. Supervisors are calculated by Theorem 1. The numbers of states (#State(S)) and transitions (#Trans(S)) of supervisor sup i derived on the corresponding learned automata are listed in the last two rows of Table V.
The first nontrivial supervisor is sup 2 , which is implemented as a patch to control the system. Then, the integrated system, including BBW and sup 2 , is tested by LBTest. A Warning verdict is given at the 20th testing iteration. A counterexample "brake, acc, brake, idle, acc, brake" is identified. The corresponding sequence of outputs is "[still, zero, noslip], [still, zero, noslip], [moving, zero, noslip], [still, nonzero, noslip], [still, zero, slip], [moving, zero, slip], [still, nonzero, noslip]." The violation appears at the third "brake" event in the counterexample. Thus, sup 2 is not a correct supervisor for the system. Subsequently, sup 3 is used to control the BBW system. The controlled system is tested again by LBTest. Since LBTest is a testing tool, the nonexistence of counterexample does not imply that the requirement holds. In our recent study, an approximate method based on the output pattern definition  for estimating whether the test is enough in LBTest when there is no counterexample identified is proposed [19]. Fig. 11 shows that the output pattern of the system controlled by sup 3 varies with the testing rounds. The number of output patterns increases relatively slowly after 120 testing rounds, and almost remains unchanged after 260 rounds. We conclude that the test is sufficient.
It is necessary to test whether the satisfiability of Requirements 1 and 3 is influenced in the controlled system. Thus, Requirements 1 and 3 are tested by LBTest. A counterexample is produced at the 131st testing iteration when testing Requirement 1, which is "idle, acc, acc, acc, brake, brake." Since Requirement 1 only concerns two abstract outputs VehStatus and TorqueRR, the corresponding output of the controlled system is: "[still, zero], [still, zero], [still, nonzero], [still, nonzero], [still, nonzero], [moving, zero], [still, nonzero], [still, nonzero]." Then, supervisor 3 is not an appropriate supervisor, which does not ensure Requirement 1.
To obtain a supervisor that satisfies all requirements, supervisors listed in Table V are tested one by one. The testing results indicating the satisfiability of Requirements 1-3 are listed in Table VI. All tests are performed by LBTest and the maximal number of testing iterations is 300. Table VI shows that all the three requirements are satisfied in the controlled system when sup 7 is used. We further examine the quality of the testing result by plotting the numbers of output patterns [19] of the testing process for sup 7 and the three requirements in Fig. 12. Evidently, the numbers of output patterns of the controlled system become almost invariant after 250 testing iterations for all three requirements. Finally, we conclude that sup 7 is a feasible supervisor that enforces the system to satisfy all the requirements.

B. Adaptive Cruise Control in a Platooning Program
The proposed approach can also be applied to the platooning system which has proved to be effective for increasing traffic efficiency and decreasing automobile fuel  consumptions [38]- [41]. This paper studies a simple platooning problem with a leader and a follower, as illustrated in Fig. 13. The leader is driven by a human driver as usual. The follower is controlled by an adaptive cruise control (ACC) system that automatically controls the follower's speed to keep a short distance from the follower and avoid collision [42]. The follower vehicle measures its relative distance to the leader by radar and laser sensors. The two vehicles also communicate via a wireless network [43]. The ACC adjusts the distance between the follower and the leader by controlling the throttle and brake commands of the follower.
Generally, there are two kinds of policies used to regulate the distance between the vehicles in an ACC: constant time-gap policy (CTP) and constant range policy (CRP). CTP maintains a constant time gap during which the follower vehicle reaches the leader vehicle's present position, while CRP maintains a constant relative distance between the two vehicles. Daniel and Athanasios [44] implemented the ACC algorithm presented in [45] and [46] as an executable Java program which is considered as the black-box system to be controlled in this paper. Furthermore, Daniel and Athanasios [44] have found in experiments that the proportional-derivative (PD) controller proposed in [45] using CRP cannot avoid collisions in the stop-and-go scenario. Therefore, we want to eliminate the collision by adding a supervisor on the ACC using the proposed approach.
1) Abstracting the System: Safety is paramount in platooning.
Requirement 4: The two vehicles shall not crash.
A sufficient condition to meet the requirement is to always keep a safe distance between the two vehicles. The output of the ACC software describes the relative distance between the leader and the follower vehicles at discrete time samples. The range of safe distance between the two vehicles is defined as follows [45]. The relative distance is a positive value denoted by x r . Let x min and x max denote the reference minimum and maximum inter-vehicle safe distances between the follower and the leader vehicles, respectively. Error represents the percentage of allowed error. The relative minimum and maximum safe distances between the vehicles are calculated by (x min − (x min × Error) and (x max + (x max × Error), respectively. As suggested by Daniel and Athanasios [44], we set x min = x max = 5m and Error = 0.1 in the experiments. Thus, the relative safe distance ranges in [4.5, 5.5]m.
To analyze the relative distance qualitatively, an output abstract function f out is defined as x r > 5.5 good, x r ≥ 4.5 and x r ≤ 5.5 tooclose, x r > 0 and x r < 4.5 crash, x r ≤ 0.0.
The input signals of the leader are commands of acceleration and deceleration by pressing the gas and brake pedals, which are indicated by gasPos l and braPos l in the range from 0% to 100%. The input of the follower is calculated by ACC, which is either acceleration or deceleration represented by [gasPos f , braPos f ], respectively. Let x = [braPos l , gasPos l ] be the input vector of the leader vehicle. In the experiment, the set of abstract input events is l = {0, l_a i , l_b i |i = 1, . . . , 10}. The input refinement function of the leader is defined as The acceleration request acc f of the follower vehicle is calculated by a PD controller [45], where In (8), K p and K d are the gain parameters of the proportional and derivative parts, v l and v f denote the velocity of the leader and follower, x l and x f represent the position of the leader and follower, x min is the minimal safe relative distance between the two vehicles, and h is a constant time headway. If acc f > 0, the input of the follower is an acceleration. If acc f < 0, the input of the follower is a deceleration. The current velocity of the follower keeps constant if acc f = 0.
2) Testing the System: In the experiments, we set parameters K p = 3.59, K d = 0.08, x min = 5.0 and h = 0.98. Since the leader vehicle runs independently without adapting to the follower, its sampling period is 5 s. The follower vehicle has to quickly regulate its speed to keep the safe distance. The sampling period of the ACC function is 5 ms.
The ACC system is tested in two common scenarios in traffic [44]. We perform similar tests in our experiments to show that Requirement 4 is not satisfied by the given ACC. The first scenario is the stop-and-go situation. A worst case of this situation is that the leader vehicle decelerates quickly when running at a high speed. The second scenario is that the leader speeds up with smooth acceleration and switches to the cruise mode. Many test cases are executed to test the behavior of the platooning under the two traffic scenarios. Test case 1 represents the leader performing sudden accelerations and two sudden brakes. As a result of the abrupt braking, the leader almost comes to a stand-still. Test case 2 denotes the leader performing smooth acceleration and going into cruising gradually.
When Test case 1 is performed, plot labeled with "Before control" describing the relative distance between the leader and follower vehicles is shown in Fig. 14. In Fig. 14, a collision happens at 105 s, when the leader vehicle brakes l_b 6 suddenly. The collision reveals that the ACC system is not safe in the first scenario. When Test case 2 is tested, no collision happens. The ACC works well in the second scenario.
3) Learning Moore Automata: In the platooning example, the external input of the system is positions of brake and gas pedals of the leader vehicle. If ACC of the follower vehicle cannot prevent a collision, a supervisor of the ACC system should override the output of ACC by commanding emergency brake. The supervisor cannot control the actions of the leader vehicle. Therefore, the events in set l = {0, l_a i , l_b i |i = 1, . . . , 10} are uncontrollable.
The input of the follower vehicle is calculated by (8). We introduce two controllable events f _a and f _b to control the input value of the follower vehicle, where f _a means that the ACC sends the computed decision value to the follower and f _b means that the supervisor overrides ACC's decision and applies the emergency brake. The alphabet of the follower is f = { f _a, f _b}. The alphabet of the platooning program is = {l_a i , l_b i , 0, f _a, f _b}.
Since the sampling periods of the input of leader and follower are 5 s and 5 ms, 1000 speed control requests are calculated for the follower before the leader receives the next input signal. Consequently, there are 1000 states between two input events of the leader. The state space is extremely large and we just infer automaton models for the follower under specific test cases, e.g., Test case 1. Even so, the number of states becomes huge when the length of a test case is a bit long (e.g., there are 30 000 states in the complete model of the system under Test case 1). Thus, we just construct approximate supervisors for the system in this study. Fig. 15 shows that state L i transfers to state L i+1 in the model of the leader vehicle in Test case 1, when the leader vehicle receives event l_b 6 . In the duration that state L i transfers to state L i+1 , 1000 speed control decisions are  calculated in the follower vehicle. A partial model of the follower vehicle is constructed in Fig. 16.

4) Computing Supervisors, Controlling and
Testing the System: In this experiment, there are 30 000 states in the learned Moore automaton of the first scenario. In the supervisor, the event f _b is executed from states 20 150 to 20 825, which sends the highest request to the brake pedal (acc f = −20). The supervisor is patched to the ACC system. The controlled system is tested again. The plot describing the relative distance between the leader and the follower vehicles is shown in Fig. 14 labeled with "After control." In Fig. 14, it is shown that the collision at 105 s is avoided in the controlled system under Test case 1. The controlled ACC system still works well with Test case 2. As a result, it can be concluded that the proposed approach works effectively in the platooning program.
As far as we know, this is the first attempt of applying SCT to platooning systems. In further studies, we shall find a general supervisor for the system with a scenario where the leading vehicle stops unexpectedly at different time.

VI. CONCLUSION
We propose an approach to improve the quality of black-box embedded systems when the formal models of the system and requirement are not directly available. The approach integrates automaton learning algorithms and SCT of DES. The system is tested against the requirement. If the requirement is not satisfied, the C * algorithm is proposed to infer a Moore automaton. Then a supervisor is derived by SCT on the learned automaton and patched onto the system to correct erroneous behavior of the system. Finally, the controlled system is tested again to verify the correctness of the supervisor. The procedure iterates until the requirement holds in the controlled system. The proposed approach is implemented automatically. Experiments are performed on the BBW system and a platooning program to illustrate the feasibility of the approach to realistic systems.