A Generative Verification Framework on Statistical Stability for Data-Driven Controllers

This study proposes a novel framework for evaluating the stability of data-driven controllers and the concept of statistical stability. The proposed framework can be used when it is challenging to show stability through conventional control theory. The novelty of this paper lies in that it provides a method for scientifically analyzing the stability of data-driven controllers, thereby improving the quality of data-driven controllers. The proposed framework consists of three parts: the generative model, controller optimizer, and verification model. A variational autoencoder is used to classify and randomly generate data, and the generated data are used to train the controller. A support vector machine is used to classify areas where the controller is statistically stable. The statistical stability of an optimal controller designed using a deep neural network structure is analyzed using the proposed framework.


I. INTRODUCTION
A mathematical relationship between the control input and system output must be derived to design controllers for dynamic systems. Next, based on this relationship, a controller that can achieve the desired control performance and control goals is designed. Stability is an essential requirement for all control systems. In addition to stability, control systems must meet requirements including tracking given reference signals and suppressing disturbances and noise [1]. A suitable controller must ensure the stability of the control system and be able to follow a given reference signal with small error. For linear time-invariant systems, analysis of the stability can be conducted algebraically [2]. For nonlinear systems, stability analysis can be more challenging and differs for the system's dynamic characteristics [3]. Several methodologies exist for analyzing the stability of nonlinear systems, including locally linearizing the system and using bifurcation theory [4]. The methodologies require analyzing the system's differential equations and often conducting The associate editor coordinating the review of this manuscript and approving it for publication was Emanuele Crisostomi .
input-output analysis. Because various types of nonlinearities exist, different mathematical tools are required for stability analysis [5].
Another objective of the control system is to achieve optimal performance. The optimal control of dynamic systems is often conducted using the calculus of variations and solving the two-point boundary value problem (TPBVP) to obtain the optimal trajectory [6]. This approach is also known as trajectory optimization. The benefit of trajectory optimization is that the optimal trajectories of the state and control input can be calculated, and the boundary conditions and path constraints can be considered [7]. However, in many cases, analytically solving the TPBVP is very challenging, and numerical methods are often utilized to solve the TPBVP [8]. Therefore, the process often requires high computational power and is time-consuming. Another limitation of the trajectory optimization approach is that the method uses open-loop control. For a linear system, the optimal controller design is based on solving system matrix equations assuming full knowledge of the system. The optimal controller is designed offline by solving Hamilton-Jacobi-Bellman (HJB) equations such as the Riccati equations [9]. VOLUME 11, 2023 This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/ Therefore, although the controller is in the shape of the feedback control, it is an open-loop control. In practical applications, it is often important to be able to design controllers online without having complete knowledge of the plant dynamics. Modeling uncertainties may exist, such as parameter inaccuracy, unmodeled dynamics, and disturbances. The feedback control approaches are used to compensate for the limitations of the open-loop control approach. In the case of linear systems, proportional-integral-derivative (PID) controllers can be designed for transfer function models. For nonlinear systems, various types of nonlinear controllers such as sliding mode control (SMC) [10], feedback linearization (FL) [11], and backstepping control [12] can be designed and applied depending on the characteristics of the control system [13]. However, controllers designed using conventional controller design methodologies have a limitation in that they must be designed to have a specific structure to ensure stability and optimality from a mathematical perspective. For example, LQRs can only consider the cost function of the quadratic form, and in the case of backstepping control, it can only be applied if the control system is in cascade form. To ensure stability using Lyapunov stability theory, it is necessary to find a suitable Lyapunov function, and a general methodology for finding the Lyapunov function does not exist. Thus, the stable region is often conservatively presented using quadratic functions [14]. A conservatively designed Lyapunov function may unnecessarily limit the performance of the nonlinear controller.
In this study, a new methodology that can overcome the limitations of the conventional controller design methodologies described above is proposed, and the statistical stability of the control system is analyzed. To this end, a generative verification model is proposed, and the stability and performance of data-driven control systems are statistically verified. Because deep neural networks are available to approximate and express arbitrary functions, feedback controllers consisting of deep neural networks can be designed by designing deep neural networks as a function for the state variable [15]. Furthermore, deep neural networks have flexibility in designing input-output structures such that it is possible to design arbitrary forms of information and state variables as inputs. Traditional model-based controller design methodologies derive the dynamic relationship between state variables and control commands, and then a design methodology for control is selected based on the relationship. Unlike conventional controller design methods, deep neural network controllers have no restrictions on the controller's mathematical form, which can improve the control system's performance.
The contribution of this study is to propose a concept of the statistical stability and binary stability conditions for nonautonomous systems with reference signals and to propose a framework to verify them. The binary Lyapunov stability condition is designed so that the definition of stability in conventional control theory can be applied to finite time intervals. Although it is impossible to show asymptotic stability using the proposed stability analysis method, stability for time intervals long enough for practical use can be statistically guaranteed. The theorem on permutations of multisets is proven to satisfy the statistical stability of arbitrarily length time-series data.
This study is organized as follows. In Section II, the design of feedback controllers using deep neural networks is described. In Section III, the proposed framework is introduced, and the design methods for each element that makes up the proposed model and the role of each element are introduced. In Section IV, an example of implementing the proposed model and its results are shown. Finally, in Section V, the conclusions are described.

II. DEEP NEURAL NETWORK-BASED CONTROLLERS
Let us consider the following control system.
where x is the state vector, u is the control input, and y is the system output. The trajectory of the state vector is represented as follows.
In general, feedback controllers u(x) are designed in the form of functions for system state variables. Linear feedback controllers (including the PID controller and linear quadratic regulator) and nonlinear feedback controllers are examples of controllers in the form of functions for state variables.
In particular, backstepping control [16] and extended state observer-based control [17] may use state variables created using a dynamic extension or similar methods. Moreover, since the deep neural network can be used to approximate and express arbitrary functions, it is also possible to design a feedback controller consisting of only deep neural networks if the deep neural network is designed as a function for state variables. For example, if a linear activation function is used for a neural network of fully connected layer N D without a bias node and the state variable vector x is used as the input of N D , then the output of N D can be expressed as a linear equation using one matrix W, such as N D (x) = Wx. Therefore, if the output of N D is used as a control command, N D (x) = u has the same form as a linear controller. Similar to N D , a deep neural network C can represent a feedback controller that is a function of the state variable. Thus, the input to the deep neural network is a state variable x, and the output can be expressed as u = C(x). Nonlinear functions such as the rectified linear unit (ReLU) or sigmoid functions can be used as the activation function of C. Moreover, C may not be simply a fully connected layer but a deep neural network with arbitrary structures. Thus, C(x) can be used as a nonlinear feedback controller with an arbitrary structure. 5268 VOLUME 11, 2023 Deep neural network structures are designed with various types of structures according to the goals to be learned. For example, deep neural networks that process images, which are two-dimensional data, use convolutional layers to extract feature points, and deep neural networks that process time-series data can use structures such as recurrent neural networks (RNNs) and long short-term memory (LSTM) [18], [19]. Likewise, deep neural network controllers for use as nonlinear feedback controllers should be designed as structures to improve control performance.

III. GENERATIVE VERIFICATION MODEL
In this study, a generative verification model (GVM) is proposed as a methodology for verifying the performance and stability of nonlinear deep neural network controller models. Since nonlinear deep neural network controller models are nonlinear functions with arbitrary shapes, techniques such as Lyapunov stability analysis methodologies used for stability analysis in conventional nonlinear controller design methodologies cannot be applied. Since the GVM proposed in this study is applicable to all controllers given in the form of arbitrary nonlinear functions, it is versatile based on the fact that it can be utilized even if it is not a deep neural network controller.
The GVM structure for the controller stability analysis is shown in Figure 1. The controller stability analysis structure using GVM consists of three parts: controller optimizer (CO), generative model (GM), and verification model (VM). In Fig. 1, the input for CO is the set of time-series data R, which is used as the reference signal for the controller. The output of the CO is the simulation result and its evaluation, including some labels, H . The inputs for the VM are the set of feature vectors Z distributed in the latent space and the corresponding set of labels H . A new set of feature vectors Z + is sampled in VM. GM generates a set of time-series data R for dataset Z during every iteration. More detailed explanations of each block represented in Fig. 1 are provided in the corresponding sections.

A. CONTROLLER OPTIMIZER
In the CO, the DNN-based controller is trained to have optimal performance for a given reference trajectory. Thus, the input of CO is a set of reference inputs. Since the reference input is a target to be followed by the control system's output, it is time-series data with various trajectories. For example, when designing a linear system controller such as a PID controller, a unit step function can be used as a reference trajectory to evaluate the time response. Additionally, since LQR aims to regulate the output to zero, it can be understood that the reference trajectory is a zero function.
In this study, it is assumed that any shape of time-series data is given as a reference input, such as a unit step function or a zero function, rather than a specific shape of reference trajectories. Thus, CO considers time-series data with arbitrary shapes of a particular length as a reference input. The right of Fig. 1 shows a block diagram of CO. R is a batch of reference inputs given in the form of time-series data. The CO is implemented as a DNN, where the simulator network contained in CO performs numerical simulations on R. The simulator network includes a deep neural network controller model C(x). The time-series dataset S obtained from numerical simulation is used in the loss function calculation with R. The deep neural network controller model C(x) is trained to minimize the loss function. Then, the numerical simulation result s ∈ S performed with respect to all reference trajectories r included in the set R, i.e., r ∈ R, are evaluated and classified. If the simulation result s is satisfactory, then s ∈ S p ⊂ S. Otherwise, s ∈ S f ⊂ S. The classification result is labeled to obtain λ ∈ [0, 1]. H is the set of labels λ.
Meanwhile, the controller model C(x) is optimized by CO for a given R, and its optimal performance is determined by the loss function. For example, a loss function can be designed as Eq. (3). If the DNN-based controller model C(x) is designed as a fully connected layer without bias and a linear activation function is used, then the CO-learned C(x) = Wx will be an LQR controller. (3) However, the proposed CO in this work has versatility because it allows the consideration of an arbitrary nonlinear DNN-based controller model C(x) and arbitrary loss functions depending on the control purpose.
A tracking controller is often designed by defining an error between the signal to be followed and the system output and designing a regulator for the error dynamics. This method can also be adopted when designing a deep neural network controller, but in this study, the state variables x and the reference signals to be followed r are used as input to the controller to emphasize the flexibility of the deep neural network controller structure. That is, u = C(x, r). This implies that any beneficial input to a deep neural network controller can be used. Figure 1 shows that the GVM has a structure that iterates along the direction of the arrow. An R output from the GM is given as an input to the CO, where a DNN-based controller model C(x) is trained whenever a new R is given in each iteration. Meanwhile, the nonlinear DNN-based controller model trained by the CO results from learning about a given dataset R and is an arbitrary nonlinear function, so its stability is not guaranteed by CO alone. In other words, the tracking performance cannot be guaranteed if new time-series datā r / ∈ R not included in the dataset used for training are given as reference input. In this study, GVM, including CO, is proposed by designing GM and VM to overcome this.

B. GENERATIVE MODEL
GM generates a set R of time-series data. Since R is a training dataset that CO learns, it contains various forms of time-series data, making it valuable. GM is a kind of generative model. The generative model refers to a model that can generate new data that did not exist previously based on various input data. For example, training a generative model with time-series data as input data can generate new data similar to the training data but not included in the training dataset. These generative models are usually trained using structures such as generative adversarial networks (GANs) [20], variational autoencoders (VAEs) [21], and adversarial autoencoders (AAEs) [22].
GM not only needs to generate new time-series data but also needs to have a feature space because the VM must have a vector space to learn the stability region. A machine learning technique that can create a feature space based on given data is called feature learning or representation learning [23]. Representation learning is widely used in fields such as speech recognition [24], object recognition [25], and natural language processing [26]. Since the VAE can obtain feature space by automatically performing representation learning while training the generative model, VAE is adopted in this study.
The GM is designed with a VAE structure to generate arbitrary shapes of time-series data. A VAE is a type of autoencoder, a structure in which an encoder and a decoder are connected. The encoder of the VAE has a structure in which the given input data are compressed and distributed on a multivariate vector space. Multivariate vector spaces are referred to as latent spaces or feature vector spaces. Each input data point is mapped to a point on the latent space, which is restored by the decoder in the same form as the input data. Figure 2 shows the structure of the VAE used in this study. The VAE is trained to minimize reproduction errors between input data and data restored by the decoder, such as a typical autoencoder [21]. The difference between a typical autoencoder and a VAE is that the latter adds a normalization term to the loss function so that the feature vectors obtained through the encoder follow a predetermined distribution in the latent space. Since the Gaussian distribution is considered in this study, the encoder structure, as shown in Figure 2, includes a node that outputs the average and standard deviation of the Gaussian distribution. The decoder of the VAE may be used as a generative model. Therefore, arbitrary time-series data can be generated by extracting any point in the latent space and passing it through the decoder. Meanwhile, output data obtained by decoding In the VAE, the time-series data given as an input are compressed and converted into a feature vector on the latent space. The dimension of the latent space, n l ∈ N, is a hyperparameter that may be arbitrarily determined when designing the VAE. The larger the dimension of the latent space, the higher the classification performance of the VAE because the feature vector can contain more information as the dimension increases. Therefore, using a higher n l is recommended.
In this study, we leverage these VAE features to generate a set R of reference trajectories. In other words, the role of GM is to generate various reference trajectories in an arbitrary form for the control system of CO to learn. The shape of time-series data that is suitable may vary depending on the type of control system. To obtain a good GM, a set of time-series data suitable for the reference trajectory must be used as training data. This study uses sinusoidal time-series data with various wavelengths, amplitudes, and initial phases.
In GVM, GM uses a decoder from a pretrained VAE. Unlike the process of CO learning a new controller in every iteration, GM is not retrained in every iteration. After the VAE is trained, only the decoder part of the VAE may be used as a GM. Thus, as shown in Figure 1, for GM, the input Z is a set of vectors distributed in the latent space. The set Z is concatenated with the output Z + of the VM, and the number of elements gradually increases. GM generates a set of time-series data R for a new Z + , including an existing Z every iteration.

C. VERIFICATION MODEL
The latent space of the VAE is a vector space from which the characteristics of time-series data are extracted, and the distribution of the data on the vector space reflects the characteristics of time-series data. Therefore, decoding two different feature vectors at similar locations in the latent space results in time-series data with similar forms of time response. Since the VAE automatically classifies the feature vectors obtained by encoding input data, the physical meaning of each dimension cannot be specified in advance. For example, if a VAE learns sine waves with various wavelengths and amplitudes and maps them to a two-dimensional feature vector space, time-series data with similar wavelengths and amplitudes may be distributed at similar locations in the feature vector space. However, the corresponding two-dimensional vector coordinates do not precisely represent the wavelength and amplitude. Figure 3 shows an example of classifying different sinusoidal time-series data with different wavelengths and amplitudes into a two-dimensional latent space. It can be seen that time-series data with similar characteristics are mapped to similar positions. Moreover, structured data such as sinusoidal time-series data can be classified into parameters with physical meaning, such as wavelength and amplitude. In contrast, time-series data with arbitrary shapes cannot be classified as a finite number of parameters with physical meaning. However, the VAE automatically extracts statistically similar features and classifies time-series data.
In this study, the performance region of the control system is derived by performing binary classification in the latent space F, utilizing the VAE's characteristics that it can classify time-series data with arbitrary forms. The latent space is a set of reference trajectories Z p (∈ F p ⊂ F) that the system output can follow using the deep neural network model controller C(x) and a set of reference trajectories that cannot be followed.
Subspaces F p and F f of latent space F represent areas where the system output can and cannot follow the reference trajectory using the controller C(x), respectively. If C(x) varies for the same latent space, then subspaces F p , F f can change. Therefore, we express them as F p | C and F f | C .
The inputs of the VM in Figure 1 are Z and H . Here, Z is the set of vectors z i in the latent space, and H is the set of labels λ i assigned for each vector z i . This label is determined by evaluating numerical simulation results using a CO-trained controller C(x). Given an ordered pair (z i , λ i ) of data and labels, a binary classification model D(z) that bisects the latent space using a support vector machine (SVM) can be derived. D(z) receives the feature vector z i and outputs a predicted value p i (∈ [0, 1]). Since the label λ i corresponding to a given z i in the learning dataset (z i , λ i ) can vary with the controller C(x), the D(z) that is learned also depends on the controller. Thus, it can be expressed as D| C (z).
Finally, D| C (z) is used for the performance analysis of the controller. Since any reference trajectory can be converted to the feature vector z, D| C (z) evaluates this feature vector to derive whether it is traceable. The latent space is bisected by D| C (z) into F p | C and F f | C , where the followable region F p | C within the latent space F is the performance region of controller C(x). It can be argued that the tracking performance and stability of the controller C(x) are statistically guaranteed within F p | C . The statistical stability is addressed in the following section.
Meanwhile, the output of the VM shown in Figure 1 is Z + , which is a newly extracted sample set from the latent space. This new set of samples is obtained by performing active sampling based on a classification model D| C trained on the VM. Since a new set of samples Z + is added to each iteration of the GVM and accumulates on the entire dataset, the performance of the controller C(x) trained by the CO and the classification model D| C (z) trained by the VM can gradually increase using more iterations.

IV. STABILITY OF DNN-BASED CONTROLLERS A. STABILITY FOR NONAUTONOMOUS SYSTEMS REVISITED
The conventional definition of stability for nonautonomous systems is revisited to set up the definition of statistical stability of DNN-based controllers. A system is described as stable if starting the system somewhere near its desired operating point implies that it will stay around the point ever after [3]. In this study, the stability for nonautounumatic systems is considered because the reference tracking controller u = C(x, r) includes a predefined reference signal r(t), which is a time-varying variable, i.e., an explicit function of time.
Definition 1 (Stability for Nonautonomous Systems): The equilibrium point 0 is stable at t 0 if for any R > 0, there exists a positive scalar r(R, t 0 ) such that Otherwise, the equilibrium point 0 is unstable. The stability in Definition 1 is also known as the stability in the sense of Lyapunov, and defines the uniformly boundedness of the solution.
The solution is globally uniformly ultimately bounded if (5) holds for arbitrarily large a.
In general, the stability in the sense of Lyapunov is insufficient, and asymptotic stability is required to guarantee the performance of controllers.
Definition 3 (Asymptotic Stability for Nonautonomous Systems): The equilibrium point 0 is asymptotically stable at time t 0 if the system is stable and ∃ r(t 0 ) > 0 such that Note that to guarantee the asymptotic stability of the system, the time interval of infinity needs to be considered. However, the verification model can only cover time-series data of finite length. The concept of statistical stability and binary stability conditions are proposed in this study for data-driven controllers and verification models in the following sections.

B. STATISTICAL STABILITY
The definition of statistical stability is presented in Definition 4.
Definition 4 (Statistical Stability): Consider a domain F, which is a set of finite-length trajectories r(t), t ∈ [t 0 , t f ], and the initial state x(t 0 ) ∈ U ⊂ X , where U is a bounded set in the state space X of an autonomous control system in Eq. (1)

. The control system with a DNN-based controller u(t) = C(x(t), r(t)) is statistically stable in F p ⊂ F with respect to a binary stability condition function λ(r(t), h(x(t), u(t))) if λ(r(t), y(t)) = true ∀r(t) ∈ F p and
The binary stability condition function λ(r, y) : F → [true, false] is a function that evaluates the binary stability between two different trajectories of the same time interval and returns a binary label. If λ(r, y) = true, then the two trajectories are binary stable. If λ(r, y) = false, then the two trajectories are not binary stable, i.e., binary unstable.
Binary stability conditions are proposed to consider the stability of nonautonomous systems based on time-series data of finite length. The binary stability conditions can be defined in various ways. One of the binary stability conditions can be defined from the stability in the sense of Lyapunov.

Definition 5 (Binary Lyapunov Stability Condition): The binary Lyapunov stability condition of two different trajectories a(t) and b(t) on the same finite time interval t ∈ [t 0 , t f ] is considered to be satisfied when the error between two trajectories, e(t) = a(t) − b(t), satisfies the following condition.
If the trajectories a and b are vectors, the binary stability condition is evaluated in an elementwise manner. Note that the binary Lyapunov stability condition is defined for a finite time interval. As stated in the previous section, the Lyapunov stability only guarantees the boundedness of the signal. The asymptotic stability condition is required for the performance of the tracking controller. However, asymptotic stability is only applicable for infinite time intervals, which is inapplicable for data-driven controllers that utilize time-series data of finite length. Another binary stability condition is defined to overcome this inapplicability. The definition of the binary integral stability condition is given in Definition 6.

Definition 6 (Binary Integral Stability Condition): The binary integral stability condition of two different trajectories a(t) and b(t) on the same finite time interval t ∈ [t 0 , t f ] is considered to be satisfied when the mean error e(a, b) between the two trajectories is below a constant threshold
The binary integral stability condition requires the integral error to be smaller than the given threshold, ε. The threshold is a design parameter that the designer can set to obtain a stability region of satisfactory tracking performance. That is, it is possible to obtain a stability region of the system that has a specific supremum value of the integral error. When the binary integral stability condition is invoked in defining the statistical stability, the mean error between the reference trajectory r(t) and the output trajectory y(t) is considered.
The term statistical stability implies that the stability is satisfied in a statistical manner. The binary integral stability condition in Definition 6 only guarantees the boundedness of the tracking error during a finite time interval. It seems insufficient, superficially. However, the statistical stability is sufficient to guarantee the boundedness during a much longer length of time for a data-driven controller with a split invariant loss function.

C. SPLIT INVARIANCE OF A LOSS FUNCTION
Consider a trajectory r(t), t ∈ [t 0 , t f ]. Splitting r(t) in N s time intervals gives short trajectories r i (t) as follows.
Then, the corresponding state trajectory x(t), t ∈ [t 0 , t f ] is also split as follows.
The split invariant loss function is defined as follows. Eqs. (1,2) using a DNN-based controller u = C and the split set of time series in Eqs. (9,10), a loss function J (r(t), x(t)) is split invariant if the following equation is satisfied. (11) Definition 7 represents the fact that if a loss function is split invariant, then the summation of the loss value evaluated on every split time interval is equivalent to the loss value evaluated on the concatenated time interval. Therefore, if a DNN-based controller C is trained for a split invariant loss function, training with a reference trajectory r(t), t ∈ [t 0 , t f ] and the initial state x(t 0 ) is equivalent to training with a set of split time intervals {r i (t)|i = 0, · · · , N s − 1} and the set of initial states {x(t 0 ), · · · , x(t N s −1 )}.

Definition 8 (Permutations of Multiset): For a domain F that is a set of time-series data of finite length, the set of timeseries data, which is a concatenation of a finite number of elements from F allowing repetition, is defined as P M (F).
Note that F ⊂ P M (F) by definition. The statistical stability of a system on a given domain F is expanded to P M (F) with the following theorem. Proof: Consider a trajectory r(t) ∈ P M (F), t ∈ [t 0 , t f ] constructed by concatenating N s elements r i (t), i ∈ [0, N s − 1] from the domain F. If an arbitrary initial state ζ 0 ∈ U is chosen, then the resulting state trajectory of the system for the pair (r(t), ζ 0 ) is x(t), t ∈ [t 0 , t f ], where ζ 0 = x(t 0 ). The statistical stability for a pair of trajectory and initial state (r(t), ζ 0 ) is satisfied if and only if the statistical stability is satisfied for all of the pairs (r i (t), ζ i = x(t i )). Assume that the system is statistically stable for a pair (r i (t), ζ i ) ∈ (F, U). The resulting state trajectory is , then ζ i+1 ∈ U because of the statistical stability. Then, the system is also statistically stable for the pair (r i+1 (t), ζ i+1 ). Meanwhile, the system is statistically stable for the trajectory r 0 (t) ∈ F, t ∈ [t 0 , t 1 ] and the initial state ζ 0 = x(t 0 ) ∈ U. By mathematical induction, the system is statistically stable for every pair (r i (t), ζ i = x(t i )) ∀i ∈ [0, N s − 1]. Therefore, the system is statistically stable for domains P M (F) and U.
Theorem 1 suggests that a DNN-based controller can be effectively trained with a batch of short time-series data for the statistical stability of the control system, which is often operated over a longer timespan. Statistical stability and binary stability conditions should directly evaluate stability conditions for time-series data of a given length. However, the statistical stability of the control system can be secured for a time interval with an arbitrary length by Theorem 1. Theorem 1 extends the time interval length to make more practical use of the statistical stability and binary stability conditions defined in Definitions 4 to 6.

V. RESULTS
In this section, the entire design framework proposed in Fig. 1 is realized one by one and interconnected. First, the DNN-based controller is designed and optimized for the system represented in Eqs. (1) and (2). Next is realization of the generative model. Then, the verification model is trained to obtain the stability region of the system in the latent space of the GM. SVM is used to train the decision boundary of the stability region. The training data for VM are labeled using the binary integral stability condition from Eq. (8). Finally, the process is iterated as described in Fig. 1 to increase the performance of the DNN-based controller and reliability of the stability region. After the process, the resulting optimal controller can be utilized for a much longer time interval than the length of the time-series data utilized in the controller optimization process, as described in Theorem 1. The longer time-series data are constructed as represented in Eqs. (9) and (10).

A. DEEP NEURAL NETWORK-BASED CONTROLLER DESIGN AND NUMERICAL SIMULATION
In this study, the proposed deep neural network controller is trained by applying it to linear systems, and the tracking performance for reference inputs is analyzed. VOLUME 11, 2023

1) PLANT
The standard second-order system is considered as follows.
The state-space representation of the standard second-order system can be written as follows.

2) DNN-BASED CONTROLLER
The objective of the controller is to make the output y(t) track the given reference signal r(t). Therefore, the state vector x(t) and reference signal r(t) are given as inputs for the controller.
The model structure of the DNN-based controller is shown in Fig. 4. The model includes fully connected layers with scaled exponential linear units (SELUs) and sigmoid activation functions.
In the conventional controller design methodology, it is necessary to mathematically define the input/output relationship of the system and then design a control command that can guarantee mathematical stability and performance. Therefore, conventional controller design methodologies can only be used to design controllers by clearly deriving mathematical relationships between specific variables and system outputs. In contrast, DNN-based controllers can select whatever information is available as input to deep neural networks, and it is not necessary to derive mathematical relational expressions directly for control commands.
Note that the information between the system output y and state vector x is not given to the DNN-based controller. The controller is only trained to generate the control command u minimizing the loss value, a function of the reference signal r and state vector x. The loss function considered in this study is given in Eq. (15).
Note that Eq. (15) is split invariant. Additionally, Eq. (15) can be directly used for the binary integral stability condition in Eq. (8). Because the objective is to train a tracking controller, the loss function in Eq. (15) minimizes the error between the reference signal r(t) and the output y(t). Here, the system output is included in the loss function. However, the system output is not provided as an input for the DNN-based controller. That is, it can be stated that the DNN-based controller can achieve the control objective with limited information.

3) TRAINING DATASET
In this study, the DNN-based controller of the structure described in Fig. 4 is trained. The training data are a set of fixed-length sinusoidal reference signals with random magnitudes and initial phases. The parameters for training are summarized in Table 1. Figure 5 shows some example signals from the training dataset. The different colors in the figure represent different training data. The number of data points in the training dataset is 1024. Figure 6 shows the simulation results for performance verification of the trained controller C(x). Several results for randomly generated sinusoidal reference signals are displayed at once. The different colors in the figure represent different example cases, and the data with the same color represent the same case. The training data are time-series data with a time interval of 0.1 sec and a length of 1 sec. In contrast, the verification data are time-series data with a length of 20 sec and the same time interval. The validation reference signals are sinusoidal time-series data with random magnitudes, wavelengths, and initial phases. The training data have signals with a fixed wavelength of 10 sec only, but the validation data have signals with various wavelengths. It can be seen from the simulation result that the DNN-based controller can track the given reference signal in a small range of error.  In this section, it has been shown that it is possible to train a tracking controller with good performance using the proposed DNN-based controller structure. However, the tracking performance cannot be guaranteed for reference signals not included in the training dataset. In this study, an approach of performance region analysis using GVM is proposed to overcome the limitations of DNN-based controllers.

B. REALIZATION OF GENERATIVE MODEL
In this study, the VAE is used for the representation learning of time-series data and to generate arbitrary time-series data. The latent space is set to be 3-dimensional. The latent space's dimension n l is set in three dimensions because the sinusoidal time-series data used for VAE's training are generated according to three different characteristics, i.e., magnitudes, wavelengths, and phases. However, in general, the VAE may be trained using other types of time-series data, which may have more complicated features depending on the applications of the control system. Because it is difficult to visualize the performance region classified using the SVM in the latent space where the dimension of latent space is 4D or higher (n l ≥ 4), n l = 3 is considered in this study. If visualization is not considered, then it is generally desirable to set n l to a larger value to enhance the performance of representation learning.
The latent space of the trained VAE is shown in Fig. 7, and each feature vector is colored according to the wavelength and amplitude of the time-series data used for training. In this study, 2 15 time-series data with random magnitudes, wavelengths, and initial phases are used to train GM. The magnitude is normalized to have values in the range of [−1, 1]. Note that the feature vectors are classified according to the wavelength and amplitude. Figure 8 shows a comparison between the time-series data (encoder input) used as input and the time-series data (decoder output) obtained by reconstruction to check the performance of the trained VAE. The trained VAE successfully reconstructs the input data. The decoder part of the trained VAE is used as the GM. The GM can generate random time-series data in the shape of a sinusoidal wave with various magnitudes, wavelengths, and initial phases. In Section V-A3, the training data for CO included sinusoidal data with a fixed wavelength of 10 sec. In contrast, in this section, the time-series data generated from the GM are used to construct the training data for the CO.

C. PERFORMANCE REGION ANALYSIS USING VERIFICATION MODEL
The performance region analysis using the GVM should be conducted to guarantee the stability and performance of the DNN-based controller. The DNN-based controller is repeatedly trained with increasing data, and its performance is improved using the GVM. The performance region in which the DNN-based controllers' performance and stability are guaranteed is also derived.

1) BINARY VERIFICATION USING SUPPORT VECTOR MACHINE
The VM is designed using the SVM in the latent space of the VAE. In Fig. 1, the inputs for the VM are the set of feature vectors distributed in the latent space and the corresponding set of labels. The labels are obtained by evaluating the simulation results. In this study, the label is determined by thresholding the loss value from Eq. (15). When the threshold value is J T , the label λ i for feature vector z i is determined as follows.
where r i is the reference signal decoded from z i , and y i | C is the tracking simulation output on r i using controller C. The trained SVM L is directly used to analyze the performance region of C. Note that Eq. (16) with Eq. (15) is interpreted as a binary stability condition function from Definition 4 using the binary integral stability condition on Definition 6.  The VM is a machine learning model trained on the latent space. The true performance region F * p that satisfies the statistical stability condition from Definition 4 in the latent space F exists. However, the true performance region cannot be attained analytically. Therefore, the VM is proposed in this study and is utilized to approximate the true performance region and attain an approximated performance region F p .

2) ACTIVE SAMPLING
The VM consists of the SVM and an active sampling process. After training the SVM L using the given dataset (Z , H ), a new set of feature vectors Z + is actively sampled. Fig. 1 shows that the output of the VM is Z + . The newly sampled set Z + is concatenated to the previous set Z to construct a new set with more elements. Then, H is updated by GM and CO, and the SVM is trained again. This process is iterated to retrain the controller C and the SVM L.
The active sampling algorithm is a technique to extract samples that can make the most significant model change to the current model L. The sample points can be extracted more efficiently than the random sampling algorithm using the active sampling algorithm. Random sampling is conducted at the very first iteration of the GVM because the active sampling algorithm requires a trained model L. Refer to [27], [28], [29], and [30] for detailed explanations of the active sampling algorithm and its applications on nonlinear system verification.

3) PERFORMANCE REGION
The performance region of the deep neural network controller is derived using the GVM in the latent space for reference signals. The parameters for GVM are summarized in Table 2. The generated time-series data are multiplied by the magnitude scaling coefficient to consider the reference signals of magnitudes larger than 1. The loss function from Eq. (15) is used. The time-series data used for training the controller in Section V-A are sinusoidal signals of features summarized in Table 1. In contrast, the time-series data used for training the controller in this section are randomly generated using the GM. Therefore, the reference signal may not be perfectly sinusoidal, and the range of wavelength and magnitude differ. Figure 9 shows the convergence to the correct performance area of the controller according to the number of iterations. The sample points used to train L are shown in the figure. The label for training the SVM is determined by Eq. (16). The decision boundary of the SVM is visualized in 2D planes that are orthogonal to each axis. In the figure, data points classified to belong to F p are indicated by red circles, and data points classified to belong to F f are indicated by blue circles. The decision boundary of the SVM that divides the stability region (performance region) F p and the instability region F f is indicated by a dotted line. The threshold J T , which determines whether each sample is successful or unsuccessful, selects a value that ensures that only 10% of the total samples are classified as successful in the first iteration.
An increasing number of samples are used in each iteration. In each iteration, it can be seen that the newly added samples using active sampling are extracted near the performance boundary of L in the previous iteration. Since the controller C is retrained for a given sample in each iteration, labels from Eq. (16) for a particular sample may vary from one iteration to another. Figure 10 shows the feature vectors distributed in the latent space by coloring according to the log loss value of Eq. (15). It can be seen that the stability region of the controller is  determined according to the characteristics of the reference signal when comparing this result with the result in Fig. 7. In other words, the SVM is trained to classify the feature vectors of small magnitude and long wavelength belonging to F p , and the others belong to F f .

D. SIMULATION CASE STUDY
In this section, numerical simulations are conducted to investigate whether the control stability region of the optimal controller trained using the proposed framework has been properly derived. In general, it is difficult to analytically derive the control stability region with linear control theory due to nonlinearity, such as controller saturation. In this case, the stability region is often derived through Monte Carlo simulations. Since the control system considered in this study is a linear second-order system, it is possible to design a global optimal controller using a linear controller. However, it becomes difficult to obtain the boundary of the stability region of the global optimal controller because the entire vector space becomes the stability region. Therefore, controller saturation is considered in this study, and the boundary of the stability region can be obtained.
In this study, sinusoidal time-series data are used as a reference signal, so it is possible to check how the data are classified in the latent space, as shown in Fig. 7. In the figure, the magnitude of the signal shows a distribution proportional to the value of z 2 . Figure 11 shows the simulation results for various magnitudes. The tracking error is small for the reference signals of relatively small magnitudes, and the error increases as the magnitude increases.
In Fig. 7, the initial phase changes as the location of the sample point rotates along the circumference of a circle centered on the origin of the (z 0 , z 1 ) plane. Figure 12 shows the simulation results for various initial phases. As expected, the magnitudes of the tracking error are similar for given reference signals of various initial phases.
In Fig. 7, the wavelength depends on the distance from the origin. Figure 13 shows the simulation result for various wavelengths. As the wavelength of the reference signal shortens, the tracking error increases because of the control input saturation.
When the distribution in the latent space according to the physical characteristics of the signal is known, it may be possible to predict or analyze the stability region based on the information. The stability region shown in Fig. 9 is distributed for areas with small magnitudes and long wavelengths and does not seem to be affected by the phase. However, in general, the shape of signals trained in GM may not be classified according to physical characteristics in advance. In contrast, in this case, by analyzing the stability region obtained through the GVM, the reference signal characteristics that make it easier for the controller to follow can be analyzed. Such characteristics may vary greatly depending on the control system.
From Fig. 7, it can be seen that the reference signals are classified in the feature vector space according to the characteristics of the time series. From Figs. 11-13, it is shown that the reference signal extracted from the latent space can be followed using a DNN-based controller, and its tracking performance depends on the physical characteristics of each reference signal. Figure 9 shows that the proposed  GVM framework further enhances the performance of the DNN-based controller while simultaneously deriving the performance domain of the controller. Eventually, the GVM framework can be used as an effective tool to analyze the performance of DNN-based controllers.

VI. CONCLUSION
In this study, a novel design framework that can design a controller using a DNN structure was developed, and the stability and performance of the designed DNN-based controller were analyzed using statistical stability. The controller can be designed as a DNN of arbitrary structures. To derive a latent space for analyzing stability and performance, representation learning of time-series data was performed using a VAE. The controller can derive a stability region by training the classification model in this latent space. The proposed GVM structure uses more data in every iteration to train the controller and the classification model, improving the controller's performance and deriving accurate performance regions. The proposed controller verification technique differs from the existing mathematical method and has high versatility, as there are no constraints on using a specific type of controller. Since this study focused on proposing a novel controller design framework, only state variables and reference values were used as input values for conciseness, as in the conventional controller design frameworks. As a further study, a DNN-based controller that uses more types of input data other than state and reference signals and the techniques proposed in this study can be implemented to evaluate the utility of the DNN-based controller and the proposed framework.