A New Compensator Design for Optimal Static Output Feedback Control Across a Communication Channel Subject to Random Packet Dropouts

This paper studies compensation of networked control across an unreliable communication channel subject to random packet dropouts. By posing it as a two decision-variable optimization problem, the control and compensation for static output feedback LQR are unified by the proposed design. New governing equations are derived for the controller and the compensator satisfying optimality conditions concurrently. Also presented is a convergent algorithm that solves these equations for the optimal gains. Finally, to validate the design and verify its effectiveness, a numerical example is given on which a computer simulation is conducted to compare its performance against that of three other existing schemes. The simulation results demonstrate its saliency among the four methods.


I. INTRODUCTION
A steady growth of control systems and their applications being realized through communication channels has been witnessed in the age of IOT. In the past few decades, networked control system (NCS) has increasingly gained a lot of attention and study effort. Among them, the Linear Quadratic Regulator (LQR) and LQ-related filtering across communication networks [1], [3] are active research subjects. The performance decline or even destabilization of NCS resulting from networking phenomena such as delays [1], [4], [5], data corruption [6], [7], packet loss [2], [4] and cyber-attacks [8], [10] might possibly lead to serious consequences and hence have been extensively investigated in both the industry sector and the academic community. For example, based on the Tobit measurement model for censored measurement and adopting the Poisson distribution model for packet delay, Geng et al. [1] investigated the distributed federated Tobit Kalman filter fusion problem for NCS subject The associate editor coordinating the review of this manuscript and approving it for publication was Liang Hu .
to measurement censoring and packet delays and proposed a two-step filtering fusion approach. The local estimator carries out a modified Tobit Kalman filtering scheme in the first step and the fusion center runs a distributed federated modified Tobit Kalman filtering algorithm in the second step that follows the federated Kalman fusion rule. Under the redundant channel transmission protocol, Geng et al. [2] investigated a Tobit Kalman filtering problem. To account for the complexities introduced by measurement noises transmission failures, and the redundant channels, the Tobit regression model was modified and based on which an optimal Tobit Kalman filter was proposed. Allik et al. [3] presented a Tobit Kalman filter to provide estimates of the state and state error covariance even the measurements are highly censored. Based on the Markovian packet dropout model and maximum principle, Li et al. [4] considered full state feedback for NCS subject to packet loss and input delay, and presented an optimal control design using a forward and a backward stochastic difference equations. Under the framework of Integral Quadratic Constraint (IQC), Yuan and Yu [5] considered both measurement delay and actuation delay in NCS and proposed a delay VOLUME 8, 2020 This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see http://creativecommons.org/licenses/by/4.0/ scheduled impulsive controller whose synthesis conditions were established via a number of linear matrix inequalities through the specification of a piecewise linear storage function. Gao et al. [6] proposed a communication-reduced, cyber-resilient, and information-preserved method to recover information from quantized measurement data when some portion of the measurement data is corrupted. For sensor networks applications, Xie et al. [7] employed a principal component analysis technique to identify the data corruption and proposed a matrix completion scheme to recover corrupted and successive lost data with high recovery rate. Anubi and Konstantinou [8] considered the resiliency problem of the state estimation of a cyber-physical system under cyberattack. By combining a data-driven model with traditional compressive sensing regression, they showed that the solution of the optimization problem could recover the system's actual states. For risk assessment, Milošević et al. [9] proposed a framework to estimate the impact of a range of cyber-attack strategies in stochastic linear NCS and presented two impact metrics that can be used for stochastic systems. Liu et al. [10] investigated control design of NCS under sporadic cyberattacks. A hybrid-triggering communication scheme was presented to save the limited communication resources and a controller was designed to ensure the closed-loop stability. This paper studies the compensation problem of discrete optimal control across a channel subject to sporadic signal dropout. Specifically, the NCS under study is closed by an unreliable communication network and compensation for the lost signal is intended. See Figure 1. The reader is referred to De Persis and Tesi [11] for a detailed comparison of dropout models. Sinopoli et al. [12] studied Kalman filtering for discrete-time linear systems with lossy intermittent observations. Two sets of evolution equations were presented for time-update and measurement-update respectively. Schenato et al. [13] extended the work to include optimal controls. Similar problems were studied by Imer et al. [14].
Two well-known compensating schemes: zero-input and hold-input are quite popular and have been and still are widely used. The former strategy refers to zero signal being utilized while the latter means the latest received signal being applied directly without any adjustment while performing compensation.
Based on the above classification, [13], [14] fall into the zero-input category. Bae et al [15] investigated the compensation problem of packet loss for a rehabilitation system that used a modified LQG controller with a disturbance observer employing the zero-input compensation. Shi et al. [16] studied networked control problems faced with packet loss, and utilized latest signal directly, belonging to the hold-input type. Yu et al. [17] adopted a switched system approach to tackle the stabilization problem of networked controls using directly latest control signals, i.e. the hold-input type. Moayedi et al. [18], [19] investigated networked LQG control across unreliable channels, and proposed a generalized hold-input strategy in which the zero-input and the hold-input were fused. The latest control therein is utilized for compensation but scaled by a parameter whose value falls in the range of [0, 1]. Zhang and Yu [20] approached the problem of exponential stabilization of networked systems subject to guaranteed cost and bounded packet losses. For compensation purpose the latest signals were used without any modification, (again the hold-input type) using dynamic output feedback controllers rather than static ones.
There exist other performance measures. For example, the comparison between the two popular compensatorszero-input and hold-input -made by Guo et al. [21] is from the H ∞ control's perspective. See Yang and Han [22], which is also a H ∞ control approach.
From implementation standpoint, static controller and compensator are far less complicated and much less expensive to implement than their time-varying counterparts. The issue is very critical and practical for real-time applications. As such, this paper only focuses on the class of static controllers and compensators. While the structural simplicity together with the comparatively less computation and execution burden of the zero-input and hold-input compensation schemes seem very appealing, it has been demonstrated by Schenato [23] that interestingly, none of the above two most popular compensating schemes can be claimed superior to the other. Guo et al. [21] and Gao et al. [24] came to the same conclusion. The reader is referred to [21], and [23], [24] for rigorous and in-depth analytical comparisons Still, a mystery regarding the superiority of these two widely used compensation strategies is awaiting to be unraveled.
Instead of using output feedback control directly, the recent account of Yu and Fu [25] considered a similar compensation strategy of this work but used state feedback under the LQG setting. Namely, it relies on an observer, which is dynamic, to provide an estimated full state for control purpose. For linear quadratic but non-Gaussian (LQnG) optimal control, the reader is referred to Battilotti et al. [26] wherein the unreliable network is represented by a Gilbert-Elliot channel. In particular, the packet dropout is modeled by a two-state Markov chain with known transition probability matrix. Their solution is obtained by substituting the Kalman predictor of the LQG control law with an optimal predictor. Another interesting strategy proposed by Maass et al. [27] is to construct an output estimator for lost one; as such, the method belongs to the dynamic class of compensators. The fading channels in Su and Chesi [28] were modeled as multiplicative white noise processes. A necessary and sufficient condition for the existence of their controllers was obtained by solving a convex optimization problem in the form of a semi-definite program. By a three-step procedure, they designed static output feedback controller directly to control systems over fading channels and the closed loop is stable in the mean square sense. Stabilization is purely controller based and no compensator for the lost signal was employed in their approach.
It is well known, however, that output feedback stabilization/optimal control is much more difficult than the 8936 VOLUME 8, 2020 state feedback counterpart [29], which also motivates this study. Current work therefore can be viewed in certain aspect as attempting to extend the classical output feedback LQR theory to include NCS. Under the output feedback LQR setting, a new design will be presented to solve the compensation problem involving an unreliable communication network. Recall an observer involves another system and it introduces extra dynamics, hence increasing overall system's complexity, computation burden, and implementation cost which, comparatively speaking, is disadvantageous, especially for large-scale systems. Issues of communication delay and reliability may also arise when connection of the observer with the rest subsystems is taken into account for real networked control systems. As an advantage, direct output feedback controller does not have these drawbacks.
The contribution of this paper is twofold. First, the work presents a two decision-variable approach to tackle the compensation problem that has not been treated in the literature for the optimal output feedback control and provides a rigorous solution that does not utilize an observer. Second, unlike many existing results that design the controller and compensator separately, this paper defines a unified performance measure for the NCS integrating them into a single framework and unraveling the aforementioned mystery, again under the direct output feedback control setting. In a later section, a new set of design equations will be derived and a convergent algorithm to solve it will be provided as well.
The rest of the paper is organized as follows. The system considered in this paper is given in Section II, where the mathematical model, basic assumptions, and objectives are presented. Also discussed in this section are the two commonly adopted compensation strategies, namely, the zero-input and the hold-input. In Section III, an integrated approach to tackle the compensation problem is proposed. An optimized design will be presented in Section IV, including the derivation of a new set of gain equations for the controller and the compensator satisfying optimality conditions concurrently. Section V is devoted to the development of a convergent algorithm to obtain the optimal gains. Section VI specializes to the generalized hold-input compensation policy. A numerical example is provided in Section VII to validate the new approach and compare performances of four different schemes. A conclusion is made in Section VIII.
Throughout the paper, x stands for the state of the dynamical system, u is the control input, y refers to system's output, L stands for output feedback gain, N refers to the compensator gain, γ k stands for an independent and identically distributed binary Bernoulli random variable, and E[·] stands for expected/mean value. Kronecker product is denoted by ⊗, Tr refers to trace of a square matrix vec stands for matrix vectorization, eigenvalue is denoted as λ, and matrix norm is represented by || · ||. Subscripts k and j represent the time instants for a dynamical system. Superscript ''+'' stands for the Moore-Penrose inverse.

II. THE SYSTEM MODEL AND CONTROL OBJECTIVES
Given in Figure 1 is the schematic of system configuration for the NCS under study. Specifically, the dynamics of the linear, discrete, time-invariant system studied in this paper is mathematically modeled as follows where x k is the state, u k is the control, y k is the output and L is a stabilizing output feedback gain to be designed. The superscript ''c'' associated with u k depicts that it is of the controller. It would become clear shortly from the contexts as to why distinction between two kinds of control signal is important and hence necessary. The above control law does not have any compensation mechanism and is often referred to as the zero-input policy [23] in the literature. Another popular and widely used scheme, often termed hold-input compensation policy [23] assumes the following form One may also notice that the above two well-known and popular compensation schemes lack for the rationale regarding how and why they may or may not work as they shed no insight on the degree of success or failure as far as the compensation is concerned. The existence of such a big gap is not too surprising as their gains are predetermined with no connection to the minimization of the LQ performance index.
The expected value of the binary Bernoulli random variable depicting the packet dropout [12], [13], [23] phenomenon can be expressed as where γ stands for dropout rate/probability. Since a random variable is involved, the results presented in this work should be interpreted in the expectation/average sense.

III. A UNIFIED DESIGN FRAMEWORK
Unlike many existing methods that design the controller and the compensator separately, the new control law employs the following structure unifying both designs where N stands for the compensator gain to be determined. VOLUME 8, 2020 The introduction of matrix gain N for compensation purpose will lead to certain degree of computation burden, but fortunately not heavy. Its execution time is assumed to be within a sampling period. Note that Nu k−1 is a substitute for the lost control signal of dimension m. Entry-wise speaking, computation of Nu k−1 includes two types of arithmetic operations and the respective number of which are (i) multiplication operation: m, and (ii) addition operation: m-1.
The proposed compensator is static rather than being dynamic, and matrix N is constant implying the off-line computation of this gain is conducted only once and its numeric value will then be stored in the actuator's buffer and utilized upon any occurrence of packet dropout.
Justification of employing such optimal compensator lies in the benefits it brings which usually outweigh the computation burden it poses. For the LQR problem, the cost is the only performance index to determine how good a design is. Conceivably, there exist a great many examples whose cost values, when compared to the optimal one, are excessively high, provided suboptimal static compensators, such as the zero-input and hold-input, are used. Furthermore, there should also exist cases where the suboptimal compensators even cease to function and fail to yield finite cost value, which are not acceptable, especially for critical networked control applications.
To begin, define a new augmented state [23] The dynamics of the augmented system can be written as Dynamic programming technique [12], [13], [23] will be employed. First, define the cost-to-go where subscript k and superscript f stand for the current and terminal time instant respectively The weighting matrices are chosen as Q j = Q and R j = R, except R f = 0. Combination of (7)-(9), and (11) leads to To proceed, suppose there exists a symmetric positive semi-definite matrix Y k such that the cost-to-go can be expressed in a quadratic form [13], [23], [29] as Define, for compact notations the closed loop matrix Substitution of (14) into (9) yields and the following expression Equation (18) should be solved backward-in-time with the following terminal/final condition

IV. THE OPTIMAL CONTROLLER AND COMPENSATOR
Current research is limited to the static class of compensation schemes only. Furthermore, it is typically assumed that the initial autocorrelation of the state is uniformly distributed on the surface of a unit sphere [29] satisfying the condition The subscript and superscript (denoting the starting and terminal time instants respectively) of cost-to-go are dropped for the infinite horizon case and the expected total cost can be rewritten as Conceivably, there exists a critical packet dropout rate beyond which J becomes infinite. As a general rule [23], Y k becomes larger (in the sense of its norm) when the packet dropout rate increases, and as a result, the cost will increase as well. Under mild conditions it is assumed that the expected total cost is finite and Y k exists [12], [13], [23] Throughout the paper, it is assumed that the system under consideration is stabilizable under the given packet dropout rate that falls below the critical rate. In other words, finding the critical dropout rate above which stabilization cannot be achieved is out of the scope of current study.
The steady state of Y k equation can be written as Suppose matrix Y is block-partitioned as Combination of (22) and (23) yields the following identities where γ stands for the packet dropout rate (probability). Note that an equivalent expression to (21), according to (10) and (14), can be written [29] as To meet the optimality conditions, the following 1 st order stationary equations regarding the gains must hold Following (24)-(29), one can obtain the optimal output feedback gain L and compensator gain N as follows: For compact notations, definẽ then the gain equations become Existence of the pseudo-inverse of LC is guaranteed from the following argument. First, perform singular value decomposition on where matrix U is unitary; matrix S (containing singular values of ) and matrix V (unitary) are partitioned accordingly in which S 1 contains nonzero singular values. The pseudo-inverse of LC can be obtained as Solution algorithm for these equations is given next.

V. A CONVERGENT ALGORITHM TO FIND THE GAINS
A convergent algorithm to solve the above design equations for the gains is developed in this section. Matrix vectorization (denoted by a matrix with an arrow on top and interchangeably by vec), Kronecker product (denoted by ⊗), matrix permutation (denoted by P), and gradient flow method are put together and utilized as technical tools for the algorithm development.
Firstly, gradient flows for the gain errors are derived. Secondly, definition of an error cost function is given. Thirdly, selection of gain update directions then follows. Finally, an iterative implementation procedure is provided.
Permutation matrices involving L and N are denoted as P L and P N as follows Suppose a stabilizing but non-optimal output feedback gain L and an arbitrarily guessed non-optimal compensator gain N are provided. Given the optimal gain equations (30)-(31), the errors and their associated vectorized counterparts can be defined, respectively as Following (22), one may obtain the gradient flow of Y aṡ Given below is its counterpart after vectorizatioṅ where F L and F N are defined, respectively as and Combination of (41), (43) and (45)-(47) leads to where matrix W is partitioned with block components as Now the gradient flow of the error cost can be rewritten aṡ where the gains' gradient flows are at our disposal. Recall the objective here is to get a negative gradient flow for the error cost. As such, one may simply choose The gradient flow of the error cost now becomeṡ The above expression depicts the error will eventually approach zero.
Remarks: Alternatively, one may treat equation (50) as a linear time-varying system and resort to the method by Chen and Kao [30] that solves a forward Riccati equation for the gains. Consider again the error dynamicṡ VOLUME 8, 2020 According to these authors [30], one may take the gain's gradient flow as where R 1 and R 2 two matrices at our disposal. Given below are some interesting properties of U It is shown in their work that above gradient flow will drive the gain error to zero exponentially. Since this method is more complex (as the gain's gradient flow involves the solution of a time-varying Riccati equation), it will not be pursued further and the reader is referred to [30] for more details.
As for implementation, finite-difference method, for example the simplest one -Adams Method -can be adopted to solve the differential equations of the gains numerically. To that purpose the differential equation (56) shall be replaced by a difference equation which stands for the gains' update direction. Equation (56) in that regard can be rewritten as where the subscript ''i'' refers to index of integration step with ρ standing for the step size. Some comments are in order. As can be seen from the above solution procedure, an initial arbitrarily chosen compensator gain N 0 and a stabilizing output feedback gain L 0 must be provided to start the algorithm. Finding such a stabilizing output feedback gain is never an easy task. This issue, however, is beyond the scope of this paper. Maintaining Schur stability of the closed loop matrix A ci at each iteration is crucial for the algorithm to work successfully. This explains why the step size ρ is introduced above as it prevents the gains' update from overshooting. Normally a sufficiently small number suffices. Again, how to find a non-overshooting step size is unfortunately out of the scope of present paper and will not be pursued further.

VI. THE GENERALIZED HOLD-INPUT COMPENSATOR
The generalized hold-input (GHI) compensation scheme proposed by Moayedi et al. [18], [19] takes the following form in which the compensator gain τ is a scalar In fact it turns out to be a specialized case when the structure of the matrix gain N is constrained to be One can also observe that the zero-input compensator and hold-input compensator are two specialized cases of GHI: This equation help explain why Moayedi et al. [18], [19] imposed the bounds on the scalar gain. The associated equations, except (24) and (30), will also be specialized accordingly to Following the same procedure, one may get the optimal compensator gain as An apparent advantage of the generalized hold-input scheme is its simplicity. However, as conceivable, performance sacrifice becomes inevitable in comparison with its full matrix counterpart resulting from the trade-off between structural complexity and performance. This point will become clear through the illustration of a numerical example given in a later section.

A. CONVERGENT ALGORITHM FOR GHI
The gradient flow of Y for the generalized hold-input scheme by Moayedi et al. [18], [19] becomeṡ Given below is its counterpart after vectorizatioṅ where F L and F τ are defined as follows: The feedback gain's error expression and its vectorized counterpart change tȯ The compensator gain error is defined as Its gradient flow is as followṡ The following identity proves to be useful in terms of vectorization involving the product of two matrices Replacement of the vectorized gradient flow of Y by equation (71), i.e. the vectorized gradient flows of L and τ then follows afterwards.
τ =τ Tr C T L TR LC Combination of (41), (43) and (45)-(47) leads to where matrix W is partitioned with block components Now the gradient flow of the error cost can be rewritten aṡ One may simply choose The gradient flow of the error cost now becomeṡ which implies the error will eventually approach zero.

B. THE BOUNDS IMPOSED ON THE SCALAR GAIN
It is worth pointing out that imposing bounds on the scalar compensator gain in the GHI is totally unnecessary. Conceivably, there exist such cases whose optimal compensator gain happens to be greater than unity or even negative. The downside of imposing such bounds on the gain is that its admissible range is largely reduced, hence will result in unnecessary sacrifice of optimality as well as performance degradation For this reason, the bound constraint imposed by Moayedi et al. [18], [19] should be eliminated. Another important implication of GHI is that the hold-input compensator will perform better than the zero-compensator does if the optimal scalar gains happens to be near unity. Conversely, the opposite will hold if the optimal scalar gains turns out to be near zero.

VII. VALIDATION
A numerical example is provided in this section to validate the new approach and compare its performance against that of three other schemes. The packet loss rate is assumed to be 20%. The weighting matrices for the state and control respectively in the LQ cost function, for simplicity, are set to be Q = I and R = I . The termination criterion of the algorithm is set to be the error falling below 10 −2 .
For performance comparison, the two commonly adopted compensation strategies and the GHI by Moayedi et al. [18], [19] are tested against the new one. Subscripts ''opt'' ''M '' ''h.i.'', and ''z.i.'' are used to distinguish them referring to the optimal one, the one by Moayedi et al. [18], [19], the hold-input, and the zero-input approach respectively. Trajectories of both state cost and control cost are also shown to provide a glimpse into the transient responses of all schemes. The initial state is normalized to have unit magnitude, i.e.||x 0 || = 1.
Example A, B T , C, x T 0 , λ(A), L T opt , L T h.i. , N opt , N h.i. , J opt , as shown at the bottom of this page.
It can be seen from the results the performance of the new method is the best among the four schemes as it has the minimum cost, which is expected.    Remarks: Apparently an optimized scalar gain as of Moayedi et al. [18], [19] is hardly competitive when compared to a matrix gain. This explains why it comes in second even though it outperforms the other two.
As for the transient response of this specific example, see Figure 2 - Figure 5 from a numerical simulation under packet dropout rate 20%. It is worth pointing out a Bernoulli process involving a random variable comes into the scenario; as such, any realization of this random variable is different from every other one due to its nature. In other words, Figure 2 - Figure 5 just represent the transient response from one realization among infinitely many. The optimality obtained herein therefore should be interpreted in the mean/average sense.
To further illustrate how the packet dropout rate γ quantitatively affects the cost function J , three more numerical experiments are conducted for the same system whose results are given below where only the computed cost values are listed due to limited space.
It is worth pointing out that when the dropout rate went up to 0.295, only the proposed optimal compensator functioned successfully (although yielding a large cost value); the other three schemes ceased to work and failed to yield finite cost values The importance of optimal compensation can be better appreciated through the latter case, as near the critical situation where the underlying communication network of the NCS is highly unreliable, the system may be destabilized if the optimal compensator is not used.

VIII. CONCLUSION
The compensation problem is studied in the context of NCS for discrete linear time-invariant optimal output feedback control across an unreliable link. An integrated design framework for the static class of controller and compensator is proposed and a new set of design equations is derived. A convergent algorithm is presented to solve the new design equations and a numerical example is given to validate the proposed approach and for performance comparisons.
It is shown that the new method performs the best compared to the commonly adopted zero-input, hold-input, and the generalized hold-input compensation strategies as the latter three turn out to be just special cases of the proposed one. In fact, the new compensator is the optimal one of its kind.