Introduction
What if we could take an approach to systems with uncertainties that cautiously probed for information that could improve our confidence in achieving a desirable result? Reflect for a minute on how our knowledge evolved early in the COVID-19 pandemic - there was a significant amount of uncertainty about the dynamics of the virus and the effectiveness of the various policies that were implemented to control it. Although we learned over time using historical data, taking an adaptive or passive learning approach, in this paper we’ll make a case for dual control’s planned learning approach for complex nonlinear system with significant uncertainties, using COVID-19 as an example.
Dual control approaches take actions to learn about uncertainties only when it is likely to result in a lower long-term cost, unlike adaptive control approaches. Existing adaptive controllers select control actions solely to achieve the control objective and only consider what can be learned about the uncertain parameters after the control actions have been made. This behaviour is known as passive adaptation, as no planning or effort went into achieving the reductions in the uncertainties. The ideal adaptive controller not only learns from the results of past decisions, but also plans future decisions to influence outcomes that will be favourable to learning [1]. This planned or active learning is what separates dual control from adaptive control, as uncertainty reduction is considered in the determination of the control policy and optimally balanced with achieving the control objective [2].
The dual objectives of objective function minimization and parameter identification was first identified by Feldbaum in 1960 [3], and he recognized that the optimal dual control problem could be solved using stochastic dynamic programming. Dual control exhibits three key features: caution, probing, and selectiveness [1]. Caution is the act of not varying the control significantly when uncertainty is high, probing is the act of actively modifying the control signal to reduce uncertainty, and selectiveness is the act of focusing on identifying parameters of higher importance. Unfortunately, stochastic dynamic programming involves solving the Bellman equations, which are generally computationally inefficient, and this problem is known as Bellman’s curse of dimensionality [4]. In order to achieve computationally tractable active learning, approximations of dual control must be developed that determine the relative importance of caution, probing, and selectiveness [5].
One approach to approximate dual control is to explicitly include extra terms in the objective function that produce the desired dual features of caution, probing, and selectiveness. Although this method is simple and can easily make existing optimal control methods dual, the relative importance of the regulation and identification functions must be defined [6]. This approach fixes the value of system information relative to the minimization of the rest of the cost function. Even with methods that vary the value of system information based on a specific measure, explicit approaches are likely to overvalue or undervalue system information at different points in time. For this reason, explicit dual approximations are considered inferior to implicit approaches [2].
In implicit approaches to the approximation of dual control, the relative importance of caution, probing, and selectiveness is determined by estimating the probability that high-cost probing actions will reduce total costs over the control horizon. This behaviour comes from approximating the Bellman equations and generally comes at the cost of higher computational effort compared to the explicit approach [7]. There are several existing dual implicit approaches (see [5] for a review), but each only considers limited control or parameter realizations to make the control problem tractable or is only applicable for systems with a limited number of states (e.g., [6], [8], [9]). The existing implicit dual approaches are limited by Bellman’s curse of dimensionality as they do not take a derivative-based approach in continuous state space.
In this paper, we present an implicit dual controller that avoids the curse of dimensionality by extending the iterative linear quadratic Gaussian (iLQG) method. An implicit dual controller that avoids this problem can be created by extending the iterative linear quadratic Gaussian method [10]. iLQG is a powerful control technique due to its ability to handle nonlinear and stochastic systems with multiplicative noise. Dual iLQG represents a fast (due to the linearization of the system) and feasible (due to working with derivatives about a nominal state-control trajectory) solution to implicit dual control of small and large systems while avoiding Bellman’s curse of dimensionality.
Dual iLQG can handle complex nonlinear systems with many uncertain parameters. To illustrate the significance of this approach to relevant world problems, we look to the recent example of determining COVID-19 public policy regulations as our knowledge evolved throughout 2020. The control of the COVID-19 pandemic continues to represent an enormous challenge for governments all over the world. Although policies to limit deaths and hospitalizations have been determined, such as enforcing mask use and lockdowns, these policies have social and economic costs, and precisely how effective these policies are is uncertain [11]. Additionally, there are uncertainties associated with the dynamics of the virus’s spread through populations [12]. Having an objective of balancing deaths and hospitalizations with social and economic costs in a dynamic system that has uncertain parameters makes COVID-19 a suitable application for dual control.
Methods
A. iLQG
iLQG extends the well-known linear quadratic regulator (LQR) algorithm to systems that are nonlinear, stochastic, and do not have quadratic costs [10]. To handle non-quadratic cost functions, the cost function is “quadratized” about the nominal state-control trajectory. In a similar way, the system dynamics are linearized about this nominal state-control trajectory, allowing a version of the Riccati equation to be applied to the system. Since the states are uncertain, the measurement dynamics are also included in the iLQG algorithm, and a filter is required to estimate the states’ mean values and covariances.
The forward integration of the system dynamics to obtain the nominal state trajectory from the nominal control trajectory and the calculations of the derivatives required for the linearization of the system dynamics as well as the “quadratization” of the cost function are grouped together in what’s known as a forward pass. The next step in the algorithm is the estimator, which uses the noisy measurements to infer the value of the states and their covariances. Next, a backward pass is required to calculate a quadratic approximation to the cost-to-go function, and then the optimal control deviations can be found.
Since the linear approximation of the nonlinear system dynamics loses accuracy for larger control deviations, a line search is implemented to iteratively reduce the control deviations if the solution’s estimated cost is not less than the cost of the initial nominal trajectory, improving the algorithm’s convergence.
Importantly, iLQG can handle multiplicative noise in both the state and measurement dynamics. Multiplicative noise occurs when there is an undesirable random signal that is multiplied by one or more states or controls of the system [13]. Many stochastic control approaches are based on only additive noise, where the noise term is independent of the state or control vectors, and iLQG’s more general approach is a beneficial feature.
B. Adaptive iLQG
To extend iLQG to uncertain systems in an adaptive manner, two changes are made to the closed-loop iLQG approach shown in Fig. 1. First, the initial estimates of the uncertain parameters are provided to the iLQG inner loop as constants. Second, in the outer loop, the parameters are estimated along with the states in the filter. This is done by concatenating the uncertain states and parameters together into a single augmented state vector,\begin{align*} \mathbf {x}^{p^{a}} = \begin{bmatrix} \mathbf {x}^{p} \\ \mathbf {d}^{p} \end{bmatrix} \tag {1}\end{align*}
\begin{align*} \Sigma ^{\mathbf {x}^{a}} = \begin{bmatrix} \Sigma ^{\mathbf {x}} \ & \quad \mathbf {0} \\ \mathbf {0} \ & \quad \Sigma ^{\mathbf {d}} \end{bmatrix} \tag {2}\end{align*}
\begin{align*} \mathbf {c}^{a} = \begin{bmatrix} \mathbf {c} \\ \mathbf {d} \end{bmatrix}. \tag {3}\end{align*}
This approach allows adaptive iLQG to update its parameter estimates in the outer loop after getting new measurements and pass them to the inner loop iLQG algorithm as constants. Critically, because the parameters are treated as constants in the inner loop iLQG algorithm, the uncertainty associated with the parameters does not impact the determination of the control policy, which is known as the certainty equivalence principle. These changes are shown in the system diagram for closed-loop adaptive iLQG in Fig. 2.
Closed-loop adaptive iLQG system diagram. Differences from the non-adaptive system diagram are shown in blue.
C. Dual iLQG
To extend this adaptive iLQG approach to be dual, the uncertainty associated with the parameters must influence the control policy. Therefore, instead of treating the parameters as constants in the inner loop iLQG algorithm, the parameters are treated as states and an augmented state vector is formed as shown in equation (1). An augmented state covariance is also created as shown in (2), and the augmented state dynamics must be formed as\begin{align*} d\mathbf {x}^{a^{p}} = \begin{bmatrix} \mathbf {f}(\mathbf {x}^{a^{p}}, \mathbf {u}^{p}) \\ \mathbf {f}_{d}(\mathbf {x}^{a^{p}}, \mathbf {u}^{p}) \end{bmatrix} dt = \mathbf {f}^{a}(\mathbf {x}^{a^{p}}, \mathbf {u}^{p}) dt \tag {4}\end{align*}
Closed-loop dual iLQG system diagram. Differences from the adaptive system diagram are shown in red.
In this way, the iLQG algorithm treats the parameters as unmeasured states and allows the control algorithm to predict how changes to the inputs and states can result in future reductions in the parameter uncertainty, through the backward pass of the cost-to-go function, to lower the total cost of the control trajectory. The parameter uncertainty influences the control actions through the inner loop Kalman filter gain, which impacts the estimates of the augmented state vector and the cost function at each time step. By calculating the derivatives of the cost function at each time step, dual iLQG can identify changes to the inputs that can decrease parameter uncertainties while also decreasing the total cost of the control trajectory. Dual iLQG is an improvement on wide-sense dual control [15] and the dual controller presented in [1] in that dual iLQG is an implicit dual approximation instead of an explicit one and can handle multiplicative noise which is common in many applications.
D. System Model
To model the spread of infectious diseases through populations, compartmental models have been used since 1927 [16]. These models divide a population into a series of compartments that represent stages of the disease and then describe how these groups change over time. One of the simplest of these compartment models was the SIR model [16] as shown in Fig. 4, where a population is divided into being Susceptible (S), Infected (I), or Removed (R) (that is, deceased). The movement of the population through these states can then be represented graphically as arrows between circles for each compartment (state), and these population flows can then be described with equations based on the states themselves as well as parameters and controls. These parameters generally represent infection and fatality rates for different populations, and the controls are methods of influencing these dynamics. For instance, the SIR model can be expressed as\begin{align*} \dot {S} & = -\frac {\beta S I}{N}, \tag {5}\\ \dot {I} & = \frac {\beta S I}{N} - \gamma I, \tag {6}\\ \dot {R} & = \gamma I, \tag {7}\end{align*}
These compartmental models have been tailored to better represent the dynamics of the COVID-19 virus, with different compartments or states being considered by different researchers. In [11], eight compartments are considered: Susceptible, Infected (those that are asymptomatic, infected, and undetected), Diagnosed (those that are asymptomatic, infected, and detected), Ailing (those that are symptomatic, infected, and undetected), Recognized (those that are symptomatic, infected, and detected), Threatened (those that are acutely symptomatic, infected, and detected), Healed (either after being detected or not, and assumed immune after being infected), and Extinct (and assumed to be detected), giving the SIDARTHE model shown in Fig. 5.
In the SIDARTHE model, the infected populations other than the threatened population infect the susceptible population with different rates of transmission. Once infected, 5 different transitions between populations are considered, shown in different colours in Fig. 5: developing symptoms, getting diagnosed, getting healed, becoming critical, or dying. With the parameters shown in Fig. 5 describing these transitions between the states, the SIDARTHE model can be expressed as\begin{align*} \dot {S} & = {-} S \left ({{ \alpha I + \beta D + \gamma A + \beta R }}\right ), \tag {8}\\ \dot {I} & = S \left ({{ \alpha I + \beta D + \gamma A + \beta R }}\right ) - \left ({{ \epsilon + \zeta + \gamma }}\right ) I, \tag {9}\\ \dot {D} & = \epsilon I - \left ({{ \zeta + \lambda }}\right ) D, \tag {10}\\ \dot {A} & = \zeta I - \left ({{ \theta + \mu + \kappa }}\right ) A, \tag {11}\\ \dot {R} & = \zeta D + \theta A - \left ({{ \mu + \kappa }}\right ) R, \tag {12}\\ \dot {T} & = \mu A + \mu R - \left ({{ \sigma \left ({{ T }}\right ) + \tau \left ({{ T }}\right ) }}\right ) T, \tag {13}\\ \dot {H} & = \lambda I + \lambda D + \kappa A + \kappa R + \sigma \left ({{ T }}\right ) T, \tag {14}\\ \dot {E} & = \tau \left ({{ T }}\right ) T, \tag {15}\end{align*}
In the SIDARTHE model, the recovery and mortality rates of the Threatened state is modeled as being dependent on the Threatened state in order to represent the impact of the health care system being overwhelmed. In [11], this effect was achieved in a two-step process, whereby a model was created where the Threatened population was divided into those in the limited-capacity intensive care unit (ICU) and those not, and then this model was simplified to maintain the eight states described above. The compartmental diagrams for these two steps can be seen in Fig. 6.
Partial compartmental diagram for considering the impact of an overwhelmed ICU [11].
Defining \begin{align*} \dot {T}_{1} & = \mu _{1} \left ({{ A + R }}\right ) - \left ({{ \sigma _{1} + \tau _{1} }}\right ) T_{1}, \tag {16}\\ \dot {T}_{2} & = \mu _{2} \left ({{ A + R }}\right ) - \left ({{ \sigma _{2} \left ({{ T_{2} }}\right ) + \tau _{2} \left ({{ T_{2} }}\right ) }}\right ) T_{2}, \tag {17}\end{align*}
\begin{align*} \tau (T) & = \frac {\mu _{1}}{\mu } \tau _{1} + \frac {\mu _{2}}{\mu } \tau _{2}, \tag {18}\\ \sigma (T) & = \frac {\mu _{1}}{\mu } \sigma _{1} + \frac {\mu _{2}}{\mu } \sigma _{2}, \tag {19}\end{align*}
\begin{align*} \tau (T)T & = \frac {\mu _{1}}{\mu } \tau _{1} T + \max \left \lbrace {{\frac {\mu _{2}}{\mu } \tau _{2}~T, \tau _{2} T_{ICU} }}\right . \\ & \quad \left .{{ + \tau _{crit} \left ({{ \frac {\mu _{2}}{\mu } T - T_{ICU} }}\right ) }}\right \rbrace , \tag {20}\\ \sigma (T)T & = \frac {\mu _{1}}{\mu } \sigma _{1} + \sigma _{2} \min \left \lbrace {{\frac {\mu _{2}}{\mu }, T_{ICU} }}\right \rbrace , \tag {21}\end{align*}
To implement controls in this model, public health policies are seen to have a direct influence on the infection rates \begin{align*} \alpha (t) & = \alpha _{\max } + \left ({{ \alpha _{\min } - \alpha _{\max } }}\right ) u(t), \tag {22}\\ \gamma (t) & = \gamma _{\max } + \left ({{ \gamma _{\min } - \gamma _{\max } }}\right ) u(t), \tag {23}\end{align*}
E. SIDARTHE Model Limitations
Although the SIDARTHE model is able to capture the major aspects of the dynamics of COVID-19, there are several simplifications that were made to make the model less complex. First of all, this model represents the population as static, other than deaths due to COVID-19. SIDARTHE does not include population changes due to travel, births, or non-COVID related deaths. A more complex model that did include non-COVID related deaths would also be an interesting application for dual control.
Additionally, the SIDARTHE model only represents the public health policies as a single control input, lumping the impact of these policies into a single value representing the severity of the restrictions. Although this makes the implementation of the model much easier, it would be difficult for health agencies to get precise recommendations from such a lumped term. Additionally, this single control action limits the potential probing that a dual control method could implement, as in reality there are multiple policies that can be varied over time. For instance, media campaigns, enforcing social distancing and mask use, performing asymptomatic testing, performing symptomatic testing, quarantining of positive cases, increasing non-ICU hospital resources, and increasing ICU resources could all be considered independent controls, and the extension of the SIDARTHE model to include these controls will be discussed in the next section.
F. Changes to the SIDARTHE Model
A limitation of the SIDARTHE model is that it only represents the public health policies as a single control input (u), lumping the impact of these policies into a single value representing the severity of the restrictions. We extended the SIDARTHE model to have separate control inputs representing different types of public health policies, including media campaigns (
To extend the SIDARTHE model to have separate control inputs representing different types of public health policies, a similar approach to that of equations (22) and (23) was used. The public health policies that were considered were media campaigns (\begin{align*} \alpha & = \alpha _{\max } + \left ({{ \alpha _{\min } - \alpha _{\max } }}\right ) \left ({{ \eta _{\alpha _{1}} u_{1} + \eta _{\alpha _{2}} u_{2} }}\right ), \tag {24}\\ \beta & = \beta _{\max } + \left ({{ \beta _{\min } - \beta _{\max } }}\right ) \left ({{ \eta _{\beta _{1}} u_{1} + \eta _{\beta _{2}} u_{2} + \eta _{\beta _{5}} u_{5} }}\right ), \tag {25}\\ \gamma & = \gamma _{\max } + \left ({{ \gamma _{\min } - \gamma _{\max } }}\right ) \left ({{ \eta _{\gamma _{1}} u_{1} + \eta _{\gamma _{2}} u_{2} }}\right ), \tag {26}\\ \epsilon & = \epsilon _{\min } + \left ({{ \epsilon _{\max } - \epsilon _{\min } }}\right ) \left ({{ \eta _{\epsilon _{1}} u_{1} + \eta _{\epsilon _{3}} u_{3} }}\right ), \tag {27}\\ \theta & = \theta _{\min } + \left ({{ \theta _{\max } - \theta _{\min } }}\right ) \left ({{ \eta _{\theta _{1}} u_{1} + \eta _{\theta _{4}} u_{4} }}\right ), \tag {28}\end{align*}
\begin{equation*} \eta _{\beta _{1}} + \eta _{\beta _{2}} + \eta _{\beta _{5}} = 1, \tag {29}\end{equation*}
G. Controller Initialization
The states are constrained to \begin{equation*} J(\mathbf {x}, \mathbf {u}) = c_{x} \mathbf {x} + c_{u} \mathbf {u}^{2}, \tag {30}\end{equation*}
Results
To show the ability of dual iLQG to handle many uncertain parameters, it was compared to iLQG and adaptive iLQG for the modified SIDARTHE model with 16 uncertain parameters. The SIDARTHE model used for this comparison used 5 control inputs,
The dual controller was able to reduce Threatened and Extinct cases of COVID-19, resulting in a final cost that was 6.4% lower than the adaptive controller, as shown in Fig. 7. The dual iLQG controller did not start outperforming the adaptive iLQG controller until day 18 and did not outperform the iLQG controller until day 50. By the end of the 80-day simulation, the adaptive iLQG controller only outperformed the iLQG controller by 0.4%.
Cost comparison between dual and adaptive iLQG on modified SIDARTHE COVID-19 model with 16 uncertain parameters.
The controls resulting from the three iLQG algorithms are shown in Fig. 8. With its low cost coefficient, all three controllers maximize the use of media campaigns (
Control comparison between dual and adaptive iLQG on modified SIDARTHE COVID-19 model with 16 uncertain parameters.
Figure 9 shows the true and estimated states for each algorithm, along with the covariance of the estimates. Although the dual controller had lower controls, the case counts for dual iLQG are lower than or similar to the adaptive case. The dual controller has fewer case counts for the 6th and 8th states (Threatened and Extinct) that have non-zero cost terms.
State comparison between dual and adaptive iLQG on modified SIDARTHE COVID-19 model with 16 uncertain parameters.
Comparing these results with a two-parameter simulation of the SIDARTHE COVID-19 model that had the same settings other than the number of parameters, a sense of how the dual iLQG algorithm overcomes the curse of dimensionality can be demonstrated. The simulation times for dual iLQG in this application with two- and sixteen-parameters were 973 and 2176 seconds respectively. The sixteen-parameter time is roughly two times greater than for the two-parameter case which has eight times fewer parameters. This is not the exponential increase in simulation time that would be expected if Bellman’s curse of dimensionality held.
Discussion
As shown in this example, dual iLQG is able to control systems with uncertain parameters in such a way that the reduction of uncertainty is implicit in the minimization of a given cost function. The dual goals of system identification and objective function minimization are often at odds with each other, and this tension creates a set of three features that characterize dual control. Dual control demonstrates caution, minimizing the magnitude of control actions when uncertainties are high, probing, varying the control actions to gain information about the uncertain parameters, and selectiveness, only seeking to gain information on those parameters which will are likely to cause a reduction in future costs [1].
This COVID-19 application demonstrates how dual iLQG can be used to inform government policy for uncertain nonlinear systems. The purpose of using COVID-19 as an application for dual control here is not to model the spread of the virus through a real population or to suggest that dual control could have saved lives, but to apply dual control to a complex nonlinear system that people understand. To that end, values for the states, parameters, and cost-weighting factors from the literature were used, and for the changes we made to the SIDARTHE COVID-19 model with the control effectiveness parameters, illustrative values were used.
In this example, dual iLQG outperformed adaptive iLQG by 6.4%, but it is not guaranteed to outperform other methods in every application as the probing nature of dual control does not always result in lower long-term costs. Dual iLQG also requires a dynamic model for each system that it is applied to, and the noise characteristics of that system need to be known. In the absence of a white box dynamic model, Gaussian Process regression can be used to provide the system dynamics [1], [17].
We also explored comparison with another /hlimplicit dual control method, dual multi-stage NMPC [18], but the method could only accommodate a small subset of the uncertain parameters used in this example. In a comparison of adaptive and dual iLQG with dual multi-stage NMPC on the SIDARTHE COVID-19 model but only two uncertain parameters, dual iLQG outperformed adaptive iLQG and dual MS-SP-NMPC by 28% and 66% respectively [19].
Since uncertainty is common to many systems, dual iLQG is applicable to a wide spectrum of applications. In the 2017 review paper “Systems and Control for the future of humanity, research agenda: Current and future roles, impact and grand challenges” by Lamnabhi-Lagarrigue et al. [20], three requirements are listed that “call for a paramount role for data-driven modeling, which must be integrated into virtually all future complex engineering systems.” Dual iLQG addresses two of these three requirements, specifically the need for models to adapt to changing parameters as well as the need for approaches that enable active learning, which is described as “probing the system/environment to generate sensor information that is suitable for model adaptation”. The review paper by Lamnabhi-Lagarrigue et al. mentions a number of high-impact system and control applications for the future, and dual iLQG could be applied to many of them, including automotive control, spacecraft control, renewable energy and smart grid, assistive devices for people with disabilities, and advanced building control. Considering dual iLQG’s ability to handle nonlinear systems and efficiently handle large systems, dual iLQG is in a favourable position to meet these requirements to solve current and future control problems.