Loading [MathJax]/jax/output/HTML-CSS/autoload/mtable.js
Beyond Adaptive Control: A Control Method for Nonlinear Systems With Uncertainties, Applied to COVID-19 | IEEE Journals & Magazine | IEEE Xplore

Beyond Adaptive Control: A Control Method for Nonlinear Systems With Uncertainties, Applied to COVID-19


Closed-loop dual iLQG system diagram.

Abstract:

When the outcome of an action cannot be precisely known, it is difficult to select actions to get a desired result. This problem can be caused by uncertain parameters, su...Show More

Abstract:

When the outcome of an action cannot be precisely known, it is difficult to select actions to get a desired result. This problem can be caused by uncertain parameters, such as not knowing how slippery a road is when driving in icy conditions. Adaptive control techniques can estimate uncertainties using past measurements, but the confidence in these estimates is not used to inform future control actions. Dual control, an improvement on adaptive control, can estimate the reductions in uncertainty that will result from control actions and probes the system to identify the uncertain parameters to a sufficient level to optimize the desired goal. However, existing dual control approaches have been computationally intractable for all but the simplest of control problems. Here we show that our novel and computationally efficient dual iterative linear quadratic Gaussian controller outcompetes an adaptive iterative linear quadratic Gaussian controller, using the control of COVID-19 as an example application. The dual controller performed 6.4% better than the adaptive controller in selecting policies to minimize the social and economic costs associated with both the policies and case counts using an established model of COVID-19 with sixteen uncertain parameters. Our results demonstrate that dual control is a powerful control tool that can handle complex, nonlinear, and stochastic systems in a robust and actively adaptive way while improving their performance.
Closed-loop dual iLQG system diagram.
Published in: IEEE Access ( Volume: 13)
Page(s): 34667 - 34676
Date of Publication: 13 February 2025
Electronic ISSN: 2169-3536

Funding Agency:


SECTION I.

Introduction

What if we could take an approach to systems with uncertainties that cautiously probed for information that could improve our confidence in achieving a desirable result? Reflect for a minute on how our knowledge evolved early in the COVID-19 pandemic - there was a significant amount of uncertainty about the dynamics of the virus and the effectiveness of the various policies that were implemented to control it. Although we learned over time using historical data, taking an adaptive or passive learning approach, in this paper we’ll make a case for dual control’s planned learning approach for complex nonlinear system with significant uncertainties, using COVID-19 as an example.

Dual control approaches take actions to learn about uncertainties only when it is likely to result in a lower long-term cost, unlike adaptive control approaches. Existing adaptive controllers select control actions solely to achieve the control objective and only consider what can be learned about the uncertain parameters after the control actions have been made. This behaviour is known as passive adaptation, as no planning or effort went into achieving the reductions in the uncertainties. The ideal adaptive controller not only learns from the results of past decisions, but also plans future decisions to influence outcomes that will be favourable to learning [1]. This planned or active learning is what separates dual control from adaptive control, as uncertainty reduction is considered in the determination of the control policy and optimally balanced with achieving the control objective [2].

The dual objectives of objective function minimization and parameter identification was first identified by Feldbaum in 1960 [3], and he recognized that the optimal dual control problem could be solved using stochastic dynamic programming. Dual control exhibits three key features: caution, probing, and selectiveness [1]. Caution is the act of not varying the control significantly when uncertainty is high, probing is the act of actively modifying the control signal to reduce uncertainty, and selectiveness is the act of focusing on identifying parameters of higher importance. Unfortunately, stochastic dynamic programming involves solving the Bellman equations, which are generally computationally inefficient, and this problem is known as Bellman’s curse of dimensionality [4]. In order to achieve computationally tractable active learning, approximations of dual control must be developed that determine the relative importance of caution, probing, and selectiveness [5].

One approach to approximate dual control is to explicitly include extra terms in the objective function that produce the desired dual features of caution, probing, and selectiveness. Although this method is simple and can easily make existing optimal control methods dual, the relative importance of the regulation and identification functions must be defined [6]. This approach fixes the value of system information relative to the minimization of the rest of the cost function. Even with methods that vary the value of system information based on a specific measure, explicit approaches are likely to overvalue or undervalue system information at different points in time. For this reason, explicit dual approximations are considered inferior to implicit approaches [2].

In implicit approaches to the approximation of dual control, the relative importance of caution, probing, and selectiveness is determined by estimating the probability that high-cost probing actions will reduce total costs over the control horizon. This behaviour comes from approximating the Bellman equations and generally comes at the cost of higher computational effort compared to the explicit approach [7]. There are several existing dual implicit approaches (see [5] for a review), but each only considers limited control or parameter realizations to make the control problem tractable or is only applicable for systems with a limited number of states (e.g., [6], [8], [9]). The existing implicit dual approaches are limited by Bellman’s curse of dimensionality as they do not take a derivative-based approach in continuous state space.

In this paper, we present an implicit dual controller that avoids the curse of dimensionality by extending the iterative linear quadratic Gaussian (iLQG) method. An implicit dual controller that avoids this problem can be created by extending the iterative linear quadratic Gaussian method [10]. iLQG is a powerful control technique due to its ability to handle nonlinear and stochastic systems with multiplicative noise. Dual iLQG represents a fast (due to the linearization of the system) and feasible (due to working with derivatives about a nominal state-control trajectory) solution to implicit dual control of small and large systems while avoiding Bellman’s curse of dimensionality.

Dual iLQG can handle complex nonlinear systems with many uncertain parameters. To illustrate the significance of this approach to relevant world problems, we look to the recent example of determining COVID-19 public policy regulations as our knowledge evolved throughout 2020. The control of the COVID-19 pandemic continues to represent an enormous challenge for governments all over the world. Although policies to limit deaths and hospitalizations have been determined, such as enforcing mask use and lockdowns, these policies have social and economic costs, and precisely how effective these policies are is uncertain [11]. Additionally, there are uncertainties associated with the dynamics of the virus’s spread through populations [12]. Having an objective of balancing deaths and hospitalizations with social and economic costs in a dynamic system that has uncertain parameters makes COVID-19 a suitable application for dual control.

SECTION II.

Methods

A. iLQG

iLQG extends the well-known linear quadratic regulator (LQR) algorithm to systems that are nonlinear, stochastic, and do not have quadratic costs [10]. To handle non-quadratic cost functions, the cost function is “quadratized” about the nominal state-control trajectory. In a similar way, the system dynamics are linearized about this nominal state-control trajectory, allowing a version of the Riccati equation to be applied to the system. Since the states are uncertain, the measurement dynamics are also included in the iLQG algorithm, and a filter is required to estimate the states’ mean values and covariances.

The forward integration of the system dynamics to obtain the nominal state trajectory from the nominal control trajectory and the calculations of the derivatives required for the linearization of the system dynamics as well as the “quadratization” of the cost function are grouped together in what’s known as a forward pass. The next step in the algorithm is the estimator, which uses the noisy measurements to infer the value of the states and their covariances. Next, a backward pass is required to calculate a quadratic approximation to the cost-to-go function, and then the optimal control deviations can be found.

Since the linear approximation of the nonlinear system dynamics loses accuracy for larger control deviations, a line search is implemented to iteratively reduce the control deviations if the solution’s estimated cost is not less than the cost of the initial nominal trajectory, improving the algorithm’s convergence.

Importantly, iLQG can handle multiplicative noise in both the state and measurement dynamics. Multiplicative noise occurs when there is an undesirable random signal that is multiplied by one or more states or controls of the system [13]. Many stochastic control approaches are based on only additive noise, where the noise term is independent of the state or control vectors, and iLQG’s more general approach is a beneficial feature.

B. Adaptive iLQG

To extend iLQG to uncertain systems in an adaptive manner, two changes are made to the closed-loop iLQG approach shown in Fig. 1. First, the initial estimates of the uncertain parameters are provided to the iLQG inner loop as constants. Second, in the outer loop, the parameters are estimated along with the states in the filter. This is done by concatenating the uncertain states and parameters together into a single augmented state vector,\begin{align*} \mathbf {x}^{p^{a}} = \begin{bmatrix} \mathbf {x}^{p} \\ \mathbf {d}^{p} \end{bmatrix} \tag {1}\end{align*}

View SourceRight-click on figure for MathML and additional features.where \mathbf {x}^{p} is the state vector, d is the vector of uncertain parameters, and \mathbf {x}^{p^{a}}a is the augmented state vector. The covariance matrices are also combined in a block diagonal manner,\begin{align*} \Sigma ^{\mathbf {x}^{a}} = \begin{bmatrix} \Sigma ^{\mathbf {x}} \ & \quad \mathbf {0} \\ \mathbf {0} \ & \quad \Sigma ^{\mathbf {d}} \end{bmatrix} \tag {2}\end{align*}
View SourceRight-click on figure for MathML and additional features.
where \Sigma ^{\mathbf {x}} is the state covariance matrix, \Sigma ^{\mathbf {d}} is the parameter covariance matrix, and \Sigma ^{\mathbf {x}^{a}} is the augmented covariance matrix. The augmentation of the state vector with the parameters for the closed-loop filter is an approach that is known as joint simultaneous state and parameter estimation [14]. To pass the parameters to the inner loop iLQG algorithm, an augmented constants vector \textbf {c}^{a} is created similar to the augmented state vector, through the concatenation of the constants c and the parameters d,\begin{align*} \mathbf {c}^{a} = \begin{bmatrix} \mathbf {c} \\ \mathbf {d} \end{bmatrix}. \tag {3}\end{align*}
View SourceRight-click on figure for MathML and additional features.

FIGURE 1. - Closed-loop iLQG system diagram.
FIGURE 1.

Closed-loop iLQG system diagram.

This approach allows adaptive iLQG to update its parameter estimates in the outer loop after getting new measurements and pass them to the inner loop iLQG algorithm as constants. Critically, because the parameters are treated as constants in the inner loop iLQG algorithm, the uncertainty associated with the parameters does not impact the determination of the control policy, which is known as the certainty equivalence principle. These changes are shown in the system diagram for closed-loop adaptive iLQG in Fig. 2.

FIGURE 2. - Closed-loop adaptive iLQG system diagram. Differences from the non-adaptive system diagram are shown in blue.
FIGURE 2.

Closed-loop adaptive iLQG system diagram. Differences from the non-adaptive system diagram are shown in blue.

C. Dual iLQG

To extend this adaptive iLQG approach to be dual, the uncertainty associated with the parameters must influence the control policy. Therefore, instead of treating the parameters as constants in the inner loop iLQG algorithm, the parameters are treated as states and an augmented state vector is formed as shown in equation (1). An augmented state covariance is also created as shown in (2), and the augmented state dynamics must be formed as\begin{align*} d\mathbf {x}^{a^{p}} = \begin{bmatrix} \mathbf {f}(\mathbf {x}^{a^{p}}, \mathbf {u}^{p}) \\ \mathbf {f}_{d}(\mathbf {x}^{a^{p}}, \mathbf {u}^{p}) \end{bmatrix} dt = \mathbf {f}^{a}(\mathbf {x}^{a^{p}}, \mathbf {u}^{p}) dt \tag {4}\end{align*}

View SourceRight-click on figure for MathML and additional features.where \mathbf {f}_{d}(\mathbf {x}^{a^{p}}, \mathbf {u}^{p}) is the parameter dynamics and \mathbf {f}^{a}(\mathbf {x}^{a^{p}}, \mathbf {u}^{p}) is the augmented system dynamics. The inclusion of the parameter dynamics makes dual iLQG able to handle time-varying parameters. These changes are represented in the system diagram in Fig. 3.

FIGURE 3. - Closed-loop dual iLQG system diagram. Differences from the adaptive system diagram are shown in red.
FIGURE 3.

Closed-loop dual iLQG system diagram. Differences from the adaptive system diagram are shown in red.

In this way, the iLQG algorithm treats the parameters as unmeasured states and allows the control algorithm to predict how changes to the inputs and states can result in future reductions in the parameter uncertainty, through the backward pass of the cost-to-go function, to lower the total cost of the control trajectory. The parameter uncertainty influences the control actions through the inner loop Kalman filter gain, which impacts the estimates of the augmented state vector and the cost function at each time step. By calculating the derivatives of the cost function at each time step, dual iLQG can identify changes to the inputs that can decrease parameter uncertainties while also decreasing the total cost of the control trajectory. Dual iLQG is an improvement on wide-sense dual control [15] and the dual controller presented in [1] in that dual iLQG is an implicit dual approximation instead of an explicit one and can handle multiplicative noise which is common in many applications.

D. System Model

To model the spread of infectious diseases through populations, compartmental models have been used since 1927 [16]. These models divide a population into a series of compartments that represent stages of the disease and then describe how these groups change over time. One of the simplest of these compartment models was the SIR model [16] as shown in Fig. 4, where a population is divided into being Susceptible (S), Infected (I), or Removed (R) (that is, deceased). The movement of the population through these states can then be represented graphically as arrows between circles for each compartment (state), and these population flows can then be described with equations based on the states themselves as well as parameters and controls. These parameters generally represent infection and fatality rates for different populations, and the controls are methods of influencing these dynamics. For instance, the SIR model can be expressed as\begin{align*} \dot {S} & = -\frac {\beta S I}{N}, \tag {5}\\ \dot {I} & = \frac {\beta S I}{N} - \gamma I, \tag {6}\\ \dot {R} & = \gamma I, \tag {7}\end{align*}

View SourceRight-click on figure for MathML and additional features.where \beta is the transmission rate between the infected and susceptible populations, \gamma is the inverse infectious period, and N is the total population (S + I + R = N ).

FIGURE 4. - SIR model compartmental diagram [16].
FIGURE 4.

SIR model compartmental diagram [16].

These compartmental models have been tailored to better represent the dynamics of the COVID-19 virus, with different compartments or states being considered by different researchers. In [11], eight compartments are considered: Susceptible, Infected (those that are asymptomatic, infected, and undetected), Diagnosed (those that are asymptomatic, infected, and detected), Ailing (those that are symptomatic, infected, and undetected), Recognized (those that are symptomatic, infected, and detected), Threatened (those that are acutely symptomatic, infected, and detected), Healed (either after being detected or not, and assumed immune after being infected), and Extinct (and assumed to be detected), giving the SIDARTHE model shown in Fig. 5.

FIGURE 5. - SIDARTHE model compartmental diagram [11].
FIGURE 5.

SIDARTHE model compartmental diagram [11].

In the SIDARTHE model, the infected populations other than the threatened population infect the susceptible population with different rates of transmission. Once infected, 5 different transitions between populations are considered, shown in different colours in Fig. 5: developing symptoms, getting diagnosed, getting healed, becoming critical, or dying. With the parameters shown in Fig. 5 describing these transitions between the states, the SIDARTHE model can be expressed as\begin{align*} \dot {S} & = {-} S \left ({{ \alpha I + \beta D + \gamma A + \beta R }}\right ), \tag {8}\\ \dot {I} & = S \left ({{ \alpha I + \beta D + \gamma A + \beta R }}\right ) - \left ({{ \epsilon + \zeta + \gamma }}\right ) I, \tag {9}\\ \dot {D} & = \epsilon I - \left ({{ \zeta + \lambda }}\right ) D, \tag {10}\\ \dot {A} & = \zeta I - \left ({{ \theta + \mu + \kappa }}\right ) A, \tag {11}\\ \dot {R} & = \zeta D + \theta A - \left ({{ \mu + \kappa }}\right ) R, \tag {12}\\ \dot {T} & = \mu A + \mu R - \left ({{ \sigma \left ({{ T }}\right ) + \tau \left ({{ T }}\right ) }}\right ) T, \tag {13}\\ \dot {H} & = \lambda I + \lambda D + \kappa A + \kappa R + \sigma \left ({{ T }}\right ) T, \tag {14}\\ \dot {E} & = \tau \left ({{ T }}\right ) T, \tag {15}\end{align*}

View SourceRight-click on figure for MathML and additional features.where the description of the parameters can be found in Table 1.

TABLE 1 Parameters for the SIDARTHE Model
Table 1- Parameters for the SIDARTHE Model

In the SIDARTHE model, the recovery and mortality rates of the Threatened state is modeled as being dependent on the Threatened state in order to represent the impact of the health care system being overwhelmed. In [11], this effect was achieved in a two-step process, whereby a model was created where the Threatened population was divided into those in the limited-capacity intensive care unit (ICU) and those not, and then this model was simplified to maintain the eight states described above. The compartmental diagrams for these two steps can be seen in Fig. 6.

FIGURE 6. - Partial compartmental diagram for considering the impact of an overwhelmed ICU [11].
FIGURE 6.

Partial compartmental diagram for considering the impact of an overwhelmed ICU [11].

Defining T_{1} as the Threatened population that do not require ICU treatment and T_{2} as those that do, and assuming that there are no transfers between these two populations, the dynamics of these states can be represented as\begin{align*} \dot {T}_{1} & = \mu _{1} \left ({{ A + R }}\right ) - \left ({{ \sigma _{1} + \tau _{1} }}\right ) T_{1}, \tag {16}\\ \dot {T}_{2} & = \mu _{2} \left ({{ A + R }}\right ) - \left ({{ \sigma _{2} \left ({{ T_{2} }}\right ) + \tau _{2} \left ({{ T_{2} }}\right ) }}\right ) T_{2}, \tag {17}\end{align*}

View SourceRight-click on figure for MathML and additional features.where \sigma _{1} and \tau _{1} are independent parameters and \sigma _{2} and \tau _{2} are dependent on T_{2} . This model is approximated to a lumped model as shown on the right of Fig. 6, for a defined ICU capacity of T_{ICU} , by assuming that if T_{2} \leq T_{ICU} ,\begin{align*} \tau (T) & = \frac {\mu _{1}}{\mu } \tau _{1} + \frac {\mu _{2}}{\mu } \tau _{2}, \tag {18}\\ \sigma (T) & = \frac {\mu _{1}}{\mu } \sigma _{1} + \frac {\mu _{2}}{\mu } \sigma _{2}, \tag {19}\end{align*}
View SourceRight-click on figure for MathML and additional features.
but if T_{2} \gt T_{ICU} , \tau (T) increases to \tau _{crit} for the remaining T_{2} population who require ICU treatment but cannot access it, and the recovery rate for this group also drops to 0. The \tau (T)T and \sigma (T)T terms can therefore be represented as\begin{align*} \tau (T)T & = \frac {\mu _{1}}{\mu } \tau _{1} T + \max \left \lbrace {{\frac {\mu _{2}}{\mu } \tau _{2}~T, \tau _{2} T_{ICU} }}\right . \\ & \quad \left .{{ + \tau _{crit} \left ({{ \frac {\mu _{2}}{\mu } T - T_{ICU} }}\right ) }}\right \rbrace , \tag {20}\\ \sigma (T)T & = \frac {\mu _{1}}{\mu } \sigma _{1} + \sigma _{2} \min \left \lbrace {{\frac {\mu _{2}}{\mu }, T_{ICU} }}\right \rbrace , \tag {21}\end{align*}
View SourceRight-click on figure for MathML and additional features.
and used in equations (13) to (15).

To implement controls in this model, public health policies are seen to have a direct influence on the infection rates \alpha and \gamma , and this relationship can be modeled as\begin{align*} \alpha (t) & = \alpha _{\max } + \left ({{ \alpha _{\min } - \alpha _{\max } }}\right ) u(t), \tag {22}\\ \gamma (t) & = \gamma _{\max } + \left ({{ \gamma _{\min } - \gamma _{\max } }}\right ) u(t), \tag {23}\end{align*}

View SourceRight-click on figure for MathML and additional features.where u(t) is constrained to [0, 1] and can be used to vary the infection rates between minimum values of \alpha _{\min } and \gamma _{\min } with u = 1 and maximum values of \alpha _{\max } and \gamma _{\max } with u = 0 .

E. SIDARTHE Model Limitations

Although the SIDARTHE model is able to capture the major aspects of the dynamics of COVID-19, there are several simplifications that were made to make the model less complex. First of all, this model represents the population as static, other than deaths due to COVID-19. SIDARTHE does not include population changes due to travel, births, or non-COVID related deaths. A more complex model that did include non-COVID related deaths would also be an interesting application for dual control.

Additionally, the SIDARTHE model only represents the public health policies as a single control input, lumping the impact of these policies into a single value representing the severity of the restrictions. Although this makes the implementation of the model much easier, it would be difficult for health agencies to get precise recommendations from such a lumped term. Additionally, this single control action limits the potential probing that a dual control method could implement, as in reality there are multiple policies that can be varied over time. For instance, media campaigns, enforcing social distancing and mask use, performing asymptomatic testing, performing symptomatic testing, quarantining of positive cases, increasing non-ICU hospital resources, and increasing ICU resources could all be considered independent controls, and the extension of the SIDARTHE model to include these controls will be discussed in the next section.

F. Changes to the SIDARTHE Model

A limitation of the SIDARTHE model is that it only represents the public health policies as a single control input (u), lumping the impact of these policies into a single value representing the severity of the restrictions. We extended the SIDARTHE model to have separate control inputs representing different types of public health policies, including media campaigns (u_{1} ), enforcing social distancing and mask use (u_{2} ), performing asymptomatic testing (u_{3} ), performing symptomatic testing (u_{4} ), and quarantining of positive cases (u_{5} ).

To extend the SIDARTHE model to have separate control inputs representing different types of public health policies, a similar approach to that of equations (22) and (23) was used. The public health policies that were considered were media campaigns (u_{1} ), enforcing social distancing and mask use (u_{2} ), performing asymptomatic testing (u_{3} ), performing symptomatic testing (u_{4} ), and quarantining of positive cases (u_{5} ). Since many of these public health policies influence more than one parameter in the SIDARTHE model to varying levels, effectiveness parameters were introduced. For instance, both media campaigns and enforcing social distancing and mask use will lower \alpha , but the media campaigns may do so less effectively. The impact of the control inputs on the SIDARTHE model parameters can therefore be represented as\begin{align*} \alpha & = \alpha _{\max } + \left ({{ \alpha _{\min } - \alpha _{\max } }}\right ) \left ({{ \eta _{\alpha _{1}} u_{1} + \eta _{\alpha _{2}} u_{2} }}\right ), \tag {24}\\ \beta & = \beta _{\max } + \left ({{ \beta _{\min } - \beta _{\max } }}\right ) \left ({{ \eta _{\beta _{1}} u_{1} + \eta _{\beta _{2}} u_{2} + \eta _{\beta _{5}} u_{5} }}\right ), \tag {25}\\ \gamma & = \gamma _{\max } + \left ({{ \gamma _{\min } - \gamma _{\max } }}\right ) \left ({{ \eta _{\gamma _{1}} u_{1} + \eta _{\gamma _{2}} u_{2} }}\right ), \tag {26}\\ \epsilon & = \epsilon _{\min } + \left ({{ \epsilon _{\max } - \epsilon _{\min } }}\right ) \left ({{ \eta _{\epsilon _{1}} u_{1} + \eta _{\epsilon _{3}} u_{3} }}\right ), \tag {27}\\ \theta & = \theta _{\min } + \left ({{ \theta _{\max } - \theta _{\min } }}\right ) \left ({{ \eta _{\theta _{1}} u_{1} + \eta _{\theta _{4}} u_{4} }}\right ), \tag {28}\end{align*}

View SourceRight-click on figure for MathML and additional features.where the \eta terms are effectiveness factors for the controls, describing the influence they have on the SIDARTHE model parameters. For each of the SIDARTHE model parameters, the \eta terms sum to 1 such that the maximum and minimum values are maintained. For instance,\begin{equation*} \eta _{\beta _{1}} + \eta _{\beta _{2}} + \eta _{\beta _{5}} = 1, \tag {29}\end{equation*}
View SourceRight-click on figure for MathML and additional features.
and therefore when these effectiveness factors are used as uncertain parameters in dual iLQG, one of the factors from each set can be determined from the others.

G. Controller Initialization

The states are constrained to [{0, 82999999}] and the true and estimated initial values of the state vector are both [{82998999, 1000,0, 0, 0, 0, 0, 0}]^{\top } . The 5 control inputs are constrained to [{0, 1}] . The true and initial estimated values of the uncertain parameters used in this simulation are shown in Table 2. This system is simulated over 80 time steps that are 1.0 days long, with a rolling horizon of 40 time steps. The Diagnosed, Recognized, Threatened, Extinct states are measured with a noise scaling factor of diag([{10, 7.5, 5, 2.5}]) , and the noise scaling factor for dual iLQG was set to 10^{-5} for the state and parameter dynamics and the noise was not implemented in the true system in the outer loop to enable a deterministic analysis of the controller’s behaviour as compared across several controllers. The cost function is given as\begin{equation*} J(\mathbf {x}, \mathbf {u}) = c_{x} \mathbf {x} + c_{u} \mathbf {u}^{2}, \tag {30}\end{equation*}

View SourceRight-click on figure for MathML and additional features.where c_{x} = [{0, 0, 0, 0, 0, 0.033, 0, 0.267}] , and c_{u} = [{0.01, 10, 0.75, 0.75, 0.5}] . This is a representative cost function; its values would have to be set by policymakers, but the benefits illustrated in this work are robust to various cost functions.

TABLE 2 Parameter Values for the SIDARTHE Comparison With Sixteen Uncertainties
Table 2- Parameter Values for the SIDARTHE Comparison With Sixteen Uncertainties

SECTION III.

Results

To show the ability of dual iLQG to handle many uncertain parameters, it was compared to iLQG and adaptive iLQG for the modified SIDARTHE model with 16 uncertain parameters. The SIDARTHE model used for this comparison used 5 control inputs, u_{1} to u_{5} as described in equations (24) to (28).

The dual controller was able to reduce Threatened and Extinct cases of COVID-19, resulting in a final cost that was 6.4% lower than the adaptive controller, as shown in Fig. 7. The dual iLQG controller did not start outperforming the adaptive iLQG controller until day 18 and did not outperform the iLQG controller until day 50. By the end of the 80-day simulation, the adaptive iLQG controller only outperformed the iLQG controller by 0.4%.

FIGURE 7. - Cost comparison between dual and adaptive iLQG on modified SIDARTHE COVID-19 model with 16 uncertain parameters.
FIGURE 7.

Cost comparison between dual and adaptive iLQG on modified SIDARTHE COVID-19 model with 16 uncertain parameters.

The controls resulting from the three iLQG algorithms are shown in Fig. 8. With its low cost coefficient, all three controllers maximize the use of media campaigns (u_{1} ), with only the iLQG controller stopping them for a single time step. Enforcing social distancing and mask use (u_{2} ) had the highest cost coefficient, and while on average the dual and adaptive controllers used this control nearly the same amount (0.7% difference), the iLQG controller used it 9% more. The use of asymptomatic testing (u_{3} ) had the highest discrepancy between the controllers of any of the controls. Although the average use of asymptomatic testing between the adaptive iLQG and dual iLQG controllers only has a 0.8% difference, they used it 80% more than the iLQG controller. For symptomatic testing (u_{4} ), the three controllers show roughly similar results although the dual iLQG controller used it on average 12.6% less than the adaptive iLQG controller but 65.9% more often than the iLQG controller. Lastly, the quarantining of positive cases (u_{5} ) was used equally by the three controllers for the first 37 days, after which the adaptive iLQG controller used it 2.8% less on average, and the iLQG controller used 76% less.

FIGURE 8. - Control comparison between dual and adaptive iLQG on modified SIDARTHE COVID-19 model with 16 uncertain parameters.
FIGURE 8.

Control comparison between dual and adaptive iLQG on modified SIDARTHE COVID-19 model with 16 uncertain parameters.

Figure 9 shows the true and estimated states for each algorithm, along with the covariance of the estimates. Although the dual controller had lower controls, the case counts for dual iLQG are lower than or similar to the adaptive case. The dual controller has fewer case counts for the 6th and 8th states (Threatened and Extinct) that have non-zero cost terms.

FIGURE 9. - State comparison between dual and adaptive iLQG on modified SIDARTHE COVID-19 model with 16 uncertain parameters.
FIGURE 9.

State comparison between dual and adaptive iLQG on modified SIDARTHE COVID-19 model with 16 uncertain parameters.

Comparing these results with a two-parameter simulation of the SIDARTHE COVID-19 model that had the same settings other than the number of parameters, a sense of how the dual iLQG algorithm overcomes the curse of dimensionality can be demonstrated. The simulation times for dual iLQG in this application with two- and sixteen-parameters were 973 and 2176 seconds respectively. The sixteen-parameter time is roughly two times greater than for the two-parameter case which has eight times fewer parameters. This is not the exponential increase in simulation time that would be expected if Bellman’s curse of dimensionality held.

SECTION IV.

Discussion

As shown in this example, dual iLQG is able to control systems with uncertain parameters in such a way that the reduction of uncertainty is implicit in the minimization of a given cost function. The dual goals of system identification and objective function minimization are often at odds with each other, and this tension creates a set of three features that characterize dual control. Dual control demonstrates caution, minimizing the magnitude of control actions when uncertainties are high, probing, varying the control actions to gain information about the uncertain parameters, and selectiveness, only seeking to gain information on those parameters which will are likely to cause a reduction in future costs [1].

This COVID-19 application demonstrates how dual iLQG can be used to inform government policy for uncertain nonlinear systems. The purpose of using COVID-19 as an application for dual control here is not to model the spread of the virus through a real population or to suggest that dual control could have saved lives, but to apply dual control to a complex nonlinear system that people understand. To that end, values for the states, parameters, and cost-weighting factors from the literature were used, and for the changes we made to the SIDARTHE COVID-19 model with the control effectiveness parameters, illustrative values were used.

In this example, dual iLQG outperformed adaptive iLQG by 6.4%, but it is not guaranteed to outperform other methods in every application as the probing nature of dual control does not always result in lower long-term costs. Dual iLQG also requires a dynamic model for each system that it is applied to, and the noise characteristics of that system need to be known. In the absence of a white box dynamic model, Gaussian Process regression can be used to provide the system dynamics [1], [17].

We also explored comparison with another /hlimplicit dual control method, dual multi-stage NMPC [18], but the method could only accommodate a small subset of the uncertain parameters used in this example. In a comparison of adaptive and dual iLQG with dual multi-stage NMPC on the SIDARTHE COVID-19 model but only two uncertain parameters, dual iLQG outperformed adaptive iLQG and dual MS-SP-NMPC by 28% and 66% respectively [19].

Since uncertainty is common to many systems, dual iLQG is applicable to a wide spectrum of applications. In the 2017 review paper “Systems and Control for the future of humanity, research agenda: Current and future roles, impact and grand challenges” by Lamnabhi-Lagarrigue et al. [20], three requirements are listed that “call for a paramount role for data-driven modeling, which must be integrated into virtually all future complex engineering systems.” Dual iLQG addresses two of these three requirements, specifically the need for models to adapt to changing parameters as well as the need for approaches that enable active learning, which is described as “probing the system/environment to generate sensor information that is suitable for model adaptation”. The review paper by Lamnabhi-Lagarrigue et al. mentions a number of high-impact system and control applications for the future, and dual iLQG could be applied to many of them, including automotive control, spacecraft control, renewable energy and smart grid, assistive devices for people with disabilities, and advanced building control. Considering dual iLQG’s ability to handle nonlinear systems and efficiently handle large systems, dual iLQG is in a favourable position to meet these requirements to solve current and future control problems.

References

References is not available for this document.