Scheduled System Maintenance:
Some services will be unavailable Sunday, March 29th through Monday, March 30th. We apologize for the inconvenience.
By Topic

IEEE Quick Preview
  • Abstract

SECTION I

INTRODUCTION

ONE OF THE fundamental principles of feedback theory is that the problems of optimal control and state estimation can be decoupled in certain cases [30]. This is known as the separation principle. The concept was coined early on in [17], [32] and is closely connected to the idea of certainty equivalence; see, e.g., [38]. In studying the literature on the separation principle of stochastic control, one is struck by the level of sophistication and technical complexity. The source of the difficulties can be traced to the circular dependence between control and observations. The goal of this paper is to present a rigorous approach to the separation principle in continuous time which is rooted in the engineering view of systems as maps between signal spaces.

The most basic setting begins with a linear system Formula TeX Source $$\cases{dx=A(t)x(t)dt+B_{1}(t)u(t)dt+B_{2}(t)dw\cr dy=C(t)x(t)dt+D(t)dw}\eqno{\hbox{(1)}}$$ with a state process Formula$x$, an output process Formula$y$ and a control Formula$u$, where Formula$w(t)$ is a vector-valued Wiener process, Formula$x(0)$ is a zero-mean Gaussian random vector independent of Formula$w(t)$, Formula$y(0)=0$, and Formula$A$, Formula$B_{1}$, Formula$B_{2}$, Formula$C$, Formula$D$ are matrix-valued functions of compatible dimensions, which we take to be continuous of bounded variation. Moreover, Formula$DD^{\prime}$ is nonsingular on the interval Formula$[0,T]$, and if we want the noise processes in the state and output equations to be independent, as often is assumed but not required here, we take Formula$B_{2}D^{\prime}\equiv 0$. All random variables and processes are defined over a common complete probability space Formula$(\Omega,{\cal F},\BBP)$.

The control problem is to design an output feedback law Formula TeX Source $$\pi:y\mapsto u\eqno{\hbox{(2)}}$$ over the window Formula$[0,T]$ which maps the observation process Formula$y$ to the control input Formula$u$, in a nonanticipatory manner, so that the value of the functional Formula TeX Source $$\displaylines{J(u)=E\left\{\int_{0}^{T}x(t)^{\prime}Q(t)x(t)dt\right.\hfill\cr\hfill\left.+\int_{0}^{T}u(t)^{\prime}R(t)u(t)dt+x(T)^{\prime}Sx(T)\right\}\quad{\hbox{(3)}}}$$ is minimized, where Formula$Q$ and Formula$R$ are continuous matrix functions of bounded variation, Formula$Q(t)$ is positive semi-definite and Formula$R(t)$ is positive definite for all Formula$t$. How to choose the admissible class of control laws Formula$\pi$ has been the subject of much discussion in the literature [27]. The conclusion, under varying conditions, has been that Formula$\pi$ can be chosen to be linear in the data and, more specifically, in the form Formula TeX Source $$u(t)=K(t){\mathhat x}(t),\eqno{\hbox{(4)}}$$ where Formula${\mathhat x}(t)$ is the Kalman estimate of the state vector Formula$x(t)$ obtained from the Kalman filter Formula TeX Source $$\displaylines{d{\mathhat x}=A(t){\mathhat x}(t)dt+B_{1}(t)u(t)dt\hfill\cr\hfill+L(t)(dy-C(t){\mathhat x}(t)dt),\quad{\mathhat x}(0)=0,\quad{\hbox{(5)}}}$$ and the gains Formula$K$ and Formula$L$ computed by solving to a pair of dual Riccati equations.

A result of this kind is far from obvious, and the early literature was marred by treatments of the separation principle where the non-Gaussian element introduced by an a priori nonlinear control law Formula$\pi$ was overlooked. The subtlety lies in excluding the possibility that a nonlinear controller extracts more information from the data than it is otherwise possible. This point will be explained in detail in Section II, where a brief historical account of the problem will be given. Early expositions of the separation principle often fall in one of two categories: either the subtle issues are overlooked and inadmissible shortcuts are taken; or the treatment is mathematically quite sophisticated and technically very demanding. The short survey in Section II will thus serve the purpose of introducing the theoretical challenges at hand, as well as setting up notation.

In this paper we take the point of view that feedback laws (2) should act on sample paths of the stochastic process Formula$y$ rather than on the process itself. This is motivated by engineering thinking where systems and feedback loops process signals. Thus, our key assumption on admissible control laws (2) is that the resulting feedback loop is deterministically well-posed in the sense that the feedback equations admit a unique solution that causally depends on the input for each input sample path. For this class of control laws we prove that the separation principle stated above holds and moreover that it extends to systems driven by general martingale noise. More precisely, in this non-Gaussian situation the Wiener process Formula$w$ in (1) is replaced by an arbitrary (square-integrable) martingale process with possible jumps such as a Poisson process martingale; see, e.g., [19, p. 87]. Then, we only need to exchange the (linear) Kalman estimate Formula${\mathhat {x}}$ by the strict sense conditional mean Formula TeX Source $${\mathhat {x}}(t)=E\{x(t)\mid{\cal Y}_{t}\},\eqno{\hbox{(6)}}$$ where Formula TeX Source $${\cal Y}_{t}:=\sigma\{y(\tau),\tau\in [0,t]\},\quad 0\leq t\leq T,\eqno{\hbox{(7)}}$$ is the filtration generated by the output process; i.e., the family of increasing sigma fields representing the data as it is produced. The estimate Formula${\mathhat x}$ needs to be defined with care so that it constitutes a sufficiently regular stochastic process and realized by a map acting on observations [2, page 17], [11]. Unfortunately, the results in the present paper come at a cost since our key assumption of well-posedness excludes control laws for which the feedback system fails to be defined sample-wise. Existence of strong solutions of the feedback equations is not enough to ensure well-posedness in our sense as we will discuss below. In addition, the condition of deterministic well-posedness is often difficult to verify. Yet, besides the fact that we prove the separation principle for general martingale noise, the sample-wise viewpoint provides a simple explanation of why the separation principle may hold in the first place.

Before proceeding we recast the system model (1) in an integrated form which allows similar conclusions for more general linear systems in a unified setting. To this end, let Formula$z(t)=\left(\matrix{x(t)\cr y(t)}\right)$. System (1) can now be expressed in the form Formula TeX Source $$\cases{z(t)=z_{0}(t)+\int_{0}^{t}G(t,\tau)u(\tau)d\tau\cr y(t)=Hz(t),}\eqno{\hbox{(8)}}$$ where Formula$z_{0}$ is the process Formula$z$ obtained by setting Formula$u=0$ and Formula$G$ is a Volterra kernel. This integrated form encompasses a considerably wider class of controlled linear systems including delay-differential equations, following [26], [27], which will be taken up in Section VI. The corresponding feedback configuration is shown in Fig. 1 where Formula TeX Source $$g:(t,u)\mapsto\int_{0}^{t}G(t,\tau)u(\tau)d\tau,\eqno{\hbox{(9)}}$$ is a Volterra operator and Formula$H$ is a constant matrix. As usual, Fig. 1 is a graphical representation of the algebraic relationship Formula TeX Source $$z=z_{0}+g\pi H z.\eqno{\hbox{(10)}}$$ For the particular model in (1), Formula$H=[0,I]$, but in general Formula$H$ could be any matrix or linear system. Setting Formula$z:=x$ and Formula$H=I$ we obtain the special case of complete state information.

Figure 1
Fig. 1. A feedback interconnection.

In a stochastic setting, the feedback (10) is said to have a unique strong solution if there exists a non-anticipating function Formula$F$ such that Formula$z=F(z_{0})$ satisfies (10) with probability one and all other solutions coincide with Formula$z$ with probability one. It is important to note that in our sample-wise setting we require more, namely that such a unique solution exists and that (10) holds for all Formula$z_{0}$, not only “almost all.” Consequences of this requirement will be further elaborated upon below.

The outline of the paper is as follows. In Section II we begin by reviewing the standard quadratic regulator problem and pointing out subtleties created by a possible nonlinear control law. We then review several strategies in the literature to establish a separation principle, chiefly restricting the class of admissible controls. Section III defines notions of signals and systems used in our framework, and in Section IV we establish necessary conditions for a feedback loop to make sense and deduce a basic fact about propagation of information in the loop through linear components. It Section V we state and prove our main results on the separation principle for linear-quadric regulator problems, allowing also for more general martingale noise. Finally, in Section VI we prove a separation theorem for delay systems with Gaussian martingale noise.

SECTION II

HISTORICAL REMARKS

A common approach to establishing the basic separation principle stated at the beginning of Section I is a completion-of-squares argument similar to the one used in deterministic linear-quadratic-regulator theory; see e.g., [1]. For ease of reference, we briefly review this construction. Given the system (1) and the solution of the matrix Riccati equation Formula TeX Source $$\cases{{\mathdot{P}}=-A^{\prime}P-PA+PB_{1}R^{-1}B_{1}^{\prime}P-Q,\cr P(T)=S}.\eqno{\hbox{(11a)}}$$ Itô's differential rule (see, e.g., [19], [31]) yields Formula TeX Source $$d(x^{\prime}Px)=x^{\prime}{\mathdot{P}}xdt+2x^{\prime}Pdx+{\rm tr}(B_{2}^{\prime}PB_{2})dt,$$ where Formula${\rm tr}(M)$ denotes the trace of the matrix Formula$M$. Then from (1) and (11a) it readily follows that Formula TeX Source $$\displaylines{d(x^{\prime}Px)=[-x^{\prime}Qx-u^{\prime}Ru+(u-Kx)^{\prime}R(u-Kx)]dt\hfill\cr\hfill+{\rm tr}(B_{2}^{\prime}PB_{2})dt+2x^{\prime}PB_{2}dw,}$$ where Formula TeX Source $$K(t):=-R(t)^{-1}B_{1}(t)^{\prime}P(t).\eqno{\hbox{(11b)}}$$ Integrating this from 0 to Formula$T$ and taking mathematical expectation, we obtain the following expression for the cost functional (3): Formula TeX Source $$\displaylines{J(u)=E\left\{\int_{0}^{T}(u-Kx)^{\prime}R(u-Kx)dt\right\}\hfill\cr\hfill+E\left\{x(0)^{\prime}P(0)x(0)\right\}+\int_{0}^{T}{\rm tr}(B_{2}^{\prime}PB_{2})dt.\quad{\hbox{(12)}}}$$ To ensure that Formula$\int_{0}^{T}x^{\prime}PB_{2}dw$ has zero expectation, we need to check that the integrand is square integrable. It is clear that Formula$u$ is square integrable for otherwise Formula$J(u)=\infty$. Then the state process Formula TeX Source $$x(t)=x_{0}(t)+\int_{0}^{t}\Phi (t,s)B_{1}(s)u(s)ds\eqno{\hbox{(13)}}$$ is square integrable as well. Here Formula$x_{0}$ is the (square integrable) state process corresponding to Formula$u=0$, and Formula$\Phi$ is the transition matrix function of the system (1).

Now, if we had complete state information with (1) replaced by Formula TeX Source $$\cases{dx=A(t)x(t)dt+B_{1}(t)u(t)dt+B_{2}(t)dw\cr y=x}\eqno{\hbox{(14)}}$$ we could immediately conclude that the feedback law Formula TeX Source $$u(t)=K(t)x(t)\eqno{\hbox{(15)}}$$ is optimal, because the last term in (12) does not depend on the control. However, when we have incomplete state information with the control being a function of the observed process Formula$\{y(s); 0\leq s\leq t\}$, things become more complicated. Mathematically we formalize this by having any control process adapted to the filtration (7); i.e., having Formula$u(t) {\cal Y}_{t}$-measurable for each Formula$t\in [0,T]$. Then, with Formula${\mathhat {x}}$ given by (6), setting Formula TeX Source $${\mathtilde{x}}(t):=x(t)-{\mathhat {x}}(t),\eqno{\hbox{(16)}}$$ we have Formula$E\{[u(t)-K(t){\mathhat {x}}(t)]{\mathtilde{x}}(t)^{\prime}\}=0$, and therefore Formula TeX Source $$\displaylines{E\int_{0}^{T}(u-Kx)^{\prime}R(u-Kx)dt\hfill\cr\hfill=E\int_{0}^{T}[(u-K{\mathhat {x}})^{\prime}R(u-K{\mathhat {x}})+{\rm tr}(K^{\prime}RK\Sigma)]dt,\quad{\hbox{(17)}}}$$ where Formula$\Sigma$ is the error covariance matrix function Formula TeX Source $$\Sigma (t):=E\{{\mathtilde{x}}(t){\mathtilde{x}}(t)^{\prime}\}.\eqno{\hbox{(18)}}$$ A common mistake in the early literature on the separation principle is to assume without further investigation that Formula$\Sigma$ does not depend on the choice of control. Indeed, if this were the case, it would follow directly that (12) is minimized by choosing the control as (4), and the proof of the separation principle would be immediate. (Of course, in the end this will be the case under suitable conditions, but this has to be proven.) This mistake probably originates from the observation that the control term in (13) cancels when forming (16) so that Formula TeX Source $${\mathtilde{x}}(t)={\mathtilde{x}}_{0}(t):=x_{0}(t)-{\mathhat {x}}_{0}(t),\eqno{\hbox{(19)}}$$ where Formula TeX Source $${\mathhat {x}}_{0}(t):=E\{x_{0}(t)\mid{\cal Y}_{t}\}.\eqno{\hbox{(20)}}$$ However, in this analysis, we have not ruled out that Formula${\mathhat {x}}_{0}$ depends on the control or, what would follow from this, that the filtration (7) does. A detailed discussion of this conundrum can be found in [27]. In fact, since the control process Formula$u$ is in general a nonlinear function of the data and thus non-Gaussian, then so is the output process Formula$y$.1 Consequently, the conditional expectation (20) might not in general coincide with the wide sense conditional expectation obtained by projections of the components of Formula$x_{0}(t)$ onto the closed linear span of the components of Formula$\{y(\tau),\tau\in [0,t]\}$, and therefore, a priori, it could happen that Formula${\mathhat {x}}$ is not generated by the Kalman filter (5).

To avoid these problems one might begin by uncoupling the feedback loop as in Fig. 2, and determine an optimal control process in the class of stochastic processes Formula$u$ that are adapted to the family of sigma fields Formula TeX Source $${\cal Y}_{t}^{0}:=\sigma\{y_{0}(\tau),\tau\in [0,t]\},\quad 0\leq t\leq T,\eqno{\hbox{(21)}}$$ i.e., for each Formula$t\in [0,T]$, Formula$u(t)$ is a function of Formula$\{y_{0}(s),\, 0\leq s\leq t\}$. This problem, where one optimizes over the class of all control processes adapted to a fixed filtration, was called a stochastic open loop (SOL) problem in [27]. It is not uncommon in the literature to assume from the outset that the control is adapted to Formula$\{{\cal Y}_{t}^{0}\}$; see, e.g., [6, Section 2.3], [16], [40].

Figure 2
Fig. 2. A stochastic open loop (SOL) configuration.

In [27] it was suggested how to embed the class of admissible controls in various SOL classes in a problem-dependent manner, and then construct the corresponding feedback law. More precisely, in the present context, the class of admissible feedback laws was taken to consist of the nonanticipatory functions Formula$u:=\pi (y)$ such that the feedback loop Formula TeX Source $$z=z_{0}+g\pi Hz\eqno{\hbox{(22)}}$$ has a unique solution Formula$z_{\pi}$ and Formula$u=\pi (Hz_{\pi})$ is adapted to Formula$\{{\cal Y}_{t}^{0}\}$. Next, we shall give a few examples of specific classes of feedback laws that belong to this general class.

Example 1

It is common to restrict the admissible class of control laws to contain only linear ones; see, e.g., [12]. In a more general direction, let Formula${\cal L}$ be the class Formula TeX Source $$({\cal L})\quad u(t)=\bar{u}_{0}(t)+\int_{0}^{t}F(t,\tau)dy,\eqno{\hbox{(23)}}$$ where Formula$\bar{u}$ is a deterministic function and Formula$F$ is an Formula$L_{2}$ kernel. In this way, the Gaussian property will be preserved, and Formula${\mathhat {x}}$ will be generated by the Kalman filter (5). Then it follows from (1) and (5) that Formula${\mathtilde{x}}$ is generated by Formula TeX Source $$d{\mathtilde{x}}=(A-LC){\mathtilde{x}}dt+(B_{2}-LD)dw,\quad{\mathtilde{x}}(0)=x(0),$$ which is clearly independent of the choice of control. Then so is the error covariance (18), as desired. Even in the more general setting described by (8), it was shown in [26, pp. 95–96] that Formula TeX Source $${\cal Y}_{t}={\cal Y}_{t}^{0},\quad t\in [0,T],\eqno{\hbox{(24)}}$$ for any Formula$\pi\in{\cal L}$, where (21) is the filtration generated by the uncontrolled output process Formula$y_{0}$ obtained by setting Formula$u=0$ in (8).

Example 2

In his influential paper [41], Wonham proposed the class of control laws Formula TeX Source $$u(t)=\psi (t,{\mathhat {x}}(t))\eqno{\hbox{(25)}}$$ in terms of the state estimate (6), where Formula$\psi (t,x)$ is Lipschitz continuous in Formula$x$. For pedagogical reasons, we first highlight a somewhat more restrictive construction due to Kushner [21]. Let Formula TeX Source $${\mathhat {\xi}}_{0}(t):=E\{x_{0}(t)\mid{\cal Y}_{t}^{0}\}$$ be the Kalman state estimate of the uncontrolled system Formula TeX Source $$\cases{dx_{0}=A(t)x_{0}(t)dt+B_{2}(t)dw\cr dy_{0}=C(t)x_{0}(t)dt+D(t)dw}.\eqno{\hbox{(26)}}$$ Here we use the notation Formula${\mathhat {\xi}}_{0}$ to distinguish it from Formula${\mathhat {x}}_{0}$, defined by (20), which a priori might depend on the control. Then the Kalman filter takes the form Formula TeX Source $$d{\mathhat {\xi}}_{0}=A{\mathhat {\xi}}_{0}(t)dt+L(t)dv_{0},\;{\mathhat {\xi}}_{0}(0)=0$$ where the innovation process Formula TeX Source $$dv_{0}=dy_{0}-C{\mathhat {\xi}}_{0}(t)dt,\; v_{0}(0)=0$$ generates the same filtration, Formula$\{{\cal V}_{t}^{0}\}$, as Formula$y_{0}$; i.e., Formula${\cal V}_{t}^{0}={\cal Y}_{t}^{0}$ for Formula$t\in [0,T]$. This is well-known, but a simple proof is given in Section VI in a more general setting; see (64). Now, along the lines of (13), define Formula TeX Source $${\mathhat {\xi}}(t)={\mathhat {\xi}}_{0}(t)+\int_{0}^{t}\Phi (t,s)B_{1}(s)u(s)ds,$$ where the control is chosen as Formula TeX Source $$u(t)=\psi (t,{\mathhat {\xi}}(t)).\eqno{\hbox{(27)}}$$ Since Formula$\psi$ is Lipschitz, Formula${\mathhat {\xi}}$ is the unique strong solution of the stochastic differential equation Formula TeX Source $$d{\mathhat {\xi}}=\big (A{\mathhat {\xi}}+B_{1}\psi (t,{\mathhat {\xi}})\big)dt+Ldv_{0},\;{\mathhat {\xi}}(0)=0,\eqno{\hbox{(28)}}$$ and it is thus adapted to Formula$\{{\cal V}_{t}^{0}\}$ and hence to Formula$\{{\cal Y}_{t}^{0}\}$; see, e.g., [19, p. 120]. Hence the selection (27) of control law forces Formula$u$ to be adapted to Formula$\{{\cal Y}_{t}^{0}\}$, and hence, due to Formula TeX Source $$dy=dy_{0}+\int_{0}^{t}C(t)\Phi (t,s)B_{1}(s)u(s)dsdt,\eqno{\hbox{(29)}}$$ obtained from (13), Formula${\cal Y}_{t}\subset{\cal Y}_{t}^{0}$ for Formula$t\in [0,T]$. However, since the control-dependent terms cancel, Formula TeX Source $$dv_{0}=dy_{0}-C{\mathhat {\xi}}_{0}(t)dt=dy-C{\mathhat {\xi}}(t)dt,$$ which inserted into (28) yields a stochastic differential equation, obeying the appropriate Lipschitz condition, driven by Formula$dy$ and having Formula${\mathhat {\xi}}$ as a strong solution. Therefore, Formula${\mathhat {\xi}}$ is adapted to Formula$\{{\cal Y}_{t}\}$, and hence, by (27), so is Formula$u$. Consequently, (29) implies that Formula${\cal Y}_{t}^{0}\subset{\cal Y}_{t}$ for Formula$t\in [0,T]$ so that actually (24) holds. Finally, this implies that Formula${\mathhat {\xi}}={\mathhat {x}}$, and thus Formula$u$ is given by (25). However, it should be noted that the class of control laws (27) is a subclass of (25) as it has been constructed to make Formula$u$ a priori adapted to Formula$\{{\cal Y}_{t}^{0}\}$. Therefore, the relevance of these results, presented in [21], for the proof in [22, page 348] is unclear. In their popular textbook [20], widely used as a reference source for the validity of the separation principle over a general class of admissible (including nonlinear) controls, Kwakernaak and Sivan prove the separation principle over a class of linear laws but claim with reference to [21], [22] that it holds “without qualification” in general [20, p. 390]. (However, see Remark 6 below.)

In his pioneering paper [41] Wonham proved the separation theorem for controls in the class (25) even with a more general cost functional than (3). However, the proof is far from simple and marred by many technical assumptions. A case in point is the assumption that Formula$C(t)$ is square and has a determinant bounded away from zero, which is a serious restriction. A later proof by Fleming and Rishel [15] is considerably simpler. They also prove the separation theorem with quadratic cost functional (3) for a class of Lipschitz continuous feedback laws, namely Formula TeX Source $$u(t)=\phi (t,y),\eqno{\hbox{(30)}}$$ where Formula$\phi:\, [0,T]\times C^{n}[0,T]\to{\BBR}^{m}$ is a nonanticipatory function of Formula$y$ which is Lipschitz continuous in this argument.

Example 3

It is interesting to note that if there is a delay in the processing of the observed data so that, for each Formula$t$, Formula$u(t)$ is a function of Formula$y(\tau)$; Formula$0\leq\tau\leq t-\varepsilon$, then (24) holds. To see this, let Formula$n$ be a positive integer, and suppose that Formula${\cal Y}_{t}={\cal Y}_{t}^{0}$ for Formula$t\in [0,n\varepsilon]$. Since Formula$u(t)$ is Formula${\cal Y}_{t-\varepsilon}$-measurable on Formula$[0,(n+1)\varepsilon]$, it is at the same time Formula${\cal Y}_{t-\varepsilon}$ as well as Formula${\cal Y}_{t-\varepsilon}^{0}$-measurable. Then, since Formula TeX Source $$y(t)=y_{0}(t)+\int_{0}^{t}HG(t,s)u(s)ds,$$ it follows that Formula${\cal Y}_{t}={\cal Y}_{t}^{0}$ for Formula$t\in [0,(n+1)\varepsilon]$. Since Formula${\cal Y}_{t}={\cal Y}_{t}^{0}$ for Formula$t\in [0,\varepsilon]$, (24) follows by induction.

Remark 4

Example 3 highlights the reason why the problem with possibly control-dependent sigma fields does not occur in the usual discrete-time formulation. Indeed, in this setting, the error covariance (18) will not depend on the control, while, as we have mentioned, some more analysis is needed to rule out that its continuous-time counterpart does. This invalidates a procedure used in several textbooks (see, e.g., [36]) in which the continuous-time Formula$\Sigma$ is constructed as the limit of finite difference quotients of the discrete-time Formula$\Sigma$, which, as we have seen in Example 3, does not depend on the control, and which simply is the solution of a discrete-time matrix Riccati equation. However, we cannot a priori conclude that continuous-time Formula$\Sigma$ satisfies this Riccati equation. For this we need (24), or alternatively arguments such as in Remark 6. Otherwise the argument is circular.

Remark 5

Historically, a popular approach was introduced in Duncan and Varaiya [14] and Davis and Varaiya [13] (see also [6, Section 2.4]) based on weak solutions of the relevant stochastic differential equation. In their analysis the driving noise is a Wiener process. The key element of their approach is to start with an uncontrolled system and, through a change of probability measure, correspond its solutions to those of a new system with a suitably defined control input and noise process. This control input, together with the conformably altered input process, leaves the filtration of the observation process unaffected, thereby bypassing the central issue dealt with in the current paper. Briefly, starting from a Wiener process Formula${\mathtilde{w}}$ of an uncontrolled system with an output process Formula$y$ and any process Formula$u$ adapted to Formula$\{{\cal Y}_{t}\}$, by a suitable change of probability measure (that depends on Formula$u$), Formula TeX Source $$dw=d{\mathtilde{w}}-B_{1}udt$$ can be transformed, using the Girsanov transformation, into a new Wiener process, which in the sense of weak solutions [19] is the same as any other Wiener process. Replacing Formula$d{\mathtilde{w}}$ in the original uncontrolled system by Formula$B_{1}udt+dw$ leaves the filtration Formula$\{{\cal Y}_{t}\}$ unaffected.

Remark 6

Yet another approach to the separation principle is based on the fact that, although (1) with a nonlinear control is non-Gaussian, the model is conditionally Gaussian given the filtration Formula$\{{\cal Y}_{t}\}$ [29, Chapters 16.1]. This fact can be used to show that Formula${\mathhat {x}}$ is actually generated by a Kalman filter [29, Chapters 11 and 12]. This last approach requires quite a sophisticated analysis and is restricted to the case where the driving noise Formula$w$ is a Wiener process.

A key point for establishing the separation priniciple is to identify admissible control laws for which (24) holds. For each such control law Formula$\pi$ we need a solution of the feedback (10), i.e., a pair Formula$(z_{0},z)$ of stochastic processes that satisfies Formula TeX Source $$z=z_{0}+g\pi H z.\eqno{\hbox{(31)}}$$ Since Formula$z_{0}$ is the driving process, it is natural to seek a solution Formula$z$ which causally depends on Formula$z_{0}$ and is unique. If this is the case then Formula$z$ is a strong solution; otherwise it is a weak solution. There are well-known examples of stochastic differential equations that have only weak solutions [19, page 137], [5], [37]. Moreover, as we have mentioned in Remark 5, weak solutions circumvent the need to establish the equivalence (24) between filtrations. Thus, it has been suggested that the framework of weak solutions is the appropriate one for control problems [34, page 149]. Yet, from an applications point of view, where the control needs to be causally dependent on observed data, this is in our view questionable. In fact, there are control laws for which (31) only admits a weak solution and (24) does not hold (Remark 12). In the present paper we take an even more stringent view on the causal dependence. We require that (31) has a unique strong solution which in addition specifies a measurable map Formula$z_{0}\to z$ between sample-paths for every sample path of Formula$z_{0}$ (cf. [19, Remark 5.2, p. 128], [34, p. 122]), thus modeling correspondence between signals—we further elaborate upon this in Section IV.

In short, we only allow control laws which are physically realizable in an engineering sense, in that they induce a signal that travels through the feedback loop. This comes at a price since there are stochastic differential equations having strong solutions that do not fall in this category (Remark 12). Moreover, verifying that a control law is admissible in our sense may be difficult to ascertain in general. On the other hand, an advantage of the approach is that the class of control laws includes discontinuous ones and allows for statements about linear systems driven by non-Gaussian noise with possible jumps. We now proceed to develop the approach and the key property of deterministic well-posedness.

SECTION III

SIGNALS AND SYSTEMS

Signals are thought of as sample paths of a stochastic process with possible discontinuities. This is quite natural from several points of view. First, it encompasses the response of a typical nonlinear operation that involves thresholding and switching, and second, it includes sample paths of counting processes and other martingales. More specifically we consider signals to belong to the Skorohod space Formula$D$; this is defined as the space of functions which are continuous on the right and have a left limit at all points, i.e., the space of càdlàg functions.2 It contains the space Formula$C$ of continuous functions as a proper subspace. The notation Formula$D[0,T]$ or Formula$C[0,T]$ emphasizes the time interval where signals are being considered.

Traditionally, the comparison of two continuous functions in the uniform topology relates to how much their graphs need to be perturbed so as to be carried onto one another by changing only the ordinates, with the time-abscissa being kept fixed. However, in order to metrize Formula$D$ in a natural manner one must recognize the effect of uncertainty in measuring time and allow a respective deformation of the time axis as well. To this end, let Formula${\cal K}$ denote the class of strictly increasing, continuous mappings of Formula$[0,T]$ onto itself and let Formula$I$ denote the identity map. Then, for Formula$x$, Formula$y\in D[0,T]$, Formula TeX Source $$d(x,y):=\inf_{\kappa\in{\cal K}}\max\{\Vert\kappa-I\Vert,\Vert x-y\kappa\Vert\}$$ defines a metric on Formula$D[0,T]$ which induces the so called Skorohod topology. A further refinement so as to ensure bounds on the slopes of the chords of Formula$\kappa$, renders Formula$D[0,T]$ separable and complete, that is, Formula$D[0,T]$ is a Polish space; see [7, Theorem 12.2].

Systems are thought of as general measurable nonanticipatory maps from Formula$D\to D$ sending sample paths to sample paths so that their outputs at any given time Formula$t$ is a measurable function of past values of the input and of time. More precisely, let Formula TeX Source $$\Pi_{\tau}: x\mapsto\Pi_{\tau}x:=\cases{x(t)& for $t<\tau$\cr x(\tau)& for $t\geq\tau$.}$$ Then, a measurable map Formula$f:\, D[0,T]\to D[0,T]$ is said to be a system if and only if Formula TeX Source $$\Pi_{\tau}f\;\Pi_{\tau}=\Pi_{\tau}f\quad{\rm for all} \tau\in [0,T].$$ An important class of systems is provided by stochastic differential equations with Lipschitz coefficients driven by a Wiener process [34, Theorem 13.1]. These have pathwise unique strong solutions and induce maps between corresponding path spaces [34, page 127], [19, pages 126–128]. Also, under fairly general conditions (see e.g., [33, Chapter V]), stochastic differential equations driven by martingales with sample paths in Formula$D$ have strong solutions who are semi-martingales.

Besides stochastic differential equations in general, and those in (8) in particular, other nonlinear maps may serve as systems. For instance, discontinuous hystereses nonlinearities as well as non-Lipschitz static maps such as Formula$u\mapsto y:=\sqrt{\vert u\vert}$, are reasonable as systems, from an engineering viewpoint. Indeed, these induce maps from Formula$D\to D$ (or from Formula$C\to D$, as in the case of relay hysteresis), are seen to be systems according to our definition,3 and can be considered as components of nonlinear feedback laws. We note that a nonlinearity such as Formula$u\mapsto y={\rm sign}(u)$ is not a system in the sense of our definition since the output is not in general in Formula$D$. Such nonlinearities, which often appear in bang-bang control, need to be approximated with a physically realizable hysteretic system.

SECTION IV

WELL-POSEDNESS AND A KEY LEMMA

It is straightforward to construct examples of deterministically well-posed feedback interconnections with elements as above. However, the situation is a bit more delicate when considering feedback loops since it is also perfectly possible that, at least mathematically, they give rise to unrealistic behavior. A standard example is that of a feedback loop with causal components that “implements” a perfect predictor. Indeed, consider a system Formula$f$ which superimposes its input with a delayed version of it, i.e., Formula TeX Source $$f: z(t)\mapsto z(t)+z(t-t_{\rm delay}),$$ for Formula$t\geq 0$, and assume initial conditions Formula$z(t)=0$ for Formula$t<0$. Then the feedback interconnection of Fig. 3 is unrealistic as it behaves as a perfect predictor. The feedback equation Formula TeX Source $$z(t)=z_{0}(t)+f(z(t))=z_{0}(t)+z(t)+z(t-t_{\rm delay})$$ gives rise to Formula$0=z_{0}(t)+z(t-t_{\rm delay})$, and hence, Formula TeX Source $$z(t)=-z_{0}(t+t_{\rm delay}).$$ Therefore, the output process Formula$z$ is not causally dependent on the input. The question of well-posedness of feedback systems has been studied from different angles for over forty years. See for instance the monograph by Jan Willems [39].

Figure 3
Fig. 3. Basic feedback system.

In our present setting of stochastic control we need a concept of well-posedness which ensures that signals inside a feedback loop are causally dependent on external inputs. This is a natural assumption from a systems point of view.

Definition 7

A feedback system is deterministically well-posed if the closed-loop maps are themselves systems; i.e., the feedback equation Formula$z=z_{0}+f(z)$ has a unique solution Formula$z\in D$ for all inputs Formula$z_{0}\in D$ and the operator Formula$(1-f)^{-1}$ is itself a system.

Thus, now thinking about Formula$z_{0}$ and Formula$z$ in the feedback system in Fig. 3 as stochastic processes, deterministic well-posedness implies that Formula${\cal Z}_{t}\subset{\cal Z}^{0}_{t}$ for Formula$t\in [0,T]$, where Formula${\cal Z}_{t}$ and Formula${\cal Z}^{0}_{t}$ are the sigma-fields generated by Formula$z$ and Formula$z_{0}$, respectively. This is a consequence of the fact that Formula$(1-f)^{-1}$ is a system. Likewise, since Formula$(1-f)$ is also a system, Formula${\cal Z}_{t}^{0}\subset{\cal Z}_{t}$ so that in fact Formula TeX Source $${\cal Z}_{t}^{0}={\cal Z}_{t},\quad t\in [0,T].\eqno{\hbox{(32)}}$$

Next, we consider the situation in Fig. 1 and the relation between Formula${\cal Y}_{t}$ and the filtration Formula${\cal Y}^{0}_{t}$ of the process Formula$y_{0}=Hz_{0}$. The latter represents the “uncontrolled” output process where the control law Formula$\pi$ is taken to be identically zero. A key technical lemma for what follows states that the filtrations Formula${\cal Y}_{t}$ and Formula${\cal Y}^{0}_{t}$ are also identical if the feedback system is deterministically well-posed. This is not obvious at first sight, solely on the basis of the linear relationships Formula$y=Hz$ and Formula$y_{0}=Hz_{0}$, as the following simple example demonstrates: the two vector processes Formula${w\choose 0}$ and Formula${0\choose w}$ generate the same filtrations while Formula$(1\; 0){w\choose 0}$ and Formula$(1\; 0){0\choose w}$ do not.

Lemma 8

If the feedback interconnection in Fig. 1 is deterministically well-posed, Formula$g\pi$ is a system, and Formula$H$ is a linear system having a right inverse Formula$H^{-R}$ that is also a system, then Formula$(1-Hg\pi)^{-1}$ is a system and Formula${\cal Y}_{t}={\cal Y}^{0}_{t}$, Formula$t\in [0,T]$.

Remark 9

Note that, for the prototype problem involving (1), the conditions on Formula$H$ in Lemma 8 are trivial as Formula$H=[0,\, I]$ and hence Formula$H^{-R}:=H^{\prime}$ is a right inverse. The requirement in the lemma that Formula$g\pi$ is a system allows for a more general situation where Formula$\pi$ is not itself a system (e.g., generating outputs not in D), but where the cascade connection is still admissible.

Proof

By well-posedness Formula$(1-g\pi H)^{-1}$ is a system. To show that Formula$(1-Hg\pi)^{-1}$ exists and is a system, first note that Formula TeX Source $$(1-Hg\pi)H=H-Hg\pi H=H (1-g\pi H).\eqno{\hbox{(33)}}$$ The first step is using left distributivity and the second is using the fact that Formula$H$ is linear. But then Formula TeX Source $$(1-Hg\pi)\underbrace{H(1-g\pi H)^{-1}H^{-R}}_{h}=I,\eqno{\hbox{(34)}}$$ where Formula$HH^{-R}=I$. Thus, Formula$h$ is a “right inverse” of Formula$p:=(1-Hg\pi)$ in that the composition Formula$p\circ h$ of the two maps is the identity. We claim that Formula$h$ is in fact the inverse of Formula$p$ (which is necessarily unique) in that Formula$y=h(y_{0})$ and Formula TeX Source $$(1-Hg\pi)y=y_{0}\eqno{\hbox{(35)}}$$ establish a bijective correspondence between Formula$y$ and Formula$y_{0}$, i.e., that both Formula$p\circ h$ as well as Formula$h\circ p$ are identity maps. We need to show the latter. The only potential problem would be if two distinct values Formula$y$ and Formula${\mathhat y}$ satisfy (35) for the same value for Formula$y_{0}$. We now show that this is not possible.

Since Formula$H$ is right invertible, Formula$y_{0}$ can be written in the form Formula$y_{0}=Hz_{0}$ for Formula$z_{0}=H^{- R}y_{0}$. Let Formula$z=(1-g\pi H)^{-1}z_{0}$ and Formula$y=Hz$. Then Formula$y=h(y_{0})$, so by (34) Formula$y$ is a particular solution of (35). Now let Formula${\mathhat y}$ be another solution, i.e., suppose that Formula TeX Source $$(1-Hg\pi){\mathhat y}=y_{0}\eqno{\hbox{(36)}}$$ and that Formula${\mathhat y}\ne y$. We begin by writing Formula${\mathhat y}$ in the form Formula${\mathhat y}=H{\mathhat z}$, which can always be done since Formula$H$ is right invertible. Next we set Formula${\mathhat z}_{0}:=(1-g\pi H){\mathhat z}$. Then, by well-posedness, Formula${\mathhat {z}}$ is the unique solution of Formula TeX Source $${\mathhat z}={\mathhat z}_{0}+g\pi H({\mathhat z}).\eqno{\hbox{(37)}}$$ Moreover, by (33) and (36), Formula$H{\mathhat z}_{0}=y_{0}$, and consequently Formula${\mathhat z}_{0}=z_{0}+v$ with Formula$Hv=0$. We now claim that Formula${\mathhat z}=z+v$ which would then contradict the assumption that Formula${\mathhat y}\ne y$. To show this, note that, since Formula$z=z_{0}+g\pi H z$ and Formula$H$ is linear, Formula TeX Source $$z+v=z_{0}+v+g\pi H(z+v).$$ But the solution to (37) is unique by well-posedness. Hence, Formula${\mathhat z}=z+v$ which proves our claim.

Therefore, finally, Formula$(1-Hg\pi)$ is invertible and Formula TeX Source $$(1-Hg\pi)^{-1}=h=H(1-g\pi H)^{-1}H^{-R}$$ is itself is a system, being a composition of systems. Thus, the configuration in Fig. 4 is deterministically well-posed. Using (33) once again, Formula TeX Source $$H(1-g\pi H)^{-1}=(1-Hg\pi)^{-1}H.\eqno{\hbox{(38)}}$$ It now follows that Formula TeX Source $$\eqalignno{y=&\, H(1-g\pi H)^{-1}z_{0}=(1-Hg\pi)^{-1}Hz_{0}\cr=&\,(1-Hg\pi)^{-1}y_{0},&{\hbox{(39)}}}$$ while also (35) holds. Equation (39) shows that Formula${\cal Y}_{t}\subset{\cal Y}^{0}_{t}$, whereas (35) shows that Formula${\cal Y}^{0}_{t}\subset{\cal Y}_{t}$. Formula$\blackboxfill$

Figure 4
Fig. 4. An equivalent feedback configuration.

The essence of the lemma4 is to underscore the equivalence between the configuration in Fig. 1 and that in Fig. 4. It is this equivalence which accounts for the identity Formula${\cal Y}_{t}={\cal Y}^{0}_{t}$ between the respective Formula$\sigma$-algebras. An analogous notion of well-posedness was considered by Willems in [40] where however, in contrast, the well-posedness of the feedback configuration in Fig. 4, and consequently the validity of Formula${\cal Y}_{t}={\cal Y}^{0}_{t}$, is assumed at the outset.

In the present paper we consider only feedback laws that render the feedback system deterministically well-posed. Therefore we highlight the conditions in a formal definition.

Definition 10

A feedback law Formula$\pi$ is deterministically well-posed for the system (8) if Formula$g\pi$ is a system and the feedback loop of Fig. 1 is deterministically well-posed.

If the feedback law Formula$\pi$ is deterministically well-posed, then, by Lemma 8, the feedback loop in Fig. 4 is also deterministically well-posed. Thus, in essence, given the assumption that Formula$z=z_{0}+g\pi H z$ can be uniquely and causally solved for every input sample path, so can Formula$y=y_{0}+Hg\pi y$.

Remark 11

For pedagogical reasons, we consider the case of complete state information, corresponding to (14). This corresponds to taking Formula$H=I$ and Formula$z=x$, and the basic feedback loop is as depicted in Fig. 5. Then the basic condition (32) implied by well-posedness states that the filtration Formula$\{{\cal X}_{t}\}$, where Formula${\cal X}_{t}:=\sigma\{x(s);\; s\in [0,T]\}$, is constant under variations of the control. Consequently, we do not need Lemma 8 to resolve an issue of circular control dependence. This is completely consistent with the analysis leading up to (15) in Section II.

Figure 5
Fig. 5. Feedback loop for complete state information.

Remark 12

We now present two examples of feedback systems which fail to be deterministically well-posed. Consider the system Formula TeX Source $$\cases{dx=udt+dw\cr y=x}$$ where Formula$w$ is a Wiener process, i.e., Formula$w=x_{0}$ in Fig. 5. First take the control law Formula$\pi$ to be the Tsirel'son functional Formula$u(t)=b(t,x)$ in [34, p. 156]. Then the solution of the feedback equation can only be defined in the weak sense and, remarkably, Formula${\cal Y}_{t}^{(0)}$ is strictly contained in Formula${\cal Y}_{t}$ for Formula$t>0$ (see, e.g., [34, Theorem (18.3)]). For a different example5, take the control law Formula$u=\pi (y)$ with Formula$\pi (y)=\max\{\vert x\vert^{2/3},1\}$. This is not deterministically well-posed although the stochastic differential equation Formula TeX Source $$dx=\pi (x)dt+dw$$ has a unique strong solution [18, Chapter 5, Proposition 5.17] in the sense that any other solution has same sample paths with probability one (indistinguishable). The failure to be deterministically well-posed can be traced to the fact that this control law allows for multiple consistent responses for Formula$w\equiv 0$, a physically questionable situation. Indeed, the ordinary differential equation Formula${\mathdot x}=\pi (x)$ is not Lipschitz and has infinitely many solutions.

SECTION V

SEPARATION PRINCIPLE

Our first result is a very general separation theorem for the classical stochastic control problem stated at the beginning of Section I.

Theorem 13

Given the system (1), consider the problem of minimizing the functional (3) over the class of all feedback laws Formula$\pi$ that are deterministically well-posed for (1). Then the unique optimal control law is given by (4), where Formula$K$ is defined by (11), and Formula${\mathhat {x}}$ is given by the Kalman filter (5).

Proof

By Lemma 8, (18) does not depend on the control. Therefore, given the analysis at the beginning of Section II, (4) is the unique optimal control provided it defines a deterministically well-posed control law. It remains to show this.

Inserting (4) into (5) yields Formula TeX Source $${\mathhat {x}}(t)=\int_{0}^{t}\Psi (t,s)L(s)dy(s),$$ where the transition matrix Formula$\Psi (t,s)$ of Formula$[A(t)+B_{1}(t)K(t)-L(t)C(t)]$ has partial derivatives in both arguments. Together with (4) this yields Formula TeX Source $$u(t)=(\pi_{\rm opt}y)(t):=\int_{0}^{t}M(t,s)dy(s),\eqno{\hbox{(40)}}$$ where Formula$M(t,s):=K(t)\Psi (t,s)L(s)$. Clearly Formula$s\mapsto M(t,s)$ has bounded variation for each Formula$t\in [0,T]$, and therefore integration by parts yields Formula TeX Source $$(\pi_{\rm opt}y)(t)=M(t,t)y(t)-\int_{0}^{t}d_{s}M(t,s)y(s)ds,\eqno{\hbox{(41)}}$$ which is defined samplewise. Now inserting Formula$u=\pi_{\rm opt}Hz$ into (9) and (10) we obtain Formula TeX Source $$z=z_{0}+g\pi_{\rm opt}Hz,\eqno{\hbox{(42)}}$$ where Formula$g\pi_{\rm opt}Hz$ takes the form Formula TeX Source $$(g\pi_{\rm opt}Hz)(t)=\int_{0}^{t}N(t,s)dz(s)$$ with the kernel Formula$N$ given by Formula TeX Source $$N(t,s)=\int_{s}^{t}G(t,\tau)M(\tau,s)Hd\tau,$$ where Formula$G$ is the kernel of the Volterra operator (9). A simple calulation yields Formula TeX Source $${{\partial G}\over{\partial s}}(t,s)=\left[\matrix{A(t)\cr C(t)}\right]\Phi (t,s)B_{1}(s),$$ where Formula$\Phi (t,s)$ is the transition matrix of Formula$A$, and therefore Formula$Q(t,s):=(\partial N/\partial s)(t,s)$ is a continuous Volterra kernel, and so is the unique solution Formula$R$ of the resolvent equation Formula TeX Source $$R(t,s)=\int_{s}^{t}R(t,\tau)Q(\tau,s)d\tau+Q(t,s)\eqno{\hbox{(43)}}$$ [35], [42]. From (42) we have Formula TeX Source $$dz=dz_{0}+\int_{0}^{t}Q(t,s)dz(s)dt$$ from which it follows that Formula TeX Source $$\int_{0}^{t}Q(t,s)dz(s)=\int_{0}^{t}R(t,s)dz_{0}(s).$$ Hence Formula$(1-g\pi_{\rm opt}H)$ has a unique preimage given by Formula TeX Source $$[(1-g\pi_{\rm opt}H)^{-1}z](t)=z_{0}(t)+\int_{0}^{t}\int_{\tau}^{t}R(t,s)dsdz_{0}(\tau),$$ which is clearly a system. Hence the feedback loop is deterministically well-posed. Formula$\blackboxfill$

Consequently, for a system driven by a Wiener process with Gaussian initial condition, the linear control law defined by (4) and (5) is optimal in the class of all linear and nonlinear control laws for which the feedback system is deterministically well-posed.

If we forsake the requirement that Formula${\mathhat {x}}$ is given by the Kalman filter (5), we can now allow Formula$x_{0}$ to be non-Gaussian and Formula$w$ to be a square-integrable martingale, even allowing jumps.

Theorem 14

Given the system (1), where Formula$w$ is a square-integrable martingale and Formula$x(0)$ is an arbitrary zero mean random vector independent of Formula$w$, consider the problem of minimizing the functional (3) over the class of all feedback laws Formula$\pi$ that are deterministically well-posed for (1). Then, provided it is deterministically well-posed, the unique optimal control law is given by (4), where Formula$K$ is defined by (11) and Formula${\mathhat {x}}$ is the conditional mean (6).

Proof

Given Lemma 8, we can use the same completion-of-squares argument as in Section II except that we now need to use Ito's differential rule for martingales (see, e.g., [19], [33]), which, in integrated form, becomes Formula TeX Source $$\displaylines{x(T)^{\prime}P(T)x(T)-x(0)^{\prime}P(0)x(0)=f_{\Delta}\hfill\cr\hfill+\int_{0}^{T}\{x(t)^{\prime}{\mathdot{P}}x(t)dt+2x(t_{-})^{\prime}Pdx+tr \left (Pd[x,x^{\prime}]\right)\},\quad{\hbox{(44)}}}$$ where Formula$[x,x^{\prime}]$ is the quadratic variation of Formula$x$ and Formula$f_{\Delta}$ is an extra term which is in general nontrivial when Formula$w$ has a jump component. Now let Formula TeX Source $$q(t):=\int_{0}^{t}\Phi (t,s)\big (A(s)x(s)+B_{1}(s)u(s)\big) ds,$$ where Formula$\Phi$ is the transition function of (1) which is differentiable in both arguments. Then, Formula$x=q+v$, where Formula$dv=B_{2}dw$ and Formula$q$ is a continuous process with bounded variation. Therefore Formula TeX Source $$[x,x^{\prime}]=[q,q^{\prime}]+2[q,v^{\prime}]+[v,v^{\prime}]=[v,v^{\prime}].$$ In fact, Formula$[q,q^{\prime}]=[q,v^{\prime}]=0$ [19, Corollary 8.5]. Since Formula$v$ does not depend on the control Formula$u$, neither does the last term in the integral in (44). If Formula$w$ has a jump component, we have a nontrivial extra term in (44), namely Formula TeX Source $$\displaylines{f_{\Delta}=\sum_{s\leq T}\big [x(s)^{\prime}P(s)x(s)-x(s_{-})^{\prime}P(s)x(s_{-})\hfill\cr\hfill-2x(s_{-})^{\prime}P(s)\Delta_{s}-\Delta_{s}^{\prime} P(s)\Delta_{s}\big]}$$ where the sum is over all jump times Formula$s$ on the interval Formula$[0,T]$ and Formula$\Delta_{s}:=x(s)-x(s_{-})$ is the jump, and we need to ensure that this term does not depend on the control either. However, since Formula$x(s)=x(s_{-})+\Delta_{s}$, we have Formula$f_{\Delta}=0$.

Then the rest of the proof that (4) with Formula${\mathhat {x}}$ given by (6) is the unique minimizer of (3) over all deterministically well-posed control laws follows from an argument as in Section II. More precisely, using (11) and completing the squares we obtain Formula TeX Source $$\eqalignno{&\int_{0}^{T}(x^{\prime}Qxdt+u^{\prime}Rudt)dt+x(T)^{\prime}Sx(T)\cr&\quad=x(0)^{\prime}P(0)x(0)+\int_{0}^{T}(u-Kx)^{\prime}R(u-Kx)dt\cr&\quad+\int_{0}^{T}{\rm tr}\left(Pd[v,v^{\prime}]\right)+\int_{0}^{T}x(t_{-})^{\prime}PB_{2}dw.&{\hbox{(45)}}}$$ Next we claim that Formula$E\left\{\int_{0}^{T}x(t_{-})^{\prime}P(t)B_{2}(t)dw\right\}$ exists and equals zero. To see this note that the integrand is nonanticipatory [34, p. 122]. It also has finite variance, since Formula$w$ is a square-integrable martingale and Formula$u$ needs to be square-integrable for the cost to be finite. Therefore the integrand satisfies the condition [19, eqn. (8.8)], and hence Formula$\int_{0}^{T}x(t_{-})^{\prime}P(t)B_{2}(t)dw$ is a martingale as well and thus has zero mean. Consequently, the only control dependent term in (45) is the term appearing in (17). By Lemma 8, the estimation error Formula${\mathtilde{x}}$ does not depend on the control. Hence the statement of the theorem follows. Formula$\blackboxfill$

We note that in general the optimal control law does not belong to Formula${\cal L}$ and that Formula${\mathhat {x}}$ is not given by the Kalman filter (5) but by the conditional mean (6), which then has to be chosen with some care since it is only defined almost surely as projection for each individual time Formula$t$. To this end it is standard to select the optional projection of Formula$x(t)$ on Formula${\cal Y}_{t}$ which is a stochastic process with a càdlàg version [2, page 17]. Often Formula${\mathhat x}$ is given by a nonlinear filter as in the following example. However, even in those cases, it is difficult to ascertain well-posedness. At present, we are unable to establish that the control law in the example is deterministically well-posed and hence optimal in our admissible class of controls. We expect that Theorem 14 can be strengthened by removing the a priori assumption of well-posedness for cases where the optimal filter can be expressed as a stochastic differential equation with suitably well-conditioned coefficients. Such a strengthening is needed to prove optimality for the following example where we are currently unable to establish well-posedness.

Example 15

Figure 6
Fig. 6. Model for step change in white noise.

Consider the system in Fig. 6. Here, Formula$x$ represents a parameter which undergoes a sudden random step change due to a random external forcing Formula$v$. The step can be in either direction. Thus, as a stochastic process Formula$v(t)$ is defined Formula TeX Source $$v(t)=\cases{\theta &$t\geq\tau$\cr 0&$t<\tau$}\eqno{\hbox{(46)}}$$ where Formula$\theta=\pm 1$ with equal probability and Formula$\tau$ is a random variable uniformly distributed on Formula$[0,\,T]$. Clearly Formula$v$ is a martingale. Our goal is to maintain a value for the state Formula$x$ close to zero on the interval Formula$[0,T]$ via integral control action through Formula$u$, indirectly, by demanding that Formula TeX Source $$E\left\{\int_{0}^{T}(x^{2}+R u^{2}) dt\right\}$$ be minimal with Formula$R>0$. Here, Formula$u$ denotes the control. The process Formula$x$ is observed in additive white noise Formula${\mathdot w}$. The system is now written in the standard form (1) as follows: Formula TeX Source $$\cases{dx=u dt+dv,\; x(0)=0,\cr dy=x dt+\sigma dw}\eqno{\hbox{(47)}}$$ where Formula$w$ is a Wiener process. We solve the Riccati equation Formula${\mathdot k}=-k^{2}+R^{-1}$ with boundary condition Formula$k(T)=0$ to obtain Formula$k(t)=-R^{-1/2}\tanh\left(R^{-1/2}(T-t)\right)$. The control law in Theorem 14 is Formula TeX Source $$u(t)=k(t){\mathhat x}(t),\eqno{\hbox{(48a)}}$$ where the conditional expectation is determined separately using a (nonlinear) Wonham-Shiryaev filter Formula TeX Source $$\cases{d{\mathhat x}=k{\mathhat x}dt+{{1}\over{\sigma^{2}}}(1-\rho^{2}-2(T-t)\phi)(dy-{\mathhat x}dt)\cr d\rho={{1}\over{\sigma^{2}}}(1-\rho^{2}-2(T-t)\phi)(dy-{\mathhat x}dt)\cr d\phi=-{{1}\over{\sigma^{2}}}\phi\rho (t) (dy-{\mathhat x}dt)}\eqno{\hbox{(48b)}}$$ with Formula$\rho (0)=0$ and Formula$\phi (0)=1$. Following [16, page 222] we explain the steps for deriving the filter equations in Appendix VIII.

In order to conclude that the control law (48) is actually optimal we need to establish that the feedback loop is deterministically well-posed. This requires that (10) has a unique solution for each Formula$z_{0}=\left(\matrix{v& w}\right)^{\prime}$. Noting that the innovation Formula$dy-{\mathhat x}dt$ can be expressed as Formula TeX Source $$dy-{\mathhat x}\, dt=(v-\rho)dt+dw,$$ this requires that the stochastic differential equations (47)(48) can be uniquely solved pathwise as a map from Formula$z_{0}=\left(\matrix{v& w}\right)^{\prime}$ to Formula$z=\left(\matrix{x& y}\right)^{\prime}$. There are conditions in the literature for when such maps between path spaces exist (see [34, page 126, Theorem 10.4], [19, page 128], and the references therein). However, we are not able at present to verify that these hold in our case.

In view of Remark 11 we immediately have the following corollary to Theorem 14 for the case of complete state information. A similar statement was given in [27] in a different context.

Corollary 16

Given the system (14), where Formula$w$ is a square-integrable martingale and Formula$x(0)$ is an arbitrary random vector independent of Formula$w$, consider the problem of minimizing the functional (3) over the class of all feedback laws Formula$\pi$ that are deterministically well-posed for (14). Then the unique optimal control law is given by (15), where Formula$K$ is defined by (11).

Proof

It just remains to prove that the control law (15) is deterministically well-posed. To this end, we first note that (with Formula$z=x$) the feedback (10) becomes Formula TeX Source $$x(t)=x_{0}(t)+\int_{0}^{t}Q(t,s)x(s)ds,$$ where Formula$Q(t,s)=\Phi (t,s)B_{1}(s)K(s)$ with Formula$\Phi$ (as before) being the transition matrix function of Formula$A$. Then a straight-forward calculation shows that Formula TeX Source $$x(t)=x_{0}(t)+\int_{0}^{t}R(t,s)x_{0}(s)ds,$$ where Formula$R$ is the unique solution of the resolvent (43). This establishes well-posedness. Formula$\blackboxfill$

Example 17

Let the driving noise Formula$w$ in (14) be given by either a Poisson martingale [19, page 87], or a geometric Brownian motion [19, page 124] Formula TeX Source $$dw=\mu w(t)dt+\sigma w(t)dv,$$ where Formula$v$ is a Wiener process, or a combination. Then the control law Formula$u(t)=K(t)x(t)$ is optimal for the problem to minimize (3).

SECTION VI

SEPARATION PRINCIPLE FOR DELAY-DIFFERENTIAL SYSTEMS

The formulation (8) covers more general stochastic systems than the ones considered above. An example is a delay-differential system of the type Formula TeX Source $$\cases{dx=A_{1}(t)x(t)dt+A_{2}(t)x(t-h)dt\cr+\int_{t-h}^{t}A_{0}(t,s)x(s)dsdt+B_{1}(t)u(t)dt+B_{2}(t)dw\cr dy=C_{1}(t)x(t)dt+C_{2}(t)x(t-h)dt+D(t)dw}.$$ Apparently, stochastic control for various versions of such systems were first studied in [23], [24], [25], [26], [27], and [9], although [9] relies on the strong assumption that the observation Formula$y$ is “functionally independent” of the control Formula$u$, thus avoiding the key question studied in the present paper.

Here, as in [26], we shall consider the wider class of stochastic systems Formula TeX Source $$\cases{dx=\left(\int_{t-h}^{t}d_{s}A(t,s)x(s)\right) dt\cr\qquad\qquad+B_{1}(t)u(t)dt+B_{2}(t)dw\cr dy=\left(\int_{t-h}^{t}d_{s}C(t,s)x(s)\right) dt+D(t)dw}\eqno{\hbox{(49)}}$$ where Formula$A$ and Formula$C$ are of bounded variation in the first argument and continuous on the right in the second, Formula$x(t)=\xi (t)$ is deterministic (for simplicity) for Formula$-h\leq t\leq 0$, and Formula$y(0)=0$. More precisely, Formula$A(t,s)=0$ for Formula$s\geq t$, Formula$A(t,s)=A(t,t-h)$ for Formula$t\leq t-h$, and the total variation of Formula$s\mapsto A(t,s)$ is bounded by an integrable function in the variable Formula$t$, and the same holds for Formula$C$. Moreover, to avoid technicalities we assume that Formula$w$ is now a (square-integrable) Gaussian (vector) martingale. Now, the first of equations (49) can be written in the form Formula TeX Source $$\displaylines{x(t)=\Phi (t,0)\xi (0)+\int_{-h}^{0}d_{\tau}\left\{\int_{0}^{t}\Phi (t,s)A(s,\tau)ds\right\}\xi(\tau)\hfill\cr\hfill+\int_{0}^{t}\Phi(t,s)B_{1}(s)u(s)ds+\int_{0}^{t}\Phi(t,s)B_{2}(s)dw\quad{\hbox{(50)}}}$$ [26, p. 85], where Formula$\Phi$ is the Green's function corresponding to the deterministic system [3] (also see, e.g., [26, p. 101]). In the same way, we can express the second equation in integrated form. Consequently, (49) can be written in the form (8), where Formula$K$ and Formula$H$ are computed as in [26, pp. 101–103]. The problem is to find a feedback law (2) that minimizes Formula TeX Source $$J(u):=E\{V_{0}(x,u)\}\eqno{\hbox{(51)}}$$ subject to the constraint (49), where Formula TeX Source $$V_{s}(x,u):=\left\{\int_{s}^{T}x^{\prime}Qx\,d\alpha (t)+\int_{s}^{T}u^{\prime}Ru\,dt\right\}\eqno{\hbox{(52)}}$$ and Formula$d\alpha$ is a positive Stieltjes measure.

Lemma 8 enables us to strengthen the results in [26]. To this end, to avoid technicalities, we shall appeal to a representation result from [27] rather than using a completion-of-squares argument, although the latter strategy would lead to a stronger result where Formula$w$ could be an arbitrary (square-integrable) martingale. A completion-of-squares argument for a considerably simpler problem was given in [8], but, as pointed out in [28], this paper suffers from a similar mistake as the one pointed out earlier in the present paper. In this context, we also mention the recent paper [4], which considers optimal control of a stochastic system with delay in the control. This paper assumes at the outset that the separation principle for delay systems is valid with a reference to [20]. Instead of basing the argument on [20], which is not quite appropriate here, their claim could be justified by noting that the delay in the control also implies a delay in information as in Example 3 above.

Now, it can be shown that the corresponding deterministic control problem obtained by setting Formula$w=0$ has an optimal linear feedback control law Formula TeX Source $$u(t)=\int_{t-h}^{t}d_{\tau}K(t,\tau)x(\tau),\eqno{\hbox{(53)}}$$ where we refer the reader to [26] for the computation of Formula$K$. The following theorem is a considerable strengthening of the corresponding result in [26].

Theorem 18

Given the system (49), where Formula$w$ is a Gaussian martingale, consider the problem of minimizing the functional (51) over the class of all feedback laws Formula$\pi$ that are deterministically well-posed for (1). Then the unique optimal control law is given by Formula TeX Source $$u(t)=\int_{t-h}^{t}d_{s}K(t,s){\mathhat {x}}(s\vert t),\eqno{\hbox{(54)}}$$ where Formula$K$ is the deterministic control gain (53) and Formula TeX Source $${\mathhat {x}}(s\vert t):=E\{x(s)\mid{\cal Y}_{t}\}\eqno{\hbox{(55)}}$$ is given by a linear (distributed) filter Formula TeX Source $$\eqalignno{d{\mathhat {x}}(t\vert t)=&\,\int_{t-h}^{t}d_{s}A(t,s){\mathhat {x}}(s\vert t)dt\cr&+B_{1}udt+X(t,t)dv&{\hbox{(56a)}}\cr d_{t}{\mathhat {x}}(s\vert t)=&\,X(s,t)dv,\; s\leq t &{\hbox{(56b)}}}$$ where Formula$v$ is the innovation process Formula TeX Source $$dv=dy-\int_{t-h}^{t}d_{s}C(t,s){\mathhat {x}}(s\vert t)dt,\quad v(0)=0,\eqno{\hbox{(57)}}$$ and the gain Formula$X$ is as defined in [26, p.120].

For the proof of Theorem 18 we shall need two lemmas. The first is a slight reformulation of Lemma 4.1 in [27] and only requires that Formula$v$ be a martingale.

Lemma 19 ([27])

Let Formula$v$ be a square-integrable martingale with natural filtration Formula TeX Source $${\cal V}_{t}=\sigma\{v(s), s\in [0,t]\},\quad 0\leq t\leq T\eqno{\hbox{(58)}}$$ satisfying Formula$[v_{j},v_{k}]=\beta_{j}\delta_{jk}$, where Formula$\beta_{k}$, Formula$k=1,2,\ldots,p$, are nondecreasing functions, and Formula$\delta_{jk}$ is the Kronecker delta equal to one for Formula$j=k$ and zero otherwise. With Formula$u$ a square-integrable control process adapted to Formula$\{{\cal V}_{t}\}$, let Formula TeX Source $$u(t)=\bar{u}(t)+\sum_{k=1}^{p}\int_{0}^{t}u_{k}(t,s)dv_{k}(s)+{\mathtilde{u}}(t)\eqno{\hbox{(59)}}$$ be the unique orthogonal decomposition for which Formula$\bar{u}$ is deterministic and, for each Formula$t\in [0,T]$, Formula${\mathtilde{u}}$ is orthogonal to the linear span of the components of Formula$\{v(s), s\in E[0,t]\}$. Moreover, let Formula$x_{0}$ be a square-integrable process adapted to Formula$\{{\cal V}_{t}\}$ and having a corresponding orthogonal decomposition Formula TeX Source $$x_{0}(t)=\bar{x}_{0}(t)+\sum_{k=1}^{p}\int_{0}^{t}x_{k}^{0}(t,s)dv_{k}(s)+{\mathtilde{x}}_{0}(t).\eqno{\hbox{(60)}}$$ Then Formula$x=x_{0}+g(u)$, defined by (8) exchanging Formula$z$ for Formula$x$, has the orthogonal decomposition Formula TeX Source $$x(t)=\bar{x}(t)+\sum_{k=1}^{p}\int_{0}^{t}x_{k}(t,s)dv_{k}(s)+{\mathtilde{x}}(t),\eqno{\hbox{(61)}}$$ where Formula TeX Source $$\cases{\bar{x}(t)=\bar{x}_{0}(t)+\int_{0}^{t}G(t,\tau)\bar{u}(\tau)d\tau\cr x_{k}(t,s)=x_{k}^{0}(t,s)+\int_{s}^{t}G(t,\tau)u_{k}(\tau,s)d\tau\cr{\mathtilde{x}}(t)={\mathtilde{x}}_{0}(t)+\int_{0}^{t}G(t,\tau){\mathtilde{u}}(\tau)d\tau}\eqno{\hbox{(62)}}$$ and Formula TeX Source $$\displaylines{E\{V_{0}(x,u)\}=\sum_{k=1}^{p}\int_{0}^{T}V_{s}(x_{k}(\cdot,s),u_{k}(\cdot,s))d\beta_{k}\hfill\cr\hfill+E\{V_{0}(\bar{x},\bar{u})\}+E\{V_{0}({\mathtilde{x}},{\mathtilde{u}}\}).\quad{\hbox{(63)}}}$$

For a proof of this lemma, we refer the reader to [27].

Lemma 20

Let Formula$y$ be the output process of the closed-loop system obtained after applying a deterministically well-posed feedback law Formula$u=\pi (y)$ to the system (49). Then the innovation process (57) is a Gaussian martingale, and the corresponding filtration (58) satisfies Formula TeX Source $${\cal V}_{t}={\cal Y}_{t},\quad 0\leq t\leq T.\eqno{\hbox{(64)}}$$

Proof

As can be seen from the equation (50) and the remark following it, the process Formula$y_{0}$ obtained by setting Formula$u=0$ in (49) is given by Formula$dy_{0}=q_{0}(t)dt+D(t)dw$ for a process Formula$q_{0}$ adapted to Formula$\{{\cal W}_{t}\}$. Define Formula$dv_{0}=dy_{0}-{\mathhat {q}}_{0}(t)dt$, where Formula${\mathhat {q_{0}}}(t):=E\{q_{0}(t)\mid{\cal Y}_{t}^{0}\}$. Now, Formula$q_{0}$ and Formula$w$ are jointly Gaussian, and therefore, for each Formula$t\in [0,T]$, the components of Formula${\mathhat {q}}_{0}(t)$ belong to the closed linear span of the components of the semimartingale Formula$\{y_{0},\, t\in [0,T]\}$, and hence Formula TeX Source $${\mathhat {q}}_{0}(t)=\int_{0}^{t}M(t,s)dy_{0}$$ for some Formula$L^{2}$-kernel Formula$M$. Therefore, Formula$v_{0}$ is Gaussian, and its natural filtration Formula${\cal V}_{t}^{0}$ satisfies Formula${\cal V}_{t}^{0}\subset{\cal Y}_{t}^{0}$. Now let Formula$R$ be the resolvent of the Volterra equation with kernel Formula$M$; i.e., the unique solution of the resolvent equation Formula TeX Source $$R(t,s)=\int_{s}^{t}R(t,\tau)M(\tau,s)d\tau+M(t,s)$$ [35], [42]. Then Formula TeX Source $$\int_{s}^{t}R(t,s)dv_{0}(s)=\int_{s}^{t}M(t,s)dy_{0}(s)={\mathhat {q}}_{0}(t),$$ and hence Formula${\cal Y}_{t}^{0}\subset{\cal V}_{t}^{0}$. Consequently, in view of Lemma 8, Formula${\cal V}_{t}^{0}={\cal Y}_{t}^{0}={\cal Y}_{t}$. Next observe that Formula TeX Source $$dy=q(t)+D(t)dw,\quad q(t):=q_{0}(t)+h(u)(t),$$ where Formula$h(u)$ is a causal (linear) function of the control Formula$u$. Since Formula$h(u)$ is adapted to Formula$\{{\cal Y}_{t}\}$, Formula TeX Source $${\mathhat {q}}(t):={\mathhat {q}}_{0}(t)+h(u)(t),$$ and therefore the innovation process (57) satisfies Formula$dv=dy-{\mathhat {q}}(t)dt=dy_{0}-{\mathhat {q}}_{0}(t)dt=dv_{0}$. Equation (64) now follows.

Finally, to prove that the innovation process Formula$v$ is a martingale we need to show that Formula TeX Source $$E\{v(s)-v(t)\mid{\cal V}_{t}\}=0\;{\rm for all s\geq t}.$$ To this end, first note that Formula TeX Source $$\displaylines{E\left\{v(s)-v(t)\mid{\cal V}_{t}\right\}=E\left\{\int_{t}^{s}{\mathtilde{q}}(\tau)d\tau\mid{\cal V}_{t}\right\}\hfill\cr\hfill+E\left\{\int_{t}^{s}B(\tau)dw\mid{\cal V}_{t}\right\},\quad{\hbox{(65)}}}$$ where Formula${\mathtilde{q}}(t):=q(t)-{\mathhat {q}}(t)$. Since all the processes are jointly Gaussian (the control-dependent terms have been canceled in forming Formula${\mathtilde{q}}$), independence is the same as orthogonality. Since Formula${\mathtilde{q}}(\tau)\perp{\cal V}_{\tau}\supset{\cal V}_{t}$ for Formula$\tau\geq t$, the first term in (65) is zero. The second term can be written Formula TeX Source $$E\left\{E\left\{\int_{t}^{s}B(\tau)dw\mid{\cal W}_{t}\right\}\mid{\cal V}_{t}\right\},$$ which is zero since Formula$w$ is a martingale. Formula$\blackboxfill$

We are now in a position to prove Theorem 18. Lemma 20 shows that the innovation process (57) is a martingale. It is no restriction to assume that Formula$E\{v(t)v(t)^{\prime}\}$ is diagonal; if it is not, we just normalize the innovation process by replacing Formula$v(t)$ by Formula$R(t)^{-1/2}v(t)$, where Formula$R(t):=E\{v(t)v(t)^{\prime}\}>0$. Then we set Formula$\beta_{k}(t):=E\{v_{k}^{2}\}$, Formula$k=1,2,\ldots,p$. Since Formula${\cal V}_{t}={\cal Y}_{t}$ for Formula$t\in [0,T]$ (Lemma 20), admissible controls take the form (59). Moreover, the process Formula${\mathhat {x}}(t):=E\{x(t)\mid{\cal Y}_{t}\}$ is adapted to Formula$\{{\cal V}_{t}\}$, and hence, analogously to (59), it has the decomposition Formula TeX Source $${\mathhat {x}}(t)=\bar{x}(t)+\sum_{k=1}^{p}\int_{0}^{t}x_{k}(t,s)dv_{k}(s)+{\mathtilde{x}}(t),\eqno{\hbox{(66)}}$$ which now will take the place of (61) in Lemma 19. As before, let Formula${\mathhat {x}}_{0}$ be the process Formula${\mathhat {x}}$ obtained by setting Formula$u=0$. By Lemma 8, Formula${\mathhat {x}}_{0}$ does not depend on the control Formula$u$. Moreover, since Formula$x_{0}$ and Formula$v$ are jointly Gaussian, Formula TeX Source $${\mathhat {x}}_{0}(t)=\bar{x}_{0}(t)+\sum_{k=1}^{p}\int_{0}^{t}x_{k}^{0}(t,s)dv_{k}(s),\eqno{\hbox{(67)}}$$ replacing (60) in Lemma 19. Moreover, Formula TeX Source $$E\{V_{0}(x,u)\}=E\{V_{0}({\mathhat {x}},u)\}+E\{V_{0}(x-{\mathhat {x}},0)\},$$ where the last term does not depend on the control, since Formula$x-{\mathhat {x}}=x_{0}-{\mathhat {x}}_{0}$. Hence, by Lemma 19, the problem is now reduced to finding a control (59) and a state process (66) minimizing Formula$E\{V_{0}({\mathhat {x}},u)\}$ subject to Formula TeX Source $$\eqalignno{\bar{x}(t)=&\,\bar{x}_{0}(t)+\int_{0}^{t}G(t,\tau)\bar{u}(\tau)d\tau&{\hbox{(68a)}}\cr x_{k}(t,s)=&\,x_{k}^{0}(t,s)+\int_{s}^{t}G(t,\tau)u_{k}(\tau,s)d\tau &{\hbox{(68b)}}\cr{\mathtilde{x}}(t)=&\,\int_{0}^{t}G(t,\tau){\mathtilde{u}}(\tau)d\tau &{\hbox{(68c)}}}$$ where the last equation has been modified to account for the fact that Formula${\mathtilde{x}}_{0}=0$. Clearly, this problem decomposes into several distinct problems. First Formula$\bar{u}$ need to chosen so that Formula$V_{0}(\bar{x},\bar{u})$ is minimized subject to (68a). This is a deterministic control problem with the feedback solution Formula TeX Source $$\bar{u}(t)=\int_{t-h}^{t}d_{\tau}K(t,\tau)\bar{x}(\tau),\eqno{\hbox{(69)}}$$ where Formula$K$ is as in (53). Secondly, for each Formula$s\in [0,T]$ and Formula$k=1,2,\ldots,p$, Formula$u_{k}(t,s)$ has to be chosen so as to minimize Formula$V_{s}(x_{k}(\cdot,s),u_{k}(\cdot,s))$ subject to (68b). This again is a deterministic control problem with the optimal feedback solution Formula TeX Source $$u_{k}(t,s)=\int_{t-h}^{t}d_{\tau}K(t,\tau)x_{k}(\tau,s).\eqno{\hbox{(70)}}$$ Finally, Formula${\mathtilde{u}}$ should be chosen so as to minimize Formula$E\{V_{0}({\mathtilde{x}},{\mathtilde{u}}\})$ subject to (68c). This problem clearly has the solution Formula${\mathtilde{u}}=0$, and hence Formula${\mathtilde{x}}=0$ as well. Combining these results inserting them into (59) then yields the optimal feedback control Formula TeX Source $$u(t)=\int_{t-h}^{t}d_{\tau}K(t,\tau)\big (\bar{x}(\tau)+\sum_{k=1}^{p}\int_{0}^{t}x_{k}(t,s)dv_{k}(s)\big).$$ It remains to show that this is exactly the same as (54); i.e., that Formula TeX Source $${\mathhat {x}}(\tau\vert t)=\bar{x}(\tau)+\sum_{k=1}^{p}\int_{0}^{t}x_{k}(t,s)dv_{k}(s).\eqno{\hbox{(71)}}$$ To this end, first note that, since the optimal control is linear in Formula$dv$, Formula${\mathhat {x}}(\tau\vert t)$ will take the form Formula TeX Source $${\mathhat {x}}(\tau\vert t)=\bar{x}(\tau)+\int_{0}^{t}X_{t}(\tau,s)dv(s),$$ where Formula$\bar{x}(\tau)=E\{x(\tau)\}$, the same as in (71). Clearly Formula$E\{[x(\tau)-{\mathhat {x}}(\tau\vert t)]v(s)^{\prime}\}=0$ for Formula$s\in [0,t]$, and therefore Formula TeX Source $$E\{x(\tau)v(s)^{\prime}\}=E\{{\mathhat {x}}(\tau\vert t)v(s)^{\prime}\}=\int_{0}^{s}X_{t}(\tau,s)d\beta(s),$$ showing that the kernel Formula$X_{t}$ does not depend on Formula$t$; hence this index will be dropped. Now, setting Formula$\tau=t$, comparing with (66) and noting that Formula${\mathtilde{x}}=0$, we see that Formula$X(t,s)$ is the matrix with columns Formula$x_{1}(t,s),x_{2}(t,s),\ldots,x_{p}(t,s)$, establishing (71), which from now we shall write Formula TeX Source $${\mathhat {x}}(\tau\vert t)=\bar{x}(\tau)+\int_{0}^{t}X(\tau,s)dv(s).\eqno{\hbox{(72)}}$$ Hence, (54) is the optimal control, as claimed. Moreover, Formula TeX Source $${\mathhat {x}}(\tau\vert t)={\mathhat {x}}(s)+\int_{s}^{t}X(\tau,s)dv(s),$$ which yields (56a). To derive (56b), follow the procedure in [26].

It remains to show that the optimal control law (54) is deterministically well-posed. To this end, it is no restriction to assume that Formula$\bar{x}_{0}\equiv 0$ so that all processes have zero mean. Then it follows from (54) and the unsymmetric Fubini Theorem of Cameron and Martin [10] that Formula TeX Source $$\eqalignno{u(t)=&\,\int_{0}^{t}P(t,s)dv(s),\cr\noalign{\noindent \hbox{where}$\hfill$}P(t,s)=&\,\int_{t-h}^{t}d_{\tau}K(t,\tau)X(\tau,s)d\tau,}$$ and likewise from (57) that Formula TeX Source $$\eqalignno{dv=&\,dy-\int_{0}^{t}S(t,s)dv(s)dt,\cr\noalign{\noindent \hbox{where}$\hfill$}S(t,s)=&\,\int_{t-h}^{t}d_{\tau}C(t,\tau)X(\tau,s)d\tau.}$$ The function Formula$S$ is a Volterra kernel and therefore the Volterra resolvent equation Formula TeX Source $$V(t,s)=\int_{s}^{t}V(t,\tau)S(\tau,s)d\tau+S(t,s)$$ has a unique solution Formula$V$, from which it follows that Formula TeX Source $$dv=dy-\int_{0}^{t}V(t,s)dy(s).$$ Then the optimal control law is given by (40), where now Formula$M$ is given by Formula TeX Source $$M(t,s)=P(t,s)-\int_{s}^{t}P(t,\tau)V(\tau,s)d\tau.$$ Now, for the optimal control law, Formula$s\mapsto X(t,s)$ is of bounded variation for each Formula$t$ [26], and hence so is Formula$s\mapsto M(t,s)$. Hence Formula$\pi_{\rm opt}$ can be defined samplewise as in (41). To complete the proof that the optimal feedback loop is deterministically well-posed we proceed exactly as in the proof of Theorem 13, noting that in the present setting Formula TeX Source $${{\partial G}\over{\partial s}}(t,s)=\int_{s}^{t}d_{\tau}\left[\matrix{A(t.\tau)\cr C (t,\tau)}\right]\Phi (\tau,s)B_{1}(s),$$ where Formula$\Phi (t,s)$ is the transition matrix of Formula$A$ [26, p.101].

Remark 21

It was shown in [27] that, in the case of complete state information Formula$(y=x)$, the control (53) is optimal even when Formula$w$ is an arbitrary (not necessarily Gaussian) martingale.

SECTION VII

CONCLUSIONS

In studying the literature on the separation principle of stochastic control, one encounters many expositions where subtle difficulties are overlooked and inadmissible shortcuts are taken. On the other hand, for most papers and monographs that provide rigorous derivations, one is struck by the level of mathematical sophistication and technical complexity, which make the material hard to include in standard textbooks in a self-contained fashion. It is our hope that our use of deterministic well-posedness provides an alternative mechanism for understanding the separation principle that is more palatable and transparent to the engineering community, while still rigorous. The new insights offered by the approach allow us to establish the separation principle also for systems driven by non-Gaussian martingale noise. However, in this more general framework the key issue of establishing well-posedness for particular control systems is challenging and more work needs to be done.

APPENDIX

Consider the “uncontrolled” observation process Formula TeX Source $$dy_{0}=v(t)dt+\sigma dw.$$

If Formula$d{\BBP}$ denotes the law of Formula$(\theta,\tau,w)$ and Formula TeX Source $$\Lambda (t)=e^{\sigma^{-2}\int_{0}^{t}v(s)dy_{0}-(1/2)\sigma^{-2}\int_{0}^{t}v(s)^{2}ds,}$$ then, under a new measure Formula$d{\BBQ}:=\Lambda (T)^{-1}d{\BBP}$, Formula$y_{0}$ becomes a Wiener process while the law of Formula$v$ (i.e., of Formula$\theta$ and Formula$\tau$) is the same as before. Under Formula$d{\BBQ}$, the two processes Formula$y_{0}$ and Formula$v$ are independent. The conditional expectation is now given by (Bayes' formula [16, p. 174]) as shown in (73) at the bottom of the page. Formula TeX Source $$\eqalignno{E_{\BBP}(v(t)\vert{\cal Y}_{t})=&\,{{E_{\BBQ}(v(t)\Lambda (t)\vert{\cal Y}_{t})}\over{E_{\BBQ}(\Lambda (t)\vert{\cal Y}_{t})}}\cr=&\,{{E_{\BBQ}(\theta I_{t\geq\tau}e^{\sigma^{-2}\int_{0}^{t}\theta I_{s\geq\tau}dy_{0}-(1/2)\sigma^{-2}\int_{0}^{t}I_{s\geq\tau}ds}\vert{\cal Y}_{t})}\over{E_{\BBQ}(e^{\sigma^{-2}\int_{0}^{t}\theta I_{s\geq\tau}dy_{0}-(1/2)\sigma^{-2}\int_{0}^{t}I_{s\geq\tau}ds}\vert{\cal Y}_{t})}}\cr=&\,{{E_{\BBQ}(I_{t\geq\tau}e^{-({{1}/{\sigma^{2}})}y_{0}(t\wedge\tau)-(1/2\sigma^{2})(t-\tau)^{+}}\null (e^{{y_{0}(t)}/{\sigma^{2}}}\null-e^{-{{y_{0}(t)}/{\sigma^{2}}}})\vert{\cal Y}_{t})}\over{E_{\BBQ}(e^{-({{1}/{\sigma^{2}})}y_{0}(t\wedge\tau)-(1/2\sigma^{2})(t-\tau)^{+}}\null (e^{{y_{0}(t)}/{\sigma^{2}}}\null-e^{-{{y_{0}(t)}/{\sigma^{2}}}})\vert{\cal Y}_{t})}}.&{\hbox{(73)}}}$$

Here Formula$t\wedge\tau:=\min (t,\tau)$, Formula$I_{t\geq\tau}(t)=1$ when Formula$t\geq\tau$ and 0 otherwise, and Formula$(t-\tau)^{+}=(t-\tau)I_{t\geq\tau}$. Note that Formula$v(t)=\theta I_{t\geq\tau}(t)$. We also define Formula$\rho (t):=E_{\BBP}(v(t)\vert{\cal Y}_{t})$ and Formula TeX Source $$\eqalignno{\Sigma (t):=&\,\int_{0}^{t}e^{(y_{0}(t)-y_{0}(s)-(1/2)(t-s))/\sigma^{2}}ds,\cr\bar\Sigma (t):=&\,\int_{0}^{t}e^{(-(y_{0}(t)-y_{0}(s))-(1/2)(t-s))/\sigma^{2}}ds.}$$ From (73), Formula$\rho (t)=N(t)/D(t)$, where Formula$N(t)=\Sigma (t)-\bar\Sigma (t)$ and Formula$D(t)=\Sigma (t)+\bar\Sigma (t)+2(T-t)$. By first noting that Formula$\Sigma$ and Formula$\bar\Sigma$ satisfy the stochastic differential equations Formula TeX Source $$d\Sigma=\Sigma (t) dy_{0}+dt\quad{\rm and}\quad d\bar\Sigma=-\bar\Sigma (t) dy_{0}+dt,$$ respectively, the Itô rule applied to the expression Formula$N(t)/D(t)$ for the conditional expectation gives the filter equations (setting Formula$\phi=D^{-1}$) Formula TeX Source $$\eqalignno{d\rho=&\,\sigma^{-2}(1-\rho^{2}-2(T-t)\phi)(dy_{0}-\rho dt) &{\hbox{(74a)}}\cr d\phi=&\,-\sigma^{-2}\phi\rho (dy_{0}(t)-\rho dt).&{\hbox{(74b)}}}$$ Finally, noting that the innovation Formula$dy_{0}-\rho dt$ is equal to Formula$dy-{\mathhat x}dt$ for the controlled system, we obtain the filter equations (48).

ACKNOWLEDGMENT

We are indebted to an anonymous referee for significant input, which has improved the paper considerably.

Footnotes

This work was supported by grants from AFOSR, NSF, VR, SSF and the Göran Gustafsson Foundation. Recommended by Associate Editor H. Zhang.

T. T. Georgiou is with the Department of Electrical & Computer Engineering, University of Minnesota, Minneapolis, Minnesota 55455 USA (e-mail: tryphon@umn.edu).

A. Lindquist is with the Department of Automation, Shanghai Jiao Tong University, Shanghai, China, and the Center for Industrial and Applied Mathematics (CIAM) and the ACCESS Linnaeus Center, Royal Institute of Technology, 100 44 Stockholm, Sweden (e-mail: alq@kth.se).

1However, the model is conditionally Gaussian given the filtration Formula$\{{\cal Y}_{t}\}$; see Remark 6.

2“continue à droite, limite à gauche” in French, alternatively RCLL (“right continuous with left limits”) in English.

3More precisely, to be seen as a system, relay hysteresis needs to be preceded by a low-pass filter since its domain consists of continuous functions.

4It is interesting to note, as was pointed out by a referee, that the proof of the lemma relies critically on the action of the operator Formula$(1-g\pi H)^{-1}$ on a null set, as the probability Formula${\BBP}(z_{0}=H^{-R}y_{0})=0$ for any nontrivial model. This fact may be disturbing from a probabilistic point of view but does not invalidate the lemma.

5This was kindly suggested by a referee.

References

No Data Available

Authors

Tryphon T. Georgiou

Tryphon T. Georgiou

Tryphon T. Georgiou (M'79–SM'99–F'00) was born in Athens, Greece, in 1956. He received the Diploma in mechanical and electrical engineering from the National Technical University of Athens, Greece, in 1979, and the Ph.D. degree from the University of Florida, Gainesville, in 1983.

He served on the faculty of Florida Atlantic and Iowa State universities before joining the University of Minnesota in 1989. He is currently a Professor of Electrical and Computer Engineering, a Co-Director of the Control Science and Dynamical Systems Center (1990-present), and holds the Vincentine Hermes-Luh chair of Electrical Engineering. He has served as an Associate Editor for the IEEE Transactions on Automatic Control, the IEEE Transactions on Signal Processing, the SIAM Journal on Control and Optimization, and Systems and Control Letters, and as a Member of the Board of Governors of the Control Systems Society (2002–2005). His research interests lie in the areas of control and systems theory, information theory, and applied mathematics.

Dr. Georgiou is a co-recipient of three George S. Axelby Outstanding Paper awards by the IEEE Control Systems Society, for the years 1992, 1999, and 2003. In 1992 and in 1999 he received the award both times for joint work with Prof. Malcolm C. Smith (Cambridge Univ., U.K.), and in 2003 for joint work with Professors Chris Byrnes (Washington Univ., St. Louis) and Anders Lindquist (KTH, Stockholm and SJTU, Shanghai). He is a Foreign Member of the Royal Swedish Academy of Engineering Sciences (IVA).

Anders Lindquist

Anders Lindquist

Anders Lindquist (M'77–SM'86–F'89–LF'10) received the Ph.D. degree in 1972 from the Royal Institute of Technology (KTH), Stockholm, Sweden.

From 1972 to 1974 he held visiting positions at the University of Florida and Brown University. In 1974 he became an Associate Professor and in 1980 a Professor at the University of Kentucky, where he remained until 1983. In 1982, he was appointed as Professor of the Chair of Optimization and Systems Theory at the Royal Institute of Technology, and he remained in this position until 2010. Between 2000 and 2009 he was also the Head of the Mathematics Department there. He is now Zhiyuan Chair Professor and Qian Ren Scholar at Shanghai Jiao Tong University, Shanghai, China, and the Director of the Strategic Research Center for Industrial and Applied Mathematics at the Royal Institute of Technology.

Dr. Lindquist is a Member of the Royal Swedish Academy of Engineering Sciences, a Foreign Member of the Russian Academy of Natural Sciences, an Honorary Member the Hungarian Operations Research Society, a Fellow of SIAM, and a Fellow of IFAC. He was awarded the 2009 W. T. and Idalia Reid Prize in Mathematics from SIAM, the George S. Axelby Outstanding Paper Award for the year 2003, and a SIGEST paper award from SIAM Review in 2001. He received an Honorary Doctorate (Doctor Scientiarum Honoris Causa) from Technion (Israel Institute of Technology), Haifa, in June 2010.

Cited By

No Data Available

Keywords

Corrections

None

Multimedia

No Data Available
This paper appears in:
No Data Available
Issue Date:
No Data Available
On page(s):
No Data Available
ISSN:
None
INSPEC Accession Number:
None
Digital Object Identifier:
None
Date of Current Version:
No Data Available
Date of Original Publication:
No Data Available

Text Size