Browse

• Abstract

SECTION I

## INTRODUCTION

ONE OF THE fundamental principles of feedback theory is that the problems of optimal control and state estimation can be decoupled in certain cases [30]. This is known as the separation principle. The concept was coined early on in [17], [32] and is closely connected to the idea of certainty equivalence; see, e.g., [38]. In studying the literature on the separation principle of stochastic control, one is struck by the level of sophistication and technical complexity. The source of the difficulties can be traced to the circular dependence between control and observations. The goal of this paper is to present a rigorous approach to the separation principle in continuous time which is rooted in the engineering view of systems as maps between signal spaces.

The most basic setting begins with a linear system TeX Source $$\cases{dx=A(t)x(t)dt+B_{1}(t)u(t)dt+B_{2}(t)dw\cr dy=C(t)x(t)dt+D(t)dw}\eqno{\hbox{(1)}}$$ with a state process $x$, an output process $y$ and a control $u$, where $w(t)$ is a vector-valued Wiener process, $x(0)$ is a zero-mean Gaussian random vector independent of $w(t)$, $y(0)=0$, and $A$, $B_{1}$, $B_{2}$, $C$, $D$ are matrix-valued functions of compatible dimensions, which we take to be continuous of bounded variation. Moreover, $DD^{\prime}$ is nonsingular on the interval $[0,T]$, and if we want the noise processes in the state and output equations to be independent, as often is assumed but not required here, we take $B_{2}D^{\prime}\equiv 0$. All random variables and processes are defined over a common complete probability space $(\Omega,{\cal F},\BBP)$.

The control problem is to design an output feedback law TeX Source $$\pi:y\mapsto u\eqno{\hbox{(2)}}$$ over the window $[0,T]$ which maps the observation process $y$ to the control input $u$, in a nonanticipatory manner, so that the value of the functional TeX Source $$\displaylines{J(u)=E\left\{\int_{0}^{T}x(t)^{\prime}Q(t)x(t)dt\right.\hfill\cr\hfill\left.+\int_{0}^{T}u(t)^{\prime}R(t)u(t)dt+x(T)^{\prime}Sx(T)\right\}\quad{\hbox{(3)}}}$$ is minimized, where $Q$ and $R$ are continuous matrix functions of bounded variation, $Q(t)$ is positive semi-definite and $R(t)$ is positive definite for all $t$. How to choose the admissible class of control laws $\pi$ has been the subject of much discussion in the literature [27]. The conclusion, under varying conditions, has been that $\pi$ can be chosen to be linear in the data and, more specifically, in the form TeX Source $$u(t)=K(t){\mathhat x}(t),\eqno{\hbox{(4)}}$$ where ${\mathhat x}(t)$ is the Kalman estimate of the state vector $x(t)$ obtained from the Kalman filter TeX Source $$\displaylines{d{\mathhat x}=A(t){\mathhat x}(t)dt+B_{1}(t)u(t)dt\hfill\cr\hfill+L(t)(dy-C(t){\mathhat x}(t)dt),\quad{\mathhat x}(0)=0,\quad{\hbox{(5)}}}$$ and the gains $K$ and $L$ computed by solving to a pair of dual Riccati equations.

A result of this kind is far from obvious, and the early literature was marred by treatments of the separation principle where the non-Gaussian element introduced by an a priori nonlinear control law $\pi$ was overlooked. The subtlety lies in excluding the possibility that a nonlinear controller extracts more information from the data than it is otherwise possible. This point will be explained in detail in Section II, where a brief historical account of the problem will be given. Early expositions of the separation principle often fall in one of two categories: either the subtle issues are overlooked and inadmissible shortcuts are taken; or the treatment is mathematically quite sophisticated and technically very demanding. The short survey in Section II will thus serve the purpose of introducing the theoretical challenges at hand, as well as setting up notation.

In this paper we take the point of view that feedback laws (2) should act on sample paths of the stochastic process $y$ rather than on the process itself. This is motivated by engineering thinking where systems and feedback loops process signals. Thus, our key assumption on admissible control laws (2) is that the resulting feedback loop is deterministically well-posed in the sense that the feedback equations admit a unique solution that causally depends on the input for each input sample path. For this class of control laws we prove that the separation principle stated above holds and moreover that it extends to systems driven by general martingale noise. More precisely, in this non-Gaussian situation the Wiener process $w$ in (1) is replaced by an arbitrary (square-integrable) martingale process with possible jumps such as a Poisson process martingale; see, e.g., [19, p. 87]. Then, we only need to exchange the (linear) Kalman estimate ${\mathhat {x}}$ by the strict sense conditional mean TeX Source $${\mathhat {x}}(t)=E\{x(t)\mid{\cal Y}_{t}\},\eqno{\hbox{(6)}}$$ where TeX Source $${\cal Y}_{t}:=\sigma\{y(\tau),\tau\in [0,t]\},\quad 0\leq t\leq T,\eqno{\hbox{(7)}}$$ is the filtration generated by the output process; i.e., the family of increasing sigma fields representing the data as it is produced. The estimate ${\mathhat x}$ needs to be defined with care so that it constitutes a sufficiently regular stochastic process and realized by a map acting on observations [2, page 17], [11]. Unfortunately, the results in the present paper come at a cost since our key assumption of well-posedness excludes control laws for which the feedback system fails to be defined sample-wise. Existence of strong solutions of the feedback equations is not enough to ensure well-posedness in our sense as we will discuss below. In addition, the condition of deterministic well-posedness is often difficult to verify. Yet, besides the fact that we prove the separation principle for general martingale noise, the sample-wise viewpoint provides a simple explanation of why the separation principle may hold in the first place.

Before proceeding we recast the system model (1) in an integrated form which allows similar conclusions for more general linear systems in a unified setting. To this end, let $z(t)=\left(\matrix{x(t)\cr y(t)}\right)$. System (1) can now be expressed in the form TeX Source $$\cases{z(t)=z_{0}(t)+\int_{0}^{t}G(t,\tau)u(\tau)d\tau\cr y(t)=Hz(t),}\eqno{\hbox{(8)}}$$ where $z_{0}$ is the process $z$ obtained by setting $u=0$ and $G$ is a Volterra kernel. This integrated form encompasses a considerably wider class of controlled linear systems including delay-differential equations, following [26], [27], which will be taken up in Section VI. The corresponding feedback configuration is shown in Fig. 1 where TeX Source $$g:(t,u)\mapsto\int_{0}^{t}G(t,\tau)u(\tau)d\tau,\eqno{\hbox{(9)}}$$ is a Volterra operator and $H$ is a constant matrix. As usual, Fig. 1 is a graphical representation of the algebraic relationship TeX Source $$z=z_{0}+g\pi H z.\eqno{\hbox{(10)}}$$ For the particular model in (1), $H=[0,I]$, but in general $H$ could be any matrix or linear system. Setting $z:=x$ and $H=I$ we obtain the special case of complete state information.

Fig. 1. A feedback interconnection.

In a stochastic setting, the feedback (10) is said to have a unique strong solution if there exists a non-anticipating function $F$ such that $z=F(z_{0})$ satisfies (10) with probability one and all other solutions coincide with $z$ with probability one. It is important to note that in our sample-wise setting we require more, namely that such a unique solution exists and that (10) holds for all $z_{0}$, not only “almost all.” Consequences of this requirement will be further elaborated upon below.

The outline of the paper is as follows. In Section II we begin by reviewing the standard quadratic regulator problem and pointing out subtleties created by a possible nonlinear control law. We then review several strategies in the literature to establish a separation principle, chiefly restricting the class of admissible controls. Section III defines notions of signals and systems used in our framework, and in Section IV we establish necessary conditions for a feedback loop to make sense and deduce a basic fact about propagation of information in the loop through linear components. It Section V we state and prove our main results on the separation principle for linear-quadric regulator problems, allowing also for more general martingale noise. Finally, in Section VI we prove a separation theorem for delay systems with Gaussian martingale noise.

SECTION II

## HISTORICAL REMARKS

A common approach to establishing the basic separation principle stated at the beginning of Section I is a completion-of-squares argument similar to the one used in deterministic linear-quadratic-regulator theory; see e.g., [1]. For ease of reference, we briefly review this construction. Given the system (1) and the solution of the matrix Riccati equation TeX Source $$\cases{{\mathdot{P}}=-A^{\prime}P-PA+PB_{1}R^{-1}B_{1}^{\prime}P-Q,\cr P(T)=S}.\eqno{\hbox{(11a)}}$$ Itô's differential rule (see, e.g., [19], [31]) yields TeX Source $$d(x^{\prime}Px)=x^{\prime}{\mathdot{P}}xdt+2x^{\prime}Pdx+{\rm tr}(B_{2}^{\prime}PB_{2})dt,$$ where ${\rm tr}(M)$ denotes the trace of the matrix $M$. Then from (1) and (11a) it readily follows that TeX Source $$\displaylines{d(x^{\prime}Px)=[-x^{\prime}Qx-u^{\prime}Ru+(u-Kx)^{\prime}R(u-Kx)]dt\hfill\cr\hfill+{\rm tr}(B_{2}^{\prime}PB_{2})dt+2x^{\prime}PB_{2}dw,}$$ where TeX Source $$K(t):=-R(t)^{-1}B_{1}(t)^{\prime}P(t).\eqno{\hbox{(11b)}}$$ Integrating this from 0 to $T$ and taking mathematical expectation, we obtain the following expression for the cost functional (3): TeX Source $$\displaylines{J(u)=E\left\{\int_{0}^{T}(u-Kx)^{\prime}R(u-Kx)dt\right\}\hfill\cr\hfill+E\left\{x(0)^{\prime}P(0)x(0)\right\}+\int_{0}^{T}{\rm tr}(B_{2}^{\prime}PB_{2})dt.\quad{\hbox{(12)}}}$$ To ensure that $\int_{0}^{T}x^{\prime}PB_{2}dw$ has zero expectation, we need to check that the integrand is square integrable. It is clear that $u$ is square integrable for otherwise $J(u)=\infty$. Then the state process TeX Source $$x(t)=x_{0}(t)+\int_{0}^{t}\Phi (t,s)B_{1}(s)u(s)ds\eqno{\hbox{(13)}}$$ is square integrable as well. Here $x_{0}$ is the (square integrable) state process corresponding to $u=0$, and $\Phi$ is the transition matrix function of the system (1).

Now, if we had complete state information with (1) replaced by TeX Source $$\cases{dx=A(t)x(t)dt+B_{1}(t)u(t)dt+B_{2}(t)dw\cr y=x}\eqno{\hbox{(14)}}$$ we could immediately conclude that the feedback law TeX Source $$u(t)=K(t)x(t)\eqno{\hbox{(15)}}$$ is optimal, because the last term in (12) does not depend on the control. However, when we have incomplete state information with the control being a function of the observed process $\{y(s); 0\leq s\leq t\}$, things become more complicated. Mathematically we formalize this by having any control process adapted to the filtration (7); i.e., having $u(t) {\cal Y}_{t}$-measurable for each $t\in [0,T]$. Then, with ${\mathhat {x}}$ given by (6), setting TeX Source $${\mathtilde{x}}(t):=x(t)-{\mathhat {x}}(t),\eqno{\hbox{(16)}}$$ we have $E\{[u(t)-K(t){\mathhat {x}}(t)]{\mathtilde{x}}(t)^{\prime}\}=0$, and therefore TeX Source $$\displaylines{E\int_{0}^{T}(u-Kx)^{\prime}R(u-Kx)dt\hfill\cr\hfill=E\int_{0}^{T}[(u-K{\mathhat {x}})^{\prime}R(u-K{\mathhat {x}})+{\rm tr}(K^{\prime}RK\Sigma)]dt,\quad{\hbox{(17)}}}$$ where $\Sigma$ is the error covariance matrix function TeX Source $$\Sigma (t):=E\{{\mathtilde{x}}(t){\mathtilde{x}}(t)^{\prime}\}.\eqno{\hbox{(18)}}$$ A common mistake in the early literature on the separation principle is to assume without further investigation that $\Sigma$ does not depend on the choice of control. Indeed, if this were the case, it would follow directly that (12) is minimized by choosing the control as (4), and the proof of the separation principle would be immediate. (Of course, in the end this will be the case under suitable conditions, but this has to be proven.) This mistake probably originates from the observation that the control term in (13) cancels when forming (16) so that TeX Source $${\mathtilde{x}}(t)={\mathtilde{x}}_{0}(t):=x_{0}(t)-{\mathhat {x}}_{0}(t),\eqno{\hbox{(19)}}$$ where TeX Source $${\mathhat {x}}_{0}(t):=E\{x_{0}(t)\mid{\cal Y}_{t}\}.\eqno{\hbox{(20)}}$$ However, in this analysis, we have not ruled out that ${\mathhat {x}}_{0}$ depends on the control or, what would follow from this, that the filtration (7) does. A detailed discussion of this conundrum can be found in [27]. In fact, since the control process $u$ is in general a nonlinear function of the data and thus non-Gaussian, then so is the output process $y$.1 Consequently, the conditional expectation (20) might not in general coincide with the wide sense conditional expectation obtained by projections of the components of $x_{0}(t)$ onto the closed linear span of the components of $\{y(\tau),\tau\in [0,t]\}$, and therefore, a priori, it could happen that ${\mathhat {x}}$ is not generated by the Kalman filter (5).

To avoid these problems one might begin by uncoupling the feedback loop as in Fig. 2, and determine an optimal control process in the class of stochastic processes $u$ that are adapted to the family of sigma fields TeX Source $${\cal Y}_{t}^{0}:=\sigma\{y_{0}(\tau),\tau\in [0,t]\},\quad 0\leq t\leq T,\eqno{\hbox{(21)}}$$ i.e., for each $t\in [0,T]$, $u(t)$ is a function of $\{y_{0}(s),\, 0\leq s\leq t\}$. This problem, where one optimizes over the class of all control processes adapted to a fixed filtration, was called a stochastic open loop (SOL) problem in [27]. It is not uncommon in the literature to assume from the outset that the control is adapted to $\{{\cal Y}_{t}^{0}\}$; see, e.g., [6, Section 2.3], [16], [40].

Fig. 2. A stochastic open loop (SOL) configuration.

In [27] it was suggested how to embed the class of admissible controls in various SOL classes in a problem-dependent manner, and then construct the corresponding feedback law. More precisely, in the present context, the class of admissible feedback laws was taken to consist of the nonanticipatory functions $u:=\pi (y)$ such that the feedback loop TeX Source $$z=z_{0}+g\pi Hz\eqno{\hbox{(22)}}$$ has a unique solution $z_{\pi}$ and $u=\pi (Hz_{\pi})$ is adapted to $\{{\cal Y}_{t}^{0}\}$. Next, we shall give a few examples of specific classes of feedback laws that belong to this general class.

#### Example 1

It is common to restrict the admissible class of control laws to contain only linear ones; see, e.g., [12]. In a more general direction, let ${\cal L}$ be the class TeX Source $$({\cal L})\quad u(t)=\bar{u}_{0}(t)+\int_{0}^{t}F(t,\tau)dy,\eqno{\hbox{(23)}}$$ where $\bar{u}$ is a deterministic function and $F$ is an $L_{2}$ kernel. In this way, the Gaussian property will be preserved, and ${\mathhat {x}}$ will be generated by the Kalman filter (5). Then it follows from (1) and (5) that ${\mathtilde{x}}$ is generated by TeX Source $$d{\mathtilde{x}}=(A-LC){\mathtilde{x}}dt+(B_{2}-LD)dw,\quad{\mathtilde{x}}(0)=x(0),$$ which is clearly independent of the choice of control. Then so is the error covariance (18), as desired. Even in the more general setting described by (8), it was shown in [26, pp. 95–96] that TeX Source $${\cal Y}_{t}={\cal Y}_{t}^{0},\quad t\in [0,T],\eqno{\hbox{(24)}}$$ for any $\pi\in{\cal L}$, where (21) is the filtration generated by the uncontrolled output process $y_{0}$ obtained by setting $u=0$ in (8).

#### Example 2

In his influential paper [41], Wonham proposed the class of control laws TeX Source $$u(t)=\psi (t,{\mathhat {x}}(t))\eqno{\hbox{(25)}}$$ in terms of the state estimate (6), where $\psi (t,x)$ is Lipschitz continuous in $x$. For pedagogical reasons, we first highlight a somewhat more restrictive construction due to Kushner [21]. Let TeX Source $${\mathhat {\xi}}_{0}(t):=E\{x_{0}(t)\mid{\cal Y}_{t}^{0}\}$$ be the Kalman state estimate of the uncontrolled system TeX Source $$\cases{dx_{0}=A(t)x_{0}(t)dt+B_{2}(t)dw\cr dy_{0}=C(t)x_{0}(t)dt+D(t)dw}.\eqno{\hbox{(26)}}$$ Here we use the notation ${\mathhat {\xi}}_{0}$ to distinguish it from ${\mathhat {x}}_{0}$, defined by (20), which a priori might depend on the control. Then the Kalman filter takes the form TeX Source $$d{\mathhat {\xi}}_{0}=A{\mathhat {\xi}}_{0}(t)dt+L(t)dv_{0},\;{\mathhat {\xi}}_{0}(0)=0$$ where the innovation process TeX Source $$dv_{0}=dy_{0}-C{\mathhat {\xi}}_{0}(t)dt,\; v_{0}(0)=0$$ generates the same filtration, $\{{\cal V}_{t}^{0}\}$, as $y_{0}$; i.e., ${\cal V}_{t}^{0}={\cal Y}_{t}^{0}$ for $t\in [0,T]$. This is well-known, but a simple proof is given in Section VI in a more general setting; see (64). Now, along the lines of (13), define TeX Source $${\mathhat {\xi}}(t)={\mathhat {\xi}}_{0}(t)+\int_{0}^{t}\Phi (t,s)B_{1}(s)u(s)ds,$$ where the control is chosen as TeX Source $$u(t)=\psi (t,{\mathhat {\xi}}(t)).\eqno{\hbox{(27)}}$$ Since $\psi$ is Lipschitz, ${\mathhat {\xi}}$ is the unique strong solution of the stochastic differential equation TeX Source $$d{\mathhat {\xi}}=\big (A{\mathhat {\xi}}+B_{1}\psi (t,{\mathhat {\xi}})\big)dt+Ldv_{0},\;{\mathhat {\xi}}(0)=0,\eqno{\hbox{(28)}}$$ and it is thus adapted to $\{{\cal V}_{t}^{0}\}$ and hence to $\{{\cal Y}_{t}^{0}\}$; see, e.g., [19, p. 120]. Hence the selection (27) of control law forces $u$ to be adapted to $\{{\cal Y}_{t}^{0}\}$, and hence, due to TeX Source $$dy=dy_{0}+\int_{0}^{t}C(t)\Phi (t,s)B_{1}(s)u(s)dsdt,\eqno{\hbox{(29)}}$$ obtained from (13), ${\cal Y}_{t}\subset{\cal Y}_{t}^{0}$ for $t\in [0,T]$. However, since the control-dependent terms cancel, TeX Source $$dv_{0}=dy_{0}-C{\mathhat {\xi}}_{0}(t)dt=dy-C{\mathhat {\xi}}(t)dt,$$ which inserted into (28) yields a stochastic differential equation, obeying the appropriate Lipschitz condition, driven by $dy$ and having ${\mathhat {\xi}}$ as a strong solution. Therefore, ${\mathhat {\xi}}$ is adapted to $\{{\cal Y}_{t}\}$, and hence, by (27), so is $u$. Consequently, (29) implies that ${\cal Y}_{t}^{0}\subset{\cal Y}_{t}$ for $t\in [0,T]$ so that actually (24) holds. Finally, this implies that ${\mathhat {\xi}}={\mathhat {x}}$, and thus $u$ is given by (25). However, it should be noted that the class of control laws (27) is a subclass of (25) as it has been constructed to make $u$ a priori adapted to $\{{\cal Y}_{t}^{0}\}$. Therefore, the relevance of these results, presented in [21], for the proof in [22, page 348] is unclear. In their popular textbook [20], widely used as a reference source for the validity of the separation principle over a general class of admissible (including nonlinear) controls, Kwakernaak and Sivan prove the separation principle over a class of linear laws but claim with reference to [21], [22] that it holds “without qualification” in general [20, p. 390]. (However, see Remark 6 below.)

In his pioneering paper [41] Wonham proved the separation theorem for controls in the class (25) even with a more general cost functional than (3). However, the proof is far from simple and marred by many technical assumptions. A case in point is the assumption that $C(t)$ is square and has a determinant bounded away from zero, which is a serious restriction. A later proof by Fleming and Rishel [15] is considerably simpler. They also prove the separation theorem with quadratic cost functional (3) for a class of Lipschitz continuous feedback laws, namely TeX Source $$u(t)=\phi (t,y),\eqno{\hbox{(30)}}$$ where $\phi:\, [0,T]\times C^{n}[0,T]\to{\BBR}^{m}$ is a nonanticipatory function of $y$ which is Lipschitz continuous in this argument.

#### Example 3

It is interesting to note that if there is a delay in the processing of the observed data so that, for each $t$, $u(t)$ is a function of $y(\tau)$; $0\leq\tau\leq t-\varepsilon$, then (24) holds. To see this, let $n$ be a positive integer, and suppose that ${\cal Y}_{t}={\cal Y}_{t}^{0}$ for $t\in [0,n\varepsilon]$. Since $u(t)$ is ${\cal Y}_{t-\varepsilon}$-measurable on $[0,(n+1)\varepsilon]$, it is at the same time ${\cal Y}_{t-\varepsilon}$ as well as ${\cal Y}_{t-\varepsilon}^{0}$-measurable. Then, since TeX Source $$y(t)=y_{0}(t)+\int_{0}^{t}HG(t,s)u(s)ds,$$ it follows that ${\cal Y}_{t}={\cal Y}_{t}^{0}$ for $t\in [0,(n+1)\varepsilon]$. Since ${\cal Y}_{t}={\cal Y}_{t}^{0}$ for $t\in [0,\varepsilon]$, (24) follows by induction.

#### Remark 4

Example 3 highlights the reason why the problem with possibly control-dependent sigma fields does not occur in the usual discrete-time formulation. Indeed, in this setting, the error covariance (18) will not depend on the control, while, as we have mentioned, some more analysis is needed to rule out that its continuous-time counterpart does. This invalidates a procedure used in several textbooks (see, e.g., [36]) in which the continuous-time $\Sigma$ is constructed as the limit of finite difference quotients of the discrete-time $\Sigma$, which, as we have seen in Example 3, does not depend on the control, and which simply is the solution of a discrete-time matrix Riccati equation. However, we cannot a priori conclude that continuous-time $\Sigma$ satisfies this Riccati equation. For this we need (24), or alternatively arguments such as in Remark 6. Otherwise the argument is circular.

#### Remark 5

Historically, a popular approach was introduced in Duncan and Varaiya [14] and Davis and Varaiya [13] (see also [6, Section 2.4]) based on weak solutions of the relevant stochastic differential equation. In their analysis the driving noise is a Wiener process. The key element of their approach is to start with an uncontrolled system and, through a change of probability measure, correspond its solutions to those of a new system with a suitably defined control input and noise process. This control input, together with the conformably altered input process, leaves the filtration of the observation process unaffected, thereby bypassing the central issue dealt with in the current paper. Briefly, starting from a Wiener process ${\mathtilde{w}}$ of an uncontrolled system with an output process $y$ and any process $u$ adapted to $\{{\cal Y}_{t}\}$, by a suitable change of probability measure (that depends on $u$), TeX Source $$dw=d{\mathtilde{w}}-B_{1}udt$$ can be transformed, using the Girsanov transformation, into a new Wiener process, which in the sense of weak solutions [19] is the same as any other Wiener process. Replacing $d{\mathtilde{w}}$ in the original uncontrolled system by $B_{1}udt+dw$ leaves the filtration $\{{\cal Y}_{t}\}$ unaffected.

#### Remark 6

Yet another approach to the separation principle is based on the fact that, although (1) with a nonlinear control is non-Gaussian, the model is conditionally Gaussian given the filtration $\{{\cal Y}_{t}\}$ [29, Chapters 16.1]. This fact can be used to show that ${\mathhat {x}}$ is actually generated by a Kalman filter [29, Chapters 11 and 12]. This last approach requires quite a sophisticated analysis and is restricted to the case where the driving noise $w$ is a Wiener process.

A key point for establishing the separation priniciple is to identify admissible control laws for which (24) holds. For each such control law $\pi$ we need a solution of the feedback (10), i.e., a pair $(z_{0},z)$ of stochastic processes that satisfies TeX Source $$z=z_{0}+g\pi H z.\eqno{\hbox{(31)}}$$ Since $z_{0}$ is the driving process, it is natural to seek a solution $z$ which causally depends on $z_{0}$ and is unique. If this is the case then $z$ is a strong solution; otherwise it is a weak solution. There are well-known examples of stochastic differential equations that have only weak solutions [19, page 137], [5], [37]. Moreover, as we have mentioned in Remark 5, weak solutions circumvent the need to establish the equivalence (24) between filtrations. Thus, it has been suggested that the framework of weak solutions is the appropriate one for control problems [34, page 149]. Yet, from an applications point of view, where the control needs to be causally dependent on observed data, this is in our view questionable. In fact, there are control laws for which (31) only admits a weak solution and (24) does not hold (Remark 12). In the present paper we take an even more stringent view on the causal dependence. We require that (31) has a unique strong solution which in addition specifies a measurable map $z_{0}\to z$ between sample-paths for every sample path of $z_{0}$ (cf. [19, Remark 5.2, p. 128], [34, p. 122]), thus modeling correspondence between signals—we further elaborate upon this in Section IV.

In short, we only allow control laws which are physically realizable in an engineering sense, in that they induce a signal that travels through the feedback loop. This comes at a price since there are stochastic differential equations having strong solutions that do not fall in this category (Remark 12). Moreover, verifying that a control law is admissible in our sense may be difficult to ascertain in general. On the other hand, an advantage of the approach is that the class of control laws includes discontinuous ones and allows for statements about linear systems driven by non-Gaussian noise with possible jumps. We now proceed to develop the approach and the key property of deterministic well-posedness.

SECTION III

## SIGNALS AND SYSTEMS

Signals are thought of as sample paths of a stochastic process with possible discontinuities. This is quite natural from several points of view. First, it encompasses the response of a typical nonlinear operation that involves thresholding and switching, and second, it includes sample paths of counting processes and other martingales. More specifically we consider signals to belong to the Skorohod space $D$; this is defined as the space of functions which are continuous on the right and have a left limit at all points, i.e., the space of càdlàg functions.2 It contains the space $C$ of continuous functions as a proper subspace. The notation $D[0,T]$ or $C[0,T]$ emphasizes the time interval where signals are being considered.

Traditionally, the comparison of two continuous functions in the uniform topology relates to how much their graphs need to be perturbed so as to be carried onto one another by changing only the ordinates, with the time-abscissa being kept fixed. However, in order to metrize $D$ in a natural manner one must recognize the effect of uncertainty in measuring time and allow a respective deformation of the time axis as well. To this end, let ${\cal K}$ denote the class of strictly increasing, continuous mappings of $[0,T]$ onto itself and let $I$ denote the identity map. Then, for $x$, $y\in D[0,T]$, TeX Source $$d(x,y):=\inf_{\kappa\in{\cal K}}\max\{\Vert\kappa-I\Vert,\Vert x-y\kappa\Vert\}$$ defines a metric on $D[0,T]$ which induces the so called Skorohod topology. A further refinement so as to ensure bounds on the slopes of the chords of $\kappa$, renders $D[0,T]$ separable and complete, that is, $D[0,T]$ is a Polish space; see [7, Theorem 12.2].

Systems are thought of as general measurable nonanticipatory maps from $D\to D$ sending sample paths to sample paths so that their outputs at any given time $t$ is a measurable function of past values of the input and of time. More precisely, let TeX Source $$\Pi_{\tau}: x\mapsto\Pi_{\tau}x:=\cases{x(t)& for t<\tau\cr x(\tau)& for t\geq\tau.}$$ Then, a measurable map $f:\, D[0,T]\to D[0,T]$ is said to be a system if and only if TeX Source $$\Pi_{\tau}f\;\Pi_{\tau}=\Pi_{\tau}f\quad{\rm for all} \tau\in [0,T].$$ An important class of systems is provided by stochastic differential equations with Lipschitz coefficients driven by a Wiener process [34, Theorem 13.1]. These have pathwise unique strong solutions and induce maps between corresponding path spaces [34, page 127], [19, pages 126–128]. Also, under fairly general conditions (see e.g., [33, Chapter V]), stochastic differential equations driven by martingales with sample paths in $D$ have strong solutions who are semi-martingales.

Besides stochastic differential equations in general, and those in (8) in particular, other nonlinear maps may serve as systems. For instance, discontinuous hystereses nonlinearities as well as non-Lipschitz static maps such as $u\mapsto y:=\sqrt{\vert u\vert}$, are reasonable as systems, from an engineering viewpoint. Indeed, these induce maps from $D\to D$ (or from $C\to D$, as in the case of relay hysteresis), are seen to be systems according to our definition,3 and can be considered as components of nonlinear feedback laws. We note that a nonlinearity such as $u\mapsto y={\rm sign}(u)$ is not a system in the sense of our definition since the output is not in general in $D$. Such nonlinearities, which often appear in bang-bang control, need to be approximated with a physically realizable hysteretic system.

SECTION IV

## WELL-POSEDNESS AND A KEY LEMMA

It is straightforward to construct examples of deterministically well-posed feedback interconnections with elements as above. However, the situation is a bit more delicate when considering feedback loops since it is also perfectly possible that, at least mathematically, they give rise to unrealistic behavior. A standard example is that of a feedback loop with causal components that “implements” a perfect predictor. Indeed, consider a system $f$ which superimposes its input with a delayed version of it, i.e., TeX Source $$f: z(t)\mapsto z(t)+z(t-t_{\rm delay}),$$ for $t\geq 0$, and assume initial conditions $z(t)=0$ for $t<0$. Then the feedback interconnection of Fig. 3 is unrealistic as it behaves as a perfect predictor. The feedback equation TeX Source $$z(t)=z_{0}(t)+f(z(t))=z_{0}(t)+z(t)+z(t-t_{\rm delay})$$ gives rise to $0=z_{0}(t)+z(t-t_{\rm delay})$, and hence, TeX Source $$z(t)=-z_{0}(t+t_{\rm delay}).$$ Therefore, the output process $z$ is not causally dependent on the input. The question of well-posedness of feedback systems has been studied from different angles for over forty years. See for instance the monograph by Jan Willems [39].

Fig. 3. Basic feedback system.

In our present setting of stochastic control we need a concept of well-posedness which ensures that signals inside a feedback loop are causally dependent on external inputs. This is a natural assumption from a systems point of view.

#### Definition 7

A feedback system is deterministically well-posed if the closed-loop maps are themselves systems; i.e., the feedback equation $z=z_{0}+f(z)$ has a unique solution $z\in D$ for all inputs $z_{0}\in D$ and the operator $(1-f)^{-1}$ is itself a system.

Thus, now thinking about $z_{0}$ and $z$ in the feedback system in Fig. 3 as stochastic processes, deterministic well-posedness implies that ${\cal Z}_{t}\subset{\cal Z}^{0}_{t}$ for $t\in [0,T]$, where ${\cal Z}_{t}$ and ${\cal Z}^{0}_{t}$ are the sigma-fields generated by $z$ and $z_{0}$, respectively. This is a consequence of the fact that $(1-f)^{-1}$ is a system. Likewise, since $(1-f)$ is also a system, ${\cal Z}_{t}^{0}\subset{\cal Z}_{t}$ so that in fact TeX Source $${\cal Z}_{t}^{0}={\cal Z}_{t},\quad t\in [0,T].\eqno{\hbox{(32)}}$$

Next, we consider the situation in Fig. 1 and the relation between ${\cal Y}_{t}$ and the filtration ${\cal Y}^{0}_{t}$ of the process $y_{0}=Hz_{0}$. The latter represents the “uncontrolled” output process where the control law $\pi$ is taken to be identically zero. A key technical lemma for what follows states that the filtrations ${\cal Y}_{t}$ and ${\cal Y}^{0}_{t}$ are also identical if the feedback system is deterministically well-posed. This is not obvious at first sight, solely on the basis of the linear relationships $y=Hz$ and $y_{0}=Hz_{0}$, as the following simple example demonstrates: the two vector processes ${w\choose 0}$ and ${0\choose w}$ generate the same filtrations while $(1\; 0){w\choose 0}$ and $(1\; 0){0\choose w}$ do not.

#### Lemma 8

If the feedback interconnection in Fig. 1 is deterministically well-posed, $g\pi$ is a system, and $H$ is a linear system having a right inverse $H^{-R}$ that is also a system, then $(1-Hg\pi)^{-1}$ is a system and ${\cal Y}_{t}={\cal Y}^{0}_{t}$, $t\in [0,T]$.

#### Remark 9

Note that, for the prototype problem involving (1), the conditions on $H$ in Lemma 8 are trivial as $H=[0,\, I]$ and hence $H^{-R}:=H^{\prime}$ is a right inverse. The requirement in the lemma that $g\pi$ is a system allows for a more general situation where $\pi$ is not itself a system (e.g., generating outputs not in D), but where the cascade connection is still admissible.

##### Proof

By well-posedness $(1-g\pi H)^{-1}$ is a system. To show that $(1-Hg\pi)^{-1}$ exists and is a system, first note that TeX Source $$(1-Hg\pi)H=H-Hg\pi H=H (1-g\pi H).\eqno{\hbox{(33)}}$$ The first step is using left distributivity and the second is using the fact that $H$ is linear. But then TeX Source $$(1-Hg\pi)\underbrace{H(1-g\pi H)^{-1}H^{-R}}_{h}=I,\eqno{\hbox{(34)}}$$ where $HH^{-R}=I$. Thus, $h$ is a “right inverse” of $p:=(1-Hg\pi)$ in that the composition $p\circ h$ of the two maps is the identity. We claim that $h$ is in fact the inverse of $p$ (which is necessarily unique) in that $y=h(y_{0})$ and TeX Source $$(1-Hg\pi)y=y_{0}\eqno{\hbox{(35)}}$$ establish a bijective correspondence between $y$ and $y_{0}$, i.e., that both $p\circ h$ as well as $h\circ p$ are identity maps. We need to show the latter. The only potential problem would be if two distinct values $y$ and ${\mathhat y}$ satisfy (35) for the same value for $y_{0}$. We now show that this is not possible.

Since $H$ is right invertible, $y_{0}$ can be written in the form $y_{0}=Hz_{0}$ for $z_{0}=H^{- R}y_{0}$. Let $z=(1-g\pi H)^{-1}z_{0}$ and $y=Hz$. Then $y=h(y_{0})$, so by (34) $y$ is a particular solution of (35). Now let ${\mathhat y}$ be another solution, i.e., suppose that TeX Source $$(1-Hg\pi){\mathhat y}=y_{0}\eqno{\hbox{(36)}}$$ and that ${\mathhat y}\ne y$. We begin by writing ${\mathhat y}$ in the form ${\mathhat y}=H{\mathhat z}$, which can always be done since $H$ is right invertible. Next we set ${\mathhat z}_{0}:=(1-g\pi H){\mathhat z}$. Then, by well-posedness, ${\mathhat {z}}$ is the unique solution of TeX Source $${\mathhat z}={\mathhat z}_{0}+g\pi H({\mathhat z}).\eqno{\hbox{(37)}}$$ Moreover, by (33) and (36), $H{\mathhat z}_{0}=y_{0}$, and consequently ${\mathhat z}_{0}=z_{0}+v$ with $Hv=0$. We now claim that ${\mathhat z}=z+v$ which would then contradict the assumption that ${\mathhat y}\ne y$. To show this, note that, since $z=z_{0}+g\pi H z$ and $H$ is linear, TeX Source $$z+v=z_{0}+v+g\pi H(z+v).$$ But the solution to (37) is unique by well-posedness. Hence, ${\mathhat z}=z+v$ which proves our claim.

Therefore, finally, $(1-Hg\pi)$ is invertible and TeX Source $$(1-Hg\pi)^{-1}=h=H(1-g\pi H)^{-1}H^{-R}$$ is itself is a system, being a composition of systems. Thus, the configuration in Fig. 4 is deterministically well-posed. Using (33) once again, TeX Source $$H(1-g\pi H)^{-1}=(1-Hg\pi)^{-1}H.\eqno{\hbox{(38)}}$$ It now follows that TeX Source \eqalignno{y=&\, H(1-g\pi H)^{-1}z_{0}=(1-Hg\pi)^{-1}Hz_{0}\cr=&\,(1-Hg\pi)^{-1}y_{0},&{\hbox{(39)}}} while also (35) holds. Equation (39) shows that ${\cal Y}_{t}\subset{\cal Y}^{0}_{t}$, whereas (35) shows that ${\cal Y}^{0}_{t}\subset{\cal Y}_{t}$. $\blackboxfill$

Fig. 4. An equivalent feedback configuration.

The essence of the lemma4 is to underscore the equivalence between the configuration in Fig. 1 and that in Fig. 4. It is this equivalence which accounts for the identity ${\cal Y}_{t}={\cal Y}^{0}_{t}$ between the respective $\sigma$-algebras. An analogous notion of well-posedness was considered by Willems in [40] where however, in contrast, the well-posedness of the feedback configuration in Fig. 4, and consequently the validity of ${\cal Y}_{t}={\cal Y}^{0}_{t}$, is assumed at the outset.

In the present paper we consider only feedback laws that render the feedback system deterministically well-posed. Therefore we highlight the conditions in a formal definition.

#### Definition 10

A feedback law $\pi$ is deterministically well-posed for the system (8) if $g\pi$ is a system and the feedback loop of Fig. 1 is deterministically well-posed.

If the feedback law $\pi$ is deterministically well-posed, then, by Lemma 8, the feedback loop in Fig. 4 is also deterministically well-posed. Thus, in essence, given the assumption that $z=z_{0}+g\pi H z$ can be uniquely and causally solved for every input sample path, so can $y=y_{0}+Hg\pi y$.

#### Remark 11

For pedagogical reasons, we consider the case of complete state information, corresponding to (14). This corresponds to taking $H=I$ and $z=x$, and the basic feedback loop is as depicted in Fig. 5. Then the basic condition (32) implied by well-posedness states that the filtration $\{{\cal X}_{t}\}$, where ${\cal X}_{t}:=\sigma\{x(s);\; s\in [0,T]\}$, is constant under variations of the control. Consequently, we do not need Lemma 8 to resolve an issue of circular control dependence. This is completely consistent with the analysis leading up to (15) in Section II.

Fig. 5. Feedback loop for complete state information.

#### Remark 12

We now present two examples of feedback systems which fail to be deterministically well-posed. Consider the system TeX Source $$\cases{dx=udt+dw\cr y=x}$$ where $w$ is a Wiener process, i.e., $w=x_{0}$ in Fig. 5. First take the control law $\pi$ to be the Tsirel'son functional $u(t)=b(t,x)$ in [34, p. 156]. Then the solution of the feedback equation can only be defined in the weak sense and, remarkably, ${\cal Y}_{t}^{(0)}$ is strictly contained in ${\cal Y}_{t}$ for $t>0$ (see, e.g., [34, Theorem (18.3)]). For a different example5, take the control law $u=\pi (y)$ with $\pi (y)=\max\{\vert x\vert^{2/3},1\}$. This is not deterministically well-posed although the stochastic differential equation TeX Source $$dx=\pi (x)dt+dw$$ has a unique strong solution [18, Chapter 5, Proposition 5.17] in the sense that any other solution has same sample paths with probability one (indistinguishable). The failure to be deterministically well-posed can be traced to the fact that this control law allows for multiple consistent responses for $w\equiv 0$, a physically questionable situation. Indeed, the ordinary differential equation ${\mathdot x}=\pi (x)$ is not Lipschitz and has infinitely many solutions.

SECTION V

## SEPARATION PRINCIPLE

Our first result is a very general separation theorem for the classical stochastic control problem stated at the beginning of Section I.

#### Theorem 13

Given the system (1), consider the problem of minimizing the functional (3) over the class of all feedback laws $\pi$ that are deterministically well-posed for (1). Then the unique optimal control law is given by (4), where $K$ is defined by (11), and ${\mathhat {x}}$ is given by the Kalman filter (5).

##### Proof

By Lemma 8, (18) does not depend on the control. Therefore, given the analysis at the beginning of Section II, (4) is the unique optimal control provided it defines a deterministically well-posed control law. It remains to show this.

Inserting (4) into (5) yields TeX Source $${\mathhat {x}}(t)=\int_{0}^{t}\Psi (t,s)L(s)dy(s),$$ where the transition matrix $\Psi (t,s)$ of $[A(t)+B_{1}(t)K(t)-L(t)C(t)]$ has partial derivatives in both arguments. Together with (4) this yields TeX Source $$u(t)=(\pi_{\rm opt}y)(t):=\int_{0}^{t}M(t,s)dy(s),\eqno{\hbox{(40)}}$$ where $M(t,s):=K(t)\Psi (t,s)L(s)$. Clearly $s\mapsto M(t,s)$ has bounded variation for each $t\in [0,T]$, and therefore integration by parts yields TeX Source $$(\pi_{\rm opt}y)(t)=M(t,t)y(t)-\int_{0}^{t}d_{s}M(t,s)y(s)ds,\eqno{\hbox{(41)}}$$ which is defined samplewise. Now inserting $u=\pi_{\rm opt}Hz$ into (9) and (10) we obtain TeX Source $$z=z_{0}+g\pi_{\rm opt}Hz,\eqno{\hbox{(42)}}$$ where $g\pi_{\rm opt}Hz$ takes the form TeX Source $$(g\pi_{\rm opt}Hz)(t)=\int_{0}^{t}N(t,s)dz(s)$$ with the kernel $N$ given by TeX Source $$N(t,s)=\int_{s}^{t}G(t,\tau)M(\tau,s)Hd\tau,$$ where $G$ is the kernel of the Volterra operator (9). A simple calulation yields TeX Source $${{\partial G}\over{\partial s}}(t,s)=\left[\matrix{A(t)\cr C(t)}\right]\Phi (t,s)B_{1}(s),$$ where $\Phi (t,s)$ is the transition matrix of $A$, and therefore $Q(t,s):=(\partial N/\partial s)(t,s)$ is a continuous Volterra kernel, and so is the unique solution $R$ of the resolvent equation TeX Source $$R(t,s)=\int_{s}^{t}R(t,\tau)Q(\tau,s)d\tau+Q(t,s)\eqno{\hbox{(43)}}$$ [35], [42]. From (42) we have TeX Source $$dz=dz_{0}+\int_{0}^{t}Q(t,s)dz(s)dt$$ from which it follows that TeX Source $$\int_{0}^{t}Q(t,s)dz(s)=\int_{0}^{t}R(t,s)dz_{0}(s).$$ Hence $(1-g\pi_{\rm opt}H)$ has a unique preimage given by TeX Source $$[(1-g\pi_{\rm opt}H)^{-1}z](t)=z_{0}(t)+\int_{0}^{t}\int_{\tau}^{t}R(t,s)dsdz_{0}(\tau),$$ which is clearly a system. Hence the feedback loop is deterministically well-posed. $\blackboxfill$

Consequently, for a system driven by a Wiener process with Gaussian initial condition, the linear control law defined by (4) and (5) is optimal in the class of all linear and nonlinear control laws for which the feedback system is deterministically well-posed.

If we forsake the requirement that ${\mathhat {x}}$ is given by the Kalman filter (5), we can now allow $x_{0}$ to be non-Gaussian and $w$ to be a square-integrable martingale, even allowing jumps.

#### Theorem 14

Given the system (1), where $w$ is a square-integrable martingale and $x(0)$ is an arbitrary zero mean random vector independent of $w$, consider the problem of minimizing the functional (3) over the class of all feedback laws $\pi$ that are deterministically well-posed for (1). Then, provided it is deterministically well-posed, the unique optimal control law is given by (4), where $K$ is defined by (11) and ${\mathhat {x}}$ is the conditional mean (6).

##### Proof

Given Lemma 8, we can use the same completion-of-squares argument as in Section II except that we now need to use Ito's differential rule for martingales (see, e.g., [19], [33]), which, in integrated form, becomes TeX Source $$\displaylines{x(T)^{\prime}P(T)x(T)-x(0)^{\prime}P(0)x(0)=f_{\Delta}\hfill\cr\hfill+\int_{0}^{T}\{x(t)^{\prime}{\mathdot{P}}x(t)dt+2x(t_{-})^{\prime}Pdx+tr \left (Pd[x,x^{\prime}]\right)\},\quad{\hbox{(44)}}}$$ where $[x,x^{\prime}]$ is the quadratic variation of $x$ and $f_{\Delta}$ is an extra term which is in general nontrivial when $w$ has a jump component. Now let TeX Source $$q(t):=\int_{0}^{t}\Phi (t,s)\big (A(s)x(s)+B_{1}(s)u(s)\big) ds,$$ where $\Phi$ is the transition function of (1) which is differentiable in both arguments. Then, $x=q+v$, where $dv=B_{2}dw$ and $q$ is a continuous process with bounded variation. Therefore TeX Source $$[x,x^{\prime}]=[q,q^{\prime}]+2[q,v^{\prime}]+[v,v^{\prime}]=[v,v^{\prime}].$$ In fact, $[q,q^{\prime}]=[q,v^{\prime}]=0$ [19, Corollary 8.5]. Since $v$ does not depend on the control $u$, neither does the last term in the integral in (44). If $w$ has a jump component, we have a nontrivial extra term in (44), namely TeX Source $$\displaylines{f_{\Delta}=\sum_{s\leq T}\big [x(s)^{\prime}P(s)x(s)-x(s_{-})^{\prime}P(s)x(s_{-})\hfill\cr\hfill-2x(s_{-})^{\prime}P(s)\Delta_{s}-\Delta_{s}^{\prime} P(s)\Delta_{s}\big]}$$ where the sum is over all jump times $s$ on the interval $[0,T]$ and $\Delta_{s}:=x(s)-x(s_{-})$ is the jump, and we need to ensure that this term does not depend on the control either. However, since $x(s)=x(s_{-})+\Delta_{s}$, we have $f_{\Delta}=0$.

Then the rest of the proof that (4) with ${\mathhat {x}}$ given by (6) is the unique minimizer of (3) over all deterministically well-posed control laws follows from an argument as in Section II. More precisely, using (11) and completing the squares we obtain TeX Source \eqalignno{&\int_{0}^{T}(x^{\prime}Qxdt+u^{\prime}Rudt)dt+x(T)^{\prime}Sx(T)\cr&\quad=x(0)^{\prime}P(0)x(0)+\int_{0}^{T}(u-Kx)^{\prime}R(u-Kx)dt\cr&\quad+\int_{0}^{T}{\rm tr}\left(Pd[v,v^{\prime}]\right)+\int_{0}^{T}x(t_{-})^{\prime}PB_{2}dw.&{\hbox{(45)}}} Next we claim that $E\left\{\int_{0}^{T}x(t_{-})^{\prime}P(t)B_{2}(t)dw\right\}$ exists and equals zero. To see this note that the integrand is nonanticipatory [34, p. 122]. It also has finite variance, since $w$ is a square-integrable martingale and $u$ needs to be square-integrable for the cost to be finite. Therefore the integrand satisfies the condition [19, eqn. (8.8)], and hence $\int_{0}^{T}x(t_{-})^{\prime}P(t)B_{2}(t)dw$ is a martingale as well and thus has zero mean. Consequently, the only control dependent term in (45) is the term appearing in (17). By Lemma 8, the estimation error ${\mathtilde{x}}$ does not depend on the control. Hence the statement of the theorem follows. $\blackboxfill$

We note that in general the optimal control law does not belong to ${\cal L}$ and that ${\mathhat {x}}$ is not given by the Kalman filter (5) but by the conditional mean (6), which then has to be chosen with some care since it is only defined almost surely as projection for each individual time $t$. To this end it is standard to select the optional projection of $x(t)$ on ${\cal Y}_{t}$ which is a stochastic process with a càdlàg version [2, page 17]. Often ${\mathhat x}$ is given by a nonlinear filter as in the following example. However, even in those cases, it is difficult to ascertain well-posedness. At present, we are unable to establish that the control law in the example is deterministically well-posed and hence optimal in our admissible class of controls. We expect that Theorem 14 can be strengthened by removing the a priori assumption of well-posedness for cases where the optimal filter can be expressed as a stochastic differential equation with suitably well-conditioned coefficients. Such a strengthening is needed to prove optimality for the following example where we are currently unable to establish well-posedness.

#### Example 15

Fig. 6. Model for step change in white noise.

Consider the system in Fig. 6. Here, $x$ represents a parameter which undergoes a sudden random step change due to a random external forcing $v$. The step can be in either direction. Thus, as a stochastic process $v(t)$ is defined TeX Source $$v(t)=\cases{\theta &t\geq\tau\cr 0&t<\tau}\eqno{\hbox{(46)}}$$ where $\theta=\pm 1$ with equal probability and $\tau$ is a random variable uniformly distributed on $[0,\,T]$. Clearly $v$ is a martingale. Our goal is to maintain a value for the state $x$ close to zero on the interval $[0,T]$ via integral control action through $u$, indirectly, by demanding that TeX Source $$E\left\{\int_{0}^{T}(x^{2}+R u^{2}) dt\right\}$$ be minimal with $R>0$. Here, $u$ denotes the control. The process $x$ is observed in additive white noise ${\mathdot w}$. The system is now written in the standard form (1) as follows: TeX Source $$\cases{dx=u dt+dv,\; x(0)=0,\cr dy=x dt+\sigma dw}\eqno{\hbox{(47)}}$$ where $w$ is a Wiener process. We solve the Riccati equation ${\mathdot k}=-k^{2}+R^{-1}$ with boundary condition $k(T)=0$ to obtain $k(t)=-R^{-1/2}\tanh\left(R^{-1/2}(T-t)\right)$. The control law in Theorem 14 is TeX Source $$u(t)=k(t){\mathhat x}(t),\eqno{\hbox{(48a)}}$$ where the conditional expectation is determined separately using a (nonlinear) Wonham-Shiryaev filter TeX Source $$\cases{d{\mathhat x}=k{\mathhat x}dt+{{1}\over{\sigma^{2}}}(1-\rho^{2}-2(T-t)\phi)(dy-{\mathhat x}dt)\cr d\rho={{1}\over{\sigma^{2}}}(1-\rho^{2}-2(T-t)\phi)(dy-{\mathhat x}dt)\cr d\phi=-{{1}\over{\sigma^{2}}}\phi\rho (t) (dy-{\mathhat x}dt)}\eqno{\hbox{(48b)}}$$ with $\rho (0)=0$ and $\phi (0)=1$. Following [16, page 222] we explain the steps for deriving the filter equations in Appendix VIII.

In order to conclude that the control law (48) is actually optimal we need to establish that the feedback loop is deterministically well-posed. This requires that (10) has a unique solution for each $z_{0}=\left(\matrix{v& w}\right)^{\prime}$. Noting that the innovation $dy-{\mathhat x}dt$ can be expressed as TeX Source $$dy-{\mathhat x}\, dt=(v-\rho)dt+dw,$$ this requires that the stochastic differential equations (47)(48) can be uniquely solved pathwise as a map from $z_{0}=\left(\matrix{v& w}\right)^{\prime}$ to $z=\left(\matrix{x& y}\right)^{\prime}$. There are conditions in the literature for when such maps between path spaces exist (see [34, page 126, Theorem 10.4], [19, page 128], and the references therein). However, we are not able at present to verify that these hold in our case.

In view of Remark 11 we immediately have the following corollary to Theorem 14 for the case of complete state information. A similar statement was given in [27] in a different context.

#### Corollary 16

Given the system (14), where $w$ is a square-integrable martingale and $x(0)$ is an arbitrary random vector independent of $w$, consider the problem of minimizing the functional (3) over the class of all feedback laws $\pi$ that are deterministically well-posed for (14). Then the unique optimal control law is given by (15), where $K$ is defined by (11).

##### Proof

It just remains to prove that the control law (15) is deterministically well-posed. To this end, we first note that (with $z=x$) the feedback (10) becomes TeX Source $$x(t)=x_{0}(t)+\int_{0}^{t}Q(t,s)x(s)ds,$$ where $Q(t,s)=\Phi (t,s)B_{1}(s)K(s)$ with $\Phi$ (as before) being the transition matrix function of $A$. Then a straight-forward calculation shows that TeX Source $$x(t)=x_{0}(t)+\int_{0}^{t}R(t,s)x_{0}(s)ds,$$ where $R$ is the unique solution of the resolvent (43). This establishes well-posedness. $\blackboxfill$

#### Example 17

Let the driving noise $w$ in (14) be given by either a Poisson martingale [19, page 87], or a geometric Brownian motion [19, page 124] TeX Source $$dw=\mu w(t)dt+\sigma w(t)dv,$$ where $v$ is a Wiener process, or a combination. Then the control law $u(t)=K(t)x(t)$ is optimal for the problem to minimize (3).

SECTION VI

## SEPARATION PRINCIPLE FOR DELAY-DIFFERENTIAL SYSTEMS

The formulation (8) covers more general stochastic systems than the ones considered above. An example is a delay-differential system of the type TeX Source $$\cases{dx=A_{1}(t)x(t)dt+A_{2}(t)x(t-h)dt\cr+\int_{t-h}^{t}A_{0}(t,s)x(s)dsdt+B_{1}(t)u(t)dt+B_{2}(t)dw\cr dy=C_{1}(t)x(t)dt+C_{2}(t)x(t-h)dt+D(t)dw}.$$ Apparently, stochastic control for various versions of such systems were first studied in [23], [24], [25], [26], [27], and [9], although [9] relies on the strong assumption that the observation $y$ is “functionally independent” of the control $u$, thus avoiding the key question studied in the present paper.

Here, as in [26], we shall consider the wider class of stochastic systems TeX Source $$\cases{dx=\left(\int_{t-h}^{t}d_{s}A(t,s)x(s)\right) dt\cr\qquad\qquad+B_{1}(t)u(t)dt+B_{2}(t)dw\cr dy=\left(\int_{t-h}^{t}d_{s}C(t,s)x(s)\right) dt+D(t)dw}\eqno{\hbox{(49)}}$$ where $A$ and $C$ are of bounded variation in the first argument and continuous on the right in the second, $x(t)=\xi (t)$ is deterministic (for simplicity) for $-h\leq t\leq 0$, and $y(0)=0$. More precisely, $A(t,s)=0$ for $s\geq t$, $A(t,s)=A(t,t-h)$ for $t\leq t-h$, and the total variation of $s\mapsto A(t,s)$ is bounded by an integrable function in the variable $t$, and the same holds for $C$. Moreover, to avoid technicalities we assume that $w$ is now a (square-integrable) Gaussian (vector) martingale. Now, the first of equations (49) can be written in the form TeX Source $$\displaylines{x(t)=\Phi (t,0)\xi (0)+\int_{-h}^{0}d_{\tau}\left\{\int_{0}^{t}\Phi (t,s)A(s,\tau)ds\right\}\xi(\tau)\hfill\cr\hfill+\int_{0}^{t}\Phi(t,s)B_{1}(s)u(s)ds+\int_{0}^{t}\Phi(t,s)B_{2}(s)dw\quad{\hbox{(50)}}}$$ [26, p. 85], where $\Phi$ is the Green's function corresponding to the deterministic system [3] (also see, e.g., [26, p. 101]). In the same way, we can express the second equation in integrated form. Consequently, (49) can be written in the form (8), where $K$ and $H$ are computed as in [26, pp. 101–103]. The problem is to find a feedback law (2) that minimizes TeX Source $$J(u):=E\{V_{0}(x,u)\}\eqno{\hbox{(51)}}$$ subject to the constraint (49), where TeX Source $$V_{s}(x,u):=\left\{\int_{s}^{T}x^{\prime}Qx\,d\alpha (t)+\int_{s}^{T}u^{\prime}Ru\,dt\right\}\eqno{\hbox{(52)}}$$ and $d\alpha$ is a positive Stieltjes measure.

Lemma 8 enables us to strengthen the results in [26]. To this end, to avoid technicalities, we shall appeal to a representation result from [27] rather than using a completion-of-squares argument, although the latter strategy would lead to a stronger result where $w$ could be an arbitrary (square-integrable) martingale. A completion-of-squares argument for a considerably simpler problem was given in [8], but, as pointed out in [28], this paper suffers from a similar mistake as the one pointed out earlier in the present paper. In this context, we also mention the recent paper [4], which considers optimal control of a stochastic system with delay in the control. This paper assumes at the outset that the separation principle for delay systems is valid with a reference to [20]. Instead of basing the argument on [20], which is not quite appropriate here, their claim could be justified by noting that the delay in the control also implies a delay in information as in Example 3 above.

Now, it can be shown that the corresponding deterministic control problem obtained by setting $w=0$ has an optimal linear feedback control law TeX Source $$u(t)=\int_{t-h}^{t}d_{\tau}K(t,\tau)x(\tau),\eqno{\hbox{(53)}}$$ where we refer the reader to [26] for the computation of $K$. The following theorem is a considerable strengthening of the corresponding result in [26].

#### Theorem 18

Given the system (49), where $w$ is a Gaussian martingale, consider the problem of minimizing the functional (51) over the class of all feedback laws $\pi$ that are deterministically well-posed for (1). Then the unique optimal control law is given by TeX Source $$u(t)=\int_{t-h}^{t}d_{s}K(t,s){\mathhat {x}}(s\vert t),\eqno{\hbox{(54)}}$$ where $K$ is the deterministic control gain (53) and TeX Source $${\mathhat {x}}(s\vert t):=E\{x(s)\mid{\cal Y}_{t}\}\eqno{\hbox{(55)}}$$ is given by a linear (distributed) filter TeX Source \eqalignno{d{\mathhat {x}}(t\vert t)=&\,\int_{t-h}^{t}d_{s}A(t,s){\mathhat {x}}(s\vert t)dt\cr&+B_{1}udt+X(t,t)dv&{\hbox{(56a)}}\cr d_{t}{\mathhat {x}}(s\vert t)=&\,X(s,t)dv,\; s\leq t &{\hbox{(56b)}}} where $v$ is the innovation process TeX Source $$dv=dy-\int_{t-h}^{t}d_{s}C(t,s){\mathhat {x}}(s\vert t)dt,\quad v(0)=0,\eqno{\hbox{(57)}}$$ and the gain $X$ is as defined in [26, p.120].

For the proof of Theorem 18 we shall need two lemmas. The first is a slight reformulation of Lemma 4.1 in [27] and only requires that $v$ be a martingale.

#### Lemma 19 ([27])

Let $v$ be a square-integrable martingale with natural filtration TeX Source $${\cal V}_{t}=\sigma\{v(s), s\in [0,t]\},\quad 0\leq t\leq T\eqno{\hbox{(58)}}$$ satisfying $[v_{j},v_{k}]=\beta_{j}\delta_{jk}$, where $\beta_{k}$, $k=1,2,\ldots,p$, are nondecreasing functions, and $\delta_{jk}$ is the Kronecker delta equal to one for $j=k$ and zero otherwise. With $u$ a square-integrable control process adapted to $\{{\cal V}_{t}\}$, let TeX Source $$u(t)=\bar{u}(t)+\sum_{k=1}^{p}\int_{0}^{t}u_{k}(t,s)dv_{k}(s)+{\mathtilde{u}}(t)\eqno{\hbox{(59)}}$$ be the unique orthogonal decomposition for which $\bar{u}$ is deterministic and, for each $t\in [0,T]$, ${\mathtilde{u}}$ is orthogonal to the linear span of the components of $\{v(s), s\in E[0,t]\}$. Moreover, let $x_{0}$ be a square-integrable process adapted to $\{{\cal V}_{t}\}$ and having a corresponding orthogonal decomposition TeX Source $$x_{0}(t)=\bar{x}_{0}(t)+\sum_{k=1}^{p}\int_{0}^{t}x_{k}^{0}(t,s)dv_{k}(s)+{\mathtilde{x}}_{0}(t).\eqno{\hbox{(60)}}$$ Then $x=x_{0}+g(u)$, defined by (8) exchanging $z$ for $x$, has the orthogonal decomposition TeX Source $$x(t)=\bar{x}(t)+\sum_{k=1}^{p}\int_{0}^{t}x_{k}(t,s)dv_{k}(s)+{\mathtilde{x}}(t),\eqno{\hbox{(61)}}$$ where TeX Source $$\cases{\bar{x}(t)=\bar{x}_{0}(t)+\int_{0}^{t}G(t,\tau)\bar{u}(\tau)d\tau\cr x_{k}(t,s)=x_{k}^{0}(t,s)+\int_{s}^{t}G(t,\tau)u_{k}(\tau,s)d\tau\cr{\mathtilde{x}}(t)={\mathtilde{x}}_{0}(t)+\int_{0}^{t}G(t,\tau){\mathtilde{u}}(\tau)d\tau}\eqno{\hbox{(62)}}$$ and TeX Source $$\displaylines{E\{V_{0}(x,u)\}=\sum_{k=1}^{p}\int_{0}^{T}V_{s}(x_{k}(\cdot,s),u_{k}(\cdot,s))d\beta_{k}\hfill\cr\hfill+E\{V_{0}(\bar{x},\bar{u})\}+E\{V_{0}({\mathtilde{x}},{\mathtilde{u}}\}).\quad{\hbox{(63)}}}$$

For a proof of this lemma, we refer the reader to [27].

#### Lemma 20

Let $y$ be the output process of the closed-loop system obtained after applying a deterministically well-posed feedback law $u=\pi (y)$ to the system (49). Then the innovation process (57) is a Gaussian martingale, and the corresponding filtration (58) satisfies TeX Source $${\cal V}_{t}={\cal Y}_{t},\quad 0\leq t\leq T.\eqno{\hbox{(64)}}$$

##### Proof

As can be seen from the equation (50) and the remark following it, the process $y_{0}$ obtained by setting $u=0$ in (49) is given by $dy_{0}=q_{0}(t)dt+D(t)dw$ for a process $q_{0}$ adapted to $\{{\cal W}_{t}\}$. Define $dv_{0}=dy_{0}-{\mathhat {q}}_{0}(t)dt$, where ${\mathhat {q_{0}}}(t):=E\{q_{0}(t)\mid{\cal Y}_{t}^{0}\}$. Now, $q_{0}$ and $w$ are jointly Gaussian, and therefore, for each $t\in [0,T]$, the components of ${\mathhat {q}}_{0}(t)$ belong to the closed linear span of the components of the semimartingale $\{y_{0},\, t\in [0,T]\}$, and hence TeX Source $${\mathhat {q}}_{0}(t)=\int_{0}^{t}M(t,s)dy_{0}$$ for some $L^{2}$-kernel $M$. Therefore, $v_{0}$ is Gaussian, and its natural filtration ${\cal V}_{t}^{0}$ satisfies ${\cal V}_{t}^{0}\subset{\cal Y}_{t}^{0}$. Now let $R$ be the resolvent of the Volterra equation with kernel $M$; i.e., the unique solution of the resolvent equation TeX Source $$R(t,s)=\int_{s}^{t}R(t,\tau)M(\tau,s)d\tau+M(t,s)$$ [35], [42]. Then TeX Source $$\int_{s}^{t}R(t,s)dv_{0}(s)=\int_{s}^{t}M(t,s)dy_{0}(s)={\mathhat {q}}_{0}(t),$$ and hence ${\cal Y}_{t}^{0}\subset{\cal V}_{t}^{0}$. Consequently, in view of Lemma 8, ${\cal V}_{t}^{0}={\cal Y}_{t}^{0}={\cal Y}_{t}$. Next observe that TeX Source $$dy=q(t)+D(t)dw,\quad q(t):=q_{0}(t)+h(u)(t),$$ where $h(u)$ is a causal (linear) function of the control $u$. Since $h(u)$ is adapted to $\{{\cal Y}_{t}\}$, TeX Source $${\mathhat {q}}(t):={\mathhat {q}}_{0}(t)+h(u)(t),$$ and therefore the innovation process (57) satisfies $dv=dy-{\mathhat {q}}(t)dt=dy_{0}-{\mathhat {q}}_{0}(t)dt=dv_{0}$. Equation (64) now follows.

Finally, to prove that the innovation process $v$ is a martingale we need to show that TeX Source $$E\{v(s)-v(t)\mid{\cal V}_{t}\}=0\;{\rm for all s\geq t}.$$ To this end, first note that TeX Source $$\displaylines{E\left\{v(s)-v(t)\mid{\cal V}_{t}\right\}=E\left\{\int_{t}^{s}{\mathtilde{q}}(\tau)d\tau\mid{\cal V}_{t}\right\}\hfill\cr\hfill+E\left\{\int_{t}^{s}B(\tau)dw\mid{\cal V}_{t}\right\},\quad{\hbox{(65)}}}$$ where ${\mathtilde{q}}(t):=q(t)-{\mathhat {q}}(t)$. Since all the processes are jointly Gaussian (the control-dependent terms have been canceled in forming ${\mathtilde{q}}$), independence is the same as orthogonality. Since ${\mathtilde{q}}(\tau)\perp{\cal V}_{\tau}\supset{\cal V}_{t}$ for $\tau\geq t$, the first term in (65) is zero. The second term can be written TeX Source $$E\left\{E\left\{\int_{t}^{s}B(\tau)dw\mid{\cal W}_{t}\right\}\mid{\cal V}_{t}\right\},$$ which is zero since $w$ is a martingale. $\blackboxfill$

We are now in a position to prove Theorem 18. Lemma 20 shows that the innovation process (57) is a martingale. It is no restriction to assume that $E\{v(t)v(t)^{\prime}\}$ is diagonal; if it is not, we just normalize the innovation process by replacing $v(t)$ by $R(t)^{-1/2}v(t)$, where $R(t):=E\{v(t)v(t)^{\prime}\}>0$. Then we set $\beta_{k}(t):=E\{v_{k}^{2}\}$, $k=1,2,\ldots,p$. Since ${\cal V}_{t}={\cal Y}_{t}$ for $t\in [0,T]$ (Lemma 20), admissible controls take the form (59). Moreover, the process ${\mathhat {x}}(t):=E\{x(t)\mid{\cal Y}_{t}\}$ is adapted to $\{{\cal V}_{t}\}$, and hence, analogously to (59), it has the decomposition TeX Source $${\mathhat {x}}(t)=\bar{x}(t)+\sum_{k=1}^{p}\int_{0}^{t}x_{k}(t,s)dv_{k}(s)+{\mathtilde{x}}(t),\eqno{\hbox{(66)}}$$ which now will take the place of (61) in Lemma 19. As before, let ${\mathhat {x}}_{0}$ be the process ${\mathhat {x}}$ obtained by setting $u=0$. By Lemma 8, ${\mathhat {x}}_{0}$ does not depend on the control $u$. Moreover, since $x_{0}$ and $v$ are jointly Gaussian, TeX Source $${\mathhat {x}}_{0}(t)=\bar{x}_{0}(t)+\sum_{k=1}^{p}\int_{0}^{t}x_{k}^{0}(t,s)dv_{k}(s),\eqno{\hbox{(67)}}$$ replacing (60) in Lemma 19. Moreover, TeX Source $$E\{V_{0}(x,u)\}=E\{V_{0}({\mathhat {x}},u)\}+E\{V_{0}(x-{\mathhat {x}},0)\},$$ where the last term does not depend on the control, since $x-{\mathhat {x}}=x_{0}-{\mathhat {x}}_{0}$. Hence, by Lemma 19, the problem is now reduced to finding a control (59) and a state process (66) minimizing $E\{V_{0}({\mathhat {x}},u)\}$ subject to TeX Source \eqalignno{\bar{x}(t)=&\,\bar{x}_{0}(t)+\int_{0}^{t}G(t,\tau)\bar{u}(\tau)d\tau&{\hbox{(68a)}}\cr x_{k}(t,s)=&\,x_{k}^{0}(t,s)+\int_{s}^{t}G(t,\tau)u_{k}(\tau,s)d\tau &{\hbox{(68b)}}\cr{\mathtilde{x}}(t)=&\,\int_{0}^{t}G(t,\tau){\mathtilde{u}}(\tau)d\tau &{\hbox{(68c)}}} where the last equation has been modified to account for the fact that ${\mathtilde{x}}_{0}=0$. Clearly, this problem decomposes into several distinct problems. First $\bar{u}$ need to chosen so that $V_{0}(\bar{x},\bar{u})$ is minimized subject to (68a). This is a deterministic control problem with the feedback solution TeX Source $$\bar{u}(t)=\int_{t-h}^{t}d_{\tau}K(t,\tau)\bar{x}(\tau),\eqno{\hbox{(69)}}$$ where $K$ is as in (53). Secondly, for each $s\in [0,T]$ and $k=1,2,\ldots,p$, $u_{k}(t,s)$ has to be chosen so as to minimize $V_{s}(x_{k}(\cdot,s),u_{k}(\cdot,s))$ subject to (68b). This again is a deterministic control problem with the optimal feedback solution TeX Source $$u_{k}(t,s)=\int_{t-h}^{t}d_{\tau}K(t,\tau)x_{k}(\tau,s).\eqno{\hbox{(70)}}$$ Finally, ${\mathtilde{u}}$ should be chosen so as to minimize $E\{V_{0}({\mathtilde{x}},{\mathtilde{u}}\})$ subject to (68c). This problem clearly has the solution ${\mathtilde{u}}=0$, and hence ${\mathtilde{x}}=0$ as well. Combining these results inserting them into (59) then yields the optimal feedback control TeX Source $$u(t)=\int_{t-h}^{t}d_{\tau}K(t,\tau)\big (\bar{x}(\tau)+\sum_{k=1}^{p}\int_{0}^{t}x_{k}(t,s)dv_{k}(s)\big).$$ It remains to show that this is exactly the same as (54); i.e., that TeX Source $${\mathhat {x}}(\tau\vert t)=\bar{x}(\tau)+\sum_{k=1}^{p}\int_{0}^{t}x_{k}(t,s)dv_{k}(s).\eqno{\hbox{(71)}}$$ To this end, first note that, since the optimal control is linear in $dv$, ${\mathhat {x}}(\tau\vert t)$ will take the form TeX Source $${\mathhat {x}}(\tau\vert t)=\bar{x}(\tau)+\int_{0}^{t}X_{t}(\tau,s)dv(s),$$ where $\bar{x}(\tau)=E\{x(\tau)\}$, the same as in (71). Clearly $E\{[x(\tau)-{\mathhat {x}}(\tau\vert t)]v(s)^{\prime}\}=0$ for $s\in [0,t]$, and therefore TeX Source $$E\{x(\tau)v(s)^{\prime}\}=E\{{\mathhat {x}}(\tau\vert t)v(s)^{\prime}\}=\int_{0}^{s}X_{t}(\tau,s)d\beta(s),$$ showing that the kernel $X_{t}$ does not depend on $t$; hence this index will be dropped. Now, setting $\tau=t$, comparing with (66) and noting that ${\mathtilde{x}}=0$, we see that $X(t,s)$ is the matrix with columns $x_{1}(t,s),x_{2}(t,s),\ldots,x_{p}(t,s)$, establishing (71), which from now we shall write TeX Source $${\mathhat {x}}(\tau\vert t)=\bar{x}(\tau)+\int_{0}^{t}X(\tau,s)dv(s).\eqno{\hbox{(72)}}$$ Hence, (54) is the optimal control, as claimed. Moreover, TeX Source $${\mathhat {x}}(\tau\vert t)={\mathhat {x}}(s)+\int_{s}^{t}X(\tau,s)dv(s),$$ which yields (56a). To derive (56b), follow the procedure in [26].

It remains to show that the optimal control law (54) is deterministically well-posed. To this end, it is no restriction to assume that $\bar{x}_{0}\equiv 0$ so that all processes have zero mean. Then it follows from (54) and the unsymmetric Fubini Theorem of Cameron and Martin [10] that TeX Source \eqalignno{u(t)=&\,\int_{0}^{t}P(t,s)dv(s),\cr\noalign{\noindent \hbox{where}\hfill}P(t,s)=&\,\int_{t-h}^{t}d_{\tau}K(t,\tau)X(\tau,s)d\tau,} and likewise from (57) that TeX Source \eqalignno{dv=&\,dy-\int_{0}^{t}S(t,s)dv(s)dt,\cr\noalign{\noindent \hbox{where}\hfill}S(t,s)=&\,\int_{t-h}^{t}d_{\tau}C(t,\tau)X(\tau,s)d\tau.} The function $S$ is a Volterra kernel and therefore the Volterra resolvent equation TeX Source $$V(t,s)=\int_{s}^{t}V(t,\tau)S(\tau,s)d\tau+S(t,s)$$ has a unique solution $V$, from which it follows that TeX Source $$dv=dy-\int_{0}^{t}V(t,s)dy(s).$$ Then the optimal control law is given by (40), where now $M$ is given by TeX Source $$M(t,s)=P(t,s)-\int_{s}^{t}P(t,\tau)V(\tau,s)d\tau.$$ Now, for the optimal control law, $s\mapsto X(t,s)$ is of bounded variation for each $t$ [26], and hence so is $s\mapsto M(t,s)$. Hence $\pi_{\rm opt}$ can be defined samplewise as in (41). To complete the proof that the optimal feedback loop is deterministically well-posed we proceed exactly as in the proof of Theorem 13, noting that in the present setting TeX Source $${{\partial G}\over{\partial s}}(t,s)=\int_{s}^{t}d_{\tau}\left[\matrix{A(t.\tau)\cr C (t,\tau)}\right]\Phi (\tau,s)B_{1}(s),$$ where $\Phi (t,s)$ is the transition matrix of $A$ [26, p.101].

#### Remark 21

It was shown in [27] that, in the case of complete state information $(y=x)$, the control (53) is optimal even when $w$ is an arbitrary (not necessarily Gaussian) martingale.

SECTION VII

## CONCLUSIONS

In studying the literature on the separation principle of stochastic control, one encounters many expositions where subtle difficulties are overlooked and inadmissible shortcuts are taken. On the other hand, for most papers and monographs that provide rigorous derivations, one is struck by the level of mathematical sophistication and technical complexity, which make the material hard to include in standard textbooks in a self-contained fashion. It is our hope that our use of deterministic well-posedness provides an alternative mechanism for understanding the separation principle that is more palatable and transparent to the engineering community, while still rigorous. The new insights offered by the approach allow us to establish the separation principle also for systems driven by non-Gaussian martingale noise. However, in this more general framework the key issue of establishing well-posedness for particular control systems is challenging and more work needs to be done.

## APPENDIX

Consider the “uncontrolled” observation process TeX Source $$dy_{0}=v(t)dt+\sigma dw.$$

If $d{\BBP}$ denotes the law of $(\theta,\tau,w)$ and TeX Source $$\Lambda (t)=e^{\sigma^{-2}\int_{0}^{t}v(s)dy_{0}-(1/2)\sigma^{-2}\int_{0}^{t}v(s)^{2}ds,}$$ then, under a new measure $d{\BBQ}:=\Lambda (T)^{-1}d{\BBP}$, $y_{0}$ becomes a Wiener process while the law of $v$ (i.e., of $\theta$ and $\tau$) is the same as before. Under $d{\BBQ}$, the two processes $y_{0}$ and $v$ are independent. The conditional expectation is now given by (Bayes' formula [16, p. 174]) as shown in (73) at the bottom of the page. TeX Source \eqalignno{E_{\BBP}(v(t)\vert{\cal Y}_{t})=&\,{{E_{\BBQ}(v(t)\Lambda (t)\vert{\cal Y}_{t})}\over{E_{\BBQ}(\Lambda (t)\vert{\cal Y}_{t})}}\cr=&\,{{E_{\BBQ}(\theta I_{t\geq\tau}e^{\sigma^{-2}\int_{0}^{t}\theta I_{s\geq\tau}dy_{0}-(1/2)\sigma^{-2}\int_{0}^{t}I_{s\geq\tau}ds}\vert{\cal Y}_{t})}\over{E_{\BBQ}(e^{\sigma^{-2}\int_{0}^{t}\theta I_{s\geq\tau}dy_{0}-(1/2)\sigma^{-2}\int_{0}^{t}I_{s\geq\tau}ds}\vert{\cal Y}_{t})}}\cr=&\,{{E_{\BBQ}(I_{t\geq\tau}e^{-({{1}/{\sigma^{2}})}y_{0}(t\wedge\tau)-(1/2\sigma^{2})(t-\tau)^{+}}\null (e^{{y_{0}(t)}/{\sigma^{2}}}\null-e^{-{{y_{0}(t)}/{\sigma^{2}}}})\vert{\cal Y}_{t})}\over{E_{\BBQ}(e^{-({{1}/{\sigma^{2}})}y_{0}(t\wedge\tau)-(1/2\sigma^{2})(t-\tau)^{+}}\null (e^{{y_{0}(t)}/{\sigma^{2}}}\null-e^{-{{y_{0}(t)}/{\sigma^{2}}}})\vert{\cal Y}_{t})}}.&{\hbox{(73)}}}

Here $t\wedge\tau:=\min (t,\tau)$, $I_{t\geq\tau}(t)=1$ when $t\geq\tau$ and 0 otherwise, and $(t-\tau)^{+}=(t-\tau)I_{t\geq\tau}$. Note that $v(t)=\theta I_{t\geq\tau}(t)$. We also define $\rho (t):=E_{\BBP}(v(t)\vert{\cal Y}_{t})$ and TeX Source \eqalignno{\Sigma (t):=&\,\int_{0}^{t}e^{(y_{0}(t)-y_{0}(s)-(1/2)(t-s))/\sigma^{2}}ds,\cr\bar\Sigma (t):=&\,\int_{0}^{t}e^{(-(y_{0}(t)-y_{0}(s))-(1/2)(t-s))/\sigma^{2}}ds.} From (73), $\rho (t)=N(t)/D(t)$, where $N(t)=\Sigma (t)-\bar\Sigma (t)$ and $D(t)=\Sigma (t)+\bar\Sigma (t)+2(T-t)$. By first noting that $\Sigma$ and $\bar\Sigma$ satisfy the stochastic differential equations TeX Source $$d\Sigma=\Sigma (t) dy_{0}+dt\quad{\rm and}\quad d\bar\Sigma=-\bar\Sigma (t) dy_{0}+dt,$$ respectively, the Itô rule applied to the expression $N(t)/D(t)$ for the conditional expectation gives the filter equations (setting $\phi=D^{-1}$) TeX Source \eqalignno{d\rho=&\,\sigma^{-2}(1-\rho^{2}-2(T-t)\phi)(dy_{0}-\rho dt) &{\hbox{(74a)}}\cr d\phi=&\,-\sigma^{-2}\phi\rho (dy_{0}(t)-\rho dt).&{\hbox{(74b)}}} Finally, noting that the innovation $dy_{0}-\rho dt$ is equal to $dy-{\mathhat x}dt$ for the controlled system, we obtain the filter equations (48).

### ACKNOWLEDGMENT

We are indebted to an anonymous referee for significant input, which has improved the paper considerably.

## Footnotes

This work was supported by grants from AFOSR, NSF, VR, SSF and the Göran Gustafsson Foundation. Recommended by Associate Editor H. Zhang.

T. T. Georgiou is with the Department of Electrical & Computer Engineering, University of Minnesota, Minneapolis, Minnesota 55455 USA (e-mail: tryphon@umn.edu).

A. Lindquist is with the Department of Automation, Shanghai Jiao Tong University, Shanghai, China, and the Center for Industrial and Applied Mathematics (CIAM) and the ACCESS Linnaeus Center, Royal Institute of Technology, 100 44 Stockholm, Sweden (e-mail: alq@kth.se).

1However, the model is conditionally Gaussian given the filtration $\{{\cal Y}_{t}\}$; see Remark 6.

2“continue à droite, limite à gauche” in French, alternatively RCLL (“right continuous with left limits”) in English.

3More precisely, to be seen as a system, relay hysteresis needs to be preceded by a low-pass filter since its domain consists of continuous functions.

4It is interesting to note, as was pointed out by a referee, that the proof of the lemma relies critically on the action of the operator $(1-g\pi H)^{-1}$ on a null set, as the probability ${\BBP}(z_{0}=H^{-R}y_{0})=0$ for any nontrivial model. This fact may be disturbing from a probabilistic point of view but does not invalidate the lemma.

5This was kindly suggested by a referee.

## References

No Data Available

## Cited By

No Data Available

None

## Multimedia

No Data Available
This paper appears in:
No Data Available
Issue Date:
No Data Available
On page(s):
No Data Available
ISSN:
None
INSPEC Accession Number:
None
Digital Object Identifier:
None
Date of Current Version:
No Data Available
Date of Original Publication:
No Data Available