Complete Characterization of Gorbunov and Pinsker Nonanticipatory Epsilon Entropy of Multivariate Gaussian Sources

—This paper derives the optimal test channel distribution and the complete characterization of the classical Gorbunov and Pinsker (1973), Gorbunov and Pinsker (1974) nonanticipatory epsilon entropy of multivariate Gaussian Markov sources with square-error ﬁdelity, which remained an open problem since 1974. The paper also formulates a state dependent nonanticipatory epsilon entropy, in which past reproductions are available to the decoder and not to the encoder, the test channel is speciﬁed with respect to an auxiliary (state) random process, and the reproduction process is a causal function of past reproduction and the auxiliary random process. This variation is analogous to the Wyner and Ziv (1976) and Wyner (1978) rate distortion function (RDF), of memoryless sources. It is shown that the operational rate of zero-delay codes, with past reproductions available to the decoder but not to the encoder is bounded below by the state dependent nonanticipatory epsilon entropy rate. For the case of multivariate Gaussian Markov sources with square-error ﬁdelity, the optimal test channel distribution and the complete characterization of the state dependent of nonanticipatory epsilon entropy are derived, and also shown that that the two nonanticipatory epsilon entropies coincide. The derivations are new; they are based on structural properties of the stochastic realizations of the reproduction process that induce the optimal test channel distributions. They are derived using, achievable lower bounds on information theoretic measures, properties of mean-square estimation theory, Hadamard’s inequality, and canonical correlation coefﬁcients of a tuple of multivariate jointly Gaussian random processes. Applications of the nonanticipatory epsilon entropy and its state dependent variation are discussed to the areas of control of


memory channels, design of causal estimators for Gaussian Markov sources with a fidelity criterion, computation of the rate loss of causal and zero-delay codes of Gaussian Markov sources with respect to non-causal codes.
Index Terms-Nonanticipatory epsilon entropy, multivariate Gaussian processes, square-error fidelity criterion.

I. INTRODUCTION, CONTRIBUTIONS,
AND LITERATURE REVIEW G ORBUNOV and Pinsker [2], [3] introduced the nonan- ticipatory epsilon entropy, 1 and message generation rates of sources with respect to a fidelity criterion.These are variants of the classical rate distortion function 2 (RDF) and its rate -the information theoretic definition of infimum of all achievable rates of non-causal codes subject to a fidelity [6]- [8], often called, the optimal performance theoretically attainable (OPTA) by noncausal codes.Over the last 20 years the nonanticipatory epsilon entropy of Gaussian Markov processes with symbol-by-symbol and block symbol square error distortion criteria is investigated extensively in the literature, under the names, sequential and nonanticipative RDF (see [9]- [13]), and applied in the following three main applications areas: 1) Quantification of the rate loss of the OPTA by causal codes [14] and zero-delay codes [15]- [18], with respect to noncausal codes, for Gaussian Markov sources with letter-by-letter square error fidelity [10].For scalar sources, it is shown in [10] that the OPTA by causal codes, when feedback is used from the decoder to the encoder, exceeds Gorbunov's and Pinsker's message generation rates by less than 1 2 ln 2πe 12 0.254 bits/sample, while that of zero-delay codes exceeds Gorbunov's and Pinsker's message generation rates by less than 1  2 ln 2πe 12 + 1 1.254 bits/sample.Further, causal and zero-delay codes, based on subtractive dither with uniform scalar quantization (SDUSQ) [19], [20] and lattice codes [21] are constructed in [22] by making use of the test channel.In this application area, the main problem is to determine tight bounds on causal and zero-delay codes, and to construct causal and zero-delay codes which operate with respect to such rate loss requirements.2) Control of linear Gaussian Markov control systems over finite rate, noiseless or noisy, communications channels [9], [23], [24].In this application area, the main problem is to determine necessary and sufficient conditions to stabilize unstable control systems and to design controllers, encoders and decoders to control the control system using a noisy or noiseless channel with finite rate constraints.3) Synthesis of recursive, causal filters of Gaussian Markov processes subject to letter-by-letter and block letter square error fidelity criteria [11], [13], [25].In this application, further clarified in Fig. 2, the main problem is to compute the optimal test channel distribution of Gorbunov's and Pinsker's nonanticipatory epsilon entropy and then realize it by designing an optimal encoder, noisy channel, and decoder, such that the Gaussian Markov process is reconstructed at the output of the decoder or estimator, with the pre-specified end-to-end average distortion.An earlier formulation of the causal estimation problem is put forward by Bucy [26], where use is made of mutual information subject to a causality constraint on the estimator.In Bayesian estimation [27], [28] of Fig. 1, one is given a model that generates the unobserved process X n {X t : t = 0, . . ., n}, and a model that generates observed data O n {O 0 , O 1 , . . ., O n }, i.e., the sensor model.At each time t, an estimate of the unobserved process X t , that minimizes the posterior expected value of a loss function, denoted by X t = g t (O t−1 ) for some nonanticipative measurable function g t (•), is constructed causally, from the observed data O t−1 , for t = 0, . . ., n. Bayesian filtering theory presupposes that both models, which generate (X n , O n ), are given á priori, while at each time t, the estimator is ) is often computed recursively, like the Kalman filter of mean-square estimation.
In information-based estimation, one is given a model that generates the unobserved process X n and a fidelity criterion of reconstructing X n by Y n , and the objective is to determine the optimal nonanticipative test channel conditional distribution {P Yt|Y t−1 ,X t : t = 0, . . ., n} of the Gorbunov and Pinsker nonanticipatory epsilon entropy, denoted hereinafter by R GP n (D), and to realize this distribution by an {encoder, channel, decoder} so that the end-to-end average distortion crierion is met.Thus, in information-based estimation, the observation model is constructed by the cascade of the {encoder, channel}, that generates O n = B n according to Fig. 2 if there is feedback, and O n = S n according to Fig. 3 if there is no feedback and the filter is the decoder, which satisfies the end-to-end average distortion.Fig. 2. Block diagram of a feedback realization that consist of a sensor map and a decoder; past reproductions are available to both, the sensor map and the decoder.Fig. 3. Block diagram of a nonfeedback realization that consists of a sensor map and a decoder; past reproductions are available to the decoder but not to the sensor map. .Despite the broad range of applications of nonanticipatory epsilon entropy, such as, 1)-3), two fundamental questions are not answered to this date.
Question 1.The first question is concerned with the mathematical problem of the complete characterization of Gorbunov's and Pinsker's nonanticipatory epsilon entropy for multivariate Gaussian sources with letter-by-letter and block letter distortion criteria.That is, the specification of the optimal test channel distribution, and its properties, that achieves the nonanticipatory epsilon entropy, and gives rise to a specific optimization problem over a set of parameters to be optimized, i.e., to ensure such reproduction process exist.
Application areas 1)-3) above are currently limited by the fact that, for multivariate Gaussian Markov sources with letterby-letter or block letter square error distortion criteria, neither Gorbunov's and Pinsker's nonanticipatory epsilon entropy and message generation rate, nor the optimal test channel distributions that achieve these are known.For example, in [22], suboptimal rate losses are computed and suboptimal codes are constructed, and similarly in [29].
Question 2. The second question is concerned with the practical aspects of application areas of Gorbunov's and Pinsker's nonanticipatory epsilon entropy; more specifically, whether it is possible to achieve nonanticipatory epsilon entropy using a state dependent variation, that uses a state dependent test channel, as shown in Fig. 3, where the encoder that does not use feedback from past reproduction. 3pplication areas 1)-3) often make use of Gorbunov's and Pinsker's nonanticipatory epsilon entropy, and hence they are limited by the fact that they apply encoders, which make use of past reproduction symbols as side information.This requirement imposes a heavy burden on design specifications due to the communication of past reproductions to the encoder, compared to encoders that do not use past reproductions as side information.For example, in application area 3), the synthesis of recursive, causal filters can be implemented using: (2.1) the feedback realization of Fig. 2 that uses past reproductions, as side information, to generate both the outputs of the sensor map and the decoder/estimator, or (2.2) the nonfeedback realization of Fig. 3 that uses past reproductions, as side information, to generated the outputs of the decoder/estimator, and does not use past reproductions to generate the outputs of the sensor map.
The fundamental question however, is whether the nonfeedback synthesis of Fig. 3 achieves the value of the feedback synthesis of Fig. 2.
Prior literature did not address Questions 1 and 2, and this limits the full use of nonanticipatory epsilon entropy and message generation rates in the specific areas discussed.Our main emphasis in this paper is to address Questions 1 and 2.
Regarding Question 1, this paper derives, for the first time, the complete characterization of Gorbunov and Pinsker [3] nonanticipatory epsilon entropy of multivariate Gaussian Markov sources with letter-by-letter and block letter squareerror fidelity criteria.The realizations of the reproduction processes that induce the optimal test channel distributions, which achieve the nonanticipatory epsilon entropy, are parametrized by a tuple of matrices that satisfy specific properties.These are feedback realizations, which are equivalent to innovations encoders with side information the past reproductions, followed by Gaussian noise channels, followed by decoders with side information the past reproductions.The feedback realizations fill the gap of the partial characterization of nonanticipatory epsilon entropy given by Gorbunov and Pinsker [3,Theorem 5].For the nonstationary, multivariate Gaussian Markov sources, it is shown that certain matrices that define the realizations of the reproduction processes and induce the optimal test channels, admit a spectral representations with respect to the same unitary matrices.This structural property leads to characterizations of the nonanticipatory epsilon entropy by sequential multivariate optimization problems in time, with respect to (w.r.t.) the canonical correlation coefficients of a tuple of multivariate jointly Gaussian random processes.Further, for the letter-by-letter squareerror criterion, an upper bound on the characterization of the nonanticipatory epsilon entropy is derived that is easily computed sequentially in time, based on a water-filling calculation in spatial dimensions, which is further shown to be analogous to the water-filling solution of the classical RDF of multivariate memoryless Gaussian sources with square-error fidelity.
To the best of the author's knowledge, current knowledge on the complete characterization of the nonanticipative epsilon entropy, is restricted to scalar Gaussian Markov sources, given by Gorbunov and Pinsker [3,Example 2] for the letter-byletter square-error fidelity criterion, and in [30] for the block letter square-error criterion.For multivariate Gaussian Markov sources with a letter-by letter square-error distortion crite-rion, a partial characterization of the nonanticipatory epsilon entropy is given by Gorbunov and Pinsker in [3, Theorem 5]; the term "partial" refers to the fact that the matrices which define the realization of the test channel distribution remain to this date unspecified.It appears from the literature that the only other research paper that attempted to characterize the nonanticipatory epsilon entropy for multivariate Gaussian Markov sources with letter-by-letter and block letter squareerror fidelity criteria is [13]; see also discussion in [31].However, the analysis in [13] produces a water-filling solution, and realizations of the test channel distributions, which are suboptimal, and hence these correspond to upper bounds on the nonanticipatory epsilon entropy.Due to the lack of an efficient characterization of the nonanticipatory epsilon entropy, for the case of multivariate Gaussian Markov sources with a letter-by-letter distortion criterion, a numerical approach is developed in [32], that makes use of semidefinite programming.However, the semidefinite programming approach does not answer the question of existence of a realization that induces the optimal test channel distribution.In particular, in the application of the algorithm, the joint distribution of the source and its reproduction, should be such that one of its marginal distributions is the source distribution, which is not easy to check.
Regarding Question 2, this paper formulates a state dependent variant of Gorbunov and Pinsker [3] nonanticipatory epsilon entropy, which is analogous to the Wyner and Ziv [4] and Wyner [5] RDF of memoryless sources.The main features of this new formulation, is the use of a test channel distribution w.r.t. an auxiliary random process (i.e., state process), subject to a causality constraint and a conditional independence condition to ensure the encoder does not use feedback from the past reproductions, unlike Gorbunov's and Pinsker's nonanticipatory epsilon entropy.Although, the state dependent nonanticipatory epsilon entropy is larger or equal to the Gorbunov and Pinsker nonanticipatory epsilon entropy, sufficient conditions are identified for the two to coincide.Moreover, it is shown that the operational rate of zero-delay codes is bounded below by the state dependent nonanticipatory epsilon entropy rate.It is also shown that, for multivariate Gaussian sources with square-error fidelity criterion, the two nonanticipatory epsilon entropies coincide, and moreover a nonfeedback realization is constructed of the auxiliary random process and the reproduction process that induce the optimal test channel distribution of the state dependent Gorbunov's and Pinsker's nonanticipatory epsilon entropy.{(X n , B(X n )) : n ∈ Z} denotes a measurable space, where X n are confined to complete separable metric spaces or Polish space, and B(X n ) the Borel σ−algebras of subsets of X n .Points in the product space X Z × n∈Z X n are denoted by x ∞ −∞ (. . ., x −1 , x 0 , x 1 , . ..) ∈ X Z , and their restrictions to finite coordinates for any

A. Notation
Given a probability space Ω, F , P and a RV X : (Ω, F ) −→ (X, B(X)), we denote by4 P X ∈ dx = P X (dx) ≡ P(dx) the probability measure induced by X on (X, B(X)) (i.e., probability distribution if the conditional probability measure of random variable (RV) where G x = y ∈ Y (x, y) ∈ G} is the x−section of the set G. That is, ⊗ denotes the compound probability operator in (I.2), and the notation, P X,Y (dx, dy) = P Y |X (dy|x) ⊗ P X (dx), is used.
Consider three random variables (RVs) X : Ω → X, Y : Ω → Y, and W : Ω → W defined on some probability space (Ω, F , P).We say that RVs (X, Y ) are conditionally independent given RV W if P X,Y |W = P X|W P Y |W − a.s., (almost surely); the specification a.s. is often omitted.We often denote this conditional independence by the Markov Chain (MC) Y ↔ W ↔ X.
A time-invariant Gaussian Markov process X n corresponds to the restriction (A t , B t , K Wt ) = (A, B, K W ), ∀t, i.e., X n is not necessarily stationary or asymptotically stationary.
We evaluate the reproduction . ., X n }, w.r.t. the average of a block symbol square error fidelity criterion, d 0,n (•, •), defined by We also consider the average of symbol-by-symbol square error fidelity criterion defined by Given the fixed source probability distribution P X n , and a conditional probability distribution P Y n |X n , define the joint distribution of (X n , Y n ), by P X n ,Y n = P Y n |X n ⊗ P X n , with marginal distributions P X n , P Y n .Let I(X n , Y n ) denote the mutual information between X n and its reproduction Y n , defined by [7] where P Y n × P X n is the distribution defined by the product of the two distributions, and (I.8) follows because (by the Radon-Nikodym derivative theorem).

(P2)
The nonanticipatory epsilon entropy w.r.t. the symbolby-symbol fidelity criterion [2], is defined by (I.10) In (P1) and (P2) the infimum is taken over all joint distributions P X n ,Y n of the joint process (X n , Y n ) such that following conditions hold: (GP-1) × n t=0 Yt P X n ,Y n (dx n , dy n ) = P X n (dx n ), i.e., the joint distribution has an X n −marginal that corresponds to the fixed distribution of the source X n , and (GP-2) for each t ∈ {0, . . ., n}, the RV Y t is conditionally independent of X n t+1 conditioned on X t , that is, Conditional independence (I.11) is known as the "causality condition" of the reproduction distribution.The underlying assumption in [3] is that the infimums in (I.9) and (I.10) exist and they are finite.
In the literature, the nonanticipatory epsilon entropy is often called sequential RDF, because the nonanticipative constraint described under (GP-2) above, implies [11], P Yt|Y t−1 ,X n = P Yt|Y t−1 ,X t , and P Xt|X t−1 ,Y t−1 = P Xt|X t−1 , t = 0, . . ., n, and hence R GP n (D) can be expressed sequentially, as a special case of directed information, as follows: The reproduction distribution is P Yt|Y t−1 ,X t , t = 0, . . ., n.Hence, at each time t the probability distribution of Y t depends on X t and also the past reproductions Y t−1 .For application area 3) this is described in Fig. 2.
Next, we introduce the state dependent variant of Gorbunov and Pinsker nonanticipatory epsilon entropy, which imposes the requirement that the test channel distribution P Yt|Y t−1 ,X t , t = 0, . . ., n, is induced by a state dependent reproduction process Y n , as shown in Fig. 3.This is motivated by the Wyner and Ziv [4] and [5] RDF of memoryless sources with side information only at the decoder.
(P3) The nonanticipatory epsilon entropy with state dependent reproduction process and block symbol fidelity criterion.Consider a sequence of states or auxiliary RVs S n {S 0 , S 1 , . . ., S n }, S t : Ω → S t R m , t = 0, 1, . . ., n, such that at each time t, the reproduction process is state dependent, Y t = f t (S t , Y t−1 ) for some measurable functions f t (•), t = 0, . . ., n. Define the state dependent variant of the Gorbunov and Pinsker nonanticipatory epsilon entropy, by the following optimization problem6 where M GP S [0,n] (D) is specified by the set In (P3) the optimization is over the sequence of auxiliary RVs S n , such that there exist measurable functions f t (•), t = 0, . . ., n, and the induced joint distribution P X n ,Y n ,S n of the joint process (X n , Y n , S n ) satisfies the following conditions: (GPS-1) × n t=0 Yt×St P X n ,Y n ,S n (dx n , dy n , ds n ) = P X n (dx n ), i.e., the joint distribution has X n −marginal the fixed distribution of the source X n , and (GPS-2) for each t ∈ {0, 1, . . ., n} by the conditional independence conditions the joint distribution is The test channel is P St|S t−1 ,X t of Fig. 3.The state dependent nonanticipatory epsilon entropy involves the state or auxiliary RVs; the reproduction process Y n is analogous to the Wyner-Ziv [4], [5] RDF with side information Y t−1 at the decoder but not the encoder.We show in subsequent parts of the paper that in general, R GP S n (D) ≥ R GP n (D), and equality holds if at each time t, and for fixed (s t−1 , y t−1 ), the map f t (•, s t−1 , y t−1 ) : S t → Y t , f t (s t , s t−1 , y t−1 ) = y t uniquely defines s t for all t = 0, . . ., n, i.e., sequentially, y 0 uniquely defines s 0 ; y 1 and s 0 , y 0 , uniquely defines s 1 , etc.The above variation is motivated by the fundamental differences of the feedback and nonfeedback realizations, Fig. 2 and Fig. 3, described under applications area 3).

C. Main Contributions
We analyze the Gorbunov and Pinsker nonanticipative epsilon entropy, and its state dependent variant (I.13), and we introduce new methods to completely characterize these RDFs, for sources of Definition 1 and fidelity criteria defined by (I.4), (I.5).The highlights of the paper are the following: 1) Realizations of Y n that induce the optimal reproduction distributions We make use of properties of mean-square estimation theory, achievable lower bounds on mutual information, and the weak stochastic realization theory of Gaussian processes, to derive structural properties of the realizations of Y n that induce the optimal reproduction distribution (or test channel).The specification of the parameters of the realization completes and simplifies the prior characterizations given in [3].
2) Spectral Representations of the matrices of the realizations of optimal reproduction distributions of R GP n (D) and R GP n (D 0 , . . ., D n ).We show a structural property, which states that certain matrices that parametrize the realizations of the reproduction process and produce the optimal test channel distribution, admit spectral representations w.r.t.unitary matrices.This structural property is shown by making use of Hadamard's inequality to derive achievable lower bounds on the mutual information I(X n ; Y n ).
3) Complete Sequential characterizations of R GP n (D) and R GP n (D 0 , . . ., D n ) in time of Gaussian Markov processes via sequential water-filling.Characterization of nonanticipatory epsilon entropy by a sequential multivariate optimization problem in time, which is expressed in terms of the canonical correlation coefficients of a tuple of multivariate jointly Gaussian random processes, which is equivalent to optimization problems that lead to water-filling.Further, an upper bound is proposed on R GP n (D 0 , . . ., D n ) which is easy to compute using pre-processing and post-processing of the source X n and its reproduction Y n by a unitary transformation, and sequential in time, water-filling in spatial dimensions, based on parallel additive Gaussian noise channels.Simulations show the upper bound is tight. 4)

D. Organization
The paper is organized as follows.In Section II, we review an alternative equivalent definition of the Gorbunov and Pinsker nonanticipatory epsilon entropy, often called nonanticipative RDF, and use this to prove the main theorem on the state dependent nonanticipatory epsilon entropy.In Section III, we derive the main results for the complete characterization of Gorbunov and Pinsker nonanticipatory epsilon entropy of multivariate Gaussian Markov sources with letter-by-letter and block letter square-error fidelity criteria.The realizations of the reproduction processes that induce the optimal test channel distributions, which achieve the nonanticipatory epsilon entropy, are parametrized by a tuple of matrices that satisfy specific properties.In Section IV, we derive the corresponding results for the state dependent Gorbunov and Pinsker nonanticipatory epsilon entropy.In Section V, we briefly discuss prior applications of Gorbunov and Pinsker nonanticipatory epsilon entropy and message generation rates, to the rate loss of zero-delay codes.In this section we derive a new lower bound on the operation rate of zero-delay codes using the state dependent nonanticipatory epsilon entropy and message generation rates, and illustrate fundamental differences between encoders that require knowledge of past reproductions and those that do not.It is then apparent that the state dependent Gorbunov and Pinsker nonanticipatory epsilon entropy is more suitable for such applications, since no feedback is required to the encoder.A numerical example is given in Section VI.Finally, in Section VII, we draw conclusions and discuss open problems on this topic.

II. NONANTICIPATIVE RDF, NONANTICIPATORY EPSILON ENTROPY, AND STATE DEPENDENT NONANTICIPATORY EPSILON ENTROPY
In this section, we establish the connections between nonanticipatory epsilon entropy, nonanticipative RDF, and the state dependent nonanticipatory epsilon entropy.The main result of this section is Theorem 1, which establishes

A. Equivalence of Nonanticipatory Epsilon Entropy and Nonanticipative RDF Definition 2 (Source and Reproduction Distributions
The source generates sequences from the set of distributions that satisfy conditional independence, i.e., M [0,n] = M Xt|X t−1 , t = 0, . . ., n :

, n ,
where where That is, for each t, the information structure of the reproduction distribution is For each t = 0, 1, . . ., we introduce the space G t of admissible source and reproduction histories up to time t, as follows.Define ,X t : t = 0, . .., then by Ionescu-Tulcea theorem, there exists a unique probability measure , and defined by The conditional distribution of Y t given Y t−1 is then defined by Next, we define the nonanticipative RDF.Then, we invoke a lemma to deduce its equivalence to the Gorbunov and Pinsker nonanticipatory epsilon entropy, which is defined in the Introduction.
Definition 3 (Nonanticipative RDF): Consider the source and reproduction distributions of Definition 2. The information measure is where I(X t ; Y t |Y t−1 ) denotes the conditional mutual information between X t and Y t conditioned on Y t−1 , and the value +∞ is allowed (i.e., for RVs which are continuous-valued).
(a) The nonanticipative RDF of the source subject to a block letter distortion is defined by where (b) The nonanticipative RDF of the source subject to a letterby-letter distortion is defined by where If the infimum does not exist, then If two-sided processes are considered, then It should be mentioned that existence of the infimum is shown in [33], under appropriate conditions of weak compactness of probability measures.
Remark 1: Gorbunov and Pinsker [2] also analyzed a variation of nonanticipatory epsilon entropy, when the conditional independence (I.11) is replaced by Proof: The equivalence of MC1-MC3 is shown in [11].The equivalence of MC1-MC3 to MC4 is done similarly to [11].

B. State Dependent Nonanticipatory Epsilon Entropy
Now, we turn our attention to the state dependent nonanticipatory epsilon entropy, with state or auxiliary random process, S n , i.e., R GP S n (D), defined by (I.13), to prove the upper bound and to identify a fundamental property for the inequality to hold with equality.
Theorem 1 (Properties of State Dependent Nonanticipatory Epsilon Entropy): Consider R GP S n (D) defined by (I.13).Then, the following hold.
(a) The following identity holds.
(b) For each element of the set M GP S [0,n] (D) the following inequality holds, = y t is invertible and measurable (i.e., for fixed (s t−1 , y t−1 ), knowledge of y t uniquely defines s t ) for all t = 0, . . ., n, then the equality holds, Moreover, inequality in (II.14) holds with equality, if the map f t (•, s t−1 , y t−1 ) : S t → Y t , f t (s t , s t−1 , y t−1 ) = y t is invertible and measurable for all t = 0, . . ., n.
Proof: See Appendix A. Remark 2: For the Gaussian Markov process X n of Definition 1, and a square-error distortion criterion, in subsequent sections, we give a realization of (S n , Y n ) such that the equality holds, R GP S n (D) = R na n (D).

III. NONANTICIPATIVE RDF OF MULTIVARIATE GAUSSIAN MARKOV SOURCES WITH SQUARE ERROR FIDELITY
In this section, we derive the characterizations of nonanticipative RDF for the time-varying multivariate Gaussian Markov source of Definition 1.The main results of this section are: 1) Theorem 4, which applies Proposition 1 and Theorem 2, and a data processing inequality to identify sufficient conditions, in terms of mean-square estimation theory, to show that a Markov Gaussian joint distribution In subsequent sections, we make use of the following definitions of conditional means and error covariances, from mean-square estimation theory.
The derivations and structural properties of the realization of the optimal reproduction distribution are new and never appeared in the literature.These are also applicable to other RDF problems of Gaussian multivariate sources with squareerror fidelity.

A. Preliminary Results on Mean-Square Estimation
The next well-known proposition of conditionally Gaussian RVs is extensively used in our derivations.
Proposition 1: Consider a pair of RVs X = (X 1 , . . ., X k ) T : Ω → R k and Y = (Y 1 , . . ., Y l ) T : Ω → R l , defined on some probability space Ω, F , P .Let G ⊆ F be a sub−σ−algebra. 8Assume the conditional distribution of (X, Y ) conditioned on G, i.e., P(dx, dy |G) is P−a.s.(almost surely) Gaussian, with conditional means and conditional covariances Then, the vectors of conditional expectations μ X|Y,G E X Y, G and matrices of conditional covariances are given, P−a.s., by the following expressions 9 : where notation • † denotes pseudoinverse of the matrix (• .If G is the trivial information, i.e., G = {Ω, ∅}, then G is removed from the above expressions, and (III.2),(III.3)degenerate to the well-known conditional mean of meansquare estimation theory.
Theorem 2 (Equivalent Statements): Consider the statement of Proposition 1 and assume the inverse of Q Y |G exists.Any two of the following three conditions imply the third: Proof: The statements are easily verified by substituting any two conditions in (III.2) and (III.3) in order to retrieve the third condition.

B. Characterization of R na n (D) of Multivariate Gaussian Markov Sources With Block Letter Square Error Distortion
Now, we prepare to derive the complete characterization of the nonaticipatory epsilon entropy, i.e., of R GP n (D) = R na n (D).Suppose (X n , Y n ) is jointly Gaussian.By an application of (III.2), with X = X t , Y = Y t , G = Y t−1 , then (III.7),shown at the bottom of the page, is obtained.
By an application of (III.3), with X = X t , Y = Y t , G = Y t−1 , then (III.8) and (III.9),shown at the bottom of the page, are obtained.
Throughout the paper, we apply Proposition 1 and (III.7),(III.8), to identify the optimal realization matrices of Y n and their structural properties characterizing the nonanticipative RDF.
We make use of the next Theorem which follows from [13].
block letter square error distortion criterion (a) The minimum of the nonanticipative RDF R na n (D) is achieved by a Gaussian joint distribution defined by and this joint distribution is induced by the process X n (of Definition 1) and the realization of the reproduction process, parametrized by matrices (H t , K Vt ) as follows. where Vt 0, t = 0, . . ., n, is independent and Gaussian, V t , t = 0, . . ., n, is independent of W t , t = 1, . . ., n, and X 0 , and given by with Σ − 0 = K X0 and where the average distortion constraint is satisfied.
Moreover, X t|t satisfies the Kalman-filter recursion of the realization of part (a), and given by (III.16)-(III.18),shown at the bottom of the next page.Before we proceed, we digress to recall certain properties of mutual information of a tuple of scalar-valued jointly Gaussian RVs, which are further elaborated in the multivariate case by Gelfand and Yaglom [34] (see eqn(2.8') and Chapter II), citing Kolmogorov for their origin.
Remark 3 (On the Mutual Information of a Tuple of Jointly Gaussian RVs): Consider a tuple of scalar-valued jointly Gaussian RVs (X, Y ), with zero mean E(X) = E(Y ) = 0, and correlation coefficient, , and covariance matrix, Then, by Gelfand and Yaglom [34], mutual information, for any ρ ∈ [−1, 1], is given by In particular, (III.22) shows that for Gaussian RVs (X, Y ) then I(X; Y ) = +∞ if and only if the correlation coefficient takes the values ρ = +1, ρ = −1.In the case, X = Y with probability one, then I(X; Y ) = +∞.For a tuple of multivariate jointly RVs, the analog of (III.22) is discussed in Gelfand and Yaglom [34] (see eqn(2.8') and Chapter II).The full specification is also found in [35], [36].
Remark 4: In view of Remark 3, without loss of generality we assume Σ − t 0, t = 0, . . ., n.If it is not the case, then at least one of the eigenvalues of Σ − t is zero which then implies it needs to be removed, because it will give a value of 1 2 n t=0 log t 0, t = 0, . . ., n.Indeed, any Gaussian RV X : Ω → R nx , n x ∈ N, with covariance K X 0, which does not satisfy K X 0, can be can transform using a linear basis transformation into the form, With respect to the new basis, the second component of the transformed X is a constant which can be disregarded because it has zero covariance.Then, rank(K 11 ) = p 11 implies that K 11 0.
We show how to treat the case Σ − t 0 but not Σ − t 0, after we develop the main results.
Theorem 4 (below) establishes a further simplification of R na n (D).It states the following: If there exists matrix-valued parameters (H t , K Vt ) ∈ R p×p × S p×p + , t = 0, . . ., n, in the set Q H,KV [0,n] (D), such that X t|t = Y t − a.s., t = 0, . . ., n, then the reproduction distribution satisfies P Yt|Y t−1 ,Xt = P Yt|Yt−1,Xt , and hence the joint process (X n , Y n ) that achieves R na n (D) (Ht,KV t ),t=0,...,n: (a) The realization of the reproduction distribution of Theorem 3.(a) satisfies while the pay-off satisfies where For any joint distribution P Q 1 (dx n , dy n ) induced by the Gaussian Markov source X n and the realization of Y n given in Theorem 3.(a), then the following inequality holds.
where Y n is Markov, the test channel distribution satisfies (III.28) and Moreover, the nonanticipative RDF is given by where and where the joint distribution of (X n , Y n ) is Markov and it is induced by the representation Further, the joint distribution of the process (X n , Y n ) is given by Proof: See Appendix B. Theorem 5 (below) establishes existence of the matrices , t = 0, . . ., n. (III.37) Further, the representation of the reproduction process Y n , with (H t , K Vt ), t = 0, . . ., n, defined below, satisfies where (H t , K Vt ), t = 0, . . ., n, are given by and where Xt , t = 0, . . ., n, which achieves the nonanticipative RDF R na 0,n (D) given by (III.29).
(c) The complete characterization of the nonanticipative RDF is given by (III.43),shown at the bottom of the page, where the constraint set (III.44) Proof: See Appendix C. Remark 5: We observe that the realization of Theorem 5 is such that and similarly for t = 0.In view of the above properties, we can apply pre-processing after the source generates X n = x n and post-processing prior to the reconstruction process Y n = y n , to give an alternative equivalent characterization of the R na n (D) of Theorem 5, in terms of translated RVs, and a single-letter information theoretic measure.
Corollary 1 (Equivalent Characterization of R na n (D) of Gaussian Markov Processes With Block Letter Square Error Distortion): Consider the Gaussian Markov processes with block letter square error distortion of Theorem 5. Define The representation of Theorem 5.(a) is equivalently expressed as where W t is independent of Z t−1 for t = 0, . . ., n and (III.41a),(III.41b)hold.(b) The processes ( Xn , Ỹ n , Z n ) satisfy the following recursions.
where W t is independent of Z t−1 , (H t , K Vt ), t = 0, . . ., n, are given by (III.41), and Further, the pay-off and average block letter square error distortion of the characterization of nonanticipative RDF are equivalently expressed as  where (III.60) is due to In view of the above identities, the equivalent characterization follows from Theorem 5.(b).
Remark 6 (On Corollary 1): It is obvious from Corollary 1, that the translated realization in (III.54)corresponds to the memoryless realization, while the pay-off and average distortions (III.58) and (III.59),remain invariant w.r.t. the translation.In Fig. 4, we show a block diagram of the realization for Y n of nonanticipative RDF R na n (D) for multivariate Gaussian sources, which is equivalent to an innovations encoder in which the past reproductions are fed back through a noiseless channel and used for the encoding.
In the next theorem, we identify structural properties of the realization coefficients {H t , K Vt , Σ t , Σ − t }; these are further used to obtain the complete characterization of the nonanticipative RDF.
(a) For any (Σ − t , Σ t ) that belonds to the constraint set where Λ t is the diagonal matrix of the spectral representation of the symmetric positive definite matrix Σ − t , that is, Perform the spectral representation of Σ − t − Σ t 0 w.r.t. the unitary matrix V t , for t = 0, . . ., n, that is, where we have (III.68)and (III.69),shown at the bottom of the page, and where d t,i ∈ (0, 1) are the canonical correlation coefficients of the p t,12 −correlated components Xc t ∈ R pt,12 and Ỹ c t ∈ R pt,12 of the transformed vectors of Xt and Ỹt , by two linear nonsingular transformations (T 1 , T 2 ), Xt → T 1 Xt , Ỹt → T 2 Ỹt , into their canonical variable form (see [35], [36] for full specification), i.e., the correlated components of T 1 Xt and T 2 Ỹt are Xc t ∈ R pt, 12 and Ỹ c t ∈ R pt, 12 , respectively, and t,i = 0) are the canonical correlation coefficients that correspond to the identical (resp.independent) components of the transformed vectors Xt ∈ R p and Ỹt ∈ R p into their canonical variable form (see [35], [36]).In view this we have the following main characterization.
Theorem 7 (Complete Characterization of R na n (D)): Consider the Gaussian Markov process X n {X 0 , . . ., X n } of Definition 1, and a block symbol square-error distortion Then, the following statements hold.
(a) The characterization of RDF R na n (D) is given by the solution of the optimization problem, , t = 0, . . ., n (which depends on D), denote the optimal values of R na n (D) of part (a).Then, the realization where (H * t , K V * t ), t = 0, . . ., n are given by T U * t = I, as given in Theorem 6.Moreover, the optimal test channel satisfies the following properties.
The realization of the optimal Y * ,n that achieves R na n (D) is shown in Fig. 5. Proof: Follows from Theorem 6 while D max is shown in Lemma 4.
Next, we show how to treat the case Σ − t 0 but not Σ − t 0 for any t = 0, . . ., n.
Lemma 2: On the Structure of Covariance Matrix of Realization of Corollary 1] Consider Corollary 1 and the covariance matrix K ( Xt, Ỹt) of the vector ( XT t , Ỹ T t ) T , for t = 0, . . ., n, specified by (III.56),where Σ − t 0 but not Σ − t 0 for any t = 0, . . ., n.Then, the following statement holds.The pseudoinverse of Σ − t , (Σ − t ) † , has the same decomposition with Σ − t , with the difference that some of the eigenvalues are zero, i.e., where λ † t,i ≡ λ −1 t,i for i = p, p − 1, . . ., p t,λ , t = 0, . . ., n, and where p t,λ depends on D. Therefore, all the aforementioned results hold with the appropriate changes.
Proof: Follows from properties of pseudoinverse of positive semidefinite matrices.
The next lemma gives a further, generally suboptimal structural properties of the realization matrices Σ − t , Σ t , H t , K Vt , t = 0, . . ., n, of Theorem 5, which imply Σ − t , Σ t , H t , K Vt have spectral decomposition w.r.t. the same unitary matrix U t U T t = I, for t = 0, . . ., n. Lemma 3 (Generally Sub-Optimal Structural Properties of Realization Of Theorem 5): Consider the optimal reproduction distribution, and its realization given in Theorem 5.
Suppose Σ − t defined by (III.42)commutes with Σ t , that is, where and Σ − 0 = K X0 .Then, where where the constraint set (c) The analog of Theorem 7 also holds for Proof: Follows from Theorem 5 and Theorem 6.The next lemma is useful for the identification of the value of D max , if such exists in [0, ∞), beyond which the value of R na n (D) = 0. Lemma 4: Consider the characterization of nonanticipative RDF given in Theorem 5, (III.43).Then, the following statements hold.
(b) For any element of the set defined by (III.44), and Σ − t , t = 0, . . ., n defined by (III.42), the following inequality holds: t , then, the following hold. (III.89) is the solution of the Lyapunov recursion    Moreover, the following structural properties hold: is realized by the nonfeedback scheme depicted in Fig. 6.
Proof: (a) The stated realization is an alternative representation of the realization of Theorem 7.(b).Since (b) This follows directly from the optimal realization of part (a), with state dependent S * t , reproduction Y * t .Remark 7: Theorem 9 addressed Question 2; it states that there is no loss of performance if side information of past reproductions is available to the decoder and not to the encoder, compared to the Gorbunov and Pisker nonanticipatory epsilon entropy, which uses side information at both the encoder and the decoder.
The connection of Theorem 9 to recursive causal filters, implemented using Fig. 3, is now obvious.

V. ACHIEVABILITY OF NONANTICIPATORY EPSILON ENTROPY AND MESSAGE GENERATION RATES WITH NONCAUSAL, CAUSAL AND ZERO DELAY CODES SUBJECT TO RATE LOSS
In this section, we describe the connection of the characterizations of nonanticipatory epsilon entropy and message generation rates to rate loss of causal and zero delay codes w.r.t. the OPTA by noncausal codes, and the construction of encoders and decoders which achieve such rate loss.

A. Asymptotic Analysis
First, we recall that Gorbunov and Pinsker [2] also analyzed, under certain conditions, the nonanticipatory message generation rates, e., existence of optimal test channels is assumed) and the limits exist.
Theorem 10 is analogous to a theorem of Gorbunov and Pinsker [2].
Theorem 10 (Asymptotic Limit): Consider the nonanticipative RDF R na n (D) of Definition 3, and assume the infimum exists and R na (D) lim n−→∞ Then the inequality (V.2), shown at the bottom of the page, holds, where the infimum is taken over all test channels Q Yt|Y t−1 ,X t , t = 0, 1, . .., such and the limits on the right hand side exist and the infimum also exists.A similar inequality holds for R GP (D 0 , . . ., D ∞ ) = lim n−→∞ The proof follows from [2, Theorem 1].By definition, for all elements of Q [0,n] (D).From (V.3), then , where the limit of the right hand side is taken over over all test channels Q Yt|Y t−1 ,X t , t = 0, 1, . .., such that lim n−→∞ ≤ D is satisfied and the limit of the right hand side exists.Finally, taking the infimum over all test channels Q Yt|Y t−1 ,X t , t = 0, 1, . .., such that the two limits exists in the right hand side of (V.2) exist, we obtain the inequality.
Remark 8: Theorem 10 is not very useful, because it presupposes existence of the limits, which may not exist, especially if the sequence of test channels Q Yt|Y t−1 ,X t , t = 0, 1, . . . is time-varying.

B. Noncausal Codes
5. The encoder-decoder at time n, operate with average fidelity or distortion A non-negative pair (R, D) is said to be achievable, if for arbitrary > 0 there exists, for n sufficiently large, a code Let R be the set of all (R, D)-achievable pairs.Then, for D ≥ 0 the infimum of all achievable rates is defined by 1) Converse Coding Theorem: By the converse source coding theorem [6]- [8], if the encoder maps X n into a reproduction Y n chosen from a reproduction codebook with where 2) Direct Coding Theorem: If the converse and direct source coding theorems [6]- [8] hold for the source P X n with a fidelity criterion d 0,n (x n , y n ), then is the optimal performance theoretically attainable (OPTA) by the noncausal codes.
In general, due to the additional nonanticipative or causality constraint (GP-2), i.e., (I.11), which is imposed on the optimal reproduction distribution P Y n |X n of R GP n (D), and Theorem 1. Noncausal codes are often non-desirable, because they experience significant delays, which makes them unsuitable for delay sensitive applications.

C. Causal Codes
Causal codes is a sub-class of noncausal codes, introduced by Neuhoff and Gilbert [14], to circumvent the long coding and decoding delays.The encoder accepts x ∞ = {x 0 , x 2 , . ..} and produces a sequence of bits b 0 , b 1 , b 2 , . ... The decoder accepts b 0 , b 1 , b 2 , . . .and generates the reproductions y ∞ = {y 0 , y 1 , y 2 , . ..} of x ∞ .The cascade of the encoder and the decoder, called reproduction coder, is a system that characterizes the operation of the source code from x ∞ to y ∞ .The reproduction coder is specified by the sequence of maps {f 0 , f 1 , f 2 , . ..} such that the reproduction of x n by y n is determined by y n = f n (x ∞ ) for n = 0, 1, 2, . ...
2) Operational Causal Rates: Let l k (X ∞ ) denote the total number of bits received by the decoder at the time it produces the output sequence y k (under the assumption y k−1 is already produced), given that b 0 , b 1 , b 2 , . . . is the binary sequence produced by the encoder in response to the source sequence x ∞ .The operational rate of the causal reproduction coders is defined by [14] provided the limit exists.Neuhoff and Gilbert showed that the OPTA by causal reproduction coders is, where D > D min and D min is a given threshold.To this date the computation of expressions r op,+ c (D), r + c (D) remains a challenging problem, even for independent and identically distributed sources.In view of the difficulty of computing r + c (D), the following bounds derived in [10], using the converse source coding theorem, are often applied to evaluate the performance of causal codes.

D. Zero-Delay Codes
The above bounds can be extended to include zero-delay codes, as follows.Let r + zd (D) denote the OPTA by the the class of zero-delay codes [15]- [18], which impose the additional constraint that for each n = 0, 1, . .., the encoder outputs are specified by the measurable maps, (m n−1 , x n ) → m n = e zd n (m n−1 , x n ).At each time n message m n is transmitted through a noiseless channel to the decoder, which is specified by the measurable maps, f zd n (•, m n−1 ), such that m n → y n = f zd n (m n , m n−1 ), assuming y n−1 is already generated.The encoder-decoder pair is defined w.r.t.instantaneous codes.It is further shown in [10] that the following bounds hold: In view of the difficulty to evaluate r + c (D) and r + zd (D), the emphasis is often focused on the computation of their lower bound R GP,+ (D).
We now have the following proposition that relates the operational rate of zero-delay codes to the state state dependent Gorbunov and Pinsker nonantipatory entropy R GP S n (D) and its rate, and gives a tighter bound than any previously known bounds.
Proposition 3 (Operational Zero-Delay Rate and State Dependent Gorbunov and Pinsker Nonanticipatory Entropy): Consider the zero-delay code, m n = e zd n (m n−1 , x n ) and y n = h zd n (m n , y n−1 ), for n = 0, 1, . .., with block length distortion d 0,n (x n , y n ), and let l n (x n ) = n t=0 l t (m t ) denote the accumulated number of bits received by the decoder at the time of reproducing y n .
For the zero-delay code that satisfies that satisfies provided the infimums and limits exist.
Proof: First, it is easy to verify that the code e zd t (•), h zd t (•), t = 0, . . ., n satisfies the conditional independence conditions of the set M GP S [0,n] (D).(V.17)-(V.23)follow from standard properties of entropy and mutual information.Inequality (V.24) is due to I(Y t−1 ; S t |S t−1 ) ≥ 0. Inequality (V.25) is due to the fact that randomized strategies from the set M GP S [0,n] (D) includes deterministic code strategies (these also belong to the set M GP S [0,n] (D)).Equality (V.26) is shown in Appendix A, i.e., in the proof of Theorem 1.(a).Taking the infimum of both sides of (V.25) or (V.26) over the distortion constraint and the code maps, we obtain the first inequality in (V.28); the second is due to Theorem 1.(c).Taking the per unit time limit on both sides of (V.25) or (V.26), following by the infimum over the code strategies gives the first inequality in (V.30), and the second is due to Theorem 1.(c).
Remark 9: Clearly, the bound in (V.30) using R GP S n (D) is tighter than the bound using R GS n (D).However, Theorem 1 gives conditions for the two lower bounds to coincide.

E. Rate Loss of Causal and Zero Delay Codes
In recent years [10, Sections II, III], the information quantities R na,+ (D) and R na,+ (D 0 , . . .D ∞ ), are used as explained above, to quantify the rate loss of causal [14] and zero delay codes, w.r.t. the the OPTA by noncausal codes R(D).Therefore, it is necessary to know the expressions for the nonanticipative message generation rates R na,+ (D) and R na,+ (D 0 , . . .D ∞ ) of sources with memory, which to this date were unknown.
1) For high resolution (small distortion D), an optimal causal code for stationary sources with finite entropy (differential) and square error distortion, consists of uniform quantizers followed by a sequence of entropy coders (see [38]).The rate loss due to causality (compared to noncausal codes) is given by the so-called "space-filling loss" of quantizers, which is at most   We conclude by stating that the expressions of nonanticipatory epsilon entropy of multivariate Gaussian sources, with square-error fidelity obtained in this paper, are directly applicable to calculate the rate loss of causal and zero-delay codes w.r.t.noncausal codes.

VI. NUMERICAL EXAMPLE
We provide a numerical example in which we compute an upper bound of the nonanticipative RDF of a multivariate Gaussian process, based on sequential water-filling over the spatial dimensions of a multi-variate Gaussian Markov source, as given in Proposition 2. Specifically, we choose p = 3, i.e., a 3-dimensional source X t and a time-horizon n = 3. Matrices A t , B t are chosen randomly.Additionally, K Wt and K X0 are randomly chosen covariance matrices.The solution is shown in Fig. 8.
In Figure 9, we plot the nonanticipative RDF of our proposed upper bound and of the solution given by [12] via Semidefinite programming (SDP) (although, strictly speaking no realization is provided in [12] which achieves it).

VII. CONCLUSION
The complete characterization of the Gorbunov and Pinsker nonanticipatory epsilon entropy for nonstationary, multivariate Gaussian Markov sources with block letter and letter-by-letter square-error distortion criteria is derived, based on the structural properties of the matrices of two stochastic realizations of the reproduction process that induce the optimal test channel reproduction distribution.One of the realizations is shown to be analogous to the Wyner-Ziv and Wyner [4], [5] RDF, with the nonnanticipatory epsilon entropy defined with respect to an auxiliary random variable, with past reproductions available only at the decoder as causal side information.Additionally, an upper bound is provided by a simple sequential in time, water-filling solution across the spatial dimensions.It is shown via simulations that the upper bound is tight.proceed to identify (H t , K Vt ) ∈ R p×p × S p×p + , t = 0, . . ., n such that (III.35)
we denote its transpose by A T , and for m = p, we denote its trace by tr(A), and the matrix with diagonal entries A ii , i = 1, . . ., p and zero elsewhere by diag{A}.S p×p + denotes the set of symmetric positive semidefinite matrices A ∈ R p×p , and S p×p ++ its subset of positive definite matrices.The statement A A (resp.A A ) means that A − A is symmetric positive semi-definite (resp.positive definite).For x ∈ R, we define {x} + = max{1, x}.
while equality holds if certain conditions hold.For the example of Gaussian Markov source with square-error fidelity, we show equality R GP n (D) = R GP S n (D) by constructing a realization of the RVs (S n , Y n ), in the set M GP S [0,n] (D) defined by (I.14).Due to space limitations, the consequences of the state dependent Gorbunov and Pinsker nonanticipatory epsilon entropy R GP S n (D) are discussed with respect to application area 3) and 1) but not 2).
and a.a.means almost all.(b) The reproduction sequences are generated from the set of distributions ∅}, i.e., the trivial σ−algebra, then (II.10) reduces to (I.11).From the next lemma follows that R GP n (D) = R na n (D) and R GP n (D 0 , . . ., D n ) = R na n (D 0 , . . ., D n ).Lemma 1 (Conditional Independence Conditions): Consider the distributions of Definition 2.Then, for any n = 0, 1, . . ., the following statements are equivalent.MC1: −a.s.. Proof: (a) By representation of Theorem 5 and the definition of Z n , the equivalence is obtained.(b) Recursions (III.51)-(III.55)follow from part (a).By a property of mutual information then

Fig. 4 .
Fig. 4. Block diagram of the feedback realization of Y n of nonanticipative RDF R na n (D) for multivariate Gaussian sources.

Fig. 5 .
Fig. 5. Block diagram of the feedback realization of Y n of R na n (D) for multivariate Gaussian sources based on parallel additive Gaussian noise channels.
(a) R na n (D 0 , D 1 , . . ., D n ) is given by the solution of the optimization problem n .(b) The realization of Y n which induces the optimal test channel of the RDF R na n (D 0 , D 1 , . . ., D n ) is given in Theorem 5.(a), (b).

Fig. 6 .
Fig. 6.Block diagram of the state dependent nonfeedback realization with decoder side information of the state dependent nonanticipative RDF R GP S n (D) for multivariate Gaussian sources.

2 )
For zero-mean stationary Gaussian sources with squareerror distortion (and bounded entropy rate), the OPTA by causal codes r + c (D) exceeds R na,+ (D) by less than approximately 0.254 bits/sample[10],r + c (D) ≤ R na,+(limiting expressions are defined) but no expression is given for R na,+ (D), except for the scalar-valued Gaussian source with letter-by-letter square-error distortion which is already computed in [3,Example 2].3)For zero-mean stationary Gaussian sources with squareerror distortion (and bounded entropy rate), the OPTA by zerodelay codes exceeds R na,+ (D) by less than approximately

Fig. 8 .
Fig. 8. Water-filling subject to letter-by-letter distortion in time-domain for n = 3 time units.

Fig. 9 .
Fig.9.Nonanticipative RDF for different values of letter-by-letter distortion for the upper bound proposed and its numerical computation via SDP[12].
Theorem 3 (R na n (D) of Gaussian Markov Processes With Block Letter Square Error Distortion): Consider the Gaussian Markov process X n {X 0 , . . ., X n } of Definition 1, and a 9If the inverse of [13] n (D) is equivalently given by (III.19) and (III.20),shown at the bottom of the page, where Q Structures (III.10),(III.11)areobtained in[13].(III.12)-(III.15)follow from (III.7), (III.8) and the calculation of cov X t , X t Y t−1 using the parametric realization.Part (b) folllows from part (a) by simple calculations.
p t,12 , t = 0, . . ., n (III.65d)where p t,12 is the number of positive singular values, that depends also on D ∈ [0, ∞), in addition to the indicated t.
p t,12 , t = 0, . . ., n. (III.70) H t is also a symmetric positive semi-definite matrix, and moreover {Σ t , Σ − t , H t , K Vt } have a spectral decomposition w.r.t. the same unitary matrix U t U T t = I, U T t U t = I, for t = 0, . . ., n. Proof: See Appendix E. With the aid of Lemma 3 we can obtain an upper bound on the characterization of the nonanticipative RDF R na n (D), where the matrices {H t , K Vt , Σ t , Σ − t } are diagonalized w.r.t. to the same unitary matrix.Theorem 8 (Upper Bound on the Characterization of R na n (D)): Consider the Gaussian Markov process X n {X 0 , . . ., X n } of Definition 1, and a block symbol squareerror distortion d 0,n (x n , y n ) = n t=0 ||x t − y t || 2 , and assume Σ − t ∈ S p×p ++ , t = 0, . . ., n.
Now, we turn our attention to optimization problem of R na n (D 0 , . . ., D n ) of Corollary 2, to give a simple upper bound based on the solution of a sequential water-filling over the spatial dimensions.
Proposition 2: Consider the complete characterization of R na n (D 0 , D 1 , . . ., D n ) of Corollary 2. An upper bound on R na n (D 0 , . . ., D n ) is determined as follows.
In Theorem 9, we construct the auxiliary RVs S n and Y t = f t (S t , Y t−1 ), t = 0, . . ., n such that the equality R GP S * t,p }, t = 0, . .., n.with diagonal entries in decreasing order, and for t = 1, . . ., n, where the constants θ t > 0, t = 0, . . ., n are 1.A source generates a sequence of symbols X n = x n at a rate of one symbol per second, from a pre-specified joint probability distribution P X n , n = 0, 1, .... 2. The encoder map e n (•) at time n, maps source sequences X n = x n ∈ X n into the compressed messages, according toe n (x n ) ∈ M n {m 0 , m 1 , ..., m |Mn| } (V.4)wherem k ∈ M n is the binary sequence m k = {b 0 , b 1 , ...}, b k ∈ {0, 1}, in response to the source sequence X n = x n .3. The channel connecting the encoder and the decoder is a noiseless channel with capacity C bits per second.4. The decoder map at time n, is a sequence of maps f i (•), i = 0, . . ., n, which observe a compressed representation or message M k = m k ∈ M n of X n = x n , and reproduces x n by Y i = y i , i = 0, . . ., n, defined by Moreover, for any finite n, the OPTA by the zero-delay code satisfies, t ; S t |S t−1 ) − n t=0 I(Y t−1 ; S t |S t−1 ) (V.24) t ; S t |S t−1 ) − I(Y t−1 ; S t |S t−1 ) t ; S t |S t−1 , Y t−1 ), ∀ e zd (•), h zd (•) (V.26)where the infimum in the last inequality is taken w.r.t. an arbitrary S t : Ω → S t , t = 0, . . ., n which is an element of M GP S [0,n] (D).n ,Y n−1 ), Mn=e zd n (M n−1 ,X n ), n=0,1,...