Observability Gramian for Bayesian Inference in Nonlinear Systems With Its Industrial Application

In this letter, we present a novel (empirical) observability Gramian for nonlinear stochastic systems in the light of Bayesian inference. First, we define our observability Gramian, which we refer to as the estimability Gramian, based on the relation to the so-called Bayesian Fisher Information Matrix for initial state estimation. Then, we study the fundamental properties of an empirical version of the estimability Gramian. The practical usefulness of the proposed framework is examined through its application to a parameter and initial state estimation in a natural gas engine cylinder.


I. INTRODUCTION
T HE GROWING interest in digital transformation, especially digital twin, has increased the importance of state estimation problems in complex dynamics. In general statistical inference, utilizing a priori information based on domain knowledge can reduce the amount of data required. Such a framework is called Bayesian inference, where a priori information is formulated as a prior distribution. On the other hand, in the modern control theory for linear deterministic systems, the difficulty of initial state estimation is quantified with a matrix called observability Gramian. It is known that data-driven indirect calculation is possible using the initial state and output trajectory, which is referred to as the empirical observability Gramian [1], [2], [3], [4], [5], [6], [7], [8]; See Section II-B.
Although the empirical observability Gramian is just conceived from the formal similarity to linear systems' observability Gramian, it can be viewed as the Fisher Information Matrix as shown in Proposition 1. The main contribution of this letter is to provide an observability Gramian-based framework to utilize the a priori information from a statistical inference point of view.
To be more precise, we newly define a matrix (estimability Gramian in Definition 1 below) that quantifies the difficulty of initial state estimation in terms of the Bayesian Fisher Information Matrix and clarify its relationship to the conventional observability Gramian. We also examine the practical usefulness of the proposed framework through its application to a parameter and initial state estimation in a natural gas engine cylinder.
Organization: The remainder of this letter is organized as follows. In Section II, we introduce the Bayesian Fisher Information Matrix and empirical observability Gramian as preliminary. In Section III, we define the estimability Gramian and confirm its consistency with the observability Gramian for the linear system case. In Section IV, we describe the data-driven method of calculating the estimability Gramian. In Section V, a numerical example illustrates the utility of the proposed method. Some concluding remarks are given in Section VI.
Notation: Let [a] denote {1, 2, . . . , a} for a natural number a. Let R n denote n-dimensional Euclidean space, and e i denote its ith regular basis. Let E[ · ] denote the expected value of a random variable. Let I denote the Identity matrix. Let A B denote when A and B are both symmetric and A − B is positive semidefinite. Let x 2 A denote the quadratic form x Ax. We represent a random variable in ordinary font and its realization in bold font. For example, the realization of the random variable x is x. When random variable x has a probability density function (p.d.f.), denote it as ϕ x (x), and let the joint p.d.f. of the random variable x, y be ϕ x,y (x, y), and the conditional p.d.f. ϕ x (x|y = y). When two random variables x and y are independent, we denote x ⊥ y. Let σ min (A) denote the smallest eigenvalue of matrix A.

II. PRELIMINARY
In this section, we describe the Bayesian Fisher Information Matrix and the empirical observability Gramian as preliminary knowledge of our proposed method.

A. Bayesian Fisher Information Matrix
Let (x, y) be a pair of multivariate random variables and consider the point estimation of x using the observation of y.
Note that this matrix can be related to the Hessian: Then, the Cramer-Rao Bound [9] characterizes a fundamental estimation performance limitation: for any unbiased estimator x (i.e., E[x(y)|x = x] = x), the error covariance matrix has the following lower bound: Note that I y (x = x) depends on x and not on the prior distribution of x.
On the other hand, Bayesian FIM (BFIM) is defined by For this matrix, we have Bayesian Cramer-Rao Bound [10]: holds for estimatorx(y) which satisfies weak unbiasedness condition [11]. In contrast to FIM I y (x = x), BFIM J x,y depends on the prior distribution of x and not on any realization x. Note that the left-hand side of (7) is the average (over realization of x) of error covariance matrices. Therefore, the Bayesian Cramer-Rao Bound implies that the smaller the minimum eigenvalue of BFIM J x,y is, the more difficult it is to estimate x.

B. Empirical Observability Gramian
Consider a linear discrete-time deterministic system When the matrix A is Schur stable, the observability Gramian is a well-known measure for the difficulty of the initial state estimation from an output sequence [4], [12]. By the fact that A is Schur stable, G o can be approximated by G ō k which is a finite sum up to a sufficiently largek in (9). However, this calculation can be done only after obtaining A, C by system identification. On the other hand, the empirical observability GramianĜ ō k , which is proposed in [1] and re-examined in [3], can be used to obtain G ō k using output trajectories without system identification: This empirical observability GramianĜ ō k coincides with G ō k for linear systems [3]. Even if the system is nonlinear, the empirical observability Gramian can be defined in the same way by setting y (ilm) k to be the observation at time step k with the initial state x (ilm) 0 . The empirical observability Gramian is used in model reduction [2], [3] for nonlinear systems, and sensor placement problem [13], [14], [15].

III. OBSERVABILITY GRAMIAN FOR NONLINEAR SYSTEMS
Let us consider the nonlinear discrete- For mathematical simplicity, we consider the finite terminal timek. Let x k ∈ R n (y k ∈ R q ) be the state (observation) at time step k, and let f : R n → R n and h : R n → R q denote a map representing state transitions and observation processes. Let y 0:k−1 denote the output trajectory from time step 0 tok − 1. The initial state x 0 follows a prior distribution Q and the observation noise w k at time step k follows a normal distribution N (0, w ) with w 0. The mean vectorx 0 and covariance matrix x of Q are assumed to be known and finite.
Recall that the larger an eigenvalue of the observability Gramian, the more significant the contribution to the observed energy of the initial state in the corresponding eigenvector direction. This fact implies that the state of that direction is easy to estimate [4]. Actually, the observability Gramian and the FIM has a close relationship as follows: Proposition 1: For the linear system given by the FIM I y 0:k−1 (x 0 = x 0 ) is given by independently of the initial state x 0 . In particular, I y 0:k−1 (x 0 = x 0 ) = G ō k for w = I. Proof: This result follows from a straightforward calculation.
Remark 1: Even if w = I, by normalizing the observation noise (i.e. by making the covariance matrix of w k equal to I) using the presence of a matrix M satisfying M w M = I because of w 0, it is possible to match I y 0:k−1 (x 0 = x 0 ) to the observability Gramian G ō k of the system where the observation is replaced to z k := My k from (12).
This fact motivates us to define the observability Gramian for nonlinear systems as the FIM. Note that I y 0:k−1 (x 0 = x 0 ) depends on x 0 for nonlinear system in general. Instead of considering this initial-state dependent observability Gramian, we attempt to introduce a novel observability Gramian for nonlinear systems with prior initial distribution: Definition 1: For nonlinear system in (11), the Estimability Gramian G o is defined by The following theorem is useful to understand the relation between the observability and estimability Gramians: the estimability Gramian is given by In particular, for w = I we have as −1 x → 0. Proof: Let Ok ∈ R qk×n be a matrix in which CA k (k ∈ [0,k − 1]) are arranged vertically, and w,k a block diagonal matrix in which w is arranged ink diagonal blocks. Then, the random vector [x 0 y 0:k−1 ] is given by ⎡ Since x 0 and w 0:k−1 follow an independent Gaussian distribution, holds. Then, the logarithm of joint p.d.f. of [x 0 y 0:k−1 ] can be written as This shows where H denotes n × n size left upper block matrix of −1 k . Since H does not depend on the value of x 0 , y 0:k−1 , we obtain J x 0 ,y 0:k−1 = H from (6). Next, which is equal to the right-hand-side of (16). Here, we used the generalized Schur complement [16] in (20) and the Woodbury matrix identity [17] in (21). Finally, (17) readily follows from (16).
Eq. (16) shows that G ō k is expressed as the sum of the FIM and the precision matrix of the prior distribution of the linear Gaussian system. Eq. (17) indicates that the estimability Gramian is equal to the observability Gramian when no a priori information is given.

IV. DATA-DRIVEN CALCULATION OF ESTIMABILITY GRAMIAN
In this section, we propose a data-driven calculation method of the estimability Gramian, which is similar to the conventional empirical observability Gramian.

A. Empirical Estimability Gramian
denote the eigenvalues and normalized eigenvectors of x , respectively. Let := diag(σ 1 , . . . , σ n ) be a diagonal matrix with σ i on its diagonal components, and U := [u 1 · · · u n ] be the orthogonal matrix with eigenvectors arranged horizontally. Then the singular value decomposition of x can be written as x = U 2 U . We are now ready to define an empirical version of G ō k : Definition 2: For nonlinear stochastic system in (11), the Empirical Estimability GramianG ō k is defined bỹ The following theorem ensures thatG ō k is an empirical version of G ō k : Theorem 2: For linear Gaussian system (15), holds. Proof: Introducing z := −1 U x 0 ∼ N (0, I),ỹ k,i in (23) can be rewritten as where we used e i = σ i e i and the equality Uni(B) denotes the uniform distribution on the unit ball in R n ; See [18,Th. 4.3]. We, therefore, haveỸ k = CA k U, which completes the proof via comparison between (16) and (23).
Similarly to the empirical observability Gramian, the empirical estimability Gramian is justifiable through its consistency for the linear case and is computable for nonlinear systems.

B. Sample Approximation
For the data-driven calculation, we employ sample approximation of the expectation in (23).
Remark 2: In (10), the initial states for calculating empirical observability Gramian are designated by c m and U l , whose valid choice were not discussed in any of existing result, to the best of our knowledge. This shows a clear contrast to (28) that the initial state samples should be generated according to the given prior distribution Q.
Thanks to the law of large numbersĜ ō k converges toG ō k in the large sample size N, M limit. Also, according to Theorem 2,Ĝ ō k in the linear Gaussian system is valid as a sample approximation of the estimability Gramian G ō k .

V. INDUSTRIAL APPLICATION
In this section, we verify the usefulness of our proposed method through its application to an experimental design of a natural gas engine cylinder depicted in Fig. 1.

A. System Dynamics and Prior Distribution
The dynamics of the mixed gas inside the cylinder during the compression stroke are described as where the physical meaning and value of each variable are summarized in Table I. Eq. (29) is derived from the first law of thermodynamics and the ideal gas law. The constant pressure specific heat C p is expressed by a function of temperature T that can be obtained by In this letter, we adopted the function in [19]: It is crucial to control the weight of air and fuel (i.e. m air and m fuel ) filled in the cylinder to improve combustion efficiency. However, we cannot measure it directly. This motivates us to estimate m air and m fuel by using the noisy measurement of the pressure P(t) and the a priori information below.
In our experiment setup, • the initial values of the crank angle θ and the cylinder volume V are available as θ(0) = − π 3 , V(0) = 0.000207, and  • we have an a priori information of P(0) and m fuel represented by the prior distribution P(0) ∼ N (10, 0.01) and m fuel ∼ Uni(0, 0.2), respectively. In order to indirectly know the weight of the air that cannot be physically measured, we can prepare the following three experimental conditions shown in Table II This may indicate that a good estimation would be obtained based on the experimental result for Case 3 because the variance of ϕ m air under Case 3 is smaller than the others. However, this is incorrect, as we will see below.

B. Proposed Method
The constant parameters m air , m fuel are treated as the initial condition forṁ air = 0,ṁ fuel = 0. The difficulty of initial state estimation is then evaluated by calculating the empirical estimability Gramians for each of the three prior  Table III shows the empirical estimability Gramians and their minimum eigenvalue. Recall that the smaller the minimum eigenvalue of BFIM, the more difficult it is to estimate the initial state; See (7). These observations counterintuitively suggest that Case 2, which has the largest minimum eigenvalue, will be the easiest to estimate P(0), m air and m fuel . This trend remained the same for all sample sizes and time horizons tested.

C. Markov Chain Monte Carlo Methods
The empirical Gramian based methods [1], [2], [3], [4], [5], [6], [7], [8] have theoretical underpinnings only for linear systems, but are widely used for nonlinear systems because of their practical usefulness. Since our theoretical results also cover linear systems only, we virtually observe what happens in a physical experiment and confirm that Case 2 is preferable for the estimation. To this end, we numerically generate one typical trajectory instead of a physical observation and then calculate the posterior distribution of P(0), m air and m fuel that is the (distributional) estimate under the obtained observation.
As a typical real parameter, we choose P * (0) = 10, m * fuel = 0.1 and let m * air be the median of the distribution in Fig. 2, which satisfies m * air ≈ 4.0 for all three Cases. The solution P * (t) to (29) under these parameters (with additive observation noise following N (0, 0.0025)) is used a typical pressure trajectory; See Fig. 3. Next, using Markov Chain Monte Carlo (MCMC) method [20] implemented by Stan [21], 10,000 posterior samples were calculated. Computation time for MCMC was about 2549s, 973s, 3899s for Case 1, Case 2, and Case 3, respectively.
The simulation resulted in posterior distributions of P(0) and m air with nearly the same variance for all three Cases. In contrast, there was a clear difference for m fuel as shown in    4. It can be observed that the posterior distribution in Case 1 and 3 are not so different from the prior distribution (i.e., Uni(0, 0.2)), whereas Case 2 has a sharp-shaped distribution. This means that we can obtain a better estimate of m fuel in Case 2 than in the other Cases, which is consistent with the result predicted by using the proposed empirical estimability Gramian.
Remark 3: Calculating the posterior distribution for each of all the trajectories used for the empirical estimability Gramian is overly time-consuming for our purpose.

VI. CONCLUSION
In this letter, we derived the concept of the estimability Gramian for initial state estimation with nonlinear stochastic systems, which is based on the relation to BFIM. For linear Gaussian systems, we proved that it agrees with the BFIM and the observability Gramian under a certain condition. The practical usefulness of the proposed framework is examined through its application to a parameter and initial state estimation in a natural gas engine cylinder.
We are currently applying the proposed method to a more realistic model of a hydrogen gas engine. From a theoretical point of view, the so-called observability function is revisited for a better understanding of the estimability Gramian; See [22, eq. (2)] for the controllability Gramian. Also, the relationship with existing works [23], [24], [25] needs to be clarified.