Bayesian Estimation of Multivariate Pure Moving Average Processes

The multivariate estimation problems arise if the observations are available for several related variables of interest. The multivariate time series may be found in many fields of application such as economics, meteorology and utilities. The current study has three main objectives. The first one is to develop an approximate convenient Bayesian methodology to estimate the parameters of multivariate moving average processes. The second objective is to investigate the numerical efficiency of the proposed technique in solving the estimation problems by conducting a wide simulation study. The last objective is to study the sensitivity of the numerical efficiency with respect to the parameters values and time series length. The results show that the proposed technique succeeded in estimating the parameters of the multivariate moving average processes. The results are not sensitive to the changes in parameter values or in time series length.


I. INTRODUCTION
The multivariate (vector) time series may be found in many fields of application such as economics, business, meteorology, hydrology and utilities. In economics, for example, one may record quarterly money supply y(t,1), real interest rate y(t,2) and gross national product y(t,3). These variables are modeled, estimated, and forecasted jointly using a multivariate model in order to have an insight into the dynamic interrelationship among them and increase the precision of the estimates and forecasts (See [1]).
Usually the Bayesian and non-Bayesian approaches of univariate and multivariate time series are based on a class of parametric models such as moving average (MA) models. Model estimation and forecasting are two main phases in time series analysis. Although they are closely connected, they are usually treated as two separate steps. The non-Bayesian literature on univariate and multivariate of time series is vast and can be found in [1]- [6].
One may trace three different Bayesian approaches to analyze univariate time series processes. The first approach is to use the numerical integration. [7] presented this approach to implement the identification, estimation, and forecasting phases of autoregressive moving average (ARMA) models The associate editor coordinating the review of this manuscript and approving it for publication was Alessandra Bertoldo. with low orders. However, the use of numerical integration is difficult and time consuming especially in the multiparameter's cases. The second approach is the use of analytical approximations in order to have standard posterior distributions. Several approximations have been developed to solve the estimation and forecasting problems such as [8]- [10]. The last approximation has been extended to the case of seasonal models by [11], [12]. Finally, the third approach is to use sampling based methods which include: the Gibbs sampler technique ( [13]), [14]- [17], data augmentation algorithm ( [18], [19]) and the importance sampling algorithm ( [20]).
Regarding the multivariate version, the estimation and forecasting, from non-Bayesian viewpoint, of vector moving average processes, denoted by VMA, have been extensively studied and investigated. [21] gave methods to estimate the parameters of pure autoregressive and pure moving average processes. [22]) presented a practical iterative procedure to estimate the parameters of mixed VARMA processes. [23] extended the Box and Jenkins method [24] to derive the exact likelihood function of pure vector moving average processes. [25] proposed three different methods to compute the exact likelihood function of vector moving average processes. However, it seems that the non-Bayesian literature on the estimation problems of multivariate processes traditionally focused on maximum likelihood methods because of VOLUME 10, 2022 This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/ their desirable properties. For more about these methods, the reader is referred to [1], [26]- [29]. However, there have been extensive investigations in order to ease the maximum likelihood routine and make it faster. For instance, [30] developed an efficient numerical expression for the likelihood function of stationary and partially non-stationary autoregressive moving average processes. Another development was done by [31] who proposed a new procedure for the exact maximum likelihood estimation of mixed VARMA models. [32] proposed a consistent and fast iterative procedure to estimate the parameters of VARMA processes. Many other investigations have been done to simplify the computation process of VARMA models such as [33], [34].
On the other hand, the Bayesian analysis of multivariate time series is being developed and the Bayesian literature devoted to the analysis of VARMA models is sparse. [35] has introduced approximate techniques to estimate the parameters of VARMA process and predict their future values. [36] used sampling-based inference to analyze VARMA models. [37] developed a Gibbs sampler for the basic VARMA models. However, for well understood reasons, one may say that an analytic Bayesian procedure to estimate the parameters of vector moving average process and predict their future values have not been explored yet.
The main objective of this study is to develop an approximate convenient Bayesian methodology to estimate the parameters of multivariate moving average, denoted by VMA k (q), processes. Using n vectors of observations, the joint posterior probability density of the model parameters is developed in an approximate convenient form using a matrix normal-Wishart prior density (or Jeffreys' vague prior). Then, one may use this approximate joint posterior probability density to develop point estimates or construct HPD (Highest Posterior Density) regions for the parameters. A wide simulation study is conducted, using the modern specialized SCA package, in order to examine the numerical efficiency of the proposed Bayesian estimation procedure.
The study is designed as follows: Section 2 defines the multivariate (vector) pure moving average (VMA k (q)) processes in scalar and matrix notations. Section 3 constructs a convenient approximation for the likelihood function of the VMA k (q) processes. In addition, it proposes appropriate informative and non-informative priors. Section 4 derives an approximate joint posterior probability density of the model parameters in a standard form, namely matrix t distribution. In addition, it explains how to construct point and interval estimates for the parameters. Finally, section 5 is devoted to examine and assess the numerical effectiveness of the proposed Bayesian estimation procedure in solving the estimation problems of some bivariate pure moving average processes.

II. MULTIVARIATE MOVING AVERAGE PROCESSES
Let {t} be a sequence of integers, q ∈ {1,2,. . . }, k ∈ {2,3,. . . }, θ i (i = 1, 2 . . . , q) are k×k unknown matrices of real constants, {y(t)} is a sequence of k×1 real observable random vectors and {ε(t)} is a sequence of k×1 independent and normally distributed unobservable random vectors with zero mean and a k×k unknown precision matrix. Then the Multivariate (vector) moving average process of orders q, denoted by VMA k (q), is defined as where I k is the identity matrix of order k and B is the usual backward shift operator. The k×k matrix polynomial θ q (B), of degree q in the backshift operator B, is known as the moving average operator of order q. The process y(t) is invertible if all the roots of determinantal equation |θ q (B)| = 0 lie outside the unite circle. Consider the special case VMA 2 (1), which is called bivariate moving average process of order one, with moving average coefficients Then the model (2.1) can be written as where And Thus, one may write the observation y(t) of the VMA 2 (1) process as The model (2.2) can be written compactly for n observations as Here we consider y(t,1) and y(t,2) as dependent variables, while ε(t-1,1) and ε(t-1,2) are considered as input or independent variables.
In general, one can write the VMA k (q) process as The model (2.5) can be written in more compact expression as where Y is a matrix of order n×k with ij th element equals y(i,j), i = 1, 2, . . . , n; j = 1, 2, . . . , k. That is The matrix X of order n×kq is defined by

III. AN APPROXIMATE LIKELIHOOD FUNCTION OF MULTIVARIATE MOVING AVERAGE PROCESSES
The class of models (2.6) represents the general class of vector moving average models of order q, denoted by VMA K (q). It is very useful in modeling time series data arise in many areas of application such as economics, medicine, ecology and chemistry. The likelihood function of the parameters and T is where ∈ R kq×k , T > 0, and The expression (3.2) is a recurrence relation for the residuals and the m th component of the residual ε(t) can be written as The recurrence relation (3.3) causes the main problem in developing the exact analysis of multivariate VMA K (q) processes. However, this recurrence may be used to evaluate the residuals recursively if θ i and the initial values of the residuals are known. The proposed Bayesian approximation is based on replacing the exact residuals by their least squares estimates and assuming that the initial residuals equal their means, namely zero. Thus, we estimate the residuals recursively bŷ whereθ i.mj are the nonlinear least squares estimates of parameters θ i.mj . Using the estimates of the residuals, it is possible to write the likelihood function approximately as whereX (t − 1) is the same as X (t − 1) but using the estimated residuals instead of the exact ones. A convenient choice of the joint prior density of the parameters and T is the following matrix normal-Wishart distribution The hyper-parameters a is a scalar, D ∈ R kq×k , W is a kq×kq positive definite matrix and is a k × k positive definite matrix. If there is little information about the parameters, a priori, it is possible to use Jeffreys' vague prior

IV. THE POSTERIOR ANALYSIS OF THE MULTIVARIATE MOVING AVERAGE PROCESSES
In developing the proposed Bayesian estimation technique for MA K (q) models, we will assume that the order q is known. The posterior density of the model parameters and T is the Bayesian tool to estimate the unknown parameters. The posterior density ξ ( , T |S n ) is the conditional density of the parameters given the time series data and υ = n + a − k + 1 In addition, the marginal posterior distribution of T is a Wishart with parameters (n +a, C-BA −1 B).
Corollary 2: If the approximate conditional likelihood function (3.5) is combined with the Jeffreys' vague prior (3.7), the marginal posterior distribution of is a matrix t with parameters: −1 B, υ). However, the quantities A, B, C and υ will be modified by letting W→0 (kq×kq), a → −kq and → 0 (k×k).
Theorem (1) solves the estimation problem of vector moving average processes. In order to see this let γ i be the i th column of the coefficients matrix and write = [γ 1 γ 2 · · · γ kq ]; i.e. we write as a row vector of dimension k 2 q. Partition A −1 B similar to , then the posterior expectation and variance of the parameters matrix are where ⊗ denotes the usual Kronecker direct product. The variance covariance matrix is of dimension k 2 q. The elements of the principle diagonal represent the variances of the elements of the coefficients matrix as partitioned above. The matrix t approximation can be analytically used to make exact statistical inference for a parameter point 0 for the complete set using F distribution if k= 2, see [38] (p. 451). For k ≥ 3 approximate statistical inferences can be done using χ 2 distribution, see [38] (p. 452).
With regard to a point estimate of , one may use the posterior expectation A −1 B. In addition, the matrix t approximation can be used to construct an exact H.P.D. region for a specific row or a specific column using the usual multivariate t distribution. Furthermore, an exact HPD interval for any element of the matrix can be constructed using univariate student t distribution. Finally, we can make approximate statistical inferences about parameters belonging to a certain block sub matrix of using χ 2 distribution, see Box

V. RESULTS
One of the main objectives of this study is to investigate the effectiveness of the proposed Bayesian estimation methodology. In order to achieve this objective, three simulation studies have been conducted. The proposed methodology is employed to estimate the parameters of MA 2 (1) processes with various parameters values. All computations were performed on a PC using SCA package.
The simulation process has the following general design: first, a time series is generated from a given bivariate moving average model of order one with certain parameters. Second, the generated data are used to evaluate the posterior densities of the coefficients and the covariance matrix of the model. Third, performance criteria are calculated for the posterior density. Fourth, 500 replications of the above three steps are done. Finally, the output is summarized in tables.
Generally, the simulation process begins by generating 500 data sets of bivariate normal variates, each of size 500, to represent the noise ε(t), which is assumed to follow N 2 (0,V). These data sets are then used recursively to generate 500 realizations, each of size 500, from a certain bivariate moving average process of order one with certain parameters. The initial values of the errors ε(t) = [ε(t, 1)ε(t, 2)] are considered to equal their unconditional mean, namely zero. The first 200 pairs of observations are deleted in order to remove the initialization effect and hence we get 500 bivariate time series each of length 300. From the 300 observations, a bivariate time series of the desired length is used to estimate the posterior densities of the coefficients' matrix θ and the covariance matrix of the noise term using the proposed Bayesian methodology. In our simulation studies, the time series lengths are taken to be 30, 50, 100, 150, 200, and 300. Each simulation study corresponds to specific coefficients and for all of them the variance-covariance matrix of the noise is fixed at The values of the coefficients are selected to represent different positions in the invertibility domain of the MA 2 (1) model. It might be important here to emphasis that the Jefferys' noninformative prior is used to conduct all the simulation studies.
Our main concern is to study the numerical efficiency of the proposed Bayesian estimation methodology by calculating three goodness criteria, namely P * , MAD and MAPE. The measure P * checks the goodness of interval estimates calculated from a specified posterior density of the model's coefficients. Defining 95% highest posterior density (HPD) region as the shortest interval having probability 0.95 centered at the mean of the posterior density, the percentage P * lm of time series for which the HPD region of the posterior density contains the true value of the coefficient is defined as were n * lm is the number of time series where the HPD region includes the true value of θ lm . P * lm is evaluated such that the higher the value P * lm , the better the performance of the posterior density in estimation. It should be noted that according to P * lm a certain coefficient may be ruled as belonging to the HPD region or not. However, P * does not account for the distance of this coefficient from the center of the region or its boundaries. The MAD stands for the mean absolute deviation of the true coefficient from the mean of its marginal posterior density and is defined by   where θ lm and E(θ j.lm ) are the lm th component of the coefficients matrix and the mean of its marginal posterior density of the j th simulated series, respectively. The MAPE stands for the mean absolute percentage deviation of the true coefficient from the mean of its marginal posterior density and is defined as where θ lm and E(θ j.lm ) are defined as above.
The numerical efficiency of the proposed estimation procedure will be examined with respect to the time series length (n) as well as the parameters of the selected model.
In order to study the numerical efficiency of the estimation process of the covariance matrix (5.1), the MAD and the MAPE will be computed for the elements of the covariance matrix using the forms     estimated covariance matrix of the j th time series defined above. Simulation I, as an illustration, begins with the generation of 500 data sets of bivariate normal variats, each with 500 observations to represent ε(t,1) and ε(t,2) respectively. These data sets are then used to generate pairs of 500 realizations, each of size 300, from MA 2 (1) process with = 0.9 −0.2 1.1 −0.9 Assuming the starting values of the errors are zeros and Jefferys' prior is employed, the second step is to carry out all computations required to estimate the posterior density of the coefficients matrix of each of the 500 realizations and compute the P * , MAD and MAPE values. Such computations are done for a specific time series length using the first n observations of each generated set. This second step is repeated for each chosen time series length. The results of Simulation I are summarized in tables (1) up to (8). Table (1) is devoted to the results of the coefficient θ 11 . It consists of six rows corresponding to the selected time series lengths and ten columns corresponding to the computed measures. Tables (2) up to (4) display the results of θ 12 , θ 21 and θ 22 respectively and are designed similarly. Moreover, the results of the estimation of the covariance matrix are displayed similarly in tables (5) up to (8).      Simulations II and III are designed in a similar manner but using different coefficients. They are The results of this simulation study are shown in tables (1) up to (8). The first four tables present the results of the estimation of the elements of the coefficients matrix. Whereas, the last four tables present the results of the estimation of the elements of the covariance matrix.
Regarding table (1) one may notice the following comments concerning the coefficient θ 11 : First, the values of the percentage P * fluctuate around 95% (its theoretical value). This means that the marginal posterior distribution of θ 11 provide a good interval estimation. Second, the values of both the MAD and the MAPE are small at all time series lengths and tend to decrease as the time series length increases. This indicates that the posterior means of θ 11 get closer to it as the time series length increases. Moreover, both the mean and the median of the posterior means tend to concentrate around the true value of the coefficient and get closer to it as the time series length increases. Finally, the standard  deviation of the posterior means tend to decrease as the time series length increases.
The conclusions obtained from tables (2) to (4) are similar.
The following four tables display the results of the estimation of the covariance matrix. The conclusions obtained from tables (5) to (8) are similar to those obtained from table (1).
Regarding tables (5) to (8), one may notice the following comments concerning the elements of the covariance matrix: First, the values of both the MAD and the MAPE for all elements are small for all time series lengths and tend to decrease as the time series length increases. This indicates that the posterior mean of the covariance matrix gets closer to the true one as time series length increases. Second, the means and the medians of the posterior means tend to concentrate around the true values of the elements and get closer to them as the time series length increases. Finally, the standard deviations of the posterior means tend to decrease as the time series length increases.

B. RESULTS OF SIMULATION II
The results of this simulation study are shown in tables (9) up to (16). The tables are designed the same way as the tables in the previous subsection and the conclusions obtained are similar.

C. RESULTS OF SIMULATION III
The results of this simulation study are shown in tables (17) up to (24). The tables are designed the same way as the tables in the previous subsection and the conclusions obtained are similar.

VI. CONCLUSION
The results of the three simulation studies displayed in the previous three sections show that the suggested estimation technique succeeded in estimating the parameters of the bivariate moving average process of order 1.