Loading [MathJax]/extensions/TeX/boldsymbol.js
Improved Shrinkage Estimators of Covariance Matrices With Toeplitz-Structured Targets in Small Sample Scenarios | IEEE Journals & Magazine | IEEE Xplore

Improved Shrinkage Estimators of Covariance Matrices With Toeplitz-Structured Targets in Small Sample Scenarios


The structure and main contributions.

Abstract:

Shrinkage regularization is an effective strategy to estimate the covariance matrix of multi-variate random vector in small sample scenarios. The purpose of this paper is...Show More

Abstract:

Shrinkage regularization is an effective strategy to estimate the covariance matrix of multi-variate random vector in small sample scenarios. The purpose of this paper is to propose improved linear shrinkage estimators of covariance matrix as two types of Toeplitz-structured target matrices are respectively employed in the shrinkage procedure. Under Gaussian and non-Gaussian distributions, the corresponding shrinkage estimators are respectively obtained in closed form by unbiasedly estimating the unknown scalar quantities which involve the true covariance matrix. Compared with the existing estimators of same type, the proposed covariance estimators show a significant improvement on the mean squared error in numerical simulations. Moreover, example applications including portfolio risk estimation and classification of real data are provided for verifying the performance of proposed covariance estimators in small sample scenarios.
The structure and main contributions.
Published in: IEEE Access ( Volume: 7)
Page(s): 116785 - 116798
Date of Publication: 20 August 2019
Electronic ISSN: 2169-3536

Funding Agency:


CCBY - IEEE is not the copyright holder of this material. Please follow the instructions via https://creativecommons.org/licenses/by/4.0/ to obtain full-text articles and stipulations in the API documentation.
SECTION I.

Introduction

Problem of estimating a covariance matrix when the dimension is comparable to or even larger than the sample size has attracted considerable research interest, as the high-dimensional data processing has become increasingly common in both statistics and a wide spectrum of applications such as finance and biomedicine [1]–​[3].

We consider an independent and identically distributed (i.i.d.) sample \mathbf {x}_{1}, \mathbf {x}_{2}, {\dots }, \mathbf {x}_{n} drawn from an unspecified p -variate distribution with mean zero and covariance matrix \boldsymbol{\Sigma } . The sample covariance matrix (SCM), one of the most widely adopted covariance matrix estimators, is \begin{equation*} \mathbf {S}= (s_{i j})_{p \times p} = \frac {1}n \sum _{i = 1}^{n} \mathbf {x}_{i} \mathbf {x}_{i}^{T}.\tag{1}\end{equation*} View SourceRight-click on figure for MathML and additional features. The SCM enjoys numerous desirable properties such as unbiasedness and consistency [4]. It is also known as the maximum likelihood estimator under Gaussian distribution when n > p . Whereas, in the scenarios where the sample size n is not much larger than the dimension p , the SCM is likely to result in a worrying deviation from the true covariance matrix, leading to serious consequences if the SCM is employed as the covariance matrix estimator in real applications [5]. Furthermore, when n < p , the SCM becomes singular and cannot be directly employed in the scenarios where the covariance matrix estimator is required to be positive definite [6]–​[8]. In these situations, the shrinkage regularization is one of the most commonly used methodologies to improve the performance of the SCM [9]–​[12]. On one hand, the linear shrinkage estimation can effectively balance the variance and bias and usually generate a well-defined covariance estimator [13]. On the other hand, the non-linear shrinkage estimation, which modifies the eigenvalues of the SCM by random matrix theory but remains the eigenvectors unchanged, can also lead to a well-defined covariance estimator [14]. From a Bayesian perspective, the non-linear shrinkage estimation may underutilize the prior information in finite sample scenario.

One of the key issues in linear shrinkage estimation is to determine the optimal shrinkage intensity under certain evaluation criterion such as the mean squared error (MSE) or quadratic loss [15]–​[17]. When the MSE criterion is adopted, an unavailable optimal shrinkage intensity will be obtained because the MSE generally involves the mathematical expectation operator and the true (unknown) covariance matrix [18]. Several approaches, such as the plug-in strategy and cross-validation, have been suggested to tackle with this challenge [19], [20]. The plug-in strategy is one of the most intuitive and frequently-used approaches. In this methodology, the optimal shrinkage intensity is usually obtained in closed form, and then the estimates of unknown scalar quantities which involves the true covariance matrix are constructed to make the optimal shrinkage intensity available. Note that the shrinkage estimator is a convex combination of the SCM and a specified target matrix, and the corresponding shrinkage intensity should lie in the unit interval. In single target matrix scenario, the shrinkage estimator can be analytically expressed by simply clipping [20]. Whereas, when multiple target matrices are simultaneously employed, the corresponding multi-target shrinkage estimation can be formulated as a convex quadratic problem (CQP) with inequality constraints [21]. In this scenario, the optimal shrinkage intensity and the corresponding optimal multi-target shrinkage estimator can hardly be analytically expressed, therefore the plug-in strategy cannot be directly employed [22].

Putting the plug-in strategy in another perspective, we can find an estimate of the objective function in advance and implement the optimization procedure whereafter to obtain the shrinkage estimator [21]. Furthermore, the performance of the available solution completely depends on the choice of target matrix \mathbf {T} and the estimates of the unknown scalar quantities in the MSE. For the spherical target matrix \mathbf {T}_{s} = \frac { {\text {tr}}(\mathbf {S})}{p} \mathbf {I}_{p} where \mathbf {I}_{p} denotes the p\times p identity matrix, the involved unknown scalar quantities are consistently estimated under distribution-free setting in [18], and then unbiasedly estimated under Gaussian distributions in [23], [24] to enhance the statistical performance of shrinkage estimator. For the diagonal target matrix \mathbf {T}_{d} = {\text {diag}}(s_{11}, {\dots }, s_{pp}) , the involved unknown scalar quantities are asymptotically unbiasedly estimated in [24] and then strictly unbiasedly estimated in [19]. Beyond the aforementioned, a class of sparse target matrices is studied in [25] and the corresponding unknown scalar quantities are unbiasedly and consistently estimated under Gaussian and non-Gaussian distributions respectively. For more target matrices, one can further refer to [21], [26], [27]. In many real applications, the true covariance matrix of multi-variate random vector is Toeplitz or Toeplitz-like [28], [29]. Hence a target matrix with Toeplitz structure, which represents the prior information of true covariance matrix to some extent, should be employed in shrinkage estimation. The common covariance matrix and Toeplitz-structured matrix are suggested as the target matrices in this scenario [21], [26]. For the Toeplitz-structured target matrix, the shrinkage estimator is developed by asymptotically unbiasedly estimating the unknown scalar quantities under Gaussian distribution [30].

In this paper, for the common covariance target matrix and Toeplitz-structured target matrix respectively, we devote to constructing the exactly unbiased estimates of unknown scalar quantities in shrinkage estimation under both Gaussian and non-Gaussian distributions. The main contributions of this paper are as follows:

  1. For the common covariance target matrix and Toeplitz-structured target matrix, by respectively defining matrix-variate functions for effectively characterizing the structure of population covariance matrix, the corresponding optimal shrinkage intensities in the sense of minimizing the MSEs of shrinkage estimates are obtained in closed form and proved to lie in the unit interval. Especially, under Gaussian distribution, the shrinkage intensities are analytically expressed as the functions of true covariance matrix.

  2. Respectively under Gaussian and non-Gaussian distributions, all the unknown scalar quantities in the optimal shrinkage intensities are estimated unbiasedly rather than asymptotically unbiasedly, and then the estimates of the corresponding MSEs are obtained by plug-in strategy.

  3. For common covariance target matrix and Toeplitz-structured target matrix, the shrinkage estimators are respectively proposed by minimizing the available versions of the MSEs.

  4. We provide some numerical simulations and example applications including financial portfolio and classification of real data for verifying the performance of proposed covariance estimators in small sample scenarios.

The remainder of this paper is organized as follows. Section II introduces two types of target matrices and analytically derives the optimal shrinkage intensities. The relationship between the optimal shrinkage intensity and the covariance matrix structure is also discussed. In Section III, the unbiased estimates of the related unknown scalar quantities in the optimal shrinkage intensities are obtained under Gaussian and non-Gaussian distributions respectively. Moreover, the optimization problems involving available MSEs are formulated and the optimal available shrinkage intensities are solved in closed form. Section IV provides some numerical simulations and example applications. Section V gives some conclusions. Related mathematical details are provided in the appendix.

A. Notations

The notation \mathbb {R}^{m} is the set of all m -dimensional real column vectors, \mathbb {R}^{m \times n} is the set of all m \times n real matrices, and \mathbb {S}^{n} is the set of all n \times n real symmetric matrices. The symbol \mathbb {E} denotes the mathematical expectation. The bold symbol \mathbf {1} denotes the column vector having all entries 1 with appropriate dimension. For a matrix \mathbf {A} , \mathbf {A}^{T}, {\text {vec}}(\mathbf {A}) and \| \mathbf {A}\| denote its transpose, vectorization and Frobenius matrix norm respectively. For a squared matrix \mathbf {A} , {\text {tr}}(\mathbf {A}) denotes its trace. For two matrices \mathbf {A} and \mathbf {B} , \mathbf {A}\circ \mathbf {B} means their Hadamard (element-wise) product.

SECTION II.

Oracle Shrinkage Intensities

For an arbitrary pre-specified target matrix \mathbf {T} which represents the prior structure information of covariance matrix, the corresponding linear shrinkage estimator is formulated as \begin{equation*} \hat { \boldsymbol{\Sigma }} = (1 - w) \mathbf {S}+ w \mathbf {T}, \tag{2}\end{equation*} View SourceRight-click on figure for MathML and additional features. where the parameter w is generally referred to the shrinkage intensity. The MSE of the linear shrinkage estimator \hat { \boldsymbol{\Sigma }} is \begin{align*} \mathcal {M}_{ \mathbf {T}} (\hat { \boldsymbol{\Sigma }}) = \mathbb {E}[\|\hat { \boldsymbol{\Sigma }} - \boldsymbol{\Sigma }\|^{2}] = \mathbb {E}[\|(1 - w) \mathbf {S}+ w \mathbf {T}- \boldsymbol{\Sigma }\|^{2}]. \\ \tag{3}\end{align*} View SourceRight-click on figure for MathML and additional features. In order to deal with the optimal weight w in (2), we take the MSE (3) as the function with respect to w for the given \boldsymbol{\Sigma } , and then rewrite it as \begin{align*} \mathcal {M}_{ \mathbf {T}} (w | \boldsymbol{\Sigma })\! = \! \mathbb {E}[{\text {tr}}(\mathbf {S}- \mathbf {T})^{2}] w^{2} - 2 \mathbb {E}[{\text {tr}}(\mathbf {S}- \mathbf {T})(\mathbf {S}- \boldsymbol{\Sigma })] w \!+ \!c,\!\!\!\!\!\! \\\tag{4}\end{align*} View SourceRight-click on figure for MathML and additional features. where \begin{equation*} c = \mathbb {E}[{\text {tr}}(\mathbf {S}- \boldsymbol{\Sigma })^{2}]\tag{5}\end{equation*} View SourceRight-click on figure for MathML and additional features. is a constant unrelated to w and the target matrix \mathbf {T} .

In this paper, we consider two types of Toeplitz-structured target matrices. Let \mathbf {H}_{p} = \mathbf {1} \mathbf {1} ^{T} - \mathbf {I}_{p} be a matrix with diagonal elements 0 and 1 otherwise, then the common covariance target matrix, by respectively averaging the diagonal elements and the off-diagonal elements of the SCM, can be represented as [26] \begin{equation*} \mathbf {T}_{1} = \frac { {\text {tr}}(\mathbf {S})}{p} \mathbf {I}_{p} + \frac { {\text {tr}}(\mathbf {S} \mathbf {H} _{p})}{p (p - 1)} \mathbf {H}_{p}.\tag{6}\end{equation*} View SourceRight-click on figure for MathML and additional features. The corresponding MSE becomes \begin{align*}&\hspace{-1.2pc} \mathcal {M}_{ \mathbf {T}_{1}} (w | \boldsymbol{\Sigma }) \\=&\mathbb {E}[{\text {tr}}(\mathbf {S}- \mathbf {T}_{1})^{2}] w^{2} - 2 \mathbb {E}[{\text {tr}}(\mathbf {S}- \mathbf {T}_{1}) (\mathbf {S}- \boldsymbol{\Sigma })] w + c \\=&\mathbb {E}\left [{ {\text {tr}}(\mathbf {S}^{2}) - \frac {1}p {\text {tr}}^{2} (\mathbf {S}) - \frac {1}{p(p - 1)} {\text {tr}}^{2} (\mathbf {S} \mathbf {H} _{p})}\right] w^{2} \\&-\, 2 \mathbb {E}\left [{ {\text {tr}}(\mathbf {S}^{2}) - \frac {1}p {\text {tr}}^{2} (\mathbf {S}) - \frac {1}{p(p - 1)} {\text {tr}}^{2} (\mathbf {S} \mathbf {H} _{p})}\right] w \\&+\, 2 \left ({{\text {tr}}(\boldsymbol{\Sigma }^{2}) - \frac {1}p {\text {tr}}^{2} (\boldsymbol{\Sigma }) - \frac {1}{p(p - 1)} {\text {tr}}^{2} (\boldsymbol{\Sigma } \mathbf {H} _{p}) }\right) w + c, \\\tag{7}\end{align*} View SourceRight-click on figure for MathML and additional features. where c is given by (5). By defining a matrix-variate function \begin{align*} d_{1} (\mathbf {A}) = {\text {tr}}(\mathbf {A}^{2}) - \frac {1}p {\text {tr}}^{2} (\mathbf {A}) - \frac {1}{p(p - 1)} {\text {tr}}^{2} (\mathbf {A} \mathbf {H} _{p}),\quad \mathbf {A} \in \mathbb {S}^{p},\!\!\!\!\!\! \\\tag{8}\end{align*} View SourceRight-click on figure for MathML and additional features. we can rewrite the corresponding MSE as \begin{align*}&\hspace {-2pc} \mathcal {M}_{ \mathbf {T}_{1}} (w | \boldsymbol{\Sigma }) \\=&\mathbb {E}[{\text {tr}}(\mathbf {S}- \mathbf {T}_{1})^{2}] w^{2} - 2 \mathbb {E}[{\text {tr}}(\mathbf {S}- \mathbf {T}_{1}) (\mathbf {S}- \boldsymbol{\Sigma })] w + c \\=&\mathbb {E}[d_{1} (\mathbf {S})] w^{2} - 2 (\mathbb {E}[d_{1} (\mathbf {S})] - d_{1} (\boldsymbol{\Sigma })) w + c. \tag{9}\end{align*} View SourceRight-click on figure for MathML and additional features.

For q = -(p - 1), {\dots }, -1, 1, {\dots }, (p - 1) , denote \mathbf {J}_{q} as the p \times p matrix in which the elements of q -th diagonal above (for q > 0 ) or below (for q < 0 ) main diagonal are equal to 1 and 0 otherwise. Then the Toeplitz-structured target matrix, by respectively averaging all elements of each band of \mathbf {S} , can be written as [30] \begin{equation*} \mathbf {T}_{2} = \sum ^{p - 1}_{q = -(p - 1)} \frac { {\text {tr}}(\mathbf {S} \mathbf {J} _{q})}{p - |q|} \mathbf {J}_{q}^{T}.\tag{10}\end{equation*} View SourceRight-click on figure for MathML and additional features. Similarly, we define a matrix-variate function \begin{align*} d_{2} (\mathbf {A}) = {\text {tr}}(\mathbf {A}^{2}) \!-\! \frac {1}p {\text {tr}}^{2} (\mathbf {A}) \!-\! \sum _{m = 1}^{p \!-\! 1} \frac {2}{p\!-\! m} {\text {tr}}^{2} (\mathbf {A} \mathbf {J} _{m}),\quad \mathbf {A} \!\in \! \mathbb {S}^{p}, \\\tag{11}\end{align*} View SourceRight-click on figure for MathML and additional features. then the corresponding MSE becomes \begin{align*}&\hspace {-2pc} \mathcal {M}_{ \mathbf {T}_{2}} (w | \boldsymbol{\Sigma }) \\[-3pt]=&\mathbb {E}[{\text {tr}}(\mathbf {S}- \mathbf {T}_{2})^{2}] w^{2} - 2 \mathbb {E}[{\text {tr}}(\mathbf {S}- \mathbf {T}_{2})(\mathbf {S}- \boldsymbol{\Sigma })] w + c \\[-3pt]=&\mathbb {E}[d_{2} (\mathbf {S})] w^{2} - 2 (\mathbb {E}[d_{2} (\mathbf {S})] - d_{2} (\boldsymbol{\Sigma })) w + c. \tag{12}\end{align*} View SourceRight-click on figure for MathML and additional features.

Proposition 1:

For two functions d_{1}(\cdot) and d_{2}(\cdot) given by (8) and (11) respectively, we have

  1. d_{1} and d_{2} are convex;

  2. d_{i}(\mathbf {A}) \geq 0 for any \mathbf {A}\in \mathbb {S}^{p} and the equality holds if and only if \mathbf {A} is a common covariance matrix for i = 1 or symmetric Toeplitz matrix for i = 2 .

Proof:

See Appendix A.

From Proposition 1 and Jensen’s inequality [31], we have \begin{equation*} 0 \leq d_{i} (\boldsymbol{\Sigma }) = d_{i} (\mathbb {E}[\mathbf {S}]) \leq \mathbb {E} [d_{i} (\mathbf {S})], \quad i = 1, 2.\tag{13}\end{equation*} View SourceRight-click on figure for MathML and additional features. Furthermore, considering that both the MSEs (9) and (12) are the quadratic functions of w , we immediately obtain the following theorem.

Theorem 2:

In the sense of minimizing the MSEs (9) and (12) with the targets \mathbf {T}_{1} and \mathbf {T}_{2} given by (6) and (10) respectively, the optimal shrinkage intensities are \begin{equation*} w_{i}^{*} = 1 - \frac {d_{i} (\boldsymbol{\Sigma })}{ \mathbb {E}[d_{i} (\mathbf {S})] } \in [{0, 1}], \quad i = 1, 2.\tag{14}\end{equation*} View SourceRight-click on figure for MathML and additional features.

Theorem 2 reveals that the optimal oracle (namely because the unknown covariance matrix \boldsymbol{\Sigma } is involved) shrinkage intensity w_{1}^{*} (or w_{2}^{*} ) lies in the unit interval [{0, 1}] , and then the corresponding shrinkage estimator given by (2) is the convex combination of the SCM \mathbf {S} and target \mathbf {T}_{1} (or \mathbf {T}_{2} ).

For m = 1, {\dots }, p - 1 , let \mathbf {S}_{p - m} and \boldsymbol{\Sigma }_{p - m} be the (p - m) \times (p - m) upper right submatrices of \mathbf {S} and \boldsymbol{\Sigma } respectively.

Corollary 3:

For i = 1 and 2, under Gaussian distribution, the optimal shrinkage intensities w_{i}^{*} in (14) become \begin{equation*} w_{i}^{*} = 1 - \frac {d_{i} (\boldsymbol{\Sigma })}{d_{i} (\boldsymbol{\Sigma }) + \frac {1}n \left ({\frac {p - 2}{p} {\text {tr}}(\boldsymbol{\Sigma }^{2}) + {\text {tr}}^{2} (\boldsymbol{\Sigma }) - g_{i}}\right)},\tag{15}\end{equation*} View SourceRight-click on figure for MathML and additional features. where \begin{align*} g_{1}=&\frac {2}{p(p - 1)} \big ((\mathbf {1}^{T} \boldsymbol{\Sigma } \mathbf {1})^{2} - 2 (\mathbf {1}^{T} \boldsymbol{\Sigma }^{2} \mathbf {1}) + {\text {tr}}(\boldsymbol{\Sigma }^{2})\big), \tag{16}\\ g_{2}=&\sum ^{p - 1}_{m = 1} \frac {2}{p - m} \left ({{\text {tr}}(\boldsymbol{\Sigma } \mathbf {J} _{m} \boldsymbol{\Sigma } \mathbf {J} _{m}^{T}) + {\text {tr}}(\boldsymbol{\Sigma }_{p - m}^{2})}\right).\tag{17}\end{align*} View SourceRight-click on figure for MathML and additional features.

Proof:

See Appendix B.

In spite of not being data-driven form and unavailable in real applications, the oracle shrinkage intensities given in Corollary 3 are the critical points of the MSEs of shrinkage estimators, which provide important benchmarks for evaluating the available ones. Furthermore, they imply an important relationship between the target matrix \mathbf {T} and true covariance matrix \boldsymbol{\Sigma } . In fact, we have the following result.

Corollary 4:

For the targets \mathbf {T}_{1} and \mathbf {T}_{2} respectively given by (6) and (10), the corresponding oracle shrinkage intensities equal 1 if and only if the true covariance matrix has the same structure as target matrices.

Remark 1:

It is easy to verify that Corollary 4 also holds for the spherical target matrix \mathbf {T}_{s} = \frac { {\text {tr}}(\mathbf {S})}{p} \mathbf {I}_{p} and diagonal target matrix \mathbf {T}_{d} = {\text {diag}}(s_{11}, {\dots }, s_{pp}) which are adopted and discussed in many literatures (see, e.g., [18], [19], [23], [24], [26]). In detail, the optimal shrinkage intensity equals 1 while the true covariance matrix \boldsymbol{\Sigma } is spherical (or diagonal) and the spherical matrix \mathbf {T}_{s} (or the diagonal matrix \mathbf {T}_{d} ) is employed as the target.

SECTION III.

Improved Shrinkage Estimators

To turn the oracle shrinkage intensities to be available and improve the existing shrinkage estimators, in this section, we unbiasedly estimate the unknown scalar quantities in the MSEs given by (9) and (12). Note that it is easy to see that \mathbb {E}[d_{1} (\mathbf {S})] and \mathbb {E}[d_{2} (\mathbf {S})] can be unbiasedly estimated by their random versions d_{1} (\mathbf {S}) and d_{2} (\mathbf {S}) respectively in distribution-free scenario. Therefore, we only need to unbiasedly estimate d_{1} (\boldsymbol{\Sigma }) and d_{2} (\boldsymbol{\Sigma }) under the Gaussian and non-Gaussian distributions respectively. Furthermore, by the definitions of two matrix-variate functions given by (8) and (11), we have \begin{align*} d_{1} (\boldsymbol{\Sigma })=&{\text {tr}}(\boldsymbol{\Sigma }^{2}) - \frac {1}p {\text {tr}}^{2} (\boldsymbol{\Sigma }) - \frac {1}{p (p - 1)} {\text {tr}}^{2} (\boldsymbol{\Sigma } \mathbf {H} _{p}), \tag{18}\\[-2pt] d_{2} (\boldsymbol{\Sigma })=&{\text {tr}}(\boldsymbol{\Sigma }^{2}) - \frac {1}p {\text {tr}}^{2} (\boldsymbol{\Sigma }) - \sum _{m = 1}^{p - 1} \frac {2}{p - m} {\text {tr}}^{2} (\boldsymbol{\Sigma } \mathbf {J} _{m}).\tag{19}\end{align*} View SourceRight-click on figure for MathML and additional features. Therefore, it is sufficient to unbiasedly estimate each term in (18) and (19). Next, we always assume n \geq 2 .

A. Unbiased Estimates Under Gaussian Distribution

Denote a set of p \times p matrices \begin{equation*} \mathcal {W} = \{ \mathbf {W}= (w_{i j}) | w_{i j} = 0 ~ \mathrm {or}~ 1, ~ i, j = 1, {\dots }, p\},\tag{20}\end{equation*} View SourceRight-click on figure for MathML and additional features. then \mathbf {H}_{p} \in \mathcal {W} and \mathbf {J}_{q} \in \mathcal {W} for |q| = 1, {\dots }, p - 1 . Let \begin{align*} \mathbf {P}= \begin{bmatrix} 1 &\quad 1/n &\quad 1/n\\[-3pt] 1/n &\quad 1 &\quad 1/n\\[-3pt] 1/n &\quad 1/n &\quad 1 \end{bmatrix}.\tag{21}\end{align*} View SourceRight-click on figure for MathML and additional features.

Define two functions of \mathbf {A}= (a_{i j}) \in \mathbb {S}^{p} for a given \mathbf {W}= (w_{i j}) \in \mathcal {W} as follows:\begin{align*} \phi _{1}(\mathbf {A}| \mathbf {W})=&\sum _{i, j, k, l = 1}^{p} w_{i j} w_{k l} a_{i k} a_{j l}, \tag{22}\\[-3pt] \phi _{2}(\mathbf {A}| \mathbf {W})=&\sum _{i, j, k, l = 1}^{p} w_{i j} w_{k l} a_{i l} a_{j k}.\tag{23}\end{align*} View SourceRight-click on figure for MathML and additional features.

Theorem 5:

Under Gaussian distribution, if \mathbf {W}\in \mathcal {W} , then \begin{equation*} \mathbb {E}\left [{ \mathbf {P}^{-1} \begin{bmatrix} {\text {tr}}^{2}(\mathbf {S} \mathbf {W}) \\ \phi _{1}(\mathbf {S}| \mathbf {W}) \\ \phi _{2}(\mathbf {S}| \mathbf {W}) \end{bmatrix} }\right] = \begin{bmatrix} {\text {tr}}^{2}(\boldsymbol{\Sigma } \mathbf {W}) \\ \phi _{1}(\boldsymbol{\Sigma }| \mathbf {W}) \\ \phi _{2}(\boldsymbol{\Sigma }| \mathbf {W}) \end{bmatrix}. \tag{24}\end{equation*} View SourceRight-click on figure for MathML and additional features.

Proof:

See Appendix C.

Theorem 5 provides a wide class of estimates. For example, if \mathbf {W}= \mathbf {I}_{p} , we have \phi _{1}(\mathbf {S}| \mathbf {W}) = \phi _{2}(\mathbf {S}| \mathbf {W}) = {\text {tr}}(\mathbf {S}^{2}) and \phi _{1}(\boldsymbol{\Sigma }| \mathbf {W}) = \phi _{2}(\boldsymbol{\Sigma }| \mathbf {W}) = {\text {tr}}(\boldsymbol{\Sigma }^{2}) . Equation (24) becomes \begin{equation*} \mathbb {E}\left [{ \mathbf {P}^{-1} \begin{bmatrix} {\text {tr}}^{2}(\mathbf {S}) \\ {\text {tr}}(\mathbf {S}^{2}) \\ {\text {tr}}(\mathbf {S}^{2}) \end{bmatrix} }\right] = \begin{bmatrix} {\text {tr}}^{2}(\boldsymbol{\Sigma }) \\ {\text {tr}}(\boldsymbol{\Sigma }^{2}) \\ {\text {tr}}(\boldsymbol{\Sigma }^{2}) \end{bmatrix}.\tag{25}\end{equation*} View SourceRight-click on figure for MathML and additional features. Noticing that the inverse matrix of \mathbf {P} is \begin{equation*} \mathbf {P}^{-1} = \frac {n}{(n - 1)(n + 2)} \begin{bmatrix} n + 1 &\quad -1 &\quad -1\\ -1 &\quad n + 1 &\quad -1\\ -1 &\quad -1 &\quad n + 1 \end{bmatrix},\tag{26}\end{equation*} View SourceRight-click on figure for MathML and additional features. we can obtain the unbiased estimators of {\text {tr}}^{2}(\boldsymbol{\Sigma }) and {\text {tr}}(\boldsymbol{\Sigma }^{2}) as \begin{align*} \alpha _{g}=&\frac {n}{(n - 1)(n + 2)} \left ({(n + 1) {\text {tr}}^{2}(\mathbf {S}) - 2 {\text {tr}}(\mathbf {S}^{2}) }\right),\qquad \tag{27}\\ \beta _{g}=&\frac {n}{(n - 1)(n + 2)} \left ({n~ {\text {tr}}(\mathbf {S}^{2}) - {\text {tr}}^{2}(\mathbf {S}) }\right),\tag{28}\end{align*} View SourceRight-click on figure for MathML and additional features. which are same with the ones in [32].

Moreover, noticing that \mathbf {H}_{p} \in \mathcal {W} and \mathbf {J}_{m} \in \mathcal {W} for all m = 1, {\dots }, p - 1 , we can also obtain the unbiased estimates of {\text {tr}}^{2} (\boldsymbol{\Sigma } \mathbf {J} _{m}) and {\text {tr}}^{2} (\boldsymbol{\Sigma } \mathbf {H} _{p}) by Theorem 5. In fact, by some simple deductions, the unbiased estimate of {\text {tr}}^{2}(\boldsymbol{\Sigma } \mathbf {H} _{p}) can be given as \begin{equation*} \lambda _{g} = \frac {n}{(n - 1)(n + 2)} \left ({(n + 1) {\text {tr}}^{2}(\mathbf {S} \mathbf {H} _{p}) - 2 c_{\lambda }}\right)\tag{29}\end{equation*} View SourceRight-click on figure for MathML and additional features. with c_{\lambda } = (\mathbf {1}^{T} \mathbf {S} \mathbf {1})^{2} - 2 (\mathbf {1}^{T} \mathbf {S}^{2} \mathbf {1}) + {\text {tr}}(\mathbf {S}^{2}) ; and for any m = 1, {\dots }, p - 1 , the unbiased estimate of {\text {tr}}^{2}(\boldsymbol{\Sigma } \mathbf {J} _{m}) can be given by \begin{equation*} \mu _{g, m} = \frac {n}{(n - 1)(n + 2)} \left ({(n + 1) {\text {tr}}^{2}(\mathbf {S} \mathbf {J} _{m}) - \tau _{m} }\right),\tag{30}\end{equation*} View SourceRight-click on figure for MathML and additional features. where \begin{equation*} \tau _{m} = {\text {tr}}(\mathbf {S}_{p - m}^{2}) + {\text {tr}}(\mathbf {S} \mathbf {J} _{m} \mathbf {S} \mathbf {J} _{m}^{T}).\tag{31}\end{equation*} View SourceRight-click on figure for MathML and additional features.

Therefore, under Gaussian distribution, d_{1}(\boldsymbol{\Sigma }) and d_{2}(\boldsymbol{\Sigma }) given by (8) and (11) can be respectively unbiasedly estimated by \begin{align*} \widehat {d_{1}(\boldsymbol{\Sigma })}=&\beta _{g} - \frac {1}p \alpha _{g} - \frac {1}{p (p - 1)} \lambda _{g}, \tag{32}\\ \widehat {d_{2}(\boldsymbol{\Sigma })}=&\beta _{g} - \frac {1}p \alpha _{g} - \sum _{m = 1}^{p - 1} \frac {2}{p - m} \mu _{g, m}.\tag{33}\end{align*} View SourceRight-click on figure for MathML and additional features.

B. Unbiased Estimates Under Non-Gaussian Distributions

In the subsection, we deal with the distribution-free case. In order to obtain the unbiasedness of the estimators for all unknown scalar quantities in (18) and (19), we assume some conditions on the first four-order moments of population distribution.

Assumption 6:

Let \mathbf {x}_{1}, \mathbf {x}_{2}, {\dots }, \mathbf {x}_{n} be i.i.d. random vectors satisfying \begin{equation*} \mathbf {x}_{i} = F \mathbf {z}_{i}, \quad i = 1, {\dots }, n,\tag{34}\end{equation*} View SourceRight-click on figure for MathML and additional features. where \mathbf {z}_{i} = [z_{i 1}, {\dots }, z_{i p}]^{T} and F = \boldsymbol{\Sigma }^{1 / 2} = (f_{i j})_{p \times p} , the following conditions hold:

  1. For i = 1, {\dots }, n , \mathbb {E}[\mathbf {z}_{i}] = 0 and {\text {Cov}} (\mathbf {z}_{i}) = \mathbf {I}_{p} .

  2. For i = 1, {\dots }, n and any k positive integers l_{1}, {\dots }, l_{k} such that \sum _{t = 1}^{k} l_{t} = 4 , the following equations hold \begin{equation*} \mathbb {E}[z_{i j_{1}}^{l_{1}} z_{i j_{2}}^{l_{2}} \cdots z_{i j_{k}}^{l_{k}}] = \mathbb {E}[z_{i j_{1}}^{l_{1}}] \mathbb {E}[z_{i j_{2}}^{l_{2}}] \cdots \mathbb {E} [z_{i j_{k}}^{l_{k}}],\tag{35}\end{equation*} View SourceRight-click on figure for MathML and additional features. where j_{1}, j_{2}, {\dots }, j_{k} are distinct indices.

  3. For i = 1, {\dots }, n , j = 1, {\dots }, p , \mathbb {E}[z_{i j}^{4}] = \kappa + 3 < \infty .

Note that the parameter \kappa therein becomes 0 under Gaussian distributions. The moment conditions (35) hold for \sum _{t = 1}^{k} l_{t} = 4 , which is more general than 0 \leq \sum _{t = 1}^{k} l_{t} \leq 4 in [25] and \sum _{t = 1}^{k} l_{t} = 8 in [33]. Therefore Assumption 6 is more easier to be satisfied and meets with many practical applications. Of course, more strictly conditions will be considered when the consistency of the estimators are studied.

In this subsection, we provide the exactly unbiased estimates of d_{1}(\boldsymbol{\Sigma }) and d_{2}(\boldsymbol{\Sigma }) under Assumption 6. Let \begin{equation*} \mathbf {Q}= \begin{bmatrix} 1 &\quad 1/n &\quad 1/n &\quad 1/n\\ 1/n &\quad 1 &\quad 1/n &\quad 1/n\\ 1/n &\quad 1/n &\quad 1 &\quad 1/n\\ 1 &\quad 1 &\quad 1 &\quad 1 \end{bmatrix}.\tag{36}\end{equation*} View SourceRight-click on figure for MathML and additional features.

Define two functions of \mathbf {W}= (w_{i j}) \in \mathcal {W} as follows:\begin{align*} \psi _{1} (\mathbf {W})=&\kappa \sum _{i, j, k, l = 1}^{p} \sum _{m = 1}^{p} w_{i j}w_{k l} f_{m i} f_{m j} f_{m k} f_{m l}, \tag{37}\\ \psi _{2} (\mathbf {W})=&\frac {1}n \sum _{i, j, k, l = 1}^{p} \sum _{m = 1}^{n} w_{i j}w_{k l} x_{m i} x_{m j} x_{m k} x_{m l}.\tag{38}\end{align*} View SourceRight-click on figure for MathML and additional features.

Theorem 7:

Under Assumption 6, for arbitrary \mathbf {W}\in \mathcal {W} , the following equation holds:\begin{equation*} \mathbb {E}\left [{ \mathbf {Q}^{-1} \begin{bmatrix} {\text {tr}}^{2}(\mathbf {S} \mathbf {W}) \\ \phi _{1}(\mathbf {S}| \mathbf {W}) \\ \phi _{2}(\mathbf {S}| \mathbf {W}) \\ \psi _{2} (\mathbf {W}) \end{bmatrix} }\right] = \begin{bmatrix} {\text {tr}}^{2}(\boldsymbol{\Sigma } \mathbf {W}) \\ \phi _{1}(\boldsymbol{\Sigma }| \mathbf {W}) \\ \phi _{2}(\boldsymbol{\Sigma }| \mathbf {W}) \\ \psi _{1} (\mathbf {W}) \end{bmatrix}.\tag{39}\end{equation*} View SourceRight-click on figure for MathML and additional features.

Proof:

See Appendix D.

Note that the inverse matrix of \mathbf {Q} is \begin{equation*} \mathbf {Q}^{-1} = \frac {1}{n - 1} \begin{bmatrix} n &\quad 0 &\quad 0 &\quad -1\\ 0 &\quad n &\quad 0 &\quad -1\\ 0 &\quad 0 &\quad n &\quad -1\\ -n &\quad -n &\quad -n &\quad n + 2 \end{bmatrix}.\tag{40}\end{equation*} View SourceRight-click on figure for MathML and additional features. For any \mathbf {W}\in \mathcal {W} , by Theorem 7, \begin{equation*} \mathbb {E}\left [{ \frac {1}{n - 1} (n {\text {tr}}^{2} (\mathbf {S} \mathbf {W}) - \psi _{2}(\mathbf {W})) }\right] = {\text {tr}}^{2} (\boldsymbol{\Sigma } \mathbf {W}).\tag{41}\end{equation*} View SourceRight-click on figure for MathML and additional features. The last equation implies an unbiased estimate of {\text {tr}}^{2} (\boldsymbol{\Sigma } \mathbf {W}) for non-Gaussian distributions under Assumption 6.

When \mathbf {W}= \mathbf {I}_{p} , we have \begin{equation*} \psi _{2}(\mathbf {I}_{p}) = \frac {1}n \sum _{i = 1}^{n} (\mathbf {x}_{i}^{T} \mathbf {x}_{i})^{2},\tag{42}\end{equation*} View SourceRight-click on figure for MathML and additional features. and then \begin{equation*} \alpha _{f} = \frac {1}{n - 1} \left ({n~ {\text {tr}}^{2} (\mathbf {S}) - \frac {1}n \sum _{i = 1}^{n} (\mathbf {x}_{i}^{T} \mathbf {x}_{i})^{2} }\right)\tag{43}\end{equation*} View SourceRight-click on figure for MathML and additional features. is an unbiased estimate of {\text {tr}}^{2} (\boldsymbol{\Sigma }) . Moreover, \phi _{1}(\mathbf {I}_{p} | \mathbf {S}) = {\text {tr}}(\mathbf {S}^{2}) and \phi _{1}(\mathbf {I}_{p} | \boldsymbol{\Sigma }) = {\text {tr}}(\boldsymbol{\Sigma }^{2}) , then by Theorem 7, \begin{equation*} \beta _{f} = \frac {1}{n - 1} \left ({n~ {\text {tr}}(\mathbf {S}^{2}) - \frac {1}n \sum _{i = 1}^{n} (\mathbf {x}_{i}^{T} \mathbf {x}_{i})^{2} }\right)\tag{44}\end{equation*} View SourceRight-click on figure for MathML and additional features. is an unbiased estimate of {\text {tr}}(\boldsymbol{\Sigma }^{2}) .

When \mathbf {W}= \mathbf {H}_{p} , we have \begin{equation*} \psi _{2}(\mathbf {H}_{p}) = \frac {1}n \sum _{i = 1}^{n} \left ({(\mathbf {1}^{T} \mathbf {x}_{i})^{2} - \mathbf {x}_{i}^{T} \mathbf {x}_{i})}\right)^{2},\tag{45}\end{equation*} View SourceRight-click on figure for MathML and additional features. then \begin{equation*} \lambda _{f} = \frac {1}{n - 1} \left ({n~ {\text {tr}}^{2} (\mathbf {S} \mathbf {H} _{p}) - \psi _{2}(\mathbf {H}_{p}) }\right)\tag{46}\end{equation*} View SourceRight-click on figure for MathML and additional features. is an unbiased estimate of {\text {tr}}^{2} (\boldsymbol{\Sigma } \mathbf {H} _{p}) . Because \begin{equation*} \psi _{2}(\mathbf {J}_{m}) = \frac {1}n \sum _{i = 1}^{n} (\mathbf {x}_{i}^{T} \mathbf {J}_{m} \mathbf {x}_{i})^{2}\tag{47}\end{equation*} View SourceRight-click on figure for MathML and additional features. for m = 1, {\dots }, p - 1 , an unbiased estimate of {\text {tr}}^{2} (\boldsymbol{\Sigma } \mathbf {J} _{m}) can be given as \begin{equation*} \mu _{f, m} = \frac {1}{n - 1} \left ({n~ {\text {tr}}^{2} (\mathbf {S} \mathbf {J} _{m}) - \frac {1}n \sum _{i = 1}^{n} (\mathbf {x}_{i}^{T} \mathbf {J}_{m} \mathbf {x}_{i})^{2} }\right).\tag{48}\end{equation*} View SourceRight-click on figure for MathML and additional features.

Therefore, under Assumption 6, d_{1}(\boldsymbol{\Sigma }) and d_{2}(\boldsymbol{\Sigma }) can be respectively unbiasedly estimated as \begin{align*} \widehat {d_{1}(\boldsymbol{\Sigma })}=&\beta _{f} - \frac {1}p \alpha _{f} - \frac {1}{p (p - 1)} \lambda _{f}, \tag{49}\\ \widehat {d_{2}(\boldsymbol{\Sigma })}=&\beta _{f} - \frac {1}p \alpha _{f} - \sum _{m = 1}^{p - 1} \frac {2}{p - m} \mu _{f, m}.\tag{50}\end{align*} View SourceRight-click on figure for MathML and additional features.

C. Available Shrinkage Estimators

For the common covariance target \mathbf {T}_{1} , by plugging the estimates into (9), the MSE can be unbiasedly estimated by \begin{equation*} \hat { \mathcal {M}}_{ \mathbf {T}_{1}} (w) = d_{1} (\mathbf {S}) w^{2} - 2 \big (d_{1} (\mathbf {S}) - \widehat {d_{1}(\boldsymbol{\Sigma })}\big) w + \hat {c},\tag{51}\end{equation*} View SourceRight-click on figure for MathML and additional features. where \hat {c} is an unbiased estimate of c given by (5) with \hat {c} = {\text {tr}}(\mathbf {S}^{2}) - \beta _{g} under Gaussianity or \hat {c} = {\text {tr}}(\mathbf {S}^{2}) - \beta _{f} under non-Gaussianity. Then the available optimal shrinkage intensity \hat {w}_{1} can be obtained by solving the optimization problem:\begin{align*}&\min \quad \hat { \mathcal {M}}_{ \mathbf {T}_{1}} (w) \\&\mathrm {s.t.} \quad ~~ 0 \leq w \leq 1.\tag{52}\end{align*} View SourceRight-click on figure for MathML and additional features. Therefore, for the target matrix \mathbf {T}_{1} , the available optimal shrinkage intensities under Gaussian and non-Gaussian distributions are respectively \begin{align*} \hat {w}_{1 g}=&0 \vee \left ({1 - \frac {\beta _{g} - \frac {1}p \alpha _{g} - \frac {1}{p (p - 1)} \lambda _{g}}{ {\text {tr}}(\mathbf {S}- \mathbf {T}_{1})^{2}} }\right) \wedge 1, \tag{53}\\ \hat {w}_{1 f}=&0 \vee \left ({1 - \frac {\beta _{f} - \frac {1}p \alpha _{f} - \frac {1}{p (p - 1)} \lambda _{f}}{ {\text {tr}}(\mathbf {S}- \mathbf {T}_{1})^{2}} }\right) \wedge 1,\tag{54}\end{align*} View SourceRight-click on figure for MathML and additional features. where a \vee b = \max \{a, b\} and a \wedge b = \min \{a, b\} . Sequently, the corresponding shrinkage estimators under Gaussian and non-Gaussian distributions are respectively expressed as \begin{align*} \hat { \boldsymbol{\Sigma }}_{1 g}=&(1 - \hat {w}_{1 g}) \mathbf {S}+ \hat {w}_{1 g} \mathbf {T}_{1}, \tag{55}\\ \hat { \boldsymbol{\Sigma }}_{1 f}=&(1 - \hat {w}_{1 f}) \mathbf {S}+ \hat {w}_{1 f} \mathbf {T}_{1}.\tag{56}\end{align*} View SourceRight-click on figure for MathML and additional features.

In the same manner, for the Toeplitz-structured target matrix \mathbf {T}_{2} , the unbiased estimate of the relative MSE is \begin{equation*} \hat { \mathcal {M}}_{ \mathbf {T}_{2}} (w) = d_{2} (\mathbf {S}) w^{2} - 2 \big (d_{2} (\mathbf {S}) - \widehat {d_{2}(\boldsymbol{\Sigma })}\big) w + \hat {c}.\tag{57}\end{equation*} View SourceRight-click on figure for MathML and additional features. Then the available optimal shrinkage intensity \hat {w}_{2} can obtained by solving the following optimization problem:\begin{align*}&\min \quad \hat { \mathcal {M}}_{ \mathbf {T}_{2}} (w) \\&\mathrm {s.t.} \quad ~~ 0 \leq w \leq 1.\tag{58}\end{align*} View SourceRight-click on figure for MathML and additional features. Therefore, for the target matrix \mathbf {T}_{2} , the available optimal shrinkage intensities under Gaussian and non-Gaussian distributions are respectively \begin{align*} \hat {w}_{2 g}=&0 \vee \left ({1 - \frac {\beta _{g} - \frac {1}p \alpha _{g} - \sum _{m = 1}^{p - 1} \frac {2}{p - m} \mu _{g, m}}{ {\text {tr}}(\mathbf {S}- \mathbf {T}_{2})^{2}} }\right) \wedge 1,\qquad ~ \tag{59}\\ \hat {w}_{2 f}=&0 \vee \left ({1 - \frac {\beta _{f} - \frac {1}p \alpha _{f} - \sum _{m = 1}^{p - 1} \frac {2}{p - m} \mu _{f, m}}{ {\text {tr}}(\mathbf {S}- \mathbf {T}_{2})^{2}} }\right) \wedge 1.\tag{60}\end{align*} View SourceRight-click on figure for MathML and additional features. In result, the shrinkage estimators for \mathbf {T}_{2} under Gaussian and non-Gaussian distributions are respectively expressed as \begin{align*} \hat { \boldsymbol{\Sigma }}_{2 g}=&(1 - \hat {w}_{2 g}) \mathbf {S}+ \hat {w}_{2 g} \mathbf {T}_{2}, \tag{61}\\ \hat { \boldsymbol{\Sigma }}_{2 f}=&(1 - \hat {w}_{2 f}) \mathbf {S}+ \hat {w}_{2 f} \mathbf {T}_{2}.\tag{62}\end{align*} View SourceRight-click on figure for MathML and additional features.

It is worth noting that the common covariance target \mathbf {T}_{1} is positive definite [21], while the Toeplitz-structured target \mathbf {T}_{2} is not necessary [34]. Therefore, the estimators \hat { \boldsymbol{\Sigma }}_{1~g} and \hat { \boldsymbol{\Sigma }}_{1~f} are positive definite. And one should check the positive definition of \hat { \boldsymbol{\Sigma }}_{2~g} or \hat { \boldsymbol{\Sigma }}_{2~f} in applications where their inverse matrices are needed. Moreover, one can refer [10], [35], [36] if the estimator fails to be positive definite.

SECTION IV.

Numerical Simulations and Applications

A. Numerical Simulations

In this subsection, we investigate the MSE performance of proposed shrinkage estimators. The following two types of population covariance matrices are considered:

  • Model 1: \boldsymbol{\Sigma }= \boldsymbol{\Sigma }_{c} + \epsilon \boldsymbol{\Sigma } _{r} where \boldsymbol{\Sigma }_{c} = (\sigma _{i j})_{p \times p} with \sigma _{i i} = 1 and \sigma _{i j} = \nu for i \neq j .

  • Model 2: \boldsymbol{\Sigma }= \boldsymbol{\Sigma }_{t} + \epsilon \boldsymbol{\Sigma } _{r} where \boldsymbol{\Sigma }_{t} = (\sigma _{i j})_{p \times p} with \sigma _{i j} = \frac {1}2 |i - j + 1|^{2 \rho } - |i - j|^{2 \rho } + \frac {1}2 |i - j - 1|^{2 \rho } .

Thereinto, \boldsymbol{\Sigma }_{r}=\eta \eta ^{T} is a random matrix with the column vector \eta being an i.i.d. sample from the uniform distribution on the interval [{0, 1}] . For the sake of keeping the above two covariance models positive definite, the model parameters are further assumed to satisfy \nu \in \left({-\frac {1}{p-1}, 1}\right) , \rho \in (0.5, 1) and \epsilon \geq 0 . We remark that both Model 1 and Model 2 are Toeplitz-like matrices with tuning parameter \epsilon , which are more flexible to meet practical situations. Especially when \epsilon = 0 , Model 1 and Model 2 degenerate into strict Toeplitz matrices respectively. Let the data \mathbf {x}_{1}, \mathbf {x}_{2}, {\dots }, \mathbf {x}_{n} be generated as \begin{equation*} \mathbf {x}_{i} = \boldsymbol{\Sigma }^{\frac {1}2} \mathbf {z}_{i}, \quad i = 1, {\dots }, n\tag{63}\end{equation*} View SourceRight-click on figure for MathML and additional features. with \mathbf {z}_{i} = [z_{i 1}, z_{i 2}, {\dots }, z_{i p}]^{T} whose elements z_{i j}, j = 1, {\dots }, p , are mutually independently distributed from the standard Gaussian or non-Gaussian distribution. In each numerical experiment, the dimension is p = 100 , the model parameters \nu in Model 1 and \rho in Model 2 are set to be 0.2 and 0.9 to represent weak and strong correlations respectively.

Denote the proposed shrinkage estimators corresponding to target matrices \mathbf {T}_{1} and \mathbf {T}_{2} under Gaussianity as T1g and T2g respectively, and the proposed two estimators under non-Gaussianity as T1f and T2f respectively. The existing counterpart covariance estimators are denoted as follows:

  1. RBLW: the shrinkage estimator corresponding to \mathbf {T}_{s} under Gaussianity in [23],

  2. FS: the shrinkage estimator corresponding to \mathbf {T}_{d} under Gaussianity in [24],

  3. LSZ: the shrinkage estimator corresponding to \mathbf {T}_{2} under Gaussianity in [30],

  4. IKS1: the shrinkage estimator corresponding to \mathbf {T}_{s} under non-Gaussianity in [19],

  5. IKS2: the shrinkage estimator corresponding to \mathbf {T}_{d} under non-Gaussianity in [19],

  6. THX1: the shrinkage estimator corresponding to \mathbf {T}_{1} under non-Gaussianity in [20],

  7. THX2: the shrinkage estimator corresponding to \mathbf {T}_{2} under non-Gaussianity in [20].

Firstly, we provide an insight into the performance of different strategies in estimating the oracle shrinkage intensities. Denote the oracle shrinkage intensities developed in Corollary 3 as Oracle-T1 and Oracle-T2 respectively. We remind that all the unknown scalar quantities involved in Oracle-T1 and Oracle-T2 are unbiasedly estimated in this paper. In [30], the unknown scalar quantities in Oracle-T2 are consistently estimated to produce the shrinkage estimator LSZ. In [20], the unknown scalar quantities in Oracle-T1 and Oracle-T2 are estimated via low-complexity cross-validation to produce the shrinkage estimators THX1 and THX2.

Figures 1–​4 report the shrinkage intensities corresponding to \mathbf {T}_{1} and \mathbf {T}_{2} for Model 1 with different parameters \epsilon . Figures 1 and 2 reveal that the available shrinkage intensity in T1g is more accurate than the one in THX1 when the sample size is not very small or \epsilon is relative large. In Figures 3 and 4, the available intensity in T2g is more closer to the oracle than others. When \epsilon = 1 , the intensity estimates in THX1, THX2 and LSZ deviate sharply from the corresponding oracle intensities, and the ones in T1g and T2g significantly dominate the others in small sample scenarios.

FIGURE 1. - The oracle and available intensities when T1 is employed as the target matrix and 
$\epsilon = 0.1$
.
FIGURE 1.

The oracle and available intensities when T1 is employed as the target matrix and \epsilon = 0.1 .

FIGURE 2. - The oracle and available intensities when T1 is employed as the target matrix and 
$\epsilon = 1$
.
FIGURE 2.

The oracle and available intensities when T1 is employed as the target matrix and \epsilon = 1 .

FIGURE 3. - The oracle and available intensities when T2 is employed as the target matrix and 
$\epsilon = 0.1$
.
FIGURE 3.

The oracle and available intensities when T2 is employed as the target matrix and \epsilon = 0.1 .

FIGURE 4. - The oracle and available intensities when T2 is employed as the target matrix and 
$\epsilon = 1$
.
FIGURE 4.

The oracle and available intensities when T2 is employed as the target matrix and \epsilon = 1 .

Secondly, we compare the MSEs of proposed estimators and other shrinkage estimators via the percentage relative improvement in average losses (PRIAL) over the SCM defined by \begin{equation*} \mathrm {PRIAL} = \frac { \mathbb {E}[\| \mathbf {S}- \boldsymbol{\Sigma }\|^{2}] - \mathbb {E}[\|\hat { \boldsymbol{\Sigma }} - \boldsymbol{\Sigma }\|^{2}]}{ \mathbb {E}[\| \mathbf {S}- \boldsymbol{\Sigma }\|^{2}]},\tag{64}\end{equation*} View SourceRight-click on figure for MathML and additional features. where \hat { \boldsymbol{\Sigma }} is an arbitrary covariance matrix estimator [18]. One can find that \mathrm {PRIAL} > 0 means that \hat { \boldsymbol{\Sigma }} outperforms the SCM in the MSE sense. For two covariance matrix estimators, the one with larger PRIAL enjoys a lower MSE. In Monte Carlo simulations, the MSEs are approximated by averaging 50000 repetitions.

Figures 5 and 6 report the PRIALs of shrinkage estimators corresponding to four types of target matrices for Model 1 with parameter \epsilon = 1 . The sample z_{i j} follows standard Gaussian distribution and \chi ^{2}(f) distribution with f = 3 respectively. The sample size ranges from 10 to 100. We have the following observations and analyses:

  1. As the sample size increases, the PRIAL of each estimator goes down, implying that the SCM plays a more important role in shrinkage estimator.

  2. Under both Gaussian and \chi ^{2}(3) distribution, the shrinkage estimators with targets \mathbf {T}_{1} or \mathbf {T}_{2} significantly dominate the ones with targets \mathbf {T}_{s} or \mathbf {T}_{d} , because the target matrices \mathbf {T}_{1} and \mathbf {T}_{2} own similar structures with the true covariance matrix \boldsymbol{\Sigma } given in Model 1. Moreover, the differences on PRIAL are narrowing as the sample size gets larger.

  3. In Figure 5, the proposed estimator T2g greatly improves the performance of LSZ with the same target matrix \mathbf {T}_{2} . In Figure 6, the proposed estimators T1f and T2f enormously dominate THX1 and THX2 respectively.

FIGURE 5. - The PRIALs of shrinkage estimators versus the sample size under Gaussian distribution for Model 1.
FIGURE 5.

The PRIALs of shrinkage estimators versus the sample size under Gaussian distribution for Model 1.

FIGURE 6. - The PRIALs of shrinkage estimators versus the sample size under 
$\chi ^{2}(3) $
 distribution for Model 1.
FIGURE 6.

The PRIALs of shrinkage estimators versus the sample size under \chi ^{2}(3) distribution for Model 1.

Similar phenomena can be summarized from Figures 7 and 8 which report the PRIALs of shrinkage estimators for Model 2 with parameter \epsilon = 1 . The sample z_{i j} also follows standard Gaussian distribution and \chi ^{2}(f) distribution with f = 3 respectively. The estimators which employ Toeplitz-structured targets \mathbf {T}_{1} or \mathbf {T}_{2} perform better than the ones which employ the spherical or diagonal target matrices under Gaussian distribution and non-Gaussian distribution respectively. These phenomena reveal that the more accurate the target matrix is, the better the shrinkage estimator performs. Furthermore, the proposed estimators obviously improve the performance of existing shrinkage estimators which employ the same target matrix under both Gaussianity and non-Gaussianity.

FIGURE 7. - The PRIALs of shrinkage estimators versus the sample size under Gaussian distribution for Model 2.
FIGURE 7.

The PRIALs of shrinkage estimators versus the sample size under Gaussian distribution for Model 2.

FIGURE 8. - The PRIALs of shrinkage estimators versus the sample size under 
$\chi ^{2}(3) $
 distribution for Model 2.
FIGURE 8.

The PRIALs of shrinkage estimators versus the sample size under \chi ^{2}(3) distribution for Model 2.

Above simulations reveal that the performance of shrinkage estimator is largely determined by the choice of target matrix. Moreover, the better estimates of the unknown scalar quantities in oracle shrinkage intensity can further improve the performance. Therefore, it is necessary to further explore the different kinds of structured target matrices and develop the corresponding estimates with better statistical properties [35]–​[37].

B. Portfolio Risk Estimation

The portfolio is one of the advanced topics in financial investment. Thereinto, the accompanying risk is a crucial metric for a specific portfolio. We consider a portfolio comprised of p risky assets. The accompanying portfolio risk is defined as \mathbf {w}^{T} \boldsymbol{\Sigma } \mathbf {w} , where \boldsymbol{\Sigma } is the covariance matrix of portfolio returns and \mathbf {w} is the allocation weight with its i -th element being the amount invested into i -th asset [38]. In global minimum variance portfolio (GMVP) framework, an optimal allocation weight \mathbf {w} can be obtained through minimizing the portfolio risk with an expected return constraint [5]. Specifically, the GMVP problem can be formulated as:\begin{align*}&\min \quad \mathbf {w}^{T} \boldsymbol{\Sigma } \mathbf {w} \\&\mathrm {s.t.} \quad ~~ \mathbf {w}^{T} \mathbf {1}= 1.\tag{65}\end{align*} View SourceRight-click on figure for MathML and additional features. Then the optimal allocation weight of GMVP is \begin{equation*} \mathbf {w}= \frac { \boldsymbol{\Sigma }^{-1} \mathbf {1}}{ \mathbf {1}^{T} \boldsymbol{\Sigma }^{-1} \mathbf {1}},\tag{66}\end{equation*} View SourceRight-click on figure for MathML and additional features. and the corresponding optimal investment risk is \begin{equation*} R = (\mathbf {1}^{T} \boldsymbol{\Sigma }^{-1} \mathbf {1})^{-1}.\tag{67}\end{equation*} View SourceRight-click on figure for MathML and additional features.

Note that the risk R involves the true covariance matrix \boldsymbol{\Sigma } . A natural and effective strategy is to replace \boldsymbol{\Sigma } with its estimator \hat { \boldsymbol{\Sigma }} , yielding a data-driven portfolio risk. Because the stock data are non-stationary over a long period, the number of history observations available to estimate the risk R is usually limited [39]. The standard SCM has been demonstrated to severely underestimate the true risk [40]. In practical portfolio, either overestimation or underestimation of the real risk will lead to an unexpected loss of investment, therefore the covariance estimator which leads to a more precise risk estimate is more trustworthy. As [41], [42], we evaluate the performance of covariance estimators through numerical simulations because the true risk is unknown in real world. Furthermore, the covariance matrix of the asset returns is suggested to be an approximate Toeplitz structure in [28], [43], therefore we still adopt Model 1 and Model 2 described in Subsection IV-A as the true covariance matrices where the number of risky assets and the tuning parameter \epsilon are set to be 30 and 1 respectively. For each covariance matrix estimator, the estimated risk is approximated by 200000 Monte Carlo runs.

Figures 9–​12 report the true risk and estimated ones based on different covariance estimators for Models 1 and 2 under Gaussian distribution and \chi ^{2}(3) distribution respectively. The red solid line denotes the true portfolio risk computed by (67). Our observations and analyses are summarized as follows:

  1. As the sample size gets larger, each estimated risk becomes more close to the true risk. The SCM becomes available only when the sample size exceeds 30, and always results in a severe underestimate of the true risk.

  2. The bias can be effectively mitigated when the linear shrinkage estimator is employed. When \mathbf {T}_{1} or \mathbf {T}_{2} is utilized as the target matrix in the shrinkage estimation, the underestimation is largely avoidable.

  3. Based on the same Toeplitz-structured target matrix \mathbf {T}_{1} , the proposed T1f outperforms THX1. Similarly, based on the same target matrix \mathbf {T}_{2} , the proposed T2g and T2f significantly outperform LSZ and THX2 respectively.

FIGURE 9. - The portfolio risks corresponding to shrinkage estimators under Gaussian distribution for Model 1.
FIGURE 9.

The portfolio risks corresponding to shrinkage estimators under Gaussian distribution for Model 1.

FIGURE 10. - The portfolio risks corresponding to shrinkage estimators under 
$\chi ^{2}(3) $
 distribution for Model 1.
FIGURE 10.

The portfolio risks corresponding to shrinkage estimators under \chi ^{2}(3) distribution for Model 1.

FIGURE 11. - The portfolio risks corresponding to shrinkage estimators under Gaussian distribution for Model 2.
FIGURE 11.

The portfolio risks corresponding to shrinkage estimators under Gaussian distribution for Model 2.

FIGURE 12. - The portfolio risks corresponding to shrinkage estimators under 
$\chi ^{2}(3) $
 distribution for Model 2.
FIGURE 12.

The portfolio risks corresponding to shrinkage estimators under \chi ^{2}(3) distribution for Model 2.

The above phenomena reveal that the proposed estimators perform well in portfolio estimation and enjoy a significant improvement over the selected competitors based on same target matrices.

C. Classification of Parkinson’s Data

To further investigate the performance of proposed covariance estimators, we employ Parkinson’s data which are created by Max Little of the University of Oxford and available on the website at http://archive.ics.uci.edu/ml/index.php. The data are composed of 147 disease individuals and 48 healthy individuals where p = 22 biomedical voice attitudes are measured from each individual [44]. In addition, a label column is created with 1 representing the individual with Parkinson’s disease and 0 representing the healthy. We randomly partition the data into the training set and the testing set, where the training set is comprised of n_{1} individuals with Parkinson’s disease and n_{2} healthy individuals. Denote \bar { \mathbf {x}}_{1} and \bar { \mathbf {x}}_{2} as the sample means of the disease and healthy individuals in the training set. For each individual \mathbf {x} in the test set, the quadratic discriminant mechanism is expressed as \begin{equation*} y_{k} = (\mathbf {x}- \bar { \mathbf {x}}_{k})^{T} \hat { \boldsymbol{\Sigma }}^{-1} (\mathbf {x}- \bar { \mathbf {x}}_{k}), \quad k = 1, 2,\tag{68}\end{equation*} View SourceRight-click on figure for MathML and additional features. where \hat { \boldsymbol{\Sigma }} is the pooled covariance matrix estimator using the training set. For each individual \mathbf {x} in the test set, it is classified to be disease if y_{1} < y_{2} and healthy otherwise. For n_{1} = 10, 15, 20, 25, 30, 35, 40 and n_{2} = 45 , we compare the correct classification percentages of aforementioned linear shrinkage estimators using 1000 Monte Carlo repetitions.

Table 1 reports the correct classification percentages of Parkinson’s data by quadratic discriminant mechanisms with different covariance estimation strategies, where DS denotes the covariance estimator comprised of the diagonal elements of the SCM. We have the following observations:

  1. The proposed estimators T1g and T2g outperform other estimators when n_{1} < 35 . Although it performs better than the others when n_{1} = 40 , IKS2 suffers from a smallest correct classification percentage when n_{1} = 10 .

  2. For the same target matrix, the proposed covariance estimators T1f and T2f enjoy a slight advantage over the existing shrinkage estimators THX1 and THX2 respectively.

  3. For the same target matrix \mathbf {T}_{2} , the proposed T2g enjoys a higher correct classification percentage than the existing estimator LSZ.

TABLE 1 The Correct Classification Percentages of Parkinson’s Data by Quadratic Discriminant Analysis
Table 1- 
The Correct Classification Percentages of Parkinson’s Data by Quadratic Discriminant Analysis

In summary, the proposed covariance estimators are outperformed compared with their competitors in small sample scenarios.

SECTION V.

Conclusions

In this paper, the shrinkage estimators with two kinds of Toeplitz-structured target matrices have been studied under both Gaussian and non-Gaussian distributions. The relationship between the covariance matrix structure and the optimal shrinkage intensity is discussed. All unknown scalar quantities involved in the shrinkage procedure are unbiasedly estimated under Gaussianity and non-Gaussianity. The plug-in strategy is employed to obtain the corresponding available shrinkage estimators. Numerical simulations illustrate that the proposed shrinkage estimators enjoy lower MSEs than other existing shrinkage estimators in small sample scenarios. Some applications are provided for verifying the performance of proposed shrinkage estimators.

The proposed shrinkage estimators of covariance matrices are believed to be useful in many applications. More work about shrinkage estimation will be considered in future. For example, we will develop the shrinkage estimators with target matrices \mathbf {T}_{1} and \mathbf {T}_{2} in the complex domain and investigate the Cramér–Rao bound for the linear shrinkage estimation. In addition, how to choose the appropriate target matrices for various different practical applications is still an important issue.

Appendix

SECTION A.

Proof of Proposition 1

The Hessian matrices of the functions d_{1} (\cdot) and d_{2} (\cdot) are respectively \begin{align*}&\hspace{-0.5pc}\mathcal {H}_{1} = 2 \mathbf {I}_{p^{2}} - \frac {2}p {\text {vec}}(\mathbf {I}_{p}) {\text {vec}}(\mathbf {I}_{p})^{T} \\&\qquad\qquad\qquad\qquad\displaystyle {-\, \frac {2}{p(p - 1)} {\text {vec}}(H_{p}) {\text {vec}}(H_{p})^{T} } \tag{69}\end{align*} View SourceRight-click on figure for MathML and additional features. and \begin{equation*} \mathcal {H}_{2} = 2 \mathbf {I}_{p^{2}} - \sum _{q = -(p - 1)}^{p - 1} \frac {2}{p - |q|} {\text {vec}}(\mathbf {J}_{q}) {\text {vec}}(\mathbf {J}_{q})^{T}.\tag{70}\end{equation*} View SourceRight-click on figure for MathML and additional features. For i = 1, 2 , it is easy to verify that \frac {1}2 \mathcal {H}_{i} is an idempotent matrix, therefore \mathcal {H}_{i} is positive semi-definite and then the functions d_{i} (\cdot) is convex.

Denote \mathbf {A}= (a_{ij}) \in \mathbb {R}^{p \times p} . From \begin{equation*} {\text {tr}}^{2} (\mathbf {A} \mathbf {H} _{p}) = \left ({\sum _{\substack {i, j = 1 \\ i \neq j}}^{p} a_{i j} }\right)^{2},\tag{71}\end{equation*} View SourceRight-click on figure for MathML and additional features. we can easily obtain \begin{align*}&\hspace{-0.5pc}d_{1} (\mathbf {A}) = \left ({\sum _{i = 1}^{p} a_{i i}^{2} - \frac {1}p \left ({\sum _{i = 1}^{p} a_{i i}}\right)^{2}}\right) \\&\qquad\qquad\quad\displaystyle {+ \left ({\sum _{\substack {i, j = 1 \\ i \neq j}}^{p} a_{i j}^{2} - \frac {1}{p (p - 1)} \left ({\sum _{\substack {i, j = 1 \\ i \neq j}}^{p} a_{i j}}\right)^{2} }\right).} \tag{72}\end{align*} View SourceRight-click on figure for MathML and additional features. Applying the Cauchy–Schwarz inequality for both terms in d_{1} (\mathbf {A}) , we obtain that d_{1} (\mathbf {A}) \geq 0 and d_{1} (\mathbf {A}) = 0 if and only if the diagonal elements are all equal and the off-diagonal elements are all equal, which shows \mathbf {A} is common covariance. The proof of the second inequality and the corresponding condition on equality is similar.

SECTION B.

Proof of Corollary 3

By Theorem 2, we only need to compute \mathbb {E}[d_{1} (\mathbf {S})] and \mathbb {E}[d_{2} (\mathbf {S})] under Gaussian distribution.

By the definition of d_{1} (\mathbf {S}) give by (8), we have \begin{align*} \mathbb {E}[d_{1} (\mathbf {S})] = \mathbb {E}[{\text {tr}}(\mathbf {S}^{2})] - \frac {1}p \mathbb {E}[{\text {tr}}^{2} (\mathbf {S})] - \frac {1}{p(p - 1)} \mathbb {E}[{\text {tr}}^{2} (\mathbf {S} \mathbf {H} _{p})]. \!\!\!\!\! \\ \tag{73}\end{align*} View SourceRight-click on figure for MathML and additional features. For the first two terms in (73), from Gaussianity [45], we can obtain \begin{align*} \mathbb {E}[{\text {tr}}(\mathbf {S}^{2})]=&\frac {n + 1}{n} {\text {tr}}(\boldsymbol{\Sigma }^{2}) + \frac {1}n {\text {tr}}^{2} (\boldsymbol{\Sigma }), \tag{74}\\ \mathbb {E}[{\text {tr}}^{2} (\mathbf {S})]=&{\text {tr}}^{2} (\boldsymbol{\Sigma }) + \frac {2}n {\text {tr}}(\boldsymbol{\Sigma }^{2}). \tag{75}\end{align*} View SourceRight-click on figure for MathML and additional features. For the third term in (73), we can see \begin{align*} \mathbb {E}[{\text {tr}}^{2} (\mathbf {S} \mathbf {H} _{p})]=&\mathbb {E}[{\text {tr}}^{2} (\mathbf {S} \mathbf {1} \mathbf {1}^{T} - \mathbf {S})] \\=&\mathbb {E}[(\mathbf {1}^{T} \mathbf {S} \mathbf {1})^{2}] - 2 \mathbb {E}[(\mathbf {1}^{T} \mathbf {S} \mathbf {1}) {\text {tr}}(\mathbf {S})] + \mathbb {E}[{\text {tr}}^{2}(\mathbf {S})]. \\\tag{76}\end{align*} View SourceRight-click on figure for MathML and additional features. It is easy to verify that, for arbitrary i, j, k, l = 1, {\dots }, p , \begin{equation*} \mathbb {E}[s_{i j}s_{k l}] = \sigma _{i j}\sigma _{k l} + \frac {1}n \sigma _{i k}\sigma _{j l} + \frac {1}n \sigma _{i l}\sigma _{j k}.\tag{77}\end{equation*} View SourceRight-click on figure for MathML and additional features. Then \begin{equation*} \mathbb {E}[(\mathbf {1}^{T} \mathbf {S} \mathbf {1})^{2}] = \sum _{i, j, k, l = 1}^{p} \mathbb {E}[s_{i j}s_{k l}] = \frac {n + 2}{n} (\mathbf {1}^{T} \boldsymbol{\Sigma } \mathbf {1})^{2},\tag{78}\end{equation*} View SourceRight-click on figure for MathML and additional features. and \begin{align*} \mathbb {E}[(\mathbf {1}^{T} \mathbf {S} \mathbf {1}) {\text {tr}}(\mathbf {S})]=&\sum _{i, j, k = 1}^{p} \mathbb {E}[s_{i j}s_{k k}] \\=&(\mathbf {1}^{T} \boldsymbol{\Sigma } \mathbf {1}) {\text {tr}}(\boldsymbol{\Sigma }) + \frac {2}n (\mathbf {1}^{T} \boldsymbol{\Sigma }^{2} \mathbf {1}).\tag{79}\end{align*} View SourceRight-click on figure for MathML and additional features. Therefore, \begin{align*}&\hspace{-0.5pc} \mathbb {E}[{\text {tr}}^{2} (\mathbf {S} \mathbf {H} _{p})] = {\text {tr}}^{2} (\boldsymbol{\Sigma } \mathbf {H} _{p}) + \frac {2}n (\mathbf {1}^{T} \boldsymbol{\Sigma } \mathbf {1})^{2} \\&\qquad\qquad\qquad\qquad\qquad\displaystyle {- \frac {4}n (\mathbf {1}^{T} \boldsymbol{\Sigma }^{2} \mathbf {1}) + \frac {4}{n^{2}} {\text {tr}}(\boldsymbol{\Sigma }^{2}).} \tag{80}\end{align*} View SourceRight-click on figure for MathML and additional features. Substituting (74), (75) and (80) into (73) implies \begin{align*} \mathbb {E}[d_{1} (\mathbf {S})] = d_{1} (\boldsymbol{\Sigma }) + \frac {1}n \left ({\frac {p - 2}{p} {\text {tr}}(\boldsymbol{\Sigma }^{2}) + {\text {tr}}^{2} (\boldsymbol{\Sigma }) - g_{1}}\right). \\\tag{81}\end{align*} View SourceRight-click on figure for MathML and additional features.

In the same manner, for the target \mathbf {T}_{2} , by the definition of d_{2} (\mathbf {S}) given by (11), we have \begin{align*} \mathbb {E}[d_{2} (\mathbf {S})] = \mathbb {E}[{\text {tr}}(\mathbf {S}^{2})] - \frac {1}p \mathbb {E}[{\text {tr}}^{2} (\mathbf {S})] - 2 \sum _{m = 1}^{p - 1} \frac { \mathbb {E}[{\text {tr}}^{2} (\mathbf {S} \mathbf {J} _{m})] }{p - m}. \\ \tag{82}\end{align*} View SourceRight-click on figure for MathML and additional features. Noting that, for each m = 1, {\dots }, p - 1 , \begin{align*} \mathbb {E}[{\text {tr}}^{2} (\mathbf {S} \mathbf {J} _{m})]=&\sum _{i, j = 1}^{p - m} \mathbb {E}[s_{i, m + i}s_{j, m + j}] \\=&{\text {tr}}^{2} (\boldsymbol{\Sigma } \mathbf {J} _{m}) + \frac {1}n {\text {tr}}(\boldsymbol{\Sigma } \mathbf {J} _{m} \boldsymbol{\Sigma } \mathbf {J} _{m}^{T}) + \frac {1}n {\text {tr}}(\boldsymbol{\Sigma }_{p - m}^{2}). \\ \tag{83}\end{align*} View SourceRight-click on figure for MathML and additional features. Substituting (74), (75) and (83) into (82), we immediately obtain \begin{align*} \mathbb {E}[d_{2} (\mathbf {S})] = d_{2} (\boldsymbol{\Sigma }) + \frac {1}n \left ({\frac {p - 2}{p} {\text {tr}}(\boldsymbol{\Sigma }^{2}) + {\text {tr}}^{2} (\boldsymbol{\Sigma }) - g_{2}}\right). \\\tag{84}\end{align*} View SourceRight-click on figure for MathML and additional features.

SECTION C.

Proof of Theorem 5

From (77), we have \begin{align*} \mathbb {E}[{\text {tr}}^{2}(\mathbf {S} \mathbf {W})]=&\sum _{i, j, k, l = 1}^{p} w_{i j}w_{k l} \mathbb {E}[s_{i j}s_{k l}] \\=&\frac {1}n \phi _{1}(\boldsymbol{\Sigma }| \mathbf {W}) + \frac {1}n \phi _{2}(\boldsymbol{\Sigma }| \mathbf {W}) + {\text {tr}}^{2}(\boldsymbol{\Sigma } \mathbf {W}).\tag{85}\end{align*} View SourceRight-click on figure for MathML and additional features. Moreover, we can obtain the following two equations:\begin{align*} \mathbb {E}[\phi _{1}(\mathbf {S}| \mathbf {W})]=&\phi _{1}(\boldsymbol{\Sigma }| \mathbf {W}) + \frac {1}n \phi _{2}(\boldsymbol{\Sigma }| \mathbf {W}) + \frac {1}n {\text {tr}}^{2}(\boldsymbol{\Sigma } \mathbf {W}), \tag{86}\\ \mathbb {E}[\phi _{2}(\mathbf {S}| \mathbf {W})]=&\phi _{2}(\boldsymbol{\Sigma }| \mathbf {W}) + \frac {1}n \phi _{1}(\boldsymbol{\Sigma }| \mathbf {W}) + \frac {1}n {\text {tr}}^{2}(\boldsymbol{\Sigma } \mathbf {W}).\tag{87}\end{align*} View SourceRight-click on figure for MathML and additional features.

In summary, we have \begin{equation*} \mathbb {E} \begin{bmatrix} {\text {tr}}^{2}(\mathbf {S} \mathbf {W}) \\ \phi _{1}(\mathbf {S}| \mathbf {W}) \\ \phi _{2}(\mathbf {S}| \mathbf {W}) \end{bmatrix} = \mathbf {P} \begin{bmatrix} {\text {tr}}^{2}(\boldsymbol{\Sigma } \mathbf {W}) \\ \phi _{1}(\boldsymbol{\Sigma }| \mathbf {W}) \\ \phi _{2}(\boldsymbol{\Sigma }| \mathbf {W}) \end{bmatrix}. \tag{88}\end{equation*} View SourceRight-click on figure for MathML and additional features. Equation (5) thus holds owing to the non-singularity of \mathbf {P} when n \geq 2 .

SECTION D.

Proof of Theorem 7

Denote \mathbf {x}_{i} = [x_{i 1}, x_{i 2}, {\dots }, x_{i p}]^{T}, i\,\,= 1, 2, {\dots }, n . For arbitrary a, b, c, d = 1, {\dots }, p , under Assumption 6, we have s_{a b} = \frac {1}n \sum _{i = 1}^{n} x_{i a} x_{i b} with x_{s t} = \sum _{k = 1}^{p} f_{k t} z_{s k}, s = 1, {\dots }, n, t = 1, {\dots }, p , then \begin{align*}&\hspace {-2pc} \mathbb {E}[s_{a b} s_{c d}] \\=&\frac {1}{n^{2}} \sum _{i, j = 1}^{n} \sum _{k_{1}, k_{2}, k_{3}, k_{4} = 1}^{p} \!\!\!\!\!f_{k_{1} a} f_{k_{2} b} f_{k_{3} c} f_{k_{4} d} \mathbb {E}[z_{i k_{1}} z_{i k_{2}} z_{j k_{3}} z_{j k_{4}}] \\=&\frac {\kappa + 3}{n} \sum _{k = 1}^{p} f_{k a} f_{k b} f_{k c} f_{k d} + \frac {1}{n} \sum _{k_{1} \neq k_{2}} f_{k_{1} a} f_{k_{1} b} f_{k_{2} c} f_{k_{2} d} \\&+\, \frac {1}{n} \sum _{k_{1} \neq k_{2}} f_{k_{1} a} f_{k_{1} c} f_{k_{2} b} f_{k_{2} d} + \frac {1}{n} \sum _{k_{1} \neq k_{2}} f_{k_{1} a} f_{k_{1} d} f_{k_{2} b} f_{k_{2} c} \\&+\,\frac {n - 1}{n} \sum _{k = 1}^{p} f_{k a} f_{k b} \sum _{k = 1}^{p} f_{k c} f_{k d} \\=&\frac {\kappa }{n} \sum _{k = 1}^{p} f_{k a} f_{k b} f_{k c} f_{k d} + \sum _{k = 1}^{p} f_{k a} f_{k b} \sum _{k = 1}^{p} f_{k c} f_{k d} \\&+\, \frac {1}n \sum _{k = 1}^{p} f_{k a} f_{k c} \sum _{k = 1}^{p} f_{k b} f_{k d} + \frac {1}n \sum _{k = 1}^{p} f_{k a} f_{k d} \sum _{k = 1}^{p} f_{k b} f_{k c} \\=&\frac {1}{n} \delta (a, b, c, d) + \sigma _{a b} \sigma _{c d} + \frac {1}n \sigma _{a c}\sigma _{b d} + \frac {1}n \sigma _{a d} \sigma _{b c},\tag{89}\end{align*} View SourceRight-click on figure for MathML and additional features. where \delta (a, b, c, d) = \kappa \sum _{k = 1}^{p} f_{k a} f_{k b} f_{k c} f_{k d} , which is symmetric about the order of a, b, c, d . Therefore \begin{align*}&\hspace {-2pc} \mathbb {E}[{\text {tr}}^{2}(\mathbf {S} \mathbf {W})] \\[-3pt]=&\sum _{i, j, k, l = 1}^{p} w_{i j} w_{k l} \mathbb {E}[s_{i j}s_{k l}] \\[-3pt]=&{\text {tr}}^{2}(\boldsymbol{\Sigma } \mathbf {W}) + \frac {1}n \left ({\phi _{1}(\boldsymbol{\Sigma }| \mathbf {W}) + \phi _{2}(\boldsymbol{\Sigma }| \mathbf {W}) + \psi _{1} (\mathbf {W})}\right). \\[-3pt]\tag{90}\end{align*} View SourceRight-click on figure for MathML and additional features.

In the same way, we can obtain the following equations:\begin{align*}&\hspace {-2pc} \mathbb {E}[\phi _{1}(\mathbf {S}| \mathbf {W})] \\[-3pt]=&\phi _{1}(\boldsymbol{\Sigma }| \mathbf {W}) + \frac {1}n \left ({{\text {tr}}^{2}(\boldsymbol{\Sigma } \mathbf {W}) + \phi _{2}(\boldsymbol{\Sigma }| \mathbf {W}) + \psi _{1} (\mathbf {W})}\right), \\[-3pt]\tag{91}\end{align*} View SourceRight-click on figure for MathML and additional features. and \begin{align*}&\hspace {-2pc} \mathbb {E}[\phi _{2}(\mathbf {S}| \mathbf {W})] \\[-3pt]=&\phi _{2}(\boldsymbol{\Sigma }| \mathbf {W}) + \frac {1}n \left ({{\text {tr}}^{2} (\boldsymbol{\Sigma } \mathbf {W}) + \phi _{1}(\boldsymbol{\Sigma }| \mathbf {W}) + \psi _{1} (\mathbf {W})}\right). \\[-3pt]\tag{92}\end{align*} View SourceRight-click on figure for MathML and additional features.

To obtain the estimate of {\text {tr}}^{2} (\boldsymbol{\Sigma } \mathbf {W}) by forming a system of equations, we construct a quantity \begin{equation*} \psi _{2} (\mathbf {W}) = \frac {1}n \sum _{i, j, k, l = 1}^{p} \sum _{m = 1}^{n} w_{i j}w_{k l} x_{m i} x_{m j} x_{m k} x_{m l}.\tag{93}\end{equation*} View SourceRight-click on figure for MathML and additional features. From \begin{align*}&\hspace {-2pc} \mathbb {E}\left [{\frac {1}n \sum _{m = 1}^{n} x_{m i} x_{m j} x_{m k} x_{m l}}\right] \\=&\sum _{k_{1}, k_{2}, k_{3}, k_{4} = 1}^{p} f_{k_{1} i} f_{k_{2} j} f_{k_{3} k} f_{k_{4} l} \mathbb {E}[z_{1 k_{1}} z_{1 k_{2}} z_{1 k_{3}} z_{1 k_{4}}] \\=&(\kappa + 3) \sum _{k_{1} = 1}^{p} f_{k_{1} i} f_{k_{1} j} f_{k_{1} k} f_{k_{1} l} + \sum _{k_{1} \neq k_{3}} f_{k_{1} i} f_{k_{1} j} f_{k_{3} k} f_{k_{3} l} \\&+\, \sum _{k_{1} \neq k_{2}} f_{k_{1} i} f_{k_{1} k} f_{k_{2} j} f_{k_{2} l} + \sum _{k_{1} \neq k_{2}} f_{k_{1} i} f_{k_{1} l} f_{k_{2} j} f_{k_{2} k} \\=&\delta (i, j, k, l) + \sigma _{i j} \sigma _{k l} + \sigma _{i k} \sigma _{j l} + \sigma _{i l} \sigma _{j k},\tag{94}\end{align*} View SourceRight-click on figure for MathML and additional features. we immediately obtain \begin{align*} \mathbb {E}[\psi _{2}(\mathbf {W})] = \psi _{1} (\mathbf {W}) + {\text {tr}}^{2}(\boldsymbol{\Sigma } \mathbf {W}) + \phi _{1}(\boldsymbol{\Sigma }| \mathbf {W}) + \phi _{2}(\boldsymbol{\Sigma }| \mathbf {W}). \\\tag{95}\end{align*} View SourceRight-click on figure for MathML and additional features.

Equation (39) thus follows from (90), (91), (92) and (95), and the non-singularity of \mathbf {Q} when n \geq 2 .

References

References is not available for this document.