Wiener Filter Approximations Without Covariance Matrix Inversion

In this article, we address the problem of ill-conditioning of the Wiener filter, the optimal linear minimum mean square error estimator. Computing the Wiener filter involves the inverse of the observation covariance matrix. In practice, the observation covariance matrix has a large condition number, resulting in unreliable numerical computation of the Wiener filter. To address this issue, we develop four approximate Wiener filter formulas using a truncation technique based on the principal components of a composite covariance matrix. Our approximate filter formulas do not directly involve the inverse of the observation covariance matrix. As a result, our filters are well-conditioned and numerically reliable to compute. We also present an asymptotic analysis of our approximate filter formulas and show that they converge to the Wiener filter as certain approximating terms vanish. Using real data, we evaluate the performance of our filters in terms of accuracy and computation time against the Wiener filter. Our performance-computation tradeoff results show that, unlike the Wiener filter, our filters have stable performance without significantly more computation, even when the covariance matrix is ill-conditioned.


I. INTRODUCTION
Estimating a signal of interest based on an observable related signal is of crucial importance. An optimal estimator or filter (we use these terms interchangeably) is an algorithm that processes the observable signal to yield an estimate such that a certain error criterion between the estimated and desired signal is minimized. The Wiener filter is the optimal linear minimum mean square error (LMMSE) estimator. However, computing the Wiener filter involves inverting the observation covariance matrix. In practice, the dimension of the observation signal might be large, or the number of available observation samples might be small relative to the dimension of the observation signal. This results in the observation covariance matrix having a large condition number. Computing Wiener filters with ill-conditioned covariance matrices may be numerically unreliable.
We can circumvent the problem of ill-conditioning by approximating the Wiener filter. One approach is by reconditioning the covariance matrix using various well-known techniques: ridge regression [2], the minimum eigenvalue method [3], etc. Another approximation approach is to constrain the rank of the Wiener filter to reduce the computation complexity and enhance the performance [4], [5], [6], [7], [8], [9], [10], [11]. However, this does not address the issue of ill-conditioning [12]. More recently, [12] described an alternative approach that explicitly addresses the ill-conditioning instead of constraining the filter rank. The approach described in [12] involves "truncation" based on principal components of a composite covariance matrix, leading to two well-conditioned approximate Wiener filters.
To explore the possibility of a more computation-efficient filter without significantly sacrificing performance, in this article, we similarly use truncation to formulate our first two approximate Wiener filters. We approximate our filter formulas even further using the Neumann series expansion and eliminate matrix inversion altogether. The resulting approximate filters are well-conditioned and therefore numerically reliable to compute. The approximations developed here enable a tradeoff between computation and accuracy. We analyze the asymptotic performance of our approximate filters and show that they converge to the Wiener filter as certain approximating terms vanish. To demonstrate the efficacy of our approximation formulas and illustrate the possible computation-accuracy tradeoff, we evaluate the performance of our filters using an empirical dataset: historical daily closing values of the CBOE Amazon VIX Index [13].
The number of principal components of the composite covariance matrix plays a crucial part in truncation and effectively determines the quality of the approximation. To determine the optimal number of principal components used for approximating the filter, [12] uses a line-search procedure to minimize the mean square error. In this article, we use the method of [14], based on results from random matrix theory (RMT), to determine the number of principal components of the covariance matrix. We compare the results for the number of principal components computed using the RMT-based method against the method of [12]. It turns out that the number of principal components, and effectively the performance of our filters, computed using both methods are comparable, while the RMT-based method is more efficient in terms of the computation time than the method of [12].
Even though the practical limitations caused by ill-conditioning are not new to signal processing, and similar problems are faced in other fields, such as statistics and machine learning, only a few known solutions are available, and even fewer solutions that directly address the issue at hand. In signal processing, ill-conditioned covariance matrices can produce filters exhibiting significant errors [15]. In statistics, linear least squares (LLS) problems are often ill-conditioned. In [16], the authors study the stability and accuracy of least squares approximations. In [17], the authors investigate the efficacy of a self-adaptive iterative algorithm for solving severe ill-conditioned LLS problems. In machine learning, ill-conditioning problems occur when dealing with convolutional neural networks [18]. Because the solution to Wiener filters closely relates to the above-mentioned problems, our method of developing well-conditioned filters will be useful for dealing with a variety of ill-conditioned problems.
Our contributions are summarized as follows: 1) To address the problem of ill-conditioning of the Wiener filter, we describe a method based on which we introduce four new approximate Wiener filter formulas that do not directly involve inverting the observation covariance matrix (Section III). We also exploit the peculiar distribution of eigenvalues observed in large covariance matrices using the method of principal component analysis. The approximations developed in this article are justified whenever inverting the observation covariance matrix is ill-conditioned. 2) We prove that the approximate formulas converge to the Wiener filter as certain approximating terms vanish. We also characterize the asymptotic scaling laws (Section IV). Our asymptotic analysis is based on elementary tools accessible to anyone familiar with the application of linear algebra to statistical signal processing problems and will help the reader not accustomed to these types of analyses to develop an intuitive understanding of how to make calculations more tractable without using any esoteric or heavy machinery. 3) We describe two methods for determining the optimal number of principal components used in the approximate filter formulas (Sections V-D). Furthermore, we evaluate how the performance of our approximate filters varies with the number of principal components used for truncation and characterize the tradeoff between the numerical unreliability of filter computation and power loss due to truncation. 4) We evaluate the performance of our approximate Wiener filter formulas using daily closing values of the CBOE Amazon VIX Index (Sections V-E). Our quantitative results with the empirical data demonstrate that the performance of our filters remains stable as we increase the dimension of the associated covariance matrix, while the performance of the Wiener filter deteriorates significantly as expected.

II. THE WIENER FILTER
We wish to estimate a zero-mean random signal X ∈ C M based on a related zero-mean random observation Y ∈ C N , where M, N ∈ N.
We use E[·] as either the expectation or empirical mean from data, · as the standard complex 2-norm, and A as a matrix representing a linear filter or estimator. Then the LMMSE estimation problem can be stated as follows: The optimal solution, called the Wiener filter, is given as where C YY = E[YY ] ∈ C N×N is the covariance matrix of the observation Y , and C XY = E[XY ] ∈ C M×N is the crosscovariance matrix of the desired signal X and the observation Y . Throughout this article, we use the superscript prime to denote the Hermitian transpose. Even though the Wiener filter solves the LMMSE estimation problem, the numerical computation of A W is unreliable because it involves the inverse C −1 YY . If the condition number of C YY -the ratio of its largest eigenvalue to its smallest eigenvalue-is large, then the computation of C −1 YY is numerically unreliable. Notice that we do not need to compute the inverse explicitly. We can simply solve (for A W ) the normal equation A W C YY = C XY . Unfortunately, irrespective of the way we compute A W , the reliability of a numerical solution for A W depends on the condition number of C YY . A matrix is ill-conditioned if it has a large condition number; otherwise, it is well-conditioned. Our primary motivation is to present a solution to the LMMSE estimation problem that is reliable to compute regardless of the condition number of C YY . Now, there are two major factors that cause C YY to have a large condition number. In many practical problems in signal processing, finance, physics, etc., the dimensions of the observation vectors and their corresponding covariance matrices are very large. According to [19], [20], the mean condition number of a random covariance matrix increases with its size. The second reason, which is closely related to the first, is the ratio of the total number of observation samples to the dimension of the observation vector is small (e.g., because the number of observation samples is limited). In such cases, the few largest eigenvalues of the covariance matrix are significantly larger than the remaining eigenvalues [21], [22], [23], [24], [25], [26]. This results in the condition number of the covariance matrix being large as well. Empirical evidence shows that the condition numbers of covariance matrices can easily exceed 10 4 [12], [27]. In practice, these ill-conditioned covariance matrices can cause significant errors when used to solve linear equations.
To address this problem of ill-conditioning, in the next section, we describe approximate Wiener filter formulas that do not directly involve the inverse C −1 YY . These approximations are justified whenever inverting C YY is ill-conditioned. We introduce multiple filter formulas with varying computational requirements to allow trading off performance for computation. Following that, we show the convergence of these approximate filters to the Wiener filter in terms of their asymptotic scaling laws. Then, we describe two methods for choosing the best approximations to optimally trade off between computation and accuracy. Finally, we evaluate the performance of these filters using empirical data.

III. WIENER FILTER APPROXIMATIONS
To describe our method for approximating the Wiener filter, we start with the method of [12]. First, define Z ∈ C (M+N ) to be the concatenation of the vectors X and Y : Let C ZZ ∈ C (M+N )×(M+N ) be its covariance. Then, the signal X and the observation Y share the composite covariance matrix Define the eigendecomposition is the orthogonal eigenvector matrix composed of eigenvectors of C ZZ as its columns, and The matrices V and S depend jointly on C XX , C YY , and C XY . Next, partition V into V X (top M rows) and V Y (bottom N rows). The following identities are easy to verify: The eigendecomposition, or singular-value decomposition (SVD), here is different from the ones usually associated with Wiener filters. For example, in deriving a low-rank version of the Wiener filter, [4] uses the SVD of C XY C −1/2 YY ; see also [7], [8], and [10].

A. TRUNCATION APPROXIMATION
In many real-world problems, such as stock price prediction and direction-of-arrival estimation, we have to deal with large covariance matrices [21], [22], [23], [24], [25], [26], [27]. In these types of problems, the dimension N of Y can be as large as one thousand or even more. The corresponding covariance matrix C YY is of size N × N and can be transformed using various techniques. The technique of eigendecomposition for transforming a large covariance matrix is of particular interest because the covariance matrix is symmetric by definition, and for a large N, the eigenvalues of covariance matrices vary widely in magnitude. For example, in financial covariance matrices, the first few eigenvalues are often well-separated from the rest of the eigenvalues and account for the bulk of the information, while the remaining eigenvalues decay rapidly toward zero [21], [22], [23], [24], [25], [26], [27]. As the dimension N of Y increases, the proportion of the number of eigenvalues corresponding to this bulk of the information decreases. As the separation between the first and the last eigenvalue increases, so does the ill-conditioning of the corresponding covariance matrix C YY . In such situations, computing a numerical solution in a linear system involving the ill-conditioned C YY becomes unreliable. However, as only the first few eigenvalues carry most of the information, we can safely approximate the relatively small eigenvalues of C YY as 0. To elaborate, suppose the first L ≤ N eigenvalues account for the bulk of the eigenvalues of C YY ; we call these the principal eigenvalues. Next, partition V X , V Y , and S as where V X,L ∈ C M×L , V Y,L ∈ C N×L , S L ∈ C L×L , and O is the all-zero matrix (with dimensions shown in the subscript). Since these L principal eigenvalues constitute the bulk of all the eigenvalues of C YY , we can safely approximate S L by O (M+N−L)×(M+N−L) , called truncation (see, e.g., [7], [12], [27]). Equivalently, we approximate the decomposition by Consider the following transform of Z: Then, the inverse transform is Z = VZ, giving X = V XZ and Y = V YZ . We now apply the truncation approximation to get the approximate formulasZ and Combining (8) and (9), we get This suggests the following filter: Notice that C −1 YY or its eigenvalues do not appear in (11). Next, we combine the Neumann series expansion This leads to an alternative filter formula: Though (11) and (13) are equivalent formulas, their computational burden might differ. We investigate this difference later. The filters A 1 and A 2 are similar to the LSJPC filter in [12], which was derived using a similar truncation operation and is wellconditioned. The LSJPC filter was shown to have stable performance, even when the covariance matrix is ill-conditioned [12]. This indicates propitious performance for A 1 and A 2 .

B. INVERSE APPROXIMATION
Though we do not limit ourselves to any particular matrix norm, we assume that the matrix norm is submultiplicative. Many matrix norms, such as the 1-norm, the infinity-norm, and the Frobenius norm, have the submultiplicative property. However, there are some other matrix norms, such as the max norm, that do not possess the submultiplicative property. Furthermore, we assume that I M = 1 (otherwise we need to normalize by I M ). Considering V X,L V X,L + V X,L V X,L = I M , we see that V X,L V X,L < 1. This means that V X,L V X,L k can be made arbitrarily small with sufficiently large k. Hence, we can approximate the Neumann series expansion by a finite sum: Applying this approximation to (11), we get the filter Similarly, from (13), we get Finally, note that computing the estimateX = A 3 Y based on (16) can be done iteratively usinĝ Because V X,L V X,L < 1, (18) is a contractive fixed-point iteration, guaranteeing convergence ofX to (11) as K → ∞. Similarly, computing the estimate A 4 Y based on (17) can be done iteratively using We use the notation A 3 (K ) (or A 4 (K )) to indicate the Kth Neumann-approximation order for the filter A 3 (or A 4 , respectively). The special case of K = 0 gives the simple formula V X,L V Y,L , which is difficult to imagine simplifying any further.
In the cases where C YY is ill-conditioned and its eigenvalues decrease very quickly, it is possible to find an appropriate L for truncation. Similar truncations have been used in low-rank versions of the Wiener filter [4], [5], [7], [8], [10], [12], [28]. Naturally, in choosing L there is a tradeoff between the quality of the approximation and the computational burden. Therefore, there is no universally applicable value for L. Indeed, this tradeoff affords flexibility in implementing filters under different computational constraints.
It is not yet entirely clear how well our approximate filters perform in practice. Our derivations for the approximate filters described above are found in simple heuristics. To ensure the viability of these filters, we need to carefully analyze and evaluate them further. In Section IV, we show how these approximations converge to A W as the truncation and inverse approximations become exact. Later, in Section V, we evaluate these formulas using empirical data.
Our approach to addressing the problem of ill-conditioning involves dimension reduction. An added benefit of dimension reduction is that we can expect the computational burden of applying the filters to be reduced relative to the Wiener filter. Our empirical results in Sections V-E corroborate this expectation.

IV. ASYMPTOTIC PERFORMANCE
We now show how each of the approximation formulas derived in Section III converges to A W using the Bachmann-Landau notation O(·). To explain, consider a matrix M(ρ ) that depends on some parameter ρ → 0. If, for some c, M(ρ ) ≤ cρ for sufficiently small ρ, then we write M(ρ ) = O(ρ ). If C and D are bounded as ρ → 0, then the following algebraic rules help us to simplify the calculations: Whenever we use the truncation operation for some values of L and K, we essentially lose some of the power in the observed signal Y . We measure this truncation-power loss as ρ L . The amount of power lost due to truncation is small by design. The parameter ρ L is a measure of this amount of power being discarded. The exact expression for the power being discarded depends on the filter. For A 1

and
The truncation approximation relies on S L ≈ 0. Moreover, the inverse approximation relies on either V X,L V X,L ≈ 0 or V X,L V X,L ≈ 0. Thus, in our analysis, we consider S L and V X to be variables such that S L → 0, V X,L V X,L → 0, and V X,L V X,L → 0. We treat S L as fixed. Note that V X and V Y , and their submatrices V X,L and V Y,L , are bounded as V X,L V X,L → 0 and V X,L V X,L → 0. This is relevant for applying the Bachmann-Landau rules.
Theorem 1: Given a positive integer L ≤ N, Proof: We first write Therefore, using the Bachmann-Landau rules, which completes the proof. Corollary 1: Given a positive integer L ≤ N, Theorem 2: Given positive integers L ≤ N and K, Proof: Using the Neumann series expansion Combining this with (23), we get the desired result. Corollary 2: Given positive integers L ≤ N and K, The results above, stated in terms of the Bachmann-Landau notation, provide only an asymptotic characterization of the difference between each filter and A W , not an exact expression for it. However, inspecting the calculations above gives some idea of how impractical an exact analysis would be.

V. RESULTS AND DISCUSSION
We now compare, in terms of accuracy and computation time with real data, the performance of the different approximate filters developed in Section III relative to the Wiener filter. Our performance-complexity evaluation has two parts. First, we test the approximate filters as we vary the conditioning of the observation covariance matrix. Second, we quantify the price paid for well conditioning of A 1 , A 2 , A 3 , and A 4 in the preprocessing steps relative to the Wiener filter. We expect the performance of the Wiener filter to deteriorate as the conditioning of the associated observation covariance matrix worsens while the performance of the approximate filters does not deteriorate. Further, we expect the approximate filters to be computationally more efficient than the Wiener filter because of the reduced dimensions involved. We also describe and compare two methods for selecting the optimal filter orders. The results from the computational comparison elucidate how the complexities of the approximate filters scale with different dimensions of the associated covariance matrices and approximation orders. Although some of the specific numerical values are data-dependent, they illustrate the general qualitative features of our methods.

A. DATASET
To demonstrate the performance-complexity tradeoff of each of our approximate filters relative to the Wiener filter, we use the CBOE Amazon VIX index (VXAZN) daily closing values. The historical VXAZN data used in this article is freely available at [13]. To check for stationarity in time of the VXAZN data, following the method in [29], [30], we implemented the augmented Dickey-Fuller (ADF) test [31] for up to 40 lags. The ADF test checks for the null hypothesis that the process follows a unit root, i.e., it is non-stationary. Our ADF test results for the VXAZN data, produced using the Hypothe-sisTests package in Julia [32], strongly reject the null hypothesis with a maximum p-value of 0.0006. Therefore, we can safely assume that the VXAZN data sequence is stationary.

B. DATA PROCESSING
For the purpose of our estimation problem, we use a sequence of 2974 daily closing values of the VXAZN dataset, starting on January 7, 2011, and ending on November 4, 2022. Our estimation problem can be stated as follows: given the VXAZN values for N consecutive days, predict the values for M consecutive days immediately after. To do this, we consider vector-valued samples of M + N consecutive days. We get a total of T = 2974−(M +N )+1 such vector-valued samples, each one shifted by one lag from the previous one. We center the data by subtracting the empirical mean of the data matrix from each sample. We then construct a data matrix where W (i) represents the centered closing value of the VXAZN dataset for the day i. Because the dataset is stationary, the columns of W have a common (M + N )-variate mean and covariance. Then, we compute the sample covariance matrix C WW as We use 80% of the data samples for computing the sample covariance matrix and reserve the remaining 20% for conducting out-of-sample test experiments. We obtain the 80% above randomly from the data matrix.

C. ILL-CONDITIONING OF C YY
Our aim for the performance-complexity analysis is to test how the different approximate filters derived in Section III perform relative to the Wiener filter when the conditioning of the observation covariance matrix is varied. To test this, we fix M = 7 and vary N from 100 to 1300. Because the total number of data values is fixed (2974 in our case), as we increase the dimension of our vector-valued data sample, the total number of available data samples T decreases. We use γ = (M + N )/T to denote their ratio. As the dimension of the data sample approaches the total number of available data samples (i.e., γ → 1), the resulting covariance matrix becomes more ill-conditioned. Fig. 1 shows how the condition number of C YY changes as γ → 1. We can see that, as γ rises from 0.04 to 0.97, the resulting condition number of C YY increases by more than three orders of magnitude. This allows performance evaluation across many scenarios.
To keep the performance metric directly comparable as we vary N, we evaluate the performance according to the normalized rootmean-square error (nRMSE), calculated as is the M-subvector corresponding to X , and W is the average M-vector.
Better performance values are closer to 0.

D. BEST L
The performance of A 1 , A 2 , A 3 and A 4 varies with the number of principal components L used for truncation. To illustrate the effect of different values of L on the performance, Fig. 2 depicts the performance of A 3 (5) for different values of N as we vary L.
Here, the thickness of each column rung for different values of N indicates the performance in terms of nRMSE, with the maximum thickness of a column rung showing an nRMSE value of 0.28. We can see that the performance of A 3 (5) first improves as we increase L and then deteriorates as we keep increasing L. Clearly, there is some optimal value of L for which the filter performs best. This optimal value is governed by the tradeoff between the numerical unreliability of filter computation and the truncation-power loss as we increase L. The filter performance is more sensitive to this optimal value of L for smaller values of N, but for larger values of N, we can vary L by a relatively wide margin without affecting its performance by much. The performance of A 3 and A 4 also varies with the Neumann-approximation order. In our experiments, their performance is essentially constant beyond K = 5. Even when we change the dimension of M, K = 5 is the smallest Neumannapproximation order of A 3 and A 4 that matches the best performance of A 1 and A 2 , beyond which there is little to separate between their performance. Next, we describe two methods for computing the best L value to get the optimal filter performance.

1) LINE-SEARCH METHOD
Our objective function value, the mean square error, for any given A can also be expressed as Following the method in [12], we can find a suitable value of L for any filter A using a simple line-search (LS) procedure [28] together with the mean-square-error formula in (32). The LS method for choosing the best L is computationally costly because we need to compute A multiple times for different values of L.

2) MARCHENKO-PASTUR METHOD
Following the method in [14], we can find the best L from the data matrix without the explicit need for computing A. Recall that W ∈ R (M+N )×T is the mean-subtracted data matrix and C WW is its covariance matrix. Without loss of generality, we assume (M + N ) < T . If W is a random matrix, i.e., W has independent and identically distributed (i.i.d.) entries with mean 0 and variance σ 2 , we can view the eigenvalues λ 1 , . . . , λ M+N of C WW as random variables. Then, in agreement with an asymptotic universal law from RMT, the eigenvalues of C WW have the Marchenko-Pastur (MP) probability density function [33], which is fully characterized by the dimensions of the data matrix ((M + N ) and T ) and the variance (σ 2 ). According to [34], if C WW has a non-zero number of eigenvalues that are well-separated from the rest of the eigenvalues, then we can infer that the entries of W are not i.i.d. According to [14], [34], [35], [36], [37], [38], if a data matrix can be synthesized by L < (M + N ) principal components, then the MP distribution still applies to the remaining M + N − L eigenvalues of its covariance matrix. In many real-world problems, the largest L eigenvalues, corresponding to the principal components of the covariance matrix, are well-separated from the remaining eigenvalues. The latter are packed together within the support of the MP density [34], [38], [39], [40], [41], [42], [43], [44], [45], [46]. For such covariance matrices, we can jointly estimate L and σ 2 by determining the minimum value of L such that the λ L+1 to λ M+N eigenvalues are well described by the MP distribution. Following the method in [14], we can easily find the minimum L that satisfies the inequality This method works because if at least one of the eigenvalues corresponding to the principal components is part of the eigenvalues grouped together in the support of the MP density, then the range (left-hand side of (33)) will exceed the mean (right-hand side of (33)). Once we identify the minimum L for which the test inequality in (33) is satisfied, we can substitute λ L+1 , . . . , λ M+N with 0 and approximate the Wiener filter using the formulas derived in Section III. Fig. 3 shows the best L values for A 1 and A 3 (5) as we vary N, calculated using the LS and MP methods. Because the LS method depends on the individual filter, we need to calculate the best L values for A 1 and A 3 (5) separately. In contrast, the MP method is independent of the filter and only depends on the data matrix. From  Fig. 3, we can see that the best L values for both the LS and MP methods are relatively close to each other.
To better differentiate these methods, we analyze the computation time required for each in calculating the best L values. Table 1 shows the computation times for the LS and MP methods for calculating  the best L value. We can see that the MP method is approximately three orders of magnitude faster than the LS method. As seen earlier in Fig. 3, the optimal L values computed using both of these methods are comparable. Therefore, henceforth we will use the MP method for computing the optimal L value without compromising the filter performance. Fig. 4 compares the performance of A 1 , A 3 (5), and A W in terms of nRMSE as N increases and C YY becomes more ill-conditioned (see  Fig. 1). For A 1 and A 3 (5), we used the best L values computed according to the MP method. All filters have comparable performance for smaller values of N, or until the total number of samples is more than double the dimension of the data vector. After N = 700, when the ill-conditioning of C YY starts to increase rapidly, the computation of A W becomes more unreliable. This is evidenced by the deteriorating performance of A W after N = 700. But the same is untrue for filters A 1 and A 3 (5). The nRMSE values for A 1 and A 3 (5) never exceed 0.15. This demonstrates that our filters have consistently stable performance even when C YY is ill-conditioned.

E. PERFORMANCE-COMPLEXITY TRADEOFF
We also implemented the recursive least squares (RLS) algorithm with a growing window, which is a well-known adaptive filter and employs an iterative approach for filter computation. However, the resulting RLS filter converges to A W and has the same deteriorating performance as the Wiener filter after N = 700, as illustrated in Fig. 4.
Note that, although there are matrices that involve inverses in computing filters A 1 and A 2 (namely I M − V X,L V X,L and I L − V X,L V X,L ), these matrices are well-conditioned. Table 2 compares the condition numbers of these matrices in computing filters A W , A 1 , and A 2 .
We can see that, as we increase N from 100 to 1300, the resulting condition number of C YY increases by more than three orders of magnitude while the condition numbers of matrices I M − V X,L V X,L and I L − V X,L V X,L remain relatively small, which shows that these matrices are well-conditioned.
Using the approximated filters A 1 to A 4 requires some preprocessing, which includes computing the eigendecomposition of the covariance matrix and selecting the best L using the LS or MP methods. Indeed, this preprocessing involves some computational costs. Fortunately, the preprocessing has to be done upfront only once, after which we can repeatedly use the formulas for A 1 to A 4 with very little extra computation. i.e., once we obtain V X,L and V Y,L , computing the approximate filters A 1 to A 4 is more efficient than computing A W due to the reduced dimensions of V X,L and V Y,L . Table 3 shows the computation times for A 1 with and without the preprocessing. We can see that the computation times for A 1 with the preprocessing are greater than that of A W , while the computation times for A 1 without the preprocessing are two orders of magnitude smaller than A W . These results show that A 1 is computationally more expensive than A W when we need to use the filters just once, but their marginal computational burden quickly diminishes over multiple filtering operations. In our experiments, it never takes more than 7 applications of A 1 to break even with A W . As shown later, the computation times for A 2 to A 4 are comparable to A 1 . Because the best L values are approximately 10% of the corresponding N values, computing the approximate filters post-processing is more efficient compared to A W . It is worth emphasizing that our primary purpose is to address the ill-conditioning of C YY . The computation times in Table 3 along with the nRMSE performance of the filters in Fig. 4 demonstrate that our filters are numerically reliable to compute without significantly increasing the computational burden (and even decreasing it in some use cases).
Finally, Table 4 compares the computation times for all four approximate filters as we increase N. Table 4 does not include the times to compute eigendecompositions and the MP method to select L. Even though A 1 and A 3 (5) are algebraically equivalent to A 2 and A 4 (5), respectively, they differ in terms of their computational burden. We can attribute this added computation to the extra floating point operations (FLOPs) needed to compute A 2 and A 4 (5) compared to their respective algebraic counterparts. The number of FLOPs for A 1 and A 3 (5) increases with M, while the number of FLOPs for A 2 and A 4 (5) increases with only L. For smaller values of N, the values of L and M are comparable. This is demonstrated by their comparable computation times for small values of N. As N increases, so does L, whereas M remains constant throughout. Therefore, for large values of N, the computation times for A 1 and A 3 (4) are much lower than their respective counterparts. This ordering would reverse whenever L < M. The perceptive reader will have noticed that we can easily estimate how the computation cost for the approximate filters would change as M increases by comparing the computation costs for A 1 and A 3 with those of A 2 and A 4 . The difference between A 1 and A 3 (5) remains relatively small for all values of N, as illustrated more clearly in Fig. 5. With additional order of approximation (meaning additional computation), A 3 becomes more like A 1 . But recall that computing A 1 involves an inversion of a matrix, albeit of a smaller size, whereas computing A 3 does not involve inversion of any kind. This tradeoff is beneficial in many cases.

VI. CONCLUSION
In this article, we derive four approximate Wiener filter formulas using a truncation technique based on the principal components of a composite covariance matrix. Our approximate filters do not directly involve the inverse of the covariance matrix. This results in the stable performance of our filters even as the covariance matrix becomes increasingly ill-conditioned, as demonstrated by our results with real data. These results show deteriorating performance for the Wiener filter, demonstrating its numerical unreliability for covariance matrices with large condition numbers. We describe two methods with varying complexity for optimally trading off computation for accuracy using our filters. We also show that the approximate filter formulas converge to the Wiener filter in terms of asymptotic scaling laws. The numerical comparison of computation times demonstrates that the computation of our filters comes at a small extra cost compared to the Wiener filter and is even more efficient in some use cases.
Finally, it has not escaped our notice that our method of approximating Wiener filters in eigensubspaces spanned by the joint principal components (i.e., those of the composite covariance matrix) suggests a comparison with reduced-rank Wiener filters in Krylov subspaces composed by C YY and C YX [9], [10]. Their performance relative to our filters, as well as the two well-conditioned approximate filters derived using the joint principal components in [12], warrants further investigation.