Ensemble Principal Component Analysis

Efficient representations of data are essential for processing, exploration, and human understanding, and Principal Component Analysis (PCA) is one of the most common dimensionality reduction techniques used for the analysis of large, multivariate datasets today. Two well-known limitations of the method include sensitivity to outliers and noise and no clear methodology for the uncertainty quantification of the principal components or their associated explained variances. Whereas previous work has focused on each of these problems individually, we propose a scalable method called Ensemble PCA (EPCA) that addresses them simultaneously for data which has an inherently low-rank structure. EPCA combines boostrapped PCA with $k$ -means cluster analysis to handle challenges associated with sign-ambiguity and the re-ordering of components in the PCA subsamples. EPCA provides a noise-resistant extension of PCA that lends itself naturally to uncertainty quantification. We test EPCA on data corrupted with white noise, sparse noise, and outliers against both classical PCA and Robust PCA (RPCA) and show that EPCA performs competitively across different noise scenarios, with a clear advantage on datasets containing outliers and orders of magnitude reduction in computational cost compared to RPCA.


Introduction
Across the engineering, physical, biological and social sciences, data science methods are a dominant paradigm for the processing, exploration, and understanding of emerging big data.Indeed, modern sensor technologies are revolutionizing the automated collection of large data sets, which must be processed in order to aid in human understanding and decision making.Critical for such interpretability is the generation of low-dimensional feature spaces, or dominant patterns, which can be extracted from high-dimensional and multivariate data streams [20,38].One of the most successful methods for extracting dominant features in data is principle component analysis (PCA), which provides a characterization of dominant correlated activity using the underling mathematical 1 arXiv:2311.01826v1[stat.CO] 3 Nov 2023 algorithm of the singular value decomposition (SVD).The SVD has been so prolific, that it has been independently developed for dimensionality-reduction and feature extraction across a number of fields, each with their own name, including proper orthogonal decomposition (POD) [3,4,25], empirical orthogonal functions (EOFs) [18,27], Karhunen-Loéve expansion [10,24], and the Hoteling transform [19].Computational algorithms of the early 1970s [15] allowed for the robust and mature extraction of PCA modes from data, which has made PCA a classical data analysis technique.However, PCA is sensitive to noise and outliers, and the extraction of PCA modes does not come with any uncertainty quantification (UQ) metrics.We address both of these issues in our innovation of Ensemble PCA, which builds upon classic PCA in order to stabilize PCA models and provide clear UQ metrics for a given data set.
The motivations for dimensionality reduction are extensive.Often, the features of high-dimensional data exhibit partial redundancy and dependency.Beyond reducing redundancy, extracting the "most important" features of a data matrix that best summarize the information contained in a signal and removing irrelevant features helps practitioners both to work with limited computational resources and to understand the underlying data structure.Further, projecting high-dimensional data into 2D or 3D space can be extremely beneficial for human visualization.Too many features can complicate data analysis and visualization, and dimensionality reduction helps avoid these pitfalls.There are two main classes of dimensionality reduction techniques: linear and nonlinear.At their core, linear techniques use linear transformations to shift and stretch data.Examples include SVD, PCA, and Fisher's linear discriminant analysis (LDA).These methods are a cornerstone of analyzing data due to their simple geometric interpretations and typically attractive computational properties [7].They are particularly useful when the data lies in a linear subspace and where the original variables are replaced by a smaller set of underlying variables.Nonlinear techniques involve more complicated data transformations and include t-distributed stochastic neighbor embedding (t-SNE) [36], Isomap [2], and autoencoders [16,37].Nonlinear methods are generally more powerful than their linear counterparts, but can be slow to optimize and many are non-deterministic, meaning they get different, locally-optimal solutions on each run [32].
The focus of this work is linear dimensionality reduction, particularly PCA [21], as it is arguably the most common dimensionality reduction technique used for the analysis of large, multivariate datasets today.Innovations in randomized methods now allow PCA to work at massive scales [11,12,17].Extensions of PCA include Kernel PCA, which runs the algorithm on a feature space determined by the kernel [28], probabilistic PCA [34], which is targeted for data with missing entries, and sparse PCA [41], which seeks to summarize data using combinations of only a few input variables.
Regardless of extension, PCA has two well-known shortcomings.The first is sensitivity to outliers, where entire rows of the data matrix may be contaminated, and sparse noise, where individual entries of the data matrix may be affected.The second drawback is that there is no clear methodology for UQ.Previous work has focused on each of these problems individually.Robust PCA extends PCA to the sparse noise domain [5], and bootstrapping has been explored as a method for estimating sampling variability [13].To our knowledge, there has not been an extensive analysis of bootstrapping for the purpose of developing a noise-resistant PCA.Gabrys et.al [14] suggested that statistical resampling could be applied to PCA to recover the true components of datasets corrupted with large outliers, but failed to test this concept on a sufficient number of datasets and wholly ignored UQ.Drawing inspiration from [14] we propose Ensemble Principal Component Analysis (EPCA), which ensembles bootstrapping with k-means cluster analysis to create a noise-resistant approach.We test EPCA against RPCA and standard PCA on seven datasets corrupted with sparse noise, white noise, and outliers.We show that EPCA achieves maximum performance on datasets with outliers and performs competitively on datasets with other types of noise.EPCA also naturally supports uncertainty quantification and provides an orders of magnitude reduction in computational cost compared to RPCA.

Related Work
PCA operates as follows.Let be a data matrix with N samples and m features.Define the mean-centered matrix as X = X − µ X , where each feature is centered by its mean.The sample covariance matrix C ∈ R m×m of X is then defined as Since C is symmetric, it can be diagonalized as where V contains the eigenvectors of C and Λ is a diagonal matrix containing the eigenvalues λ i on the diagonal, ordered by decreasing magnitude.The principal components of X are defined as XV.The principal component associated with the largest eigenvalue projects the data onto the direction of greatest variance, while the eigenvalue measures the amount of information that can be explained by that projection [35].The d largest eigenvalues are referred to as explained variance.
The results of PCA can be unreliable under data corruption.We identify three main classes of noise: white noise, sparse noise, and outliers.Under white noise, all entries of a data matrix X are slightly perturbed.Under sparse noise, a small subset of the elements of X are corrupt with some probability p, and with outliers, multiplicative noise is applied to a given percentage of rows of X. PCA will break down if even one entry of X is grossly corrupted [26].
The sensitivity of PCA has motivated the development of noise-resistant extensions.Directions of research include the use of robust estimators of scatter as opposed to sample covariance [35] and the development of Robust Principal Component Analysis (RPCA).RPCA sets the standard for dimensionality reduction under sparse noise, and is generally applied for foreground detection [5].Mathematically, RPCA assumes we observe a data matrix X ∈ R N ×m of the following form X = L 0 + S 0 + Z 0 , where L 0 is low-rank, S 0 is sparse, and Z 0 is a noise term containing i.i.d.noise on each entry [6,39].Generally, RPCA recovers L 0 and S 0 by solving the optimization problem in (3) min subject to some constraint on the L and S matrices, such as X = L + S or ||X − L − S|| ≤ δ, where δ is some tuning parameter and the norm can be customized [5].In Expression (3), ||•|| * denotes the nuclear norm, ||•|| 1 denotes the sum of the absolute values of matrix entries, and α > 0 is a tuning parameter controlling the regularization of S. Equation ( 3) is referred to as the Principal Component Pursuit (PCP) problem, and a handful of methods have been suggested for solving PCP [5,26].PCP can recover L and S exactly [6], but is limited to the case where the low-rank component is exactly low-rank and the sparse component is exactly sparse.In addition, existing algorithms for solving PCP are computationally expensive and require extensive parameter tuning [5].Another set of extensions to PCA focus on missing data, where entries of X are deleted at random.Most approaches specialized for dealing with missingness focus on data imputation [22,40].However, missingness can be considered a sub-category of sparse noise, and methods such as RPCA continue to work in this context.
Another drawback of PCA is that it offers no clear method for estimating the sampling variablity of its descriptive outputs.Though some work has explored analytical, asymptotic confidence intervals (CIs) for principal components, the methods are often either computationally infeasible or require strong assumptions on the data [13].An alternative approach is using bootstrap-based CIs.Bootstrapping is a way to assess the variability of a statistic of interest through random sampling with replacement.It does not assume any distribution for the estimates of the uncertainties, and can be applied to most statistics [1].
In the context of PCA, boostrapping estimates the variability of PCA across different samples of the population [13].From a theoretical standpoint, boostrapping is generally not useful unless we are interested in inference concerning only a few large eigenvalues, which are well-separated from the bulk and of multiplicity one [23], i,e.data of inherently low-rank.Though bootstrapping has been applied for the uncertainty quantification of PCA, very little work has been done using bootstrapping as a method for robustification.
Gabrys et.al [14] suggested using statistical resampling as a way to recover the principal components of a data matrix corrupted with outliers, and taking inspiration from his work, we propose EPCA, which ensembles bootstrapping PCA with k-means clustering to create a method for the dimensionality reduction and analysis of low-rank, noisy data that also lends itself to uncertainty quantification.By using k-means to aggregate the output of boostrapped samples, we circumvent the challenges associated with component re-ordering and sign ambiguity.We test the performance of EPCA against classical PCA and Robust PCA, with respect to runtime and accuracy, on datasets corrupted with sparse noise, white noise, and outliers.

Ensemble Principle Component Analysis (EPCA)
Given a data matrix X ∈ R N ×m , EPCA samples B bags of size n at random with replacement.PCA is run on each of the B samples, the principal components are stored in a matrix P (j) , and the eigenvalues are stored in a matrix Λ (j) for j ∈ [1, B].The goal is to summarize the results of our B samples to output d dominant modes.
Two challenges are that there is rotational variability in the principal components found by PCA and that the identified components can be re-ordered in the subsamples [33].Most boostrapping PCA approaches tackle the first issue by using a Procrustean rotation to match the boostrap PCs to the PCs obtained by running PCA on the entire dataset [29] or by rotating the PCs towards some pre-specified target matrix T [33].Instead, we create the matrix P (2)  . . .
by stacking all of the principal components found in the bags and their reflections.In this way, every principal component is stored along with its reflection, regardless of initial orientation.We also stack the corresponding eigenvalues accordingly, creating Λ (2)  . . .
The next step is to run k-means clustering on P to output 2d clusters.This approach automatically clusters components that are oriented in the same direction and avoids any challenges associated with re-ordering.We use the nor- Λ (2)  ⋮ P (2)  ⋮ Figure 1: Ensemble Principal Component Analysis (EPCA).Given a data matrix with inherently low-rank structure, we sample B bags of size n at random with replacement.We run PCA and store d principal components P (j) and their corresponding eigenvalues Λ (j) for each bag.We create P by stacking all components, along with their reflections to account for rotational variability.
We also stack all eigenvalues in Λ, in accordance with the order in P. Next, we run k-means clustering on P with 2d clusters.
malized cluster centers of our d rotationally unique clusters as our predicted principal components, and the averages of the eigenvalues associated with the members of each cluster as our predicted eigenvalues.We order our final predicted components according to the magnitude of the average predicted eigenvalues.EPCA is visualized in Figure 1 and explained in Algorithm 1.

Uncertainty Quantification
Like all other boostrapping PCA methods [1,13,29,33], EPCA lends itself naturally to uncertainty quantification.There are many approaches to estimate a confidence interval (CI) from the boostrap distribution, but we focus on the percentile method [33].the same dataset containing 5% outliers of scale 10, respectively.We show the results of three runs, since the results of EPCA vary based on the random bootstrap samples.Figures 2a and 3a show the 95% CIs for the principal components.The tightness of our CIs correlates with the level of confidence in our output.We note that the CIs are significantly tighter for the clean data, and even though there is greater uncertainty on the corrupted data, EPCA is still able to closely identify the true components in two of the three pictured runs.Figures 2b and 3b show the distributions of the respective eigenvalues, which tell us the amount of variance explained by each of the principal components.On both the original and corrupt data, the interquartile ranges (IQR) of the variances capture the true values, but on the corrupted data, the IQRs are skewed higher and slightly wider.Gross outliers have been removed from the boxplots in Figure 3b.Even though the explained variances are more difficult to capture on the outlier data, they at least tell us about the order of the principal components and their separation.In practice, the eigenvalues of the covariance matrix are not used in PCA's projection of data into a lower-dimensional space, so their explicit values are not as important as the principal components themselves.

Experiments
We test EPCA against RPCA (solved using Augmented Lagrange Multipliers [26]) and classical PCA on seven datasets with four types of added noise: sparse, Gaussian white, uniform white, and outliers.We include three small, three medium, and one large dataset in our analysis.Our base datasets, in order of increasing size, are iris [31], wine [31], breast cancer Wisconsin (WBC) [31], a synthetic wave, the digits 0 and 1 from MNIST [8], and sea surface temperature (SST) [30].

Parameter Selection
The first type of noise is sparse noise, where each entry in the data matrix X is corrupt with probability p.The value of corrupt entries is set to c.As the number of features grows, the number of corrupted entries will also grow.The next two types of noise are types of white noise, where noise from either the uniform distribution with mean 0 and variance v or the Gaussian distribution with mean 0 and variance v are added to X.We ensure the variance is small with respect to the dominant singular value of X, as this is a level of corruption under which classical PCA should still perform well [9].In practice, we set v = σ1 f , where σ 1 is the dominant singular value of X and f is some variance divisor.Finally, we create outliers by multiplying a randomly selected s% of the rows of X by an outlier scale of S. We expect RPCA to perform best on sparse data, PCA to perform well even in the presence of low-variance white noise [9], and EPCA to perform best on outlier data.
For all datasets, we perform EPCA with B = 100 bags, but vary the size n of the bags.We take n to be larger on datasets with white noise and sparse noise and smaller on datasets with outliers.Since white noise is applied to all entries of X and sparse noise will impact most entries as X becomes higherdimensional, choosing larger bag sizes n helps "mute" the impact of the noise.Contrastingly, since outliers impact only s% of entries, choosing smaller bags helps prevent the selection of an outlier.In RPCA, the parameter controlling the extent of the regularization on the sparse part of X is set to α = 0.20.The remaining parameters for the Augmented Lagrange Multipliers algorithm used to solve RPCA are set to their default values.

Evaluation
Given an initial data matrix X, we calculate the true PCA modes using classical PCA.We then corrupt X and run PCA, RPCA, and EPCA on each of the corrupted datasets and quantify the percent relative error for the components using Equation 6.
where t are the true components and p the predicted components.As we saw in Section 4, the eigenvalues of the covariance matrix in EPCA are skewed on noisy data.However, the eigenvalues are not involved in PCA's low-dimensional mapping, only in ordering the predicted components.Therefore, we consider only the error in the principal components in our experiments.

Various Levels of Noise
Our first set of experiments addresses how PCA, EPCA, and RPCA respond to various levels of noise.For sparse noise, we vary both the probability p of an entry being corrupt and the scale c of the corruption.For both normal and uniform white noise, we vary the scale of the variance divisor f .Finally, for outliers, we vary both the percentage s of corrupt rows and the scale S. We take each of six datasets, excluding SST, and randomly add noise of a given level five times, resulting in 30 datasets per noise level.We do not include the SST dataset in this analysis, as RPCA is unable to produce a result within 120s of runtime.
Since the results of PCA and RPCA are deterministic,while the results of EPCA are stochastic due to the randomness in the bagging procedure, we run PCA and RPCA once and EPCA five times on each of the datasets.We average each method's respective relative errors together to get an average percent relative error for every level of corruption.

Fixed Levels of Noise
In the second set of experiments, we compare the performance of PCA, RPCA, and EPCA on datasets with fixed levels of corruption to test variability in performance.For sparse noise, we set the probability of an entry being corrupt to p = 0.01 and the scale of the corruption to c = 2, for both Gaussian and uniform white noise, we set the variance divisor to f = 1000, so that the variance is v = σ1 1000 , and for outliers, we corrupt s = 5 percent of rows with scale S = 5.For each of our seven datasets and each type of noise, we repeat random corruption and evaluation 100 times, resulting in 700 runs of each method per noise category.In our analysis, we consider only the output of a single run of EPCA.We return boxplots of percent relative error over the 100 trials.Once again, RPCA is unable to produce a result on the SST dataset before a timeout at 120s; therefore, RPCA's boxplots do not include performance metrics on SST.We also carry out a runtime comparison among the three methods.

Various Levels of Noise
Figure 4 compares the average performance of PCA, EPCA, and RPCA over datasets corrupted with various levels of sparse noise, determined by the probability p of an entry being corrupt and the scale c of the corruption.As we expect, RPCA performs best on sparse noise, most noticeably as the probability and scale of the noise increases.The exception is when the noise scale is set to 0, simulating missing data.However, it is possible that the RPCA regularization parameter α is simply not tuned optimally in this case.When p = 0.01, EPCA consistently performs second-best.When p = 0.05, EPCA performs second-best in terms of error in the first principal component and slightly worse than PCA on the second principal component.Finally when p = .10,EPCA generally performs worst of the three methods.Ultimately, RPCA is the preferred choice on sparse data, but EPCA is a competitive choice when our noise probability is very small.Gaussian and uniform white noise sampled from distributions with different variances v = σ1 f .When the variance divisor f is very large, PCA performs best in terms of the average percent relative error in the first and second principal components.As the variance divisor becomes smaller, PCA, EPCA, and RPCA perform very similarly to one another, but PCA maintains its advantage.
Figure 6 displays the performance of PCA, EPCA, and RPCA on data corrupted with outliers at various percentages s and scales S. We observe that for both principal components, when the percentage of outliers is 15% or lower, EPCA consistently achieves the lowest average percent relative error, regardless of outlier scale.As the percentage of outliers increases, EPCA loses its advantage.We conclude that EPCA will generally outperform PCA and RPCA on outlier data, when the percentage of outliers remains small, regardless of the scale of those outliers.

Fixed Level of Noise
We investigate the performance of PCA, EPCA, and RPCA over datasets with fixed levels of corruption.Recall that all datasets are formed by adding one of four types of noise to seven base datasets 100 times.The first aspect of performance we consider is runtime.It is well known that the runtime of RCPA does not scale well with the size of the problem [5].In contrast, since the runtime of EPCA is influenced by the number B and size n of bootstrapped samples, we can mitigate runtime challenges on larger datasets.As PCA is a subroutine of EPCA, we do not expect EPCA to outperform it.
For each of our seven corrupted datasets, we create boxplots of the spread of the runtime of each method, as seen in Figure 7.We summarize runtime for digits 0 and 1 in MNIST together.Across all datasets, PCA achieves the fastest runtime.On the smallest datasets, iris (X ∈ R 140×4 ), wine (X ∈ R 178×13 ), and WBC (X ∈ R 569×30 ), RPCA and EPCA run in time on the same order of magnitude, while PCA runs one to two orders of magniutde faster.On the medium-sized datasets wave (X ∈ R 6000×200 ) and MNIST 0 and 1 (X ∈ R 5923×784 , X ∈ R 6742×784 ), RPCA runs two orders of magnitude slower than EPCA, but PCA and EPCA run on the same order of magnitude.Finally, on the largest SST data (X ∈ R 1726×64800 ), RPCA is unable to provide an output, timing out after 120s, while EPCA and PCA run on the same order of magnitude.We conclude that on small datasets, RPCA and EPCA have similar runtime.However, unlike RPCA, which is infeasible to run on larger datasets, our method EPCA scales no worse than classical PCA.
The second aspect of performance that we consider is percent relative error in the predicted first and second components.Figure 8 shows boxplots of error for each of the three methods for 100 runs of each type of data corruption over our seven datasets.Outliers in the boxplots have been removed for easier visualization.As expected, on datasets where sparse noise is added, RPCA is able to identify the true principal components with the lowest median percent relative errors and the least variability in its results, as evidenced by tighter interquartile ranges (IQRs).Recall that this performance comes at the cost of a much higher runtime.Though EPCA performs with significantly more error than RPCA, we note that EPCA achieves a similar median error to PCA, as well as a tighter IQR for error in the first PC and both a lower median error and slightly tighter IQR for the second PC.
For uniform white noise, classical PCA outperforms the other methods with both the smallest IQRs and lowest median errors for both principal components.EPCA performs second best in both categories.On data with normal white noise, PCA performs best and EPCA second-best in terms of median error in both PCs.RPCA has the tightest IQR for the first PC, and EPCA for the second.
Finally, on datasets containing outliers, EPCA outperforms the other methods, achieving a lower median percent relative error for both principal components.EPCA also has the tightest IQR for the first PC and the second-tightest for the second PC.

Conclusion
We propose EPCA as a scalable extension of PCA that operates well in the presence of various types of noise and lends itself naturally to uncertainty quantification.Our innovative ensembling of boostrapping and k-means clustering allows us to automatically handle the challenges of principal component reordering and sign ambiguity in boostrapping PCA.We test the performance of EPCA against RPCA, which is specialized for datasets corrupt with sparse noise, and classical PCA, which should operate well even in the presence of  For every corrupted dataset, we create boxplots of the spread of the runtime of 400 runs of each method.For MNIST, we summarize runtime for the digits 0 and 1 together.PCA consistently achieves the fastest runtime.On the smallest datasets, iris, wine, and WBC, RPCA and EPCA run in time on the same order of magnitude, while PCA runs one to two orders of magnitude faster.On the medium-sized datasets, wave and MNIST, RPCA runs two orders of magnitude slower than EPCA, but PCA and EPCA run on the same order of magnitude.Finally, on the largest SST data, RPCA is unable to provide an output, timing out after 120s, while EPCA and PCA run on the same order of magnitude.
Figure 8: Summary of the performance of PCA, EPCA, and RPCA on noisy data following 100 trials of random corruption and evaluation on seven datasets.RPCA's boxplots do not include performance metrics from the SST dataset, due to timeout.(a) For sparse noise (p = 0.01, c = 2), RPCA outperforms the competitors with both lower and tighter IQRs for both principal components.EPCA performs second best with lower median error and tighter IQRs than PCA.(b) and (c) For data corrupted with uniform and Gaussian white noise (v = σ1 1000 ), PCA performs best with tighter and lower IQRs.EPCA achives the second lowest percent relative error on both PCs.(d) For outliers (s = 5, S = 5), EPCA outperforms the competitors with lower median errors for both principal components.EPCA also achieves the tightest IQR on the first and second tightest IQR on the second PC.low variance white noise.Further, we carry out a runtime analysis of all three methods on datasets of various sizes.
Overall, EPCA is unable to outperform RPCA on sparsely corrupted data or classical PCA on data with added white noise.However, EPCA performs second-best in both noise domains.On datasets containing gross outliers, EPCA significantly outperforms both PCA and RPCA and maintains this performance advantage, regardless of the scale of the outliers, as long as the percentage of outliers remains low.We conclude that although EPCA has a noise domain where it performs best, the method remains useful for all types of data corruption.An added bonus of using EPCA is that unlike classical PCA or RPCA, confidence intervals for the components and the explained variance of those components can be computed naturally.Finally, as data size increases, EPCA is significantly faster than RPCA and scales no worse than PCA, making it particularly attractive for the analysis of larger datasets.As each PCA subroutine in EPCA is independent of the rest, the potential for the parallelization of EPCA should be explored as future work.

Figures 2 and 3
show UQ for three runs of EPCA on a clean dataset and Algorithm 1 Ensemble Principal Component Analysis (a) Mean center the data.X = X − µ X (b) Select B bags of size n with replacement to create X (j) for j ∈ [1, B].(c) Run PCA on each of the B bags to output d eigenvalues Λ (j) and d principal components P (j) (d) Stack all P (j) and their reflections to create P. Stack corresponding Λ (j) to create Λ. (e) Run k-means clustering on P with 2d clusters to cluster principal components oriented in the same direction.(f) Output the directionally unique d average principal components, their corresponding average eigenvalues, and the variances of both.

Figure 2 :Figure 3 :
Figure 2: Uncertainty Quantification of three runs of EPCA on wave data.(a) The 95% Confidence Intervals (CIs) associated with the top two principal components.The CIs are tight, suggesting high confidence in the predicted components.(b) Boxplots of the distribution of the explained variance of each of the principal components.The true eigenvalues of the covariance matrix are contained within the interquartile ranges (IQRs) of all three trials.

Figure 5
Figure4compares the average performance of PCA, EPCA, and RPCA over datasets corrupted with various levels of sparse noise, determined by the probability p of an entry being corrupt and the scale c of the corruption.As we expect, RPCA performs best on sparse noise, most noticeably as the probability and scale of the noise increases.The exception is when the noise scale is set to 0, simulating missing data.However, it is possible that the RPCA regularization parameter α is simply not tuned optimally in this case.When p = 0.01, EPCA consistently performs second-best.When p = 0.05, EPCA performs second-best in terms of error in the first principal component and slightly worse than PCA on the second principal component.Finally when p = .10,EPCA generally performs worst of the three methods.Ultimately, RPCA is the preferred choice on sparse data, but EPCA is a competitive choice when our noise probability is very small.Figure5explores the performance of the three methods on data with added

Figure 4 :
Figure 4: Average percent relative error of predicted principal components from PCA, EPCA, and RPCA for data corrupted with sparse noise of different probabilities and scales.

Figure 7 :
Figure7: Runtime Comparison of PCA, EPCA, and RPCA.For every corrupted dataset, we create boxplots of the spread of the runtime of 400 runs of each method.For MNIST, we summarize runtime for the digits 0 and 1 together.PCA consistently achieves the fastest runtime.On the smallest datasets, iris, wine, and WBC, RPCA and EPCA run in time on the same order of magnitude, while PCA runs one to two orders of magnitude faster.On the medium-sized datasets, wave and MNIST, RPCA runs two orders of magnitude slower than EPCA, but PCA and EPCA run on the same order of magnitude.Finally, on the largest SST data, RPCA is unable to provide an output, timing out after 120s, while EPCA and PCA run on the same order of magnitude.