Skip to Main Content
Principal component analysis (PCA) is a dimensionality reduction technique used in most fields of science and engineering. It aims to find linear combinations of the input variables that maximize variance. A problem with PCA is that it typically assigns nonzero loadings to all the variables, which in high dimensional problems can require a very large number of coefficients. But in many applications, the aim is to obtain a massive reduction in the number of coefficients. There are two very different types of sparse PCA problems: sparse loadings PCA (slPCA) which zeros out loadings (while generally keeping all of the variables) and sparse variable PCA which zeros out whole variables (typically leaving less than half of them). In this paper, we propose a new svPCA, which we call sparse variable noisy PCA (svnPCA). It is based on a statistical model, and this gives access to a range of modeling and inferential tools. Estimation is based on optimizing a novel penalized log-likelihood able to zero out whole variables rather than just some loadings. The estimation algorithm is based on the geodesic steepest descent algorithm. Finally, we develop a novel form of Bayesian information criterion (BIC) for tuning parameter selection. The svnPCA algorithm is applied to both simulated data and real functional magnetic resonance imaging (fMRI) data.