Skip to Main Content
Identifying relevant signals within high-dimensional observations is an important preprocessing step for efficient data analysis. However, many classical dimension reduction techniques such as principal component analysis do not take the often rich statistics of real-world data into account, and thereby fail if for example the signal space is of low power but meaningful in terms of some other statistics. With “colored subspace analysis,” we propose a method for linear dimension reduction that evaluates the time structure of the multivariate observations. We differentiate the signal subspace from noise by searching for a subspace of nontrivially autocorrelated data. We prove that the resulting signal subspace is uniquely determined by the data, given that all white components have been removed. Algorithmically we propose three efficient methods to perform this search, based on joint diagonalization, using a component clustering scheme, and via joint low-rank approximation. In contrast to temporal mixture approaches from blind signal processing we do not need a generative model, i.e., we do not require the existence of sources, so the model is applicable to any wide-sense stationary time series without restrictions. Moreover, since the method is based on second-order time structure, it can be efficiently implemented and applied even in large dimensions. Numerical examples together with an application to dimension reduction of functional MRI recordings demonstrate the usefulness of the proposed method. The implementation is publicly available as a Matlab package at http://cmb.helmholtz-muenchen.de/CSA.