Skip to Main Content
The analysis of gene expression time series obtained from microarray experiments can be effectively exploited to understand a wide range of biological phenomena from the homeostatic dynamics of cell cycle systems to the response of key genes to the onset of cancer or infectious disease. However, microarray data frequently contain a significant number of missing values making the application of common multivariate analysis methods, all of which require complete expression matrices, difficult. In order to preserve the experimentally expensive non-missing data points in time series gene expression data, methods are needed to estimate the missing values in such a way that preserves the latent interdependencies among time points within individual expression profiles. Thus we propose modeling gene expression profiles as simple linear and Gaussian dynamical systems and apply the Kalman filter to estimate missing values. While other current advanced estimation methods are either sensitive to parameters with no theoretical means of selection or attempt to learn statically from inherently dynamical data, our approach is advantageous exactly because it makes minimal assumptions that are consistent with the biology. We demonstrate the efficiency of our approach by evaluating its performance in estimating artificially introduced missing values in two different time series data sets, and compare it to a Bayesian approach dependent on the eigenvectors of the gene expression matrix as well as a gene wise average imputation for missing values.