Microarray experiments can generate data sets with multiple missing expression values, normally due to various experimental problems. Unfortunately, many algorithms for gene expression analysis require a complete matrix of gene array values as input. Thereore, effective missing value estimation methods are essential to minimize the effect of incomplete data sets on analysis, and to increase the range of data sets to which these algorithms can be applied. In this regard, a new interpolation based imputation method is proposed to predict missing values in microarray gene expression data. The proposed method selects a subset of similar genes and a subset of similar samples with respect to each missing position and then applies interpolation in a novel manner to predict that missing value. The performance of the proposed method is studied based on the normalized root mean square error with existing estimation techniques including K-nearest neighbor (KNN), Sequential K-nearest neighbor (SKNN) and Iterative K-nearest neighbor (IKNN). The effectiveness of the proposed method, along with a comparison with existing methods, is demonstrated on different microarray data sets.
Published in:
Communications, Devices and Intelligent Systems (CODIS), 2012 International Conference on
Date of Conference: 28-29 Dec. 2012