Skip to Main Content
Missing data is a well-recognized issue in data mining, and imputation is one way to handle the problem. In this paper, we propose a novel tree-based imputation algorithm called Â¿imputation treeÂ¿ (ITree). It first studies the predictability of missingness using all observations by constructing a binary classification tree called Â¿missing pattern treeÂ¿ (MPT). Then, missing values in each cluster or terminal node are estimated by a regression tree of observations at that node. We present empirical results using both synthetic and real data. Almost all experiments demonstrate that ITree is superior to other commonly used methods in estimating missing values. The algorithm not only produces an impressive accuracy, but also provides information on the nature of missingness.