By Topic

Clustering-based Missing Value Imputation for Data Preprocessing

Sign In

Cookies must be enabled to login.After enabling cookies , please use refresh or reload or ctrl+f5 on the browser for the login options.

Formats Non-Member Member
$31 $13
Learn how you can qualify for the best price for this item!
Become an IEEE Member or Subscribe to
IEEE Xplore for exclusive pricing!
close button

puzzle piece

IEEE membership options for an individual and IEEE Xplore subscriptions for an organization offer the most affordable access to essential journal articles, conference papers, standards, eBooks, and eLearning courses.

Learn more about:

IEEE membership

IEEE Xplore subscriptions

5 Author(s)
Zhang, C. ; Fac. of Inf. Technol., Univ. of Technol. Sydney, Broadway, NSW ; Yongsong Qin ; Xiaofeng Zhu ; Jilian Zhang
more authors

Missing value imputation is an actual yet challenging issue confronted by machine learning and data mining. Existing missing value imputation is a procedure that replaces the missing values in a dataset by some plausible values. The plausible values are generally generated from the dataset using a deterministic, or random method. In this paper we propose a new and efficient missing value imputation based on data clustering, called CRI (clustering-based random imputation). In our approach, we fill up the missing values of an instance with those plausible values that are generated from the data similar to this instance using a kernel-based random method. Specifically, we first divide the dataset (exclude instances with missing values) into clusters. And then each of those instances with missing-values is assigned to a cluster most similar to it. Finally, missing values of an instance A are thus patched up with those plausible values that are generated using a kernel-based method to those instances from A's cluster. Our experiments (some of them are with the decision tree induction system C 5.0) have proved the effectiveness of our proposed method in missing value imputation task.

Published in:

Industrial Informatics, 2006 IEEE International Conference on

Date of Conference:

16-18 Aug. 2006