By Topic

Exploiting unlabeled data for improving accuracy of predictive data mining

Sign In

Cookies must be enabled to login.After enabling cookies , please use refresh or reload or ctrl+f5 on the browser for the login options.

Formats Non-Member Member
$31 $13
Learn how you can qualify for the best price for this item!
Become an IEEE Member or Subscribe to
IEEE Xplore for exclusive pricing!
close button

puzzle piece

IEEE membership options for an individual and IEEE Xplore subscriptions for an organization offer the most affordable access to essential journal articles, conference papers, standards, eBooks, and eLearning courses.

Learn more about:

IEEE membership

IEEE Xplore subscriptions

5 Author(s)
Kang Peng ; Center for Inf. Sci. & Technol., Temple Univ., Philadelphia, PA, USA ; Vucetic, S. ; Han, B. ; Hongbo Xie
more authors

Predictive data mining typically relies on labeled data without exploiting a much larger amount of available unlabeled data. We show that using unlabeled data can be beneficial in a range of important prediction problems and therefore should be an integral part of the learning process. Given an unlabeled dataset representative of the underlying distribution and a K-class labeled sample that might be biased, our approach is to learn K contrast classifiers each trained to discriminate a certain class of labeled data from the unlabeled population. We illustrate that contrast classifiers can be useful in one-class classification, outlier detection, density estimation, and learning from biased data. The advantages of the proposed approach are demonstrated by an extensive evaluation on synthetic data followed by real-life bioinformatics applications for (1) ranking PubMed articles by their relevance to protein disorder and (2) cost-effective enlargement of a disordered protein database.

Published in:

Data Mining, 2003. ICDM 2003. Third IEEE International Conference on

Date of Conference:

19-22 Nov. 2003