By Topic

A new method for mining regression classes in large data sets

Sign In

Cookies must be enabled to login.After enabling cookies , please use refresh or reload or ctrl+f5 on the browser for the login options.

Formats Non-Member Member
$33 $13
Learn how you can qualify for the best price for this item!
Become an IEEE Member or Subscribe to
IEEE Xplore for exclusive pricing!
close button

puzzle piece

IEEE membership options for an individual and IEEE Xplore subscriptions for an organization offer the most affordable access to essential journal articles, conference papers, standards, eBooks, and eLearning courses.

Learn more about:

IEEE membership

IEEE Xplore subscriptions

3 Author(s)
Yee Leung ; Dept. of Geogr., Chinese Univ. of Hong Kong, Shatin, China ; Jiang-Hong Ma ; Wen-Xiu Zhang

Extracting patterns and models of interest from large databases is attracting much attention in a variety of disciplines. Knowledge discovery in databases (KDD) and data mining (DM) are areas of common interest to researchers in machine learning, pattern recognition, statistics, artificial intelligence, and high performance computing. An effective and robust method, the regression class mixture decomposition (RCMD) method, is proposed for the mining of regression classes in large data sets, especially those contaminated by noise. A concept, called “regression class” which is defined as a subset of the data set that is subject to a regression model, is proposed as a basic building block on which the mining process is based. A large data set is treated as a mixture population in which there are many such regression classes and others not accounted for by the regression models. Iterative and genetic-based algorithms for the optimization of the objective function in the RCMD method are also constructed. It is demonstrated that the RCMD method can resist a very large proportion of noisy data, identify each regression class, assign an inlier set of data points supporting each identified regression class, and determine the a priori unknown number of statistically valid models in the data set. Although the models are extracted sequentially, the final result is almost independent of the extraction order due to a dynamic classification strategy employed in the handling of overlapping regression classes. The effectiveness and robustness of the RCMD method are substantiated by a set of simulation experiments and a real-life application showing the way it can be used to fit mixed data to linear regression classes and nonlinear structures in various situations

Published in:

IEEE Transactions on Pattern Analysis and Machine Intelligence  (Volume:23 ,  Issue: 1 )