By Topic

The PerfSim Algorithm for Concept Drift Detection in Imbalanced Data

Sign In

Cookies must be enabled to login.After enabling cookies , please use refresh or reload or ctrl+f5 on the browser for the login options.

Formats Non-Member Member
$31 $13
Learn how you can qualify for the best price for this item!
Become an IEEE Member or Subscribe to
IEEE Xplore for exclusive pricing!
close button

puzzle piece

IEEE membership options for an individual and IEEE Xplore subscriptions for an organization offer the most affordable access to essential journal articles, conference papers, standards, eBooks, and eLearning courses.

Learn more about:

IEEE membership

IEEE Xplore subscriptions

3 Author(s)
Antwi, D.K. ; Sch. of Electr. Eng. & Comput. Sci., Univ. of Ottawa, Ottawa, ON, Canada ; Viktor, H.L. ; Japkowicz, N.

There is currently a surge of interest in adaptive learning algorithms for applications ranging from ozone level peak predictions, learning stock market indicators, and detecting smart phone usage patterns. In such scenarios, the detection of change (or drift) in the concept being learned is important to ensure that correct, timely and relevant models are constructed. In addition, such data is often imbalanced and, to further complicate the issue, we are frequently interested in learning the minority class. It follows that ignoring these two aspects during learning may lead to unreliable, or even incorrect, models being built. In this research we discuss the interplay between concept drift detection and imbalanced data sets in order to ensure reliable results. We introduce a novel algorithm that, rather than considering a single performance evaluation measure such as accuracy for change detection, considers all the components of a confusion matrix and employs the cosine similarity coefficient. We evaluate our algorithm against a real world mobile phone database, as well as benchmarking datasets, and we compare it with two other state-of-the-art methods. The results show that our approach is particularly sensitive to concept drifts occurring in imbalanced data sets. Our evaluation indicates that our algorithm is able to detect concept drift reliably. Further, our method is shown to perform very well compared to the other techniques, especially when the drift occurs in the minority class of a class imbalance problem.

Published in:

Data Mining Workshops (ICDMW), 2012 IEEE 12th International Conference on

Date of Conference:

10-10 Dec. 2012