By Topic

Research in data stream clustering based on Gaussian Mixture Model Genetic Algorithm

Sign In

Cookies must be enabled to login.After enabling cookies , please use refresh or reload or ctrl+f5 on the browser for the login options.

Formats Non-Member Member
$33 $13
Learn how you can qualify for the best price for this item!
Become an IEEE Member or Subscribe to
IEEE Xplore for exclusive pricing!
close button

puzzle piece

IEEE membership options for an individual and IEEE Xplore subscriptions for an organization offer the most affordable access to essential journal articles, conference papers, standards, eBooks, and eLearning courses.

Learn more about:

IEEE membership

IEEE Xplore subscriptions

3 Author(s)
Ming-ming Gao ; School of Control and Computer Engineering, North China Electric Power University, Beijing, China ; Chang Tai-hua ; Xiang-xiang Gao

Clustering data streams is one of the important branches in mining data streams. Because of dynamic and massive characteristics of data streams, traditional data mining algorithms could not satisfy the requirement of online analysis and the appropriate value of number of clusters. The focus on data stream technologies is to design one-pass scan data set, and maintain an effective data structure in memory incrementally which is far smaller than the size of whole data set. In the paper proposes a new feature mining method named Gaussian Mixture Model with Genetic Algorithm (GMMGA), based on an extending method of Gaussian mixture model. This method is use a probability density based data stream clustering which requires only the newly arrived data, not the entire historical data. The GMMGA algorithm can determine the number of Gaussian clusters and the parameters of each Gaussian component through random split and merge operation of Genetic Algorithm. In the GMMGA, a function was made to threshold value to clusters to reduce the bad clusters effect on the clustering result. In this algorithm, it can improve the robustness and accuracy of the clustering numbers, also can save memory and run time. Experimental results show that the method is effective and has higher clustering precision compared with conventional STREAM algorithm and CluStream algorithm.

Published in:

The 2nd International Conference on Information Science and Engineering

Date of Conference:

4-6 Dec. 2010