Skip to Main Content
Clustering data streams is one of the important branches in mining data streams. Because of dynamic and massive characteristics of data streams, traditional data mining algorithms could not satisfy the requirement of online analysis and the appropriate value of number of clusters. The focus on data stream technologies is to design one-pass scan data set, and maintain an effective data structure in memory incrementally which is far smaller than the size of whole data set. In the paper proposes a new feature mining method named Gaussian Mixture Model with Genetic Algorithm (GMMGA), based on an extending method of Gaussian mixture model. This method is use a probability density based data stream clustering which requires only the newly arrived data, not the entire historical data. The GMMGA algorithm can determine the number of Gaussian clusters and the parameters of each Gaussian component through random split and merge operation of Genetic Algorithm. In the GMMGA, a function was made to threshold value to clusters to reduce the bad clusters effect on the clustering result. In this algorithm, it can improve the robustness and accuracy of the clustering numbers, also can save memory and run time. Experimental results show that the method is effective and has higher clustering precision compared with conventional STREAM algorithm and CluStream algorithm.