Skip to Main Content
Data stream clustering is confronted with great challenges due to the memory usages and the processing speed. Besides, lots of stream data are high-dimensional in natural and high-dimensional data are inherently more complex in clustering. This paper proposes an effective clustering algorithm referred as HSWStream for high dimensional data stream over sliding windows. This algorithm handles the high dimensional problem with projected clustering technique, deals with the in-cluster evolution with exponential histogram of cluster feature called EHCF and eliminates the influence of old points with the fading temporal cluster features. Mean- while, via the mechanism of exponential histogram, we save more information of recent data but less information of old data, which is fit for the thought of data stream evolution. The projected clustering brings higher quality of clusters and higher speed of execution, while the sliding window brings higher quality and less memory usage. In addition, in order to bring more efficiency, we use a fast computational method to main- tain EHCF. Main idea of the fast computational method indicates that we have no need to handle the new data point immediately until we should delete a FTCF in corresponding EHCF. The evolving data streams in the experiments use KDD- CUP'98 and KDD-CUP'99 real data sets and synthetic data sets. The experimental results demonstrate that proposed method is of higher quality, less memory and faster processing speed than other algorithms.