Skip to Main Content
Distinguishing potential new cluster data from outliers is a main problem in mining new pattern from evolving data streams. Meanwhile, all the clustering algorithms inherited from CluStream framework are distribution-based learning which are realized via a sliding window, so this problem becomes more obvious. This paper proposes a three-step clustering algorithm, rDenStream, based on DenStream, which includes outlier retrospect learning. During rDenStream clustering, dropped micro-clusters are stored on outside memory temporarily, and when a new cluster is discovered, these micro-clusters are learned retrospectively to find formally inaccurately-discarded data, which will improve the accuracy of the new cluster. rDenStream has important meaning in applications which require high-accuracy clustering from evolving data. Considering the data stream feature in NIDS, this paper models the arriving time of new pattern data as non-homogeneous Poisson distribution. Experiments over standard data set show its advantage over other methods in the early phase of new pattern discovery.
Intelligent Computing and Intelligent Systems, 2009. ICIS 2009. IEEE International Conference on (Volume:1 )
Date of Conference: 20-22 Nov. 2009