Distinguishing potential new cluster data from outliers is a main problem in mining new pattern from evolving data streams. Meanwhile, all the clustering algorithms inherited from CluStream framework are distribution-based learning which are realized via a sliding window, so this problem becomes more obvious. This paper proposes a three-step clustering algorithm, rDenStream, based on DenStream, which includes outlier retrospect learning. During rDenStream clustering, dropped micro-clusters are stored on outside memory temporarily, and when a new cluster is discovered, these micro-clusters are learned retrospectively to find formally inaccurately-discarded data, which will improve the accuracy of the new cluster. rDenStream has important meaning in applications which require high-accuracy clustering from evolving data. Considering the data stream feature in NIDS, this paper models the arriving time of new pattern data as non-homogeneous Poisson distribution. Experiments over standard data set show its advantage over other methods in the early phase of new pattern discovery.
Published in:
Intelligent Computing and Intelligent Systems, 2009. ICIS 2009. IEEE International Conference on
(Volume:1
)
Date of Conference: 20-22 Nov. 2009