Skip to Main Content
In this paper, we proposed a two phase grid-based clustering algorithm to partition network traffic data. The first phase is a grid-based preclustering stage. The domain space is divided into un-overlapping d-dimensional cells. The second phase is a novel partition-based clustering procedure we referred to as k-hypercells. It directly takes the populated cells created by the first phase as the source data for clustering. The algorithm can automatically decide the number of clusters and is designed specially for handling the high-dimensional categorical data records. The experimental result shows that our algorithm is efficient and effective for compressing and partitioning high-dimensional large data spaces.