Skip to Main Content
Being able to identify clusters of nearby hosts among Internet clients provides very useful information for a number of internet and p2p applications. Examples of such applications include web applications, request routing in peer-to-peer overlay network, and distributed computing applications. In this paper, we present and formulate the internet host clustering problem. Leveraging previous work on internet host distance measurement, we propose two hierarchical clustering techniques to solve this problem. The first technique is a marker based hierarchical partitioning approach. The second technique is based on the well known K-means clustering algorithm. We evaluated these two approaches in simulation using a representative Internet topology generated with the GT ITM generator for over 1,000 hosts. Our simulation results demonstrate that our algorithmic clustering approaches effectively identify clusters with arbitrary diameters. Our conclusion is that by leveraging previous work on internet host distance estimation, it is possible to cluster Internet hosts to benefit various applications with various requirements.