Skip to Main Content
Search engine is a common tool to retrieve the information in the Web. But the current status of returned results is still far from satisfaction. Users have to be confronted with searching for a long result list to get the information really wanted. Many works focused on the post processing search results to facilitate users to examine the results. One of the common ways of post processing search result is clustering. Term-based clustering appears as first way to cluster the results. But this method is suffering from the poor quality while the processed pages have little text. Link-based clustering can conquer this problem. But the quality of clusters heavily depends on the number of in-links and out-links in common. In this paper, we propose that the short text attached to in-link is valuable information and it is helpful to reach high clustering quality. To distinguish them with general snippet, we name it as in-snippet. Based on the in-snippet, we propose a new clustering method that combines the links and the in-snippets together. In our method, similarity between pages consists of two parts : link similarity and term similarity. We designed related algorithm to implement clustering. In order to prevent bias from human judgments, the experiment datasets are collected from Open Directory Project(DMOZ). Due to DMOZ is human-edited directory, the datasets from DMOZ has higher quality and larger scale. We use entropy and f-measure to evaluate the quality of the final clusters. By being compared with the link-based and the pure term-based algorithms, our method outperforms others in clustering quality.