Skip to Main Content
Search result clustering can help users quickly browse through the documents returned by search engine. Traditional clustering techniques are inadequate since they don't generate clusters with highly readable names. Label-based clustering is quite promising, which usually takes n-gram (usually bi-gram) as label candidates. However, meaningless n-grams are not removed from the candidates. In this paper, DF, user log and query context are introduced as label ranking features. An integrated model is used to combine these three features for cluster label ranking. Further more, a novel graph based clustering algorithm (GBCA) for hierarchical clustering is proposed. Experiments indicate that the cluster label extraction makes a great improvement (about 8%) over the baseline in precision, and GBCA outperforms STC and Snaket in F-measure.
Date of Conference: 5-12 Nov. 2007