Skip to Main Content
Visually summarizing web pages is an attractive approach that provides users an effective and friendly interface to identify desired contents at a first glance for search and re-finding tasks. Using dominant images in web pages is generally reliable for this purpose. However, dominant images are often unavailable in many web pages. To solve this problem, we first propose a new approach to summarize those web pages without any dominant images by retrieving relevant external images from the Internet. However, relevant external images are sometimes unreliable. To take the advantages of these two kinds of images, we further propose a clustering based algorithm to select the best summarization among all of internal and external images. This algorithm leverages relevance and dominance of images as the prior information. Experimental results show that our approach achieves 0.098 and 0.082 NDCG1 gain on a human labeled data set, compared with relevant external image and dominant image, respectively. Our user study also indicates that the images selected by our algorithm are useful as the summarization of web pages.