Skip to Main Content
Document summarization is a task which is difficult to perform automatically, especially if the document is only available as raw pixel data. This paper presents a technique to represent a document as a selection of its most eye-catching pages. The algorithm looks for salient features such as illustrations, diagrams, large titles, headings etc. that cause a page to stand out and ranks its conspicuousness according to the colour, size and number of such elements. A filter function can also be applied to introduce some spread in the selection process, if desired, in order to avoid cases where the extracted pages are too close to each other. The algorithm is intended as part of a document catalogue system and user interface, in which multiple page thumbnails are shown for each document. The aim is to broaden and enrich a documentpsilas visual profile beyond the traditional front cover icon and generally to increase its appeal to potential readers during their browsing experience.