By Topic

Document image retrieval based on 2D density distributions of terms with pseudo relevance feedback

Sign In

Cookies must be enabled to login.After enabling cookies , please use refresh or reload or ctrl+f5 on the browser for the login options.

Formats Non-Member Member
$31 $13
Learn how you can qualify for the best price for this item!
Become an IEEE Member or Subscribe to
IEEE Xplore for exclusive pricing!
close button

puzzle piece

IEEE membership options for an individual and IEEE Xplore subscriptions for an organization offer the most affordable access to essential journal articles, conference papers, standards, eBooks, and eLearning courses.

Learn more about:

IEEE membership

IEEE Xplore subscriptions

3 Author(s)
Kise, K. ; Dept. of Comput. & Syst. Sci., Osaka Prefecture Univ., Japan ; Yin Wuotang ; Matsumoto, K.

Document image retrieval is a task to retrieve document images relevant to a user's query. Most existing methods based on word-level indexing rely on the representation called "bag of words" which originated in the field of information retrieval. This paper presents a new representation of documents that utilizes additional information about the location of words in pages so as to improve the retrieval performance. We consider that pages are relevant to a query if they contain its terms densely. This notion is embodied as density distributions of terms calculated in the proposed method. Its performance is improved with the help of "pseudo relevance feedback", i.e., a method of expanding a query by analyzing pages. Experimental results on English document images show that the proposed method is superior to conventional methods of electronic document retrieval at recall levels 0.0-0.6.

Published in:

Document Analysis and Recognition, 2003. Proceedings. Seventh International Conference on

Date of Conference:

3-6 Aug. 2003