Home  |   Login  |   Logout  |   Access Information  |   Alerts  |   Purchase History  |   Cart  |   Sitemap  |   Help   
 
Abstract
BROWSE SEARCH IEEE XPLORE GUIDE SUPPORT
arrow_leftView TOC
Email/Printer Friendly Format  
 

A fast and flexible statistical method for text extraction indocument pages
Parodi, P.   Piccioli, G.  
Dept. of Comput. Sci., Toronto Univ., Ont.;

This paper appears in: Computer Vision and Pattern Recognition, 1996. Proceedings CVPR '96, 1996 IEEE Computer Society Conference on
Publication Date: 18-20 Jun 1996
On page(s): 619-624
Meeting Date: 06/18/1996 - 06/20/1996
Location: San Francisco, CA, USA
ISBN: 0-8186-7259-5
References Cited: 8
INSPEC Accession Number: 5329331
Digital Object Identifier: 10.1109/CVPR.1996.517137
Current Version Published: 2002-08-06

Abstract
This paper describes a fast and flexible method for extracting text regions from a document page containing text, graphics, and pictures. Such regions can be given as an input to an OCR system. The user fixes two parameters, the minimum width w of the text to be detected, and the precision ε needed (both expressed as a percentage of the image width), according to the implementation needs. The method works by subdividing the page into overlapping columns whose width and inter-shift depend on w and ε, and by performing text lines extraction on each column separately. Successively, a statistical analysis of the text line elements found in each column is performed, and they are connected to form complete text lines. Finally, related pieces of text are merged into blocks so that a sensible reading order is provided for the OCR system. The algorithm is very fast, is able to work on low-resolution document pages and is robust against skew. The algorithm as also very flexible: no assumptions are made on the layout of the document, the shape of the text regions, and the font size and style; the main assumption is that the background is uniform and the text approximately horizontal. Despite the statistical nature of the method, a single line of text of a certain font size is generally sufficient to warrant detection. Experimental results are shown which demonstrate the effectiveness of the method on several different kinds of documents

Index Terms
Available to subscribers and IEEE members.

References
Available to subscribers and IEEE members.
Citing Documents
Available to subscribers and IEEE members.
You are not logged in.
Guests may access Abstract records free of charge.
Login
Username
Password
» Forgot your password?
Please remember to log out when you have finished your session.
You must log in to access:
• Advanced or Author Search
• CrossRef Search
• AbstractPlus Records
• Full Text PDF
• Full Text HTML
Access this document
Full Text: PDF (512 KB)
» Buy this document now
»  Learn more about
»  Learn more about
    purchasing articles
    and standards

Rights and Permissions
» Learn More
Download this citation
Available to subscribers and IEEE members.
 
arrow_leftView TOC   |  Back to toparrow_up
Indexed by IEE Inspec
© Copyright 2009 IEEE – All Rights Reserved