By Topic

A prototype document image analysis system for technical journals

Sign In

Cookies must be enabled to login.After enabling cookies , please use refresh or reload or ctrl+f5 on the browser for the login options.

Formats Non-Member Member
$31 $13
Learn how you can qualify for the best price for this item!
Become an IEEE Member or Subscribe to
IEEE Xplore for exclusive pricing!
close button

puzzle piece

IEEE membership options for an individual and IEEE Xplore subscriptions for an organization offer the most affordable access to essential journal articles, conference papers, standards, eBooks, and eLearning courses.

Learn more about:

IEEE membership

IEEE Xplore subscriptions

3 Author(s)
Nagy, G. ; Dept. of Electr.-Comput.-Syst. Eng., Rensselaer Polytech. Inst., Troy, NY, USA ; Seth, S. ; Viswanathan, M.

Gobbledoc, a system providing remote access to stored documents, which is based on syntactic document analysis and optical character recognition (OCR), is discussed. In Gobbledoc, image processing, document analysis, and OCR operations take place in batch mode when the documents are acquired. The document image acquisition process and the knowledge base that must be entered into the system to process a family of page images are described. The process by which the X-Y tree data structure converts a 2-D page-segmentation problem into a series of 1-D string-parsing problems that can be tackled using conventional compiler tools is also described. Syntactic analysis is used in Gobbledoc to divide each page into labeled rectangular blocks. Blocks labeled text are converted by OCR to obtain a secondary (ASCII) document representation. Since such symbolic files are better suited for computerized search than for human access to the document content and because too many visual layout clues are lost in the OCR process (including some special characters), Gobbledoc preserves the original block images for human browsing. Storage, networking, and display issues specific to document images are also discussed.<>

Published in:

Computer  (Volume:25 ,  Issue: 7 )