By Topic

A document segmentation, classification and recognition system

Sign In

Cookies must be enabled to login.After enabling cookies , please use refresh or reload or ctrl+f5 on the browser for the login options.

Formats Non-Member Member
$33 $13
Learn how you can qualify for the best price for this item!
Become an IEEE Member or Subscribe to
IEEE Xplore for exclusive pricing!
close button

puzzle piece

IEEE membership options for an individual and IEEE Xplore subscriptions for an organization offer the most affordable access to essential journal articles, conference papers, standards, eBooks, and eLearning courses.

Learn more about:

IEEE membership

IEEE Xplore subscriptions

4 Author(s)
F. Y. Shih ; Dept. of Comput. & Inf. Sci., New Jersey Inst. of Technol., Newark, NJ, USA ; S. -S. Chen ; D. C. D. Hung ; P. A. Ng

A discussion is given on a document segmentation, classification and recognition system for automatically reading daily-received office documents that have complex layout structures, such as multiple columns and mixed-mode contents of texts, graphics and half-tone pictures. First, the block segmentation employs a two-step run-length smoothing algorithm for decomposing any document into single-mode blocks. Next, based on clustering rules the block classification classifies each block into one of text, horizontal or vertical lines, graphics, and pictures. The text block is separated into isolated characters using projection profiles, and which are translated into ASCII codes through a font- and size-independent character recognition subsystem. Logo pictures discriminated from half-tone pictures are identified and converted into symbolic words. The experimental results show that the proposed system is capable of correctly reading different styles of mixed-mode printed documents

Published in:

Systems Integration, 1992. ICSI '92., Proceedings of the Second International Conference on

Date of Conference:

15-18 Jun 1992