By Topic

Language-free layout analysis

Sign In

Cookies must be enabled to login.After enabling cookies , please use refresh or reload or ctrl+f5 on the browser for the login options.

Formats Non-Member Member
$33 $13
Learn how you can qualify for the best price for this item!
Become an IEEE Member or Subscribe to
IEEE Xplore for exclusive pricing!
close button

puzzle piece

IEEE membership options for an individual and IEEE Xplore subscriptions for an organization offer the most affordable access to essential journal articles, conference papers, standards, eBooks, and eLearning courses.

Learn more about:

IEEE membership

IEEE Xplore subscriptions

2 Author(s)
D. J. Ittner ; AT&T Bell Lab., Murray Hill, NJ, USA ; H. S. Baird

A system for isolating blocks, lines, words, and symbols within images of machine-printed textual documents that is, to a large existent, independent of language and writing system is described. This is achieved by exploiting a small number of nearly universal typesetting and layout conventions. The system does not require prior knowledge of page orientation (module 90°), and copes well with nonzero skew and shear angles (within 10°). Also it locates blocks of text without reliance on detailed a priori layout models, and in spite of unknown or mixed horizontal and vertical text-line orientations. Within blocks, it infers text-line orientation and isolates lines, without knowledge of the language, symbol set, text sizes, or the number of text lines. Segmentation into words and symbols, and determination of reading order, normally require some knowledge of the language: this is held to minimum by relying on shape-driven algorithms. The underlying algorithms are based on Fourier theory, digital signal processing, computational geometry, and statistical decision theory. Most of the computation occurs within algorithms that possess unambiguous semantics (that is, heuristics are kept to a minimum). The effectiveness of the method on English, Japanese, Hebrew, Thai, and Korean documents is discussed

Published in:

Document Analysis and Recognition, 1993., Proceedings of the Second International Conference on

Date of Conference:

20-22 Oct 1993