A survey of methods and strategies in character segmentation
Casey, R.G.
Lecolinet, E.
IBM Almaden Res. Center, San Jose, CA;
This paper appears in: Pattern Analysis and Machine Intelligence, IEEE Transactions on
Publication Date: Jul 1996
Volume: 18,
Issue: 7
On page(s): 690-706
ISSN: 0162-8828
References Cited: 89
CODEN: ITPIDJ
INSPEC Accession Number: 5349782
Digital Object Identifier: 10.1109/34.506792
Current Version Published: 2002-08-06
Abstract
Character segmentation has long been a critical area of the OCR
process. The higher recognition rates for isolated characters vs. those
obtained for words and connected character strings well illustrate this
fact. A good part of recent progress in reading unconstrained printed
and written text may be ascribed to more insightful handling of
segmentation. This paper provides a review of these advances. The aim is
to provide an appreciation for the range of techniques that have been
developed, rather than to simply list sources. Segmentation methods are
listed under four main headings. What may be termed the
“classical” approach consists of methods that partition the
input image into subimages, which are then classified. The operation of
attempting to decompose the image into classifiable units is called
“dissection.” The second class of methods avoids dissection,
and segments the image either explicitly, by classification of
prespecified windows, or implicitly by classification of subsets of
spatial features collected from the image as a whole. The third strategy
is a hybrid of the first two, employing dissection together with
recombination rules to define potential segments, but using
classification to select from the range of admissible segmentation
possibilities offered by these subimages. Finally, holistic approaches
that avoid segmentation by recognizing entire character strings as units
are described
Index
Terms
Available to subscribers and IEEE members.
References
Available to subscribers and IEEE members.
Citing Documents
Available to subscribers and IEEE members.