Document image binarization based on texture features
Ying Liu
Srihari, S.N.
Center of Excellence for Document Anal. & Recognition, State Univ. of New York, Buffalo, NY;
This paper appears in: Pattern Analysis and Machine Intelligence, IEEE Transactions on
Publication Date: May 1997
Volume: 19,
Issue: 5
On page(s): 540-544
ISSN: 0162-8828
References Cited: 16
CODEN: ITPIDJ
INSPEC Accession Number: 5606042
Digital Object Identifier: 10.1109/34.589217
Current Version Published: 2002-08-06
Abstract
Binarization has been difficult for document images with poor
contrast, strong noise, complex patterns, and/or variable modalities in
gray-scale histograms. We developed a texture feature based thresholding
algorithm to address this problem. Our algorithm consists of three
steps: 1) candidate thresholds are produced through iterative use of
Otsu's algorithm (1978); 2) texture features associated with each
candidate threshold are extracted from the run-length histogram of the
accordingly binarized image; 3) the optimal threshold is selected so
that desirable document texture features are preserved. Experiments with
9,000 machine printed address blocks from an unconstrained US mail
stream demonstrated that over 99.6 percent of the images were
successfully binarized by the new thresholding method, appreciably
better than those obtained by typical existing thresholding techniques.
Also, a system run with 500 troublesome mail address blocks showed that
an 8.1 percent higher character recognition rate was achieved with our
algorithm as compared with Otsu's algorithm
Index
Terms
Available to subscribers and IEEE members.
References
Available to subscribers and IEEE members.
Citing Documents
Available to subscribers and IEEE members.