By Topic

Multifont OCR Postprocessing System

Sign In

Cookies must be enabled to login.After enabling cookies , please use refresh or reload or ctrl+f5 on the browser for the login options.

Formats Non-Member Member
$33 $33
Learn how you can qualify for the best price for this item!
Become an IEEE Member or Subscribe to
IEEE Xplore for exclusive pricing!
close button

puzzle piece

IEEE membership options for an individual and IEEE Xplore subscriptions for an organization offer the most affordable access to essential journal articles, conference papers, standards, eBooks, and eLearning courses.

Learn more about:

IEEE membership

IEEE Xplore subscriptions

2 Author(s)
W. S. Rosenbaum ; Federal Systems Division, 18100 Frederick Pike, Gaithersburg, Maryland 20760, USA ; J. J. Hilliard

A series of techniques is being developed to postprocess noisy, multifont, nonformatted OCR data on a word basis to 1) determine if a field is alphabetic or numeric; 2) verify that an alphabetic word is legitimate; 3) fetch from a dictionary a set of potential entries using a garbled word as a key; and 4) error-correct the garbled word by selecting the most likely dictionary word. Four algorithms were developed using a technique called vector processing (representing alphabetic words as numeric vectors) and also by applying Bayes maximum likelihood solutions to correct the OCR output. The result was the development of a software simulator which processed sequential fields generated by the Advanced Optical Character Reader (in use by the U.S. Postal Service in New York City), performed the four functions indicated above, and selected the correct alphabetic word from a dictionary of 62,000 entries.

Note: The Institute of Electrical and Electronics Engineers, Incorporated is distributing this Article with permission of the International Business Machines Corporation (IBM) who is the exclusive owner. The recipient of this Article may not assign, sublicense, lease, rent or otherwise transfer, reproduce, prepare derivative works, publicly display or perform, or distribute the Article.  

Published in:

IBM Journal of Research and Development  (Volume:19 ,  Issue: 4 )