By Topic

Automated OCR Ground Truth Generation

Sign In

Cookies must be enabled to login.After enabling cookies , please use refresh or reload or ctrl+f5 on the browser for the login options.

Formats Non-Member Member
$31 $13
Learn how you can qualify for the best price for this item!
Become an IEEE Member or Subscribe to
IEEE Xplore for exclusive pricing!
close button

puzzle piece

IEEE membership options for an individual and IEEE Xplore subscriptions for an organization offer the most affordable access to essential journal articles, conference papers, standards, eBooks, and eLearning courses.

Learn more about:

IEEE membership

IEEE Xplore subscriptions

3 Author(s)

Most optical character recognition (OCR) systems need to be trained and tested on the symbols that are to be recognized. Therefore, ground truth data is needed. This data consists of character images together with their ASCII code. Among the approaches for generating ground truth of real world data, one promising technique is to use electronic version of the scanned documents. Using an alignment method, the character bounding boxes extracted from the electronic document are matched to the scanned image. Current alignment methods are not robust to different similarity transforms. They also need calibration to deal with non-linear local distortions introduced by the printing/scanning process. In this paper we present a significant improvement over existing methods, allowing to skip the calibration step and having a more accurate alignment, under all similarity transforms. Our method finds a robust and pixel accurate scanner independent alignment of the scanned image with the electronic document, allowing the extraction of accurate ground truth character information. The accuracy of the alignment is demonstrated using documents from the UW3 dataset. The results show that the mean distance between the estimated and the ground truth character bounding box position is less than one pixel.

Published in:

Document Analysis Systems, 2008. DAS '08. The Eighth IAPR International Workshop on

Date of Conference:

16-19 Sept. 2008