While document-image systems for the management of collections of documents, such as forms, offer significant productivity improvements, the entry of information from documents remains a labor-intensive and costly task for most organizations. In this paper, we describe a software system for the machine reading of forms data from their scanned images. We describe its major components: form recognition and “dropout,” intelligent character recognition (ICR), and contextual checking. Finally, we describe applications for which our automated forms reader has been successfully used.
Note: The Institute of Electrical and Electronics Engineers, Incorporated is distributing this Article with permission of the International Business Machines Corporation (IBM) who is the exclusive owner. The recipient of this Article may not assign, sublicense, lease, rent or otherwise transfer, reproduce, prepare derivative works, publicly display or perform, or distribute the Article.