Close category search window
 

A robust technique for text extraction in mixed-type binary documents

Sign In

Cookies must be enabled to login.After enabling cookies , please use refresh or reload or ctrl+f5 on the browser for the login options.

Formats Non-Member Member
$31 $13
Learn how you can qualify for the best price for this item!
Become an IEEE Member or Subscribe to
IEEE Xplore for exclusive pricing!
close button

puzzle piece

IEEE membership options for an individual and IEEE Xplore subscriptions for an organization offer the most affordable access to essential journal articles, conference papers, standards, eBooks, and eLearning courses.

Learn more about:

IEEE membership

IEEE Xplore subscriptions

2 Author(s)
Strouthopoulos, C. ; Dept. of Inf. & Commun., Technol. Educ. Inst. of Serres, Serres ; Nikolaidis, A.

A crucial preprocessing stage in applications such as OCR is text extraction from mixed-type documents. The present work, in contrast to most until now, successfully faces the problem of varying text orientation and size. The technique first identifies marks using a contour following technique, followed by a PCA (principal component analyzer) which determines the direction of the main axis of each mark. Next, a nearest-neighbor technique is employed to find the shortest distances between marks, and a feature vector is formed based on calculated mark dimensions and distances, which is then fed into a SOFM (self organizing feature map) which defines homogeneous mark clusters. Resulting cluster weights and variances are used to form a set of fuzzy rules, and a fuzzy classification scheme identifies marks as characters or non-characters. The technique succeeds in correctly and quickly extracting text areas in a variety of mixed-type documents.

Published in:
Pattern Recognition, 2008. ICPR 2008. 19th International Conference on

Date of Conference: 8-11 Dec. 2008

Need Help?


IEEE Advancing Technology for Humanity About IEEE Xplore | Contact | Help | Terms of Use | Nondiscrimination Policy | Site Map | Privacy & Opting Out of Cookies

A not-for-profit organization, IEEE is the world's largest professional association for the advancement of technology.
© Copyright 2013 IEEE - All rights reserved. Use of this web site signifies your agreement to the terms and conditions.