Loading [MathJax]/extensions/MathZoom.js
Automated Text Extraction from Images using OCR System | IEEE Conference Publication | IEEE Xplore

Automated Text Extraction from Images using OCR System


Abstract:

Digital images are getting popular rapidly. Every day, many images have been generated by many groups like students, engineer, doctors, according to their varying needs. ...Show More

Abstract:

Digital images are getting popular rapidly. Every day, many images have been generated by many groups like students, engineer, doctors, according to their varying needs. They can access images based on its primitive features or associated text. Text present in such images can provide meaningful information. We aim to retrieve the content and summarize the visual information automatically from images. Optical character recognition system that involves several algorithms are required for this purpose. Tesseract is currently the most accurate optical character recognition engine which was developed by HP Labs and is currently owned by Google. In this paper, we extract text from images using text localization, segmentation and binarization techniques. Text extraction can be achieved by applying text detection that identifies image parts containing text, text localization finds the exact position of the text, text segmentation separates the text from its background and binarization process converts the coloured images into binary. On this binary image, character recognition is applied to convert it into ASCII text. Text extraction is used in creating e-books from scanned books, image searching from a collection of visual data etc.
Date of Conference: 13-15 March 2019
Date Added to IEEE Xplore: 13 February 2020
ISBN Information:
Conference Location: New Delhi, India

Contact IEEE to Subscribe

References

References is not available for this document.