Loading [MathJax]/extensions/MathMenu.js
Text to voice conversion of text embedded in images | IEEE Conference Publication | IEEE Xplore

Text to voice conversion of text embedded in images


Abstract:

Natural language processing (NLP) and image processing have seen recent advancements with the goal of developing intelligent systems that would improve quality of life. T...Show More

Abstract:

Natural language processing (NLP) and image processing have seen recent advancements with the goal of developing intelligent systems that would improve quality of life. This research suggests a quick approach for text identification, information extraction from photos, and speech synthesis of extracted information. To start, the input image is first improved through grayscale conversion. The modified image's text parts are then located using the Maximally Stable External Regions (MSER) feature detector. The non-text MSERs are removed by exploiting geometrical features and the stroke width transform. In order to identify text sequences that are later divided into words, individual letter or alphabet groups are then formed. The words are finally converted to digital form using optical character recognition (OCR). In the final stage, a speech synthesizer (TTS) is fed our text along with the identified text to convert the text to voice. The suggested technique is used to test web-based photos, extract text information from the images, convert the recognized text into speech that corresponds to the user's chosen language, and improve the accuracy and robustness of the proposed work.
Date of Conference: 16-17 February 2024
Date Added to IEEE Xplore: 23 April 2024
ISBN Information:
Conference Location: Bangalore, India

Contact IEEE to Subscribe

References

References is not available for this document.