Automated Image Annotation with Voice Synthesis Using Machine Learning | IEEE Conference Publication | IEEE Xplore

Automated Image Annotation with Voice Synthesis Using Machine Learning


Abstract:

The goal of our project is to develop automated image annotator with voice synthesis using machine learning that is developed with help of CNN (Convolutional Neural Netwo...Show More

Abstract:

The goal of our project is to develop automated image annotator with voice synthesis using machine learning that is developed with help of CNN (Convolutional Neural Network) and LSTM architecture to generate the audio and textual captions. The framework involves several steps, incorporating the image embedding, image feature discovery and retrieval using VGG16, data preprocessing, tokenization, and LSTM-based caption generation to guarantee the production of accurate and rich captions. VGG16 architecture allows efficient separation of image features and creation of descriptive, meaningful captions. The project test results show that the proposed system produces accurate and multiple captions and sounds for the images. This machine holds the potential across various applications, such as aiding people with visual impairment, understand visual details or enhancing the multi-media content by providing more descriptive captions.
Date of Conference: 12-14 July 2024
Date Added to IEEE Xplore: 04 October 2024
ISBN Information:
Conference Location: RAIPUR, India

I. Introduction

This project aims to create an automated image annotator with voice synthesis using the CNN, LSTM and Python, which can generate easily understandable descriptions of images. This technology has become increasingly popular in recent years. This model first extracts the features from the given image by using CNN and then the LSTM architecture analyzes those features to predict the correct textual output description. Finally, it generates a text description of the image and its features, which is converted into speech via voice synthesis.

Contact IEEE to Subscribe

References

References is not available for this document.