Conferences >2023 IEEE 6th International C...

An Image Captioning Model Based on SE-ResNest and EMSA

Download PDF
Download References
Request Permissions
Save to
Alerts

Abstract:

In recent years, with the development of technologies such as deep learning and attention mechanism, image captions have made great progress. In the traditional image cap...Show More

Metadata

Abstract:

In recent years, with the development of technologies such as deep learning and attention mechanism, image captions have made great progress. In the traditional image caption model, there are problems of insufficient feature extraction and inaccurate information expression in the decoding process. In view of the above problems, this paper builds a model based on the framework of encoder-decoder, proposes an improvement based on ResNest network architecture in the encoder, and adds Squeeze-and-Excitation module to obtain image feature information. An improved two-layer long short-term memory (LSTM) image caption generation model is proposed on the decoder. Through more efficient multi-head attention, the model can more accurately understand the relationship between features, and generate more accurate and specific text description statements based on complete semantic information. In this paper, experiments are carried out on Flickr8k and Flickr30k datasets. Through the comparative analysis of the experimental results of the evaluation indicators, it is proved that the proposed model can effectively realize image caption and improve the accuracy of generating text description statements.

Published in: 2023 IEEE 6th International Conference on Pattern Recognition and Artificial Intelligence (PRAI)

Date of Conference: 18-20 August 2023

Date Added to IEEE Xplore: 04 December 2023

ISBN Information:

DOI: 10.1109/PRAI59366.2023.10332008

Conference Location: Haikou, China

Contents

I. Introduction

Image captioning is a task involving computer vision and natural language processing. Its purpose is to design an image through an algorithm so that the computer can understand the image content and translate it into a descriptive text. Image caption has a wide range of applications in intelligent transportation, network image analysis, providing guidance to medical practitioners [1], and helping visually impaired people perceive the surrounding environment.

References is not available for this document.

An Image Captioning Model Based on SE-ResNest and EMSA

Abstract:

Metadata

Abstract:

I. Introduction

References

IEEE Account

Purchase Details

Profile Information

Need Help?

An Image Captioning Model Based on SE-ResNest and EMSA

Alerts

Abstract:

Metadata

Abstract:

I. Introduction

Authors

Figures

References

Citations

Keywords

Metrics

References

IEEE Account

Purchase Details

Profile Information

Need Help?