Loading [MathJax]/extensions/MathMenu.js
An Image Captioning Model Based on SE-ResNest and EMSA | IEEE Conference Publication | IEEE Xplore

An Image Captioning Model Based on SE-ResNest and EMSA


Abstract:

In recent years, with the development of technologies such as deep learning and attention mechanism, image captions have made great progress. In the traditional image cap...Show More

Abstract:

In recent years, with the development of technologies such as deep learning and attention mechanism, image captions have made great progress. In the traditional image caption model, there are problems of insufficient feature extraction and inaccurate information expression in the decoding process. In view of the above problems, this paper builds a model based on the framework of encoder-decoder, proposes an improvement based on ResNest network architecture in the encoder, and adds Squeeze-and-Excitation module to obtain image feature information. An improved two-layer long short-term memory (LSTM) image caption generation model is proposed on the decoder. Through more efficient multi-head attention, the model can more accurately understand the relationship between features, and generate more accurate and specific text description statements based on complete semantic information. In this paper, experiments are carried out on Flickr8k and Flickr30k datasets. Through the comparative analysis of the experimental results of the evaluation indicators, it is proved that the proposed model can effectively realize image caption and improve the accuracy of generating text description statements.
Date of Conference: 18-20 August 2023
Date Added to IEEE Xplore: 04 December 2023
ISBN Information:
Conference Location: Haikou, China

Contact IEEE to Subscribe

References

References is not available for this document.