Abstract:
In recent years, with the development of technologies such as deep learning and attention mechanism, image captions have made great progress. In the traditional image cap...Show MoreMetadata
Abstract:
In recent years, with the development of technologies such as deep learning and attention mechanism, image captions have made great progress. In the traditional image caption model, there are problems of insufficient feature extraction and inaccurate information expression in the decoding process. In view of the above problems, this paper builds a model based on the framework of encoder-decoder, proposes an improvement based on ResNest network architecture in the encoder, and adds Squeeze-and-Excitation module to obtain image feature information. An improved two-layer long short-term memory (LSTM) image caption generation model is proposed on the decoder. Through more efficient multi-head attention, the model can more accurately understand the relationship between features, and generate more accurate and specific text description statements based on complete semantic information. In this paper, experiments are carried out on Flickr8k and Flickr30k datasets. Through the comparative analysis of the experimental results of the evaluation indicators, it is proved that the proposed model can effectively realize image caption and improve the accuracy of generating text description statements.
Published in: 2023 IEEE 6th International Conference on Pattern Recognition and Artificial Intelligence (PRAI)
Date of Conference: 18-20 August 2023
Date Added to IEEE Xplore: 04 December 2023
ISBN Information: