Loading [a11y]/accessibility-menu.js
Attending to Transforms: A Survey on Transformer-based Image Captioning | IEEE Conference Publication | IEEE Xplore

Attending to Transforms: A Survey on Transformer-based Image Captioning


Abstract:

Image captioning is a challenging task that lies at the intersection of Computer Vision and Natural Language Processing. There exists a legion of works that generate mean...Show More

Abstract:

Image captioning is a challenging task that lies at the intersection of Computer Vision and Natural Language Processing. There exists a legion of works that generate meaningful and realistic descriptions of images. Recently, with the advent of attention mechanisms and transformers, there has been a drastic shift in modelling both language and vision tasks. However, there are very few extensive studies that review these approaches based on their progression, advantages and disadvantages. This paper presents a detailed summary of transformer-based models employed for tackling image captioning. In addition to this, we provide an overview of various pre-training tasks, datasets and metrics used for image captioning. Finally, the performance of all the reviewed approaches are compared on the COCO Captions dataset.
Date of Conference: 05-06 April 2023
Date Added to IEEE Xplore: 02 June 2023
ISBN Information:
Conference Location: Nagpur, India

References

References is not available for this document.