Think as People: Context-Driven Multi-Image News Captioning with Adaptive Dual Attention | IEEE Conference Publication | IEEE Xplore

Think as People: Context-Driven Multi-Image News Captioning with Adaptive Dual Attention


Abstract:

Automatic image captioning has been extensively studied, however, existing methods primarily focus on a single image. Actually, the demand for captioning multiple images ...Show More

Abstract:

Automatic image captioning has been extensively studied, however, existing methods primarily focus on a single image. Actually, the demand for captioning multiple images and corresponding contextual information has been growing in diverse scenarios, e.g., composing news articles headlines, and electronic medical reports. In this paper, we propose a novel COntext-driven captioning approach for Multi-Image News, called COMIN, which employs a two-step attention mechanism, called adaptive dual attention, comprising global attention for grasping overall context and local attention for finer image details. It is inspired by the observation and cognitive processes of human beings where global attention and local attention are responsible for understanding the high-level features and detailing the low-level features. Experimental results on our newly contributed Star-News dataset show that our proposed model outperforms the state-of-the-art image captioning methods in multi-image captioning scenarios.
Date of Conference: 14-19 April 2024
Date Added to IEEE Xplore: 18 March 2024
ISBN Information:

ISSN Information:

Conference Location: Seoul, Korea, Republic of

Contact IEEE to Subscribe

References

References is not available for this document.