I. Introduction
Image captioning is a task involving computer vision and natural language processing. Its purpose is to design an image through an algorithm so that the computer can understand the image content and translate it into a descriptive text. Image caption has a wide range of applications in intelligent transportation, network image analysis, providing guidance to medical practitioners [1], and helping visually impaired people perceive the surrounding environment.