I. Introduction
Change captioning aims to generate a natural language sentence to describe the difference between a pair of similar images. Compared to conventional change detection [1], [2], [3], change captioning not only needs to localize accurate object changes, but also requires a high-level linguistic expression ability to semantically refer to which object has changed. Hence, not only does this task provide a deeper understanding about changes in a scene, but also has many practical applications, such as automatically generating reports about changes for the monitored facilities and areas [4], as well as about the pathological changes between medical images [5].