Abstract:
Remote sensing image change captioning (RSICC) is a novel task that aims to describe the differences between bitemporal images by natural language. Previous methods ignor...Show MoreMetadata
Abstract:
Remote sensing image change captioning (RSICC) is a novel task that aims to describe the differences between bitemporal images by natural language. Previous methods ignore a significant specificity of the task: the difficulty of RSICC is different for unchanged and changed image pairs. They process the unchanged and changed image pairs in a coupled way, which usually causes confusion for change captioning. In this article, we decouple the task into two issues to ease it: whether and what changes have occurred. An image-level classifier performs binary classification to address the first issue. A feature-level encoder contributes to extracting discriminative features to help the caption generation module address the second issue. Besides, for caption generation, we utilize prompt learning to introduce pretrained large language models (LLMs) into the RSICC task. A multiprompt learning strategy is proposed to generate a set of unified prompts and a class-specific prompt conditioned on the image-level classifier’s results. The strategy can prompt a pretrained LLM to know whether changes exist and generate captions. Finally, the multiple prompts and the visual features of the feature-level encoder are fed into a frozen LLM for language generation. Compared with previous methods, our method can leverage the powerful abilities of the pretrained LLM in language to generate plausible captions, which is free of training. Extensive experiments show that our method is effective and achieves the state-of-the-art performance. Besides, an additional experiment demonstrates that our decoupling paradigm is more promising than the previous coupled paradigm for the RSICC task. We will make our codebase publicly available to facilitate future research at https://github.com/Chen-Yang-Liu/PromptCC.
Published in: IEEE Transactions on Geoscience and Remote Sensing ( Volume: 61)