Boosting Multimodal Remote Sensing Image Classification via Prompt-Driven Fusion | IEEE Journals & Magazine | IEEE Xplore

Boosting Multimodal Remote Sensing Image Classification via Prompt-Driven Fusion


Abstract:

Prompt tuning has emerged as a powerful approach in the era for foundation models, enabling efficient use of pretrained knowledge while minimizing resource demands. We in...Show More

Abstract:

Prompt tuning has emerged as a powerful approach in the era for foundation models, enabling efficient use of pretrained knowledge while minimizing resource demands. We introduce prompt tuning to the field of remote sensing (RS) and propose a novel prompt-based multimodal fusion framework called ProMF for RS multimodal image classification. ProMF incorporates a small number of learnable parameters into the input space, while keeping the parameters of pretrained networks frozen during model fine tuning. These additional parameters are prepended to the input sequence of each Transformer layer and trained alongside the linear classification head during fine-tuning. Furthermore, to enhance the feature interaction and fusion, we hierarchically incorporate useful prompts through a novel prompt-embedded multihead self-attention (MSA) mechanism. This approach allows for the learning of complementary representations from different modalities layer by layer, improving the model performance while reducing the risk of overfitting. Experimental results on three commonly used datasets demonstrate that the proposed method outperforms state-of-the-art approaches. The demonstrated effectiveness of multimodal prompt tuning offers a new perspective on adapting pretrained models for RS applications. The code will be publicly available at: https://github.com/zhaolin6/ProMF
Published in: IEEE Geoscience and Remote Sensing Letters ( Volume: 22)
Article Sequence Number: 5503505
Date of Publication: 26 March 2025

ISSN Information:

Funding Agency:


Contact IEEE to Subscribe

References

References is not available for this document.