Journals & Magazines >IEEE Transactions on Geoscien... >Volume: 62

Cross-Modal Prealigned Method With Global and Local Information for Remote Sensing Image and Text Retrieval

Download PDF
Download References
Request Permissions
Save to
Alerts

Abstract:

In recent years, remote sensing cross-modal text-image retrieval (RSCTIR) has attracted considerable attention owing to its convenience and information mining capabilitie...Show More

Metadata

Abstract:

In recent years, remote sensing cross-modal text-image retrieval (RSCTIR) has attracted considerable attention owing to its convenience and information mining capabilities. However, two significant challenges persist: effectively integrating global and local information during feature extraction due to substantial variations in remote sensing imagery, and the failure of existing methods to adequately consider feature prealignment before modal fusion, resulting in complex modal interactions that adversely impact retrieval accuracy and efficiency. To address these challenges, we propose a cross-modal prealigned method with global and local information (CMPAGL) for remote sensing imagery. Specifically, we design a global-Swin (Gswin) Transformer block, which introduces a global information window on top of the local window attention mechanism, synergistically combining local window self-attention and global-local window cross-attention to effectively capture multiscale features of remote sensing images. In addition, our approach incorporates a prealignment mechanism to mitigate the training difficulty of modal fusion, thereby enhancing retrieval accuracy. Moreover, we propose a similarity matrix reweighting (SMR) reranking algorithm to deeply exploit information from the similarity matrix during the retrieval process. This algorithm combines forward and backward ranking, extreme difference ratio, and other factors to reweight the similarity matrix, thereby further enhancing retrieval accuracy. Finally, we optimize the triplet loss function by introducing an intraclass distance term for matched image-text pairs, not only focusing on the relative distance between matched and unmatched pairs but also minimizing the distance within matched pairs. Experiments on four public remote sensing text-image datasets, including RSICD, RSITMD, UCM-Captions, and Sydney-Captions, demonstrate the effectiveness of our proposed method, achieving improvements over state-of-the-art methods,...

Published in: IEEE Transactions on Geoscience and Remote Sensing ( Volume: 62)

Article Sequence Number: 4709118

Date of Publication: 01 November 2024

ISSN Information:

DOI: 10.1109/TGRS.2024.3489224

Funding Agency:

Contents

References is not available for this document.

Cross-Modal Prealigned Method With Global and Local Information for Remote Sensing Image and Text Retrieval

Abstract:

Metadata

Abstract:

ISSN Information:

Funding Agency:

References

IEEE Account

Purchase Details

Profile Information

Need Help?

Cross-Modal Prealigned Method With Global and Local Information for Remote Sensing Image and Text Retrieval

Alerts

Abstract:

Metadata

Abstract:

ISSN Information:

Funding Agency:

Authors

Figures

References

Citations

Keywords

Metrics

References

IEEE Account

Purchase Details

Profile Information

Need Help?