Loading [a11y]/accessibility-menu.js
Reperceive Global Vision of Transformer for Remote Sensing Images Weakly Supervised Object Localization | IEEE Journals & Magazine | IEEE Xplore

Reperceive Global Vision of Transformer for Remote Sensing Images Weakly Supervised Object Localization


Abstract:

In recent decades, weakly supervised object localization (WSOL) has gained increasing attention in remote sensing. However, unlike optical images, remote sensing images (...Show More

Abstract:

In recent decades, weakly supervised object localization (WSOL) has gained increasing attention in remote sensing. However, unlike optical images, remote sensing images (RSIs) often contain more complex scenes, which poses challenges for WSOL. Traditional convolutional neural network (CNN)-based WSOL methods are often limited by a small receptive field and yield unsatisfactory results. Transformer-based methods can obtain global perception, addressing the limitations of receptive fields in CNN-based methods, yet it may also introduce attention diffusion. To address the aforementioned problems, this article proposes a novel WSOL method based on an interpretable vision transformer (ViT), RPGV. We introduce a feature fusion enhancement module to obtain the saliency map that captures global information. Simultaneously, we solve the problem of discrete attention in the traditional ViT and eliminate local distortion in the feature map by introducing a global semantic screening module. We conduct comprehensive experiments on DIOR and HRRSD datasets, demonstrating the superior performance of our method compared to current state-of-the-art methods.
Page(s): 16902 - 16916
Date of Publication: 11 September 2024

ISSN Information:


References

References is not available for this document.