Abstract:
Existing change detection (CD) methods often directly fuse the multi-level features from bi-temporal remote sensing images without discriminatively considering each pixel...Show MoreMetadata
Abstract:
Existing change detection (CD) methods often directly fuse the multi-level features from bi-temporal remote sensing images without discriminatively considering each pixel's importance. Despite the demonstrated success, unselectively mixing the features degrades the model's performance to effectively capture the change targets due to the imbalance ratio between the change regions and the whole scene. To this end, this paper presents a glance, focus, and refinement network (GFRNet), which formulates CD as a continuous, step-by-step focusing process to mimic the human visual system. Specifically, the GFRNet first employs a transformer encoder to extract the global features from the bi-temporal images, where each feature takes a glance at the whole scene. Then, the GFRNet gradually pays attention to a cascade of salient regions, and ultimately progressively refines its focus on the desired areas of change. Comprehensive evaluations on two extensively utilized benchmark datasets, including LEVIR-CD and WHU-CD, demonstrate the superiority of our GFR-Net to a variety of state-of-the-art methods.
Published in: ICASSP 2024 - 2024 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)
Date of Conference: 14-19 April 2024
Date Added to IEEE Xplore: 18 March 2024
ISBN Information: