Abstract:
The ground object distribution in remote sensing images exhibits strong regularities. However, existing deep learning-based object detection models often focus solely on ...Show MoreMetadata
Abstract:
The ground object distribution in remote sensing images exhibits strong regularities. However, existing deep learning-based object detection models often focus solely on instance-level information within sample labels, neglecting the modeling of relationships between instances. Additionally, these models fail to effectively utilize the substantial background information present in remote sensing images. To address these issues, we propose a remote sensing object detection network named SCENE-YOLO, building upon the YOLOv8 architecture and introducing scene supervision. First, we introduce a scene information gathering and distribute network (SGD), based on transformer, to inject high-level semantic information into the feature pyramid. A slice-and-distribute mechanism is employed to prevent information loss during feature fusion across layers. Second, the backbone network is redesigned, incorporating the attention mechanism of omni-dimensional dynamic convolution (ODConv) to dynamically redistribute weights for target features. Subsequently, a scene label generation algorithm (SLGA) based on prototype learning is proposed to supervise the model by generating scene-level labels, modeling instance-instance relationships through the introduction of artificial knowledge and multilevel classification. Finally, a scene-assisted detection head (SADHead) is introduced to enhance detection performance in complex backgrounds by leveraging scene features with global contextual information to assist the model in target classification. Experimental validation on the publicly available DOTA and DIOR datasets demonstrates the effectiveness and superiority of the proposed algorithm.
Published in: IEEE Transactions on Geoscience and Remote Sensing ( Volume: 63)