Injecting Linguistic Into Visual Backbone: Query-Aware Multimodal Fusion Network for Remote Sensing Visual Grounding | IEEE Journals & Magazine | IEEE Xplore