Abstract:
The construction of a basic model to extract generalized features from a large number of multimodal data is a new challenge in the field of remote sensing. Compared with ...Show MoreMetadata
Abstract:
The construction of a basic model to extract generalized features from a large number of multimodal data is a new challenge in the field of remote sensing. Compared with natural scene images, When faced with a complex application scenario of remote sensing of multi-sensor acquisition, models that are suitable for a specific task are difficult to generalize to new scenarios. In this paper, we propose a model architecture based on the concepts of multi-domain representation and cross-domain fusion. By extracting strong generalization features from massive multi-modal data, a single foundation model can accomplish generalization interpretation for multiple downstream tasks. Experimental results show that the proposed model performs well on multiple downstream tasks, which validates the feasibility of the remote sensing cross-modal foundation model in the interpretation task.
Date of Conference: 16-21 July 2023
Date Added to IEEE Xplore: 20 October 2023
ISBN Information: