LIVEnet: Linguistic-Interact-With-Visual Engager Domain Generalization for Cross-Scene Hyperspectral Imagery Classification | IEEE Journals & Magazine | IEEE Xplore

LIVEnet: Linguistic-Interact-With-Visual Engager Domain Generalization for Cross-Scene Hyperspectral Imagery Classification


Abstract:

Domain generalization (DG) has led to remarkable achievements in cross-scene hyperspectral image (HSI) classification. Inspired by contrastive language-image pretraining ...Show More

Abstract:

Domain generalization (DG) has led to remarkable achievements in cross-scene hyperspectral image (HSI) classification. Inspired by contrastive language-image pretraining (CLIP), the language-aware DG method has been explored for cross-scene HSI classification with language prior knowledge. However, existing methods face some challenges: 1) the weak capacity to extract long-range contextual information and interclass correlation and 2) due to the inadequacies of the special pretraining on HSI data, the spatial-spectral features of HSI and linguistic features cannot be straightforwardly aligned. To tackle those dilemmas, a novel network has been proposed with a CLIP framework, which consists of an image encoder, based on an encoder-only transformer to obtain the global contextual information and interclass correlation, a frozen text encoder, and a cross-attention mechanism, named linguistic-interact-with-visual engager (LIVE), enhances the interaction between two modalities. Extensive experiments demonstrate superior performance over state-of-the-art (SOTA) methods with a CLIP framework, with 83.39% and 83.94% in OA, on the UH dataset and Pavia dataset, respectively.
Published in: IEEE Geoscience and Remote Sensing Letters ( Volume: 21)
Article Sequence Number: 5506805
Date of Publication: 03 June 2024

ISSN Information:

Funding Agency:


Contact IEEE to Subscribe

References

References is not available for this document.