I. Introduction
Hyperspectral images (HSIs) contain rich spatial and spectral information, which provides discriminative features for ground objects and has been widely used in geological exploration, target detection, and military reconnaissance [1], [2], [3], [4]. HSI classification (HSIC) is a fundamental task aiming to assign a specific category to each pixel [5], [6], [7]. Over the past decades, deep learning (DL) has shown a remarkable ability to extract effective and hierarchical features for different visual tasks. Models such as stacked autoencoders (SAEs) [8], deep belief networks (DBNs) [9], and CNNs [10], [11], [12], [13], [14] have demonstrated satisfactory performance on HSIC. It is well known that the performance of DL-based methods heavily relies on the sufficient number of labeled samples for each class [15], [16], while obtaining labeled samples of hyperspectral image data is expensive and hard.