1. Introduction
In recent years, deep learning-based supervised theories and methodologies have demonstrated excellent performance in hyperspectral image classification (HSI). The typical works include recurrent neural networks (RNN), generative adversarial networks (GAN), convolutional neural networks (CNN), to name a few. Especially for the most prevalent CNNs, Li et al. proposed an effective pixel pair feature-based CNN (PPF-CNN) [1] by combining the existed handful samples, which realized data augmentation for optimizing the classification result. Afterward, a diverse region CNN (DR-CNN) [2] was presented to excavate the abundant spectral-spatial information, which acquired improved performance. However, model overfitting and performance degradation arise with limited labeled samples as the network developing deeper.