I. Introduction
In THE last few decades, hyperspectral image classification has been an incredibly active research topic with widespread applications [1]. However, classification of hyperspectral data is a challenge due to issues such as the high ratio of feature (spectral bands) to instance (training samples) and the redundant information in the feature set [2], [3]. In the past two decades, researchers have investigated a variety of approaches to alleviate such issues [4], [5].