I. Introduction
With the rapid development of remote sensing instruments over recent years [1], [2], very high-resolution (VHR) remote sensing images are becoming increasingly available and bringing us the opportunity to try more research in military and civilian applications, such as natural disaster detection [3], [4], land-cover/land-use classification [5], [6] geographic space object detection [7], [8], geographic image retrieval [9], [10], urban planning, and environment monitoring. As we all know, VHR remote sensing images recognition based on the knowledge of domain experts has a high labor cost. Therefore, intelligent scene classification of remote sensing images [11]–[14], which categorizes scene images into different classes based on its semantic information, has drawn great attention in remote sensing field. Nevertheless, because of various classes of scenes and complex spatial information of VHR remote sensing images, how to effectively describe and classify the scenes is a pivotal and challenging task.