Classiﬁcation of High-Spatial-Resolution Remote Sensing Scenes Method Using Transfer Learning and Deep Convolutional Neural Network

—The deep convolutional neural network (DeCNN) is considered one of promising techniques for classifying the high-spatial-resolution remote sensing (HSRRS) scenes, due to its powerful feature extraction capabilities. It is well-known that huge high-quality labeled datasets are required for achieving the better classiﬁcation performances and preventing overﬁtting, during the training DeCNN model process. However, the lack of high-quality datasets limits the applications of DeCNN. In order to solve this problem,inthisarticle,weproposeaHSRRSimagesceneclassiﬁca-tionmethodusingtransferlearningandtheDeCNN(TL-DeCNN) modelinafewshotHSRRSscenesamples.Speciﬁcally,threetypicalDeCNNsofVGG19,ResNet50,andInceptionV3,trainedonthe ImageNet2015,theweightsoftheirconvolutionallayerforthatoftheTL-DeCNNaretransferred,respectively.Then,TL-DeCNN justneedstoﬁne-tuneitsclassiﬁcationmoduleonthefewshotHSRRSscenesamplesinafewepochs.Experimentalresultsin-dicatethatourproposedTL-DeCNNmethodprovidesabsolutedominanceresultswithoutoverﬁtting,whencomparedwiththe VGG19,ResNet50,andInceptionV3,directlytrainedonthefewshotsamples.


I. INTRODUCTION
W ITH the development of satellite remote sensing and computer technology, the spatial resolution and texture information of remote sensing image have been improved and corresponding processing approaches have been updated. High-spatial-resolution remote sensing (HSRRS) image with higher spatial resolution and abundant texture details have been performed well in object identification, classification, and information extraction [1]- [3]. In recent years, a lot of HSRRS images have been acquired and significant efforts have been made for land use land cover (LULC) scene classification in the field of pattern recognition [4], [5]. These approaches extract features first from training data and then build a classification model for testing other data. Most of the recognition methods are based on deep learning.
Deep learning has been successfully applied in extraction of abstract and semantic features [6]- [12], and it performs well in target identification, object detection, and classification. The convolutional neural network (CNN) is one of typical deep learning algorithms, and many types of algorithms based on CNN (e.g., ResNet, VGG, Inception) have been developed in computer vision, natural language processing, medical, and remote sensing image processing [13]. These practical applications indicated that the depth of a network is vital for the model, when adding layers to the network, it can extract more complex features. While the model with a deeper layer will obtain better performance and training CNN model, especially the deep CNN (DeCNN) model often requires a lot of labeled data. However, it is hard to obtain a huge amount of labeled data to train the DeCNN model for HSRRS scene classification problem. In addition, it takes a lot of manpower and resources to label the HSRRS data. When the size of labeled data is not large enough, the trained DeCNN model easily show an overfitting problem. Several studies have shown that transfer learning get a good performance in classification and recognition for small scale training data [14].
In this article, we propose a transfer learning and DeCNN (TL-DeCNN) model-based classification method to reduce the overfitting problem and improve the classification accuracy with limited labeled samples. Specifically, three typical DeCNN models, i.e., VGG19, ResNet50, and InceptionV3, are combined with transfer learning, respectively. And these combined algorithms This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/ are called TLVGG19, TLResNet50, and TLInceptionV3. To assess the performance of TL-DeCNN for few shot HSRRS scene classification, the retraining and testing accuracy, loss, confusion matrix, overall accuracy (OA), and kappa coefficient (KC) are used. The main contribution of this article includes the following three aspects.
1) The DeCNN-based HSRRS image scene classification method is presented in a few shot samples. We train three CNN models, i.e., VGG19, ResNet50, and InceptionV3, in a few shot samples and evaluate their accuracy. Experiment results show that InceptionV3 is the best mode among the three models. 2) TL-DeCNN-based HSRRS image scene classification is proposed in limited labeled samples case. Our proposed TL-DeCNN model is trained in a limited labeled HSRRS scene samples in a few epochs by considering fine-tune.
3) The DeCNN-based scene classification method is also considered as a benchmark method using large amount of labeled HSRRS images. The rest of this article is organized as follows. An overview of CNN-based HSRRS image scene classification and transfer learning-based application is presented in Section II. The proposed architectures based on the DeCNN model and TL-DeCNN for HSRRS image scene classification with small and large amounts of labeled data are given in Section III, respectively. In Section IV, the HSRRS image preprocessing, the architecture based on VGG19, ResNet50, and InceptionV3 for scene classification will be described, respectively. Also the evaluation indexes of the classification model will be described. Following, the results of HSRRS scene classification with DeCNN and TL-DeCNN with a few shot samples and quantitative indicators are described in Section V. Meanwhile, the results of the large amount of labeled HSRRS image scene classification based on DeCNN are compared with that of TL-DeCNN with few shot. Finally, some concluding remarks are drawn in Section VI.

II. RELATED WORK
HSRRS image scene classification problem can be extracted subregions into different semantic classes, and it is a fundamental task and significant application for remote sensing, such as urban planning, object detection, and natural resource management. Many recent works have demonstrated that CNN is the most successful and widely applied deep learning method, and has been applied to make HSRRS image scene classification task [16]- [19]. Especially, the DeCNN performs well in semantic features extraction with a lot of convolutional layers and a large amount of training dataset. However, it is difficult to train a DeCNN model with a few samples.
HSRRS image has higher spatial resolution and fewer spectral channels compared with a coarse or medium spatial resolution remote sensing data, and it is more difficult to identify subtle differences among similar land cover types. Meanwhile, the phenomenon "same object represents different spectrum" and "same spectrum belongs to different objects" of HSRRS image leads to the failure in solving lots of classification tasks with high-accuracy demand. Tremendous efforts have been made to develop robust and automatic image classification methods. Machine learning approaches (e.g., support vector machine, random forest, k-nearest neighbor, and multilayer perception) have been used widely in HSRRS image classification, and lots of achievements have been gained [20]- [22].
Recently, deep learning has represented the state of the art in a variety of domains, and CNN as a typical deep learning method, has obtained excellent results in the field of computer vision [23], wireless communications [24], [25] and remote sensing image processing [15], [16]. HSRRS image scene classification based on CNN has achieved excellent results recently. Penatti et al. evaluated the generalization power of CNN features from fully connected layers and obtained a state of the art result with a public HSRRS image data set [26]. Feature fusion strategies to integrate the multilayers features to CNNs for HSRRS image scene classification have been proposed to complete the classification tasks [16], [27]- [30]. Gong [33]. The early works have achieved excellent results in HSRRS image scene classification with a fully training CNN model. However, training a CNN model needs a considerable amount of labeled dataset, which is rather difficult for HSRRS images. Many efforts have been made to add the training samples or improve the robustness of CNN, including data augmentation, detecting adversarial perturbations [34], increasing the depth of CNN and transferring the pretrained CNN model or knowledge into a scene classification task [35].
Transfer learning is an important solution for improving the robustness of CNN-based classification models. Zhang et al. based on the features of adjacent parallel lines searched for regions of interest and confirmed the final targets through transfer learning on the AlexNet [37]. Li et al. proposed a best activation model in the end-to-end process for LULC image classification [4]. Nogueira et al. proposed a method by transferring parameters from a pretrained network and retrained the new network without parameter selection [38]. Zhao et al. combined the pretrained AlexNet with a multilayer perception structure to make classification [39]. Huang et al. constructed a semitransfer DeCNN to make image classification [40].

III. PROPOSED TL-DECNN-BASED METHOD
Deep learning-based HSRRS scene classification problem is still a challenge due to the limited labeled images. In this section, a robust classification method using TL-DeCNN is proposed. The architecture of our proposed TL-DeCNN based HSRRS image scene classification is shown in Fig. 1. We can see that the architecture can be divided into the following three steps.
1) The first step is training classification model based on Im-ageNet2015 and transferring the knowledge to the target classification task. 2) The second step is fine-tuning with limited labeled HSRRS images. 3) The third is the evaluation indicators for model and results. The goal of the architecture is to transfer deep knowledge from the ImageNet2015 to the limited training HSRRS image data in urban built-up areas scene classification, and improve the accuracy of classification.

A. Transfer Learning
Transfer learning is a popular training strategy to overcome the label-limited difficulty by initializing the training model with the parameters or knowledge, which have been learned from other large datasets. Through fine-tuning with a small amount of labeled data of the target task to obtain a better training model. Section II has shown that CNN performs well in semantic information extraction and scene classification for HSRRS scene classification and object identification, and the pretrained CNN model can be transferred to the current classification task. However, most of the research works focus on a shallow network with insufficient samples. And the DeCNN mostly focuses on object identification or classification with a large number of training samples, it needs a lot of labeled samples. When the depth of network increases, HSRRS image scene classification architecture may not be feasible. In order to solve this problem, we proposed TL-DeCNN-based HSRRS image scene classification methods with a few shot samples.

B. Knowledge Transfer From ImageNet2015 to HSRRS Scene Classification Task
This work is divided into the following three parts: model training based on ImageNet2015, feasibility of transfer learning between the ImageNet2015 and HSRRS scene classification task and the method for knowledge transfer. First, the architectures of DeCNN models, VGG19, ResNet50, and InceptionV3 are applied to extract features. Second, the applicable conditions of transfer learning are introduced. Finally, the extracted features are transferred into the HSRRS scene classification task.
1) DeCNN Training: DeCNN contains more than one layer of CNN to extract diacritical features and for accurate classification. A DeCNN usually is constructed by stacking several convolutional layers, pooling layers to form deep architecture [41]. CNN is one of the typical supervised learning methods, which need labeled data to learn and then make predictions for the unlabeled data. The input labeled data can be expressed as where x (i) represents ith feature of x, and the training data are formed by pairs of feature x i and output f i (x). Then, the training function can be expressed as The training method of CNN is similar with (2). And the goal of CNN is to learn mapping from input features to output, which is represented by a model in application. The model can be expressed as where θ is the parameters trained by CNN with samples S, and θ can be divided into two parts: θ=(θ F , θ CCE ). The former is feature extraction or learning and the latter is called classification cross-entropy (CCE) loss function, which is applied to make multi-category classification or prediction. Therefore, the equation can be written as whereT is the approximation of T , and the formulas of CCE is given as where C is the number of categories, y i is the true label of ith category, and f i (x) is the corresponding output of the model. Hence, features and classifiers will be got through CNN training with ImageNet2015. To extract deeper semantic features, VGG19, ResNet50, and InceptionV3 are applied to train the classification model, respectively. All of the training can be classified into two parts, feature extraction, and classifier. Since only the features or knowledge are useful for the following applications, the introduction of the approaches mainly focuses on the feature extraction.
2) VGG19: One of the most popular DeCNN models is VGG19, which is developed by Simonyan and Zisserman [42]. It is an influential DeCNN model, and it considers the depth of appropriate layers without increasing the total number of parameters. There are 16 convolutional layers and 3 fully connected layers in VGG19 with 3 × 3 convolution kernel size and 2 × 2 maximum pooling size. And a series of convolutional, max pooling, and rectified linear unit (ReLu) functions construct a convolutional block.
3) ResNet50: ResNet50 is one of the most common DeCNN for object detection and classification with a huge amount of samples, and it well resolves the degradation caused by the increasing number of layers in the network. It has been indicated that ResNet50 performs better in image scene classification than other CNN models in the ImageNet datasets [43]. The main idea of ResNet is to add a direct connection channel in the network, and it is called a highway network, which allows the original input information to be passed directly into the next layer. And its formula is given as where x l−1 and x l are the input and output features of the lth and (l + 1)th layers, respectively. w l is the weights associated with the lth layer of ResNet block. Each residual block consists of a series of layers, convolutional, batch normalization, pooling, and ReLU. And it can resolve the gradient degradation and overfitting problems very well.

4) InceptionV3:
InceptionNet is proposed to increase the depth and width of the network, and finally improve the performance of the neural network. InceptionV3 is one of the most popular InceptionNet for classification [36]. It introduces the idea of factorization into small convolutions and uses branches not only in the inception module but also in the branches, which can promote high-dimensional representations. Fig. 2 is the schematic diagram of transfer learning, and given a source domain D S and learning task T S , a target domain D T , and learning task T T , transfer learning is defined to help improve the learning of the target predictive function f T (·) in D T with the knowledge in D S and T S , where D S = D T , or T S = T T . What is need to be noted is that each domain is a pair D S = X S , P(X ) S and D T = X T , P(X ) T , the condition implies for the source and target tasks, either the term features are different or their marginal distribution are different. Similarly, the tasks have the same requirement. Therefore, it can be inclined to that when the domains are different, either the feature spaces are different or the feature spaces between the domains are the same but the marginal probability distributions are different. And the definition implies that when there is some relationship (overt or covert) between the feature spaces of the two domains, the source and target domains are considered related, and transfer learning can be carried out between the two domains.

5) Transfer Learning-Based Method:
There are three topics in transfer learning, the first one is what to transfer, the second is how to transfer, and the third is when to transfer. What to transfer means which part of knowledge can be transferred across domains or tasks. How to transfer means developing algorithms to transfer the knowledge and when to transfer asks in which situations, transfer learning should be done. In this article, we aim to achieve a good performance in the target HSRRS scene classification task by transferring knowledge from the source ImageNet2015 task, and as there are labeled data both in source and task domains, it belongs to the inductive transfer learning setting [44]. Meanwhile, the preliminary trained model based on DeCNN with ImageNet2015 is also geared to deep transfer learning. Compared with the nondeep approach, deep transfer learning automatically extracts more expressive features and meets the requirement of end-to-end in practical applications [44].

C. Fine-Tuning for HSRRS Image Scene Classification Task
Fine-tuning is the process to initialize the HSRRS scene classification task network with the trained knowledge, which is transferred from the ImageNet2015. And the model is trained with the labeled HSRRS images further, the adjustment of parameters is the same with that in scratch training. It requires the layer of the initial network is the same with that of the source network, including the same layer name, types, setting parameters, and so on. The fine-tuning is a vital process for HSRRS scene classification, not only make the network converge as quickly as possible but also make generic features contribute to a specific task. Compared with the learning rate in model training with ImageNet2015 (0.005), the fine-tuning learning rate is smaller (0.001), this setting could improve the accuracy of the HSRRS scene classification.

D. Accuracy Verification
The evaluation metrics include confusion matrix, OA, KC, and precision. The confusion matrix is the most commonly used indicator for evaluating the performances. The OA is an indicator for evaluating the proportion correctly classified. The KC calculated using the confusion matrix is applied to check consistency and evaluate classification precision. It considers not only the OA but also the imbalance of the number of samples in each category. The precision is an indicator measuring the accuracy of each class, and it means the number classified into a certain class, which actually belongs to the true class.

IV. EXPERIMENTS
In this section, to check the performance of the proposed TL-DeCNN, experiments have been conducted on three aspects. The first one is few shot HSRRS image scene classification based on VGG19, ResNet50, and InceptionV3, respectively. The second one is limited labeled HSRRS image scene classification based on TL-DeCNN, which means transferring the knowledge trained by VGG19, ResNet50, and InceptionV3 based on Ima-geNet2015, to the target limited labeled HSRRS image dataset to make classification, respectively. And the third one is a large amount of labeled HSRRS images for scene classification based on VGG19, ResNet50, and InceptionV3, respectively.

A. Data Description
The HSRRS images collected in urban built-up areas are extracted from the UC merced land use dataset [45] and the remote sensing image classification benchmark dataset [46]. There are ten categories of objects needed to be classified in our experiments, and the sample size of training and testing for few, TL-DeCNN-few and large amount labeled samples are  shown in Table I, respectively. All of the testing sample sizes are the same, and it is 100 samples for each category. The few and TL-DeCNN-few amount of labeled samples for training is randomly selected in the large number of labeled samples. The training samples for TL-DeCNN-few not only contain the few HSRRS image samples but also include the knowledge transferred from the ImageNet2015. Therefore, it combines the prior knowledge with the target to make an identification. It is noticed that effective data augmentation has been made for all of the labeled samples to enlarge the number of training samples, increase their diversity, and enhance the generalization of the trained model [16].

B. HSRRS Image Scene Classification With a Few Shot Samples
In this experiment, VGG19, ResNet50, and InceptionV3 are applied for HSRRS image scene classification in few shot case, respectively.
1) VGG19: There are 16 convolutional layers mainly using 3 × 3 convolutional kernels and 3 fully connected layers. The combination of convolutional, BN and ReLu layers constructs a convolutional block. The max-pooling layer is applied in every two or three convolutional blocks. And the convolutional blocks are followed by the dense layers, which are set as 4096, 4096, and 10 in our experiment. Finally, the softmax is applied to make a classification. The accuracy and loss in the training and testing stages are shown in Fig. 3(a). It is easier to see that the accuracy in training is nearly to 100% and that in testing is lower than 40%. Meanwhile, the loss is close to 0 and fluctuating around 8 in training and testing stages, respectively, which means the VGG19 model is overfitting in HSRRS image scene classification with limited labeled samples.
2) ResNet50: As illustrated in Fig. 1, the limited labeled HSRRS images are input into the ResNet50 model. And the accuracy and loss in training and testing phases are shown in Fig. 3(b). It can be seen that the training accuracy is nearly to 100%, and the testing accuracy is about 75% after training and testing process is stabilized. Meanwhile, the training loss is nearly to 0, and the test loss is larger than 2 when the model is stable. Compared with the accuracy and loss of VGG19, ResNet50 obtains a better performance, which reduces the overfitting phenomenon to some extent. However, the ResNet50 proposed for HSRRS scene classification with few shot samples still demonstrates a certain overfitting problem.
3) InceptionV3: To solve the overfitting problem further, InceptionV3 is applied to the limited labeled HSRRS scene classification task. As described in Section III, the idea of Incep-tionV3 is the factorization, which promotes high-dimensional representations. The accuracy and loss during training and testing stages are shown in Fig. 3(c). It shows that the accuracy is 100% and 83.0% in training and testing after stabilization, respectively. And the loss is 0 and about 1.8 in the training and testing phases, respectively. Compared with the accuracies and losses of VGG19 and ResNet50, the InceptionV3 is better in solving the overfitting problem. But the testing result is still much worse than that of training, and there is still overfitting for the InceptionV3 model with few shot samples.

C. TL-DeCNN-Based HSRRS Image Scene Classification Method
The TL-DeCNN is proposed to solve the overfitting problem with limited training HSRRS images. Similar with that of few shot experiments, TL-DeCNN experiment is carried out based on limited labeled HSRRS image and knowledge transferred from ImageNet2015. Three typical DeCNN models VGG19, ResNet50, and InceptionV3 are considered in this experiment.
1) TLVGG19: The architecture of HSRRS scene classification based on transfer learning and VGG19 (TLVGG19) model can be seen from Fig. 1. The knowledge trained by VGG19 with ImageNet2015 is transferred to the limited labeled HSRRS scene classification task. The accuracy and loss during the task training and testing are shown in Fig. 4(a). When the process is stabilized, the training accuracy is 100%, and the testing accuracy is 90.0%. Meanwhile, the training loss is 0, and the testing loss is nearly to 0.25. Compared with that without transferred knowledge, the HSRRS scene classification task based on TLVGG19 performs better in accuracy and loss. The testing accuracy increases from about 40% to 90%, and the testing loss decreases from about 8 to 0.25. It demonstrates that the proposed approach can greatly reduce the effect of overfitting problems with limited labeled HSRRS images.
2) TLResNet50: In few shot HSRRS image scene classification task, the architecture of transfer learning based on ResNet50 (TLResNet50) is also shown in Fig. 1. Similar to TLVGG19, the architecture transfers the knowledge trained with ImageNet2015 to the target HSRRS scene classification task. The accuracy and loss during task training and testing are shown in Fig. 4(b), the training accuracy is 100% and the testing accuracy is about 96.0% when the processes are stable. The loss is 0 and 0.18 in the training and testing phase after the process stabilized, respectively. Compared with that without transfer knowledge, the testing accuracy increases about 21.2%, and the loss decreases about 91%. This result indicates that the TLResNet50 solve the effect of overfitting problem well with limited labeled HSRRS image.
3) TLInceptionV3: The architecture of transfer learning combined with InceptionV3 (TLInceptionV3) for limited labeled HSRRS image is also shown in Fig. 1. The accuracies and losses in training and testing processes are shown in Fig. 4(c). After the process is stabilized, the testing accuracy and loss is about 92.4% and 0.26, respectively. Compared with the In-ceptionV3 without transferred knowledge, the testing accuracy increases by 9.4%, and the testing loss decreases from 1.8 to 0.26, which indicates that the approach we proposed is effective in solving the overfitting problem with limited labeled HSRRS images.

D. HSRRS Image Scene Classification in a Large Number of Labeled Samples
From the abovementioned experiments in Sections IV-B and IV-C, it has been found that the TL-DeCNN architectures, including TLVGG19, TLResNet50, and TLInceptionV3 are efficient and effective in solving the overfitting problem. However, whether the accuracy and loss of TL-DeCNN can compare with that of a large number of labeled samples based on DeCNN. This experiment is carried out with augmented HSRRS images using VGG19, ResNet50, and InceptionV3, respectively. 1) VGG19: As described in Section IV-A, there are more than 1064 samples (the size of the fewest samples is 266, and geometric transformations have been applied for data augmentation) for training in each category in the large amount of labeled data experiment. The accuracies and losses in training and testing are shown in Fig. 5(a), and it can be seen that the testing accuracy is about 90% and the testing loss is about 0.38, which is similar with that of TLVGG19. Therefore, it indicates that compared with the VGG19-based HSRRS scene classification trained with a large number of labeled samples, the TLVGG19 with few shot samples could obtain similar results, and reduces the effect of overfitting problem.
2) ResNet50: The ResNet50 is suitable for scene classification with a large number of labeled samples. The accuracies and losses in training and testing are shown in Fig. 5(b). After about ten epochs, the testing accuracy and loss are stable, and the testing accuracy is close to 98% and the testing loss is nearly to 0. Compared with the testing accuracy and loss in TLResNet50 with few shot samples, ResNet50 architecture with a large number of labeled samples is better for HSRRS scene classification task. It demonstrates that the transfer learning contributes to the classification task, and the performance of TLResNet50 with limited labeled samples is inferior to the approach based on ResNet50 with large amount of labeled samples.
3) InceptionV3: The InceptionV3 is a typical DeCNN for the extraction of deep features. It is good at extracting deep features from a large number of labeled samples. The accuracies and losses of InceptionV3 with a large amount of labeled HSRRS images in training and testing are shown in Fig. 5(c). It can be seen that after about 15 epoches, the testing accuracy and loss are stable, and the former is stable around 99.3%, the latter is stable around 0.1, which is better than that in TLInceptionV3.

V. RESULTS AND DISCUSSIONS
First of all, we present the confusion matrix of each DeCNN classifier. Fig. 6    respectively. Fig. 8(a) shows the OA and KC of DeCNN and TL-DeCNN with fine-tuning. From the figure, we can see that the transferred knowledge improves the OA and KC for TL-DeCNN classification models. Transfer learning improves the OA of VGG19 (increases by 53.1%) most obviously, and has the least effect on the OA of InceptionV3 (increases by 5.4%). Meanwhile, for few shot learning, InceptionV3 obtains the best OA and KC, and after adding the transferred knowledge the TLResNet50 gets the best performance in OA and KC. The performance of InceptionV3 is better with few shot samples is complementary for the argumentation that the performance of InceptionV3 is better than that of VGG19 and ResNet50 with abundant labeled data in [47]. And Fig. 8(b) is the corresponding OA and KC without fine-tuning for the three TL-DeCNN models, it may indicate that fine-tuning is a key step for ensuring forward transfer learning.
Then, the precision of each category with VGG19, ResNet50, InceptionV3, TLVGG19, TLResNet50, and TLInceptionV3 is labeled in Table II. VGG19 obtains the lowest precision 9.0% for "road" identification, at the same time ResNet50 and In-ceptionV3 gets 99.0% and 91.0% precision for the same class.
The same phenomenon appears in "roadside tree" and "marina" classes, and it may indicate that ResNet50 and InceptionV3 perform better for these objects identification. When the transferred knowledge is considered, the precisions for each category obtained by TLVGG19 are greatly improved, and the category with the greatest growth is "avenue," from 24.0% to 99.0%. Compared with VGG19, ResNet50 obtains better precision for all categories. The lowest precision is "bridge" 46.0%, and after the knowledge transferred into the model, the precision increases to 96.0%. Similar to the situation of VGG19, when the transferred knowledge is considered, the precisions of all categories are improved. The lowest precision is 41% of the InceptionV3 model for "bridge" identification. After the knowledge transferred into the architecture, the precision increases to 83%. Most of the precisions are improved, but the precision of the "airport" category decreases from 98% to 89%. It may be caused by the transferred knowledge that is extracted from huge airport information in ImageNet2015. The transferred knowledge contains intricate airport information, which is not similar or the same with our task "airport" in features. In short, the transferred knowledge improves the precisions of most of the categories for DeCNN scene classification tasks.
Finally, to evaluate the performance gap between TL-DeCNN based on limited labeled samples and DeCNN based on a large amount of labeled HSRRS images, the VGG19, ResNet50, and InceptionV3 are applied to make HSRRS scene classification with a large amount of labeled samples, respectively. The OA and KC is 96.1%, 97.1%, 99.4%, 0.956, 0.968, and 0.993 for VGG19, ResNet50, and InceptionV3, respectively. It is obvious that the OA and KC are both larger than that obtained by TL-DeCNN, among which the InceptionV3 obtains the best result for a large number of labeled samples, and for few shot samples TLResNet50 is the best architecture.

VI. CONCLUSION
In this article, three TL-DeCNN models, i.e., TLVGG19, TLResNet50, and TLInceptionV3 are proposed for HSRRS scene classification in urban built-up areas. The main contribution of our work is to solve the overfitting and gradient disappearance problems with limited labeled HSRRS images. Three experiments have been carried out: the first one is the DeCNN-based HSRRS scene classification with few shot; the second one is the TL-DeCNN-based scene classification with the same few shot; and the third one is DeCNN-based HSRRS scene classification with a large number of labeled samples. Experimental results show that for few shot HSRRS scene classification, all of the three architectures TLVGG19, TLResNet50, and TLInceptionV3 greatly improve the performance compared with those without transferred knowledge. And the ResNet50 is more suitable for transfer learning applications compared with VGG19 and InceptionV3, and InceptionV3 could reduce the overfitting and gradient disappearance problems to a certain degree and it performs better with few shot. Meanwhile, DeCNN-based HSRRS scene classification with a large amount of labeled HSRRS images show that their performance are better compared with TL-DeCNN with few shot. It indicates that there is still space for improvement of the classification performance for TL-DeCNN with few shot samples.