Geometry-Preserving Perceptual Feature Selection for Categorizing LR Aerial Photographs

There are plenty of high- and low-altitude earth observation satellites asynchronously capture massive-scale aerial photographs everyday. In practice, high-altitude satellites take low-resolution (LR) aerial pictures, each covers a considerably large area. Comparatively, low-altitude satellites capture high-resolution (HR) aerial photos, each depicts a relatively small area. Accurately mining the LR aerial images’ semantic clues is a significant task in pattern recognition. However, it is also a challenging task due to: 1) the inefficiency to label adequate training samples, and 2) the difficulty to describe how humans preserving the world. To handle these problems, this work presents a so-called perceptual feature selector (GPFS) that optimally preserves samples’ geometry, aiming at sufficiently discriminative perception-based representations to classify LR aerial photos. Particularly, by stimulating how humans sequentially perceiving different salient regions, we design a low-rank algorithm to divide an LR aerial image into a succinct set of attractive regions as well as a rich set of non-attractive regions. Such algorithm is able to: 1) generate a path well captures human gaze allocation, and 2) engineer the deeply-learned visual descriptor for the above gaze shifting path (GSP). Subsequently, the so-called GPFS is designed to obtain a set of high quality GSP features. GPFS is built upon a semi-supervised framework. Herein, we only require a small proportion of LR aerial images to be labeled. Besides, the labeled/unlabeled sample distribution are optimally preserved during feature selection (FS). Employing those refined features, we learn to categorize LR aerial photos. Plenty of empirical results shown the superiority of the proposed algorithm.


I. INTRODUCTION
Due to the significant progresses in space science and engineering, and distant communication, a considerable amount of satellites that observe earth were launched recently.Generally, we can conveniently categorize those satellites into two types: the high-as well as the lowaltitude satellites.In practice, those high-altitude satellites cover a remarkably larger region than the low-altitude ones.Practically, accurately discovering the semantics toward the low-resoltion (LR) aerial photographs is becoming a useful component in lots of artificial intelligence systems.
The associate editor coordinating the review of this manuscript and approving it for publication was Mohammad Shorif Uddin .In the literature, dozens of visual categorization/annotation algorithms were proposed for describing aerial imageries have various resolutions.Well-known models can be categorized as: 1) MIL (multiple instance learning)/CNN-guided region localization by leveraging weak supervision [42], [43]; 2) semantically-aware graph models for parsing [5], [6]; and 3) elaborately-made hierarchical models for annotating aerial photos [7], [8], [9].However, to our best knowledge, the current techniques fail to accurately represent aerial images with low-resolutions because of the following factors: • Practically, we notice that there exists many attractive objects or their parts in an aerial image, as exemplified in Fig. 1.To discover those semantic labels for each LR aerial image, a biologically-inspired algorithm is required to simulate human perceiving the the visually prominent regions.Practically, building a deep learning algorithm to jointly obtain the visually prominent regions and refine the visual representation to the above regions is difficult.Some possible challenges are: i) computing the path when human beings sequentially allocating their gazes onto the attractive image patches (such as the GSPs as presented in Fig. 1), 2) avoiding the inherent noisy labels from the massive-scale training samples, and 3) semantically encoding labels at image-level into various image patches in each LR aerial photograph; • Different from those aerial images with high resolutions, practically LR aerial images might have a relatively low visual quality.This is because low quality LR aerial photos are easily influenced by a bunch of external factors, such as the uncontrolable weathers.This brings a few annotated low-resolution (LR) aerial images combined with a large number of annotated highresolution (HR) counterparts.Herein, the objective is to build a feature selection algorithm which is trained using partially-annotated aerial images with LR.Obviously, this is an uneasy.Multiple difficulties involve exploiting the inherent relationships of LR as well as HR aerial images in some high-order manifold space.Herein, a new GPFS is proposed which utilizes the visual perceptual knowledge deeply learned from HR aerial images to improve the categorization of LR aerial images.An overview of the above pipeline is presented in Fig. 2. We leverage the massive-scale HR and LR aerial images, part of which are unlabeled.The aerial image regions are mapped to the feature space.Then, to stimulate how humans understand different aerial photos, a low-rank algorithm is formulated to decompose an aerial image to a rich set of attractive patches as well as the unattractive background image patches.And simultaneously, the deep GSP features are computed.Toward a subset of high quality features cross aerial images with high/low resolutions, we propose a GPFS to select discriminative features.In this context, a few The novelties of this work are: a) a low-rank algorithm generating GSP from an LR aerial image and calculates the hierarchical visual representation jointly; and b) the GPFS that selects multiple high quality features cross HR and LR aerial photos, and thereby the sample distribution can be nicely preserved.

II. PREVIOUS WORK RELATED TO OURS
Lots of computational visual models were proposed for analyzing aerial photos.To semantically model the entire image, the authors [44] proposed a topology-based visual representation for describing binary region-wise linkage in different aerial photographs.Thereby, a kernel-guided feature representation is computed to globally capture each aerial image for the subsequently recognition.The authors [46] presented a weakly-labeled training framework which annotates HR aerial photo semantics at imagelevel.The authors [47] carefully combine the well-known random forest and object-guided visual representation learner to classify remote sensing images.Sameen et al. [48] designed a multi-layer visual model for calculating the multi-labels for different HR downtown aerial images.In [45], researchers deployed a pre-specified five-layer CNN for classifying high-definition remote sensing images.They proposed a novel domain-level scenery to carefully adjust the aforementioned deep model.In [31], the authors designed a multi-modal learning algorithm to simultaneously annotate the HR aerial imagery.The authors [10] designed a novel inter-attentional algorithm to calculate the aerial photos' representation's weights.In conclusion, the above image-level visual models are practically utilized for classifying multi-resolution aerial images.They cannot optimally handle LR aerial image modeling because of the unavoidable blurred tiny but discriminative objects.To precisely capture discriminative objects with multiple scales, we require an effective region-level modeling technique.In this way, we can precisely localize those tiny/small objects inside each LR aerial photo.In [55], the authors designed a so-called group sparsity regularizer for enhancing robustly recognize human faces.They proposed an upper-bounded function to upgrade the l 1 -norm to seek sparsity.This can optimally tackle the negative influences of bias and outliers.Further in [56], the authors formulated the incomplete multi-view clustering into a incomplete similarity graphs upgradation and complete tensor representation learning task.
To characterize an aerial image regionally, researchers [4] designed a multi-layer deep learner for detecting attractive objects with different scales.In [1], researchers formulated a focal-loss-based deep model to accurately localize various cars within each LR&HR aerial photographs.In [49], the authors designed a geographic object detection model to handle HR aerial images by intelligently extracting intersections as well as roads.In [13], the authors proposed to combine feature engineering and soft-labels calculation to form an effective visual detector for modeling aerial images.Importantly, compared to the aforementioned techniques, our aerial image recognition method is biologically inspired and well reflects human visual perceptual process.In summary, the above region-level image models well exploit representative regions with multiple sizes from each LR aerial photo.However, they still have the following shortcomings: 1) these methods are usually designed for a specific image set, wherein some pre-specified domain knowledge are incorporated.Thereby, it might be difficult to adapt them to an unknown image set; 2) ideally we want a perception-guided region-level image model, where visually/smeantically salient regions are discovered for LR aerial photo representation.But the above models cannot explicitly discover these salient regions; and 3) the aforementioned models cannot select high quality features in a principled way.Meanwhile, the geometry structure among samples are not explicitly encoded during feature engineering.

III. THE APPROACH A. DEEP LOW-RANK FEATURE ENGINEERING
Practically, we notice the rich number of tiny/large-scale objects within an LR aerial photo.Psychological research [2] found that humans typically fixed onto multiple salient regions when they perceiving the world.In our scenario, for each LR aerial image, human eye typically fixes onto the ground salient regions.At the same time, the non-salient aerial image parts are generally unnoticed.This human gaze allocation observation is informative for recognizing LR aerial images.In our work, a deep low-rank method is designed to calculate those attractive image patches.And thereby the GSPs are built.Accordingly, we computer the corresponding the deep GSP features.
The fact of human visual understanding reflects that the non-salient background regions in an LR aerial image are closely related.Comparatively, those attractive foreground regions are generally non-correlated.Based on this, we divide the aerial image's feature matrix Y ∈ R U ×M into a few salient regions coupled with the non-salient ones, that is, Herein, M denotes the number of regions in each LR aerial images and U represents the feature dimension.X ∈ R U ×M contains the columns of these non-salient aerial image regions.Meanwhile, E ∈ R U ×M contains the columns of those salient aerial image regions.
To obtain a specific solution of (1), we propose two constrains to X and E respectively.Herein, we observe two facts.On one hand, there are a few aerial image regions, which are carefully perceived by human cognitive module.Mathematically speaking, this means that matrix E is sparse.On the other hand, the non-salient regions are highly correlated, which means that matrix Y is low-rank.In this way, the salient regions are calculated by incorporating a sparse and low-rank constraint, that is, Herein, ||•|| * denotes the matrix nuclear norm, l 1 (•) calculates the sparsity of a matrix, f (ϒ, Y) denotes non-salient regions within an LR aerial image, and l 2 (X, f (ϒ, Y)) calculates the cost of selecting the non-salient aerial image regions.(•) denotes a pre-defined regularizer.α, β, and η denote the non-negative parameters balancing the trade-off among the corresponding items.In this work, to maximally make E sparse, we define l 1 (•) as follows: Practically, Y's entities are all nonnegative.Herein, we set l 2 (u, v) = (u − v) 2 /2.In this way, (2) is updated as follows: To identify those regions that are unattractive from an low-resolution aerial image, we formulate a multi-layer semantic architecture f (ϒ, X).Specifically, it is comprised of L deep layers for some pre-defined transformations.We denote the deep feature corresponding to the top layer as h(Y i ) and Y i represents the column-level representation corresponding to an aerial image patch.Simultaneously, we fed the output of the existing layer into the subsequent layer.Formally speaking, this can be represented as: Herein, φ(•) represents the activator; g l (•) denotes the corresponding output.Z l and ξ l are respectively the matrix for transformation as well as the l-th layer's bias.The first layer's input is Y i .In way, the following equation can be obtained: Actually, the deeply-calculated feature h(Y i ) should be discriminative for identifying the unattractive aerial image regions.In practice, a linear transformation matrix is leveraged to refine the feature selection: where parameter set To solve the overfitting problem, a regularization term is proposed to alleviate complexity of our deep model.In this way, the regularizer (ϒ) is represented as: On the basis of the definitions in (3,8,9), we upgrade the objective function (4) as: min Herein, optimization of ( 10) is non-convex.In our implementation, we leverage the carefully-designed optimization in [3] for solution.Afterward, we denote Y * as the solution, each image region's saliency can be quantized as: where E * = Y − X * .E * (:, i) represents each column of E * .For each LR aerial image, the multiple top salient regions are sequentially connect to form the so-called GSP.Simultaneously, we obtain the deep GSP feature by row-wise concatenating the deep features calculated from its internal aerial image regions.The solution of our low-rank algorithm is detailed in [3].As shown, the time cost is largely determined by the iteration number when the convergence criteria is met.Due to the non-convexity of, we cannot strictly and mathematically derive the time complexity of the solution algorithm.Practically, we notice that the solution algorithm converges when the iteration number is between 150 to 200.In this way, the time cost is typically between 130 seconds to 178 seconds.Noticeably, such iterative solution is conducted during training.In the testing time, based on the learned parameters in the training stage, the low-rank algorithm is conduct very fast.

B. GEOMETRY-PRESERVING FEATURE SELECTION (GPFS)
Herein, it is natural to hypothesize that the entire LR aerial photos are unlabeled.Meanwhile, we labeled all the HR aerial images.We represent Simultaneously, when the u-th aerial image is unlabeled, l u is set as a long zero vector.
We denote Q ∈ R D×C as the mapping matrix for our feature selector, a standard FS is formulated through optimizing the following regularized empirical error: Herein, L is a loss function andR represents the regularizer.
As shown in Fig. 3, the similarity graph is denoted by E. Herein, each element E ij measures the differences of h i and h j .In our work, we have the following settings, that is, E ij = 1 if h i and h j are deemed as neighbors to each other, and otherwise we set E ij = 0. Also, matrix F is defined as a diagonal one, wherein each diagonal element is computed as Thereafter, we can calculate the corresponding Laplacian matrix as T = F − E.
In order to successfully mining all the training aerial images, the prediction label matrix is defined as ∈ R N ×C with respect to the entire training data based on the transductive learning theory.Herein, p i ∈ R C represents our calculated label toward sample x i .Moreover, we make P best fit both the ground-truth label as well as the aforementioned affinity graph.Formally, we calculate P using the following optimizing task: arg min P tr(P T TP) + tr((P − L) T V(P − L)), (13) where matrix V is diagonal.
Herein, ||X T Q − P|| F quantifies the cost and R(Q) is a regularization term that penalizes Q toward the optimal feature selector; σ ∈ [0, 1] and τ ∈ [0, 1] denote the importance the corresponding terms respectively.
Because of the sufficient sparsity as well as the nonconvexity constraint, we apply the l 2,p -norm toward the regularization term R(Q) for our feature selector SPFS (p ∈ (0, 1]).Therefore, we can reformulate the regularization term as: Herein, p is fixed to1/2.Details of the above optimization are presented in the following.We first set the derivative of ( 15) w.r.t P to zero, and thereby we obtain: After some derivations, we can reorganize the objective function (15) as: Herein, (17)'s derivative is set to zero, and then we obtain where )VY, and 2 .Based on the above derivations, the solution of ( 17) is briefed in Algorithm 1.

C. KERNELIZED REPRESENTATION LEARNING
We observe that our refined deep GSP representations are practically located onto the high-order kernelized feature space.In our work, we formulate a kernel-based quantizing technique for learning the visual representation for each LR aerial image.Specifically, given an LR aerial photo, we first collect the internal image patches for building the GSP.It is subsequently transformed into the deep visual representation for FS.Next, we accumulate those selected features corresponding to the i-th sample (LR aerial photo) is to form the kernelized vector.In particular, each entity of the vector v i is computed by: Herein d J (•, •) denotes the Euclidean distance of pairwise refined deep representations.

IV. EMPIRICAL EVALUATION
Herein, we evaluate the performance of the proposed LR aerial image classification using three experiments.Our collected aerial image set is introduced in the first place.It contains >3.7M LR and HR aerial images crawled based on 100 well-known metropolises from different countries.
Based on this, our algorithm is compared with 17 carefullydesigned visual recognition algorithms in multiple views: accuracy and stability.Afterward, we carefully explain the high performance advantage of our classification model using an ablation study.Lastly, we report our categorization accuracy of our method under different parameters.Based this, the optimal parameter settings are suggested.

A. THE LR&HR AERIAL PHOTO SET
By compiling our Internet-scale LR and HR aerial images (as the statistic results reported in Fig. 4), it is significant to annotate them at semantic level.Herein, 82 volunteers label Authorized licensed use limited to the terms of the applicable license agreement with IEEE.Restrictions apply.14.7% LR aerial images manually toward each large city in our data set.In this way, 47 different image-level labels were leveraged in total.Thereafter, a multi-class classifier like SVM is learned, which is subsequently deployed to label those image-level labels for the the remaining samples.
Next, these aforementioned volunteers carefully check the labels predicted by the learned classifier.We notice that many image-level labels have an ultra small set of aerial images.Thus, it is challenging to engineer an effective categorization algorithm toward the above labels.In this work, when the LR aerial photos corresponding to an image-level label is fewer than 220,000, then such label will be removed.Based on this, 18 types of labels are selected lastly.Afterward, 99.973% aerial images with different resolutions are with <4 labels.Comparatively, the remaining aerial images are with more image-level labels.The above aerial images typically have lots of tiny patches (< 210 × 210) which might be noisy.
In our approach, these aerial images are removed simply.For the last step, all the above LR&HR aerial images are ordered.
Given an image-level label, we use the first half samples for training while the rest are for testing.
In retrospect, one key advantage of our method is to robustly learn a categorization model from low quality labels.Some examples are shown in Fig. 5.To acquire the noisy labels for experimentation, for each category, we randomly use 60% aerial images to construct a training set.Based on this, we learn a multi-label categorization model, which is further leveraged to predict the entire LR&HR aerial photos' labels.In total, there are 11.3% mislabeled LR&HR aerial photos.They are combined with those correctly labeled ones to constitute our data set.
We observe that in our aerial image set, each sample typically takes up 200MB of storage space.Therefore, our 2.3 million LR&HR aerial photos will require a total of 460TB storage space.To optimally store such million-scale LR aerial photos for fast I/O interface, we employ the Supermicro server solutions. 1More specifically, we adopt the 4U double-sided super storage platform.The platform is installed with 36 Toshiba HDD drivers, each of which has a 20TB storage space.Totally, the entire storage space of our platform is 720TB and it works in RAID 0 mode.Based on this, the average sequential data reading and writing speeds are respectively 1467MB/s and 862MB/s on our storage platform.That means on average, it takes 0.137s and 0.232s to load and update each LR&HR aerial photo respectively.
In our implementation, we first pre-train our deep model using the well-known ImageNet.Then we fine tune the pre-trained deep model with batch size 64 in the end-toend mode on our complied aerial image set.We evaluate the single aerial image inference latency and analyze the frontier in their latency-accuracy trade-off.Herein, the fine-tuning is implemented using the TensorFlow Hub.

1) COMPARATIVE ACCURACIES
In this experiment, we test our categorization framework by comparing it with a rich set of baseline recognition algorithms.We first test our algorithm by comparing it with deep aerial image classification models.Thereafter, we employ multiple recent deep generic visual categorization algorithms for comparison.
First of all, we compare our method with seven deep visual classification algorithms [17], [18], [19], [20], [21], [22], [23] that optimally encode the domain experiences of multiple categories of aerial photos.Herein, the codes of [17], [18], [21], and [22] are publicly available.Based on this, we conduct comparative study, and the inherent settings remain unchanged.For [19], [20], and [23], we implement them since the codes are not provided.We re-implemented these classification algorithms by ourselves.We tried to make their performances similar to those in the original publications.
Meanwhile, our algorithm is also compared with multiple generic recognition models.Moreover, since LR aerial image classification can be considered as a sub-problem for scene categorization, we further conduct a comparative study between our method and three recently published scene classification models [16], [30], [32].For those self-implemented recognition algorithms, the empirical settings can be summarized in the following.For [19], we utilize the ResDep-128 [24] to function as the backbone.This is further updated into the multi-label variant.Different from the fully-linked layer (the number of units is fixed to 19), the rest deep layers are fixed by the above ResDep-128 [40].We deploy the ResNet-108 [24] as the backbone.The learning ratio as well as the decay are respectively fixed to 0.001 and 0.05.We calculate the loss of the entire network by leveraging the mean squared error.For [16], the well-known object bank [38] is adopted based on the carefully selected 18 LR aerial image classes.Herein, we used the average-pooling scheme.We utilized the liblinear as the solution to the linear classifier.And the 10-fold cross evaluation is applied.
For these aforementioned 18 baseline visual recognition algorithms, we test each algorithm multiple times.Accordingly, we present the average accuracies in Fig. 3. Besides, the corresponding standard errors are reported simultaneously.It is noticeable that the per-class standard errors calculated by us are much smaller than the competitors.This shown  that the high stability of our algorithm.Overall, we made the following observations: • As shown in Table 1, the proposed method performs better than the other aerial image classification algorithms significantly for the following reasons.First, these compared methods typically characterize low/medium resolution aerial photos.To accelerate the deep model learning process, they generally resize each photo to a significantly smaller one (e.g.224 × 224) for the subsequent deep modeling.Such operation is negative to learning an effective UHD aerial photo categorization model since those small but representative parts might be lost.Second, expect for the proposed pipeline, none of the competitors can implicitly correct the noisy imagelevel labels, which will inevitably hurt the categorization model training.Third, only our method uses graphlets to explicitly capture the complicated spatial layouts of each LR aerial photo.They are further incorporated by a deep hashing algorithm for calculating the discriminative image kernel.Comparatively, the seven counterparts only globally/locally describe an LR aerial image.And the discriminative spatial features are neglected.
• Moreover, our method is overwhelmingly competitive to the seven generic object recognizers owing three reasons.The above generic visual classification techniques conduct less effectively than ours owing to the following.First, the above approaches typically handle mid-sized photos usually including under ten million pixels.They cannot detect the tiny but discriminative parts from the hundreds of object components inside an LR aerial photo with over 100 million pixels.This case is particularly worse when the image-level labels are contaminated.Second, the proposed algorithm can conveniently integrate some domain experiences of LR aerial photo set, e.g., the maximum graphlet size and the category-specific object patches.Contrastively, the seven generic object recognition models cannot encode the domain knowledge reflecting UHD aerial photos.Third, by leveraging our noise-tolerant hashing algorithm, only our method allows a fast and accurate comparison of many discriminative object parts between LR aerial photos.Nevertheless, the seven generic object recognition models simply convert each LR aerial photo into a long feature vector for deep classification.They cannot achieve such precise region-to-region comparison like ours.
In this experiment, we compare our designed GPFS with a set of feature selectors in the aerial photo classification.They are information theory feature selection ITFS [50], CNN feature reduction (CNNFR) [51], feature selection for land cover classification (FSLC) [52], PCA feature reduction (PCAFR) [53], and CNN-based dimensionality reduction (CNNDR) [54].We present the comparative average categorization accuracies in Table 3.As shown, our method performs the best.This is because only of GPFS can optimally exploit sample's underlying relationships on the manifold, wherein the high-dimensional deep feature might be distributed on.
2) COMPARATIVE COMPUTATIONAL COST Practically, computational time at both the training and testing stages is a key criterion reflecting the performance of a categorization algorithm.As the comparative time cost presented in Table 2, at the training stage, we notice two categorization algorithms perform better than our method.The reason is that the architectures of [33] and [39] are overwhelmingly simple and effective.Simultaneously, it is observable that the per-class performance of [33] and [39] are about 4% lower than ours.Meanwhile, during testing, our proposed method is conducted much faster than its counterparts.It is worth emphasizing that, the training stage is performed offline, excellent testing response is much more valuable in practical AI systems.In retrospect, our LR aerial image classification framework includes two important components: 1) deep low-rank model for GSP generation, 2) our proposed GPFS, and 3) kernel SVM classification for category labels.During training, the time for each module is: 11h21m (module 1), 3h24m (module 2), and 7h32m (module 3).At the training stage, the computational time of each component is presented as follows: 212ms (module 1), 324ms (module 2), and 73ms (module 3).Noticeably, module 1 consumes most of the training time.In practice, by leveraging the Nvidia GPU acceleration technique, module 1 can be 100× accelerated by parallelizing the hundreds of graphlets extraction and vectorization.

B. ABLATION STUDY
First of all, we test our key theoretical contribution, the proposed DMCMF.Specifically, we analyze the four functional components as formulated in (7).The label noise refinement component is first abandoned (S11).Mathematically, the term ν||M − U|| 1 is removed and we update L into T. Afterward, the data graph updating term β 2 ||N − N 0 || 2 F ) is abandoned, wherein the remaining components keep intact (S12).Then, the binary hash codes constraint is removed and we maintain the rest terms unchanged (S13).Last but not least, the hierarchical feature engineering term is reduced to a flat one (S24) by setting F = 1.The results in Table 4 have shown that, label noise refinement and hierarchical feature learning models play the most important roles.This is because removing each will cause an > 6.4% classification accuracy drop.Moreover, abandoning the limitation of binary codes will bring a 4.522% accuracy decrement.Even worse, the time consumed at the test stage significantly increased by over seven times.This clearly shows the effectiveness and efficiency of adopting binary codes to characterize LR aerial photos.
Lastly, to demonstrate the usefulness of the kernel-based quantized vector calculated from each LR aerial photo, the following experimental setups are applied.We first use the aggregation-based deep network that accumulates the predicted category labels corresponding to the entire graphlets within an LR aerial photo.These labels are subsequently combined into the final image-level category label (S31).Thereafter, we replace our adopted linear kernel by polynomial kernel (S32) and Gaussian radial basis function (RBF) (S33) respectively.As shown in Table 4, aggregating the graphlet-level category label severely hurts the categorization accuracy.This is because calculating the category label at graphlet-level is sometimes obscure and misleading.In practice, each graphlet occupies very few regions within each LR aerial photo, and some regions correspond to the background areas irrelevant to a particular category.Besides, both polynomial and RBF kernels perform inferiorly than our linear kernel.This observation demonstrates that projecting the quantized vectors onto a linear space can better separate LR aerial photos from different categories.

C. PERFORMANCE BY PARAMETER ADJUSTMENT
We have two categories of adjustable parameters in our method.The first category contains those weights balancing different clues in the deep-low rank model.The second set contains parameters influencing deep feature engineering, i.e., the number of selected features and the number of deep layers F. In this experiment, we test the LR aerial image classification performance by varying the aforementioned parameters.
For the first parameter set, the inherent values of α, β, γ are fixed to 0.3, 0.1, 0.15.Herein, such default values are decided using 5-fold cross validation, which is based on an aerial image set containing 12000 samples.As the three curves displayed in Fig. 6, the six parameters all go up continuously to a high level.Thereafter, these parameters all go down stably.
Next, we evaluate the LR aerial photo categorization by changing the number of selected features and the number of deep layers F. As elaborated at the bottom of Fig. 6, when increasing K , the accuracy goes up shapely when K ∈ [1, 5] and then keeps stable when K > 5.Meanwhile, we notice that when K goes up, the time and storage costs increase dramatically since more graphlets will be generated.Toward an efficient and effective LR aerial photo categorization system, we set K = 5.Moreover, we notice the best recognition performance when there are four deep layers.To our best knowledge, too few deep layers will make the deeply-learned binary hash codes insufficiently discriminative.Meanwhile, too many deep layers will increase the number of deep model parameters, which inevitably causes deep model overfitting.
Herein, we represent the value of objective function ( 17) by varying the iteration number in Algorithm 1.As shown in Fig. 7, the objective function converges stably when the  iteration number increases.This shows the power of our designed GPFS.
Finally, we tune the percentage of labeled aerial photos from 10 to 100%, with the step of 10%.Noticeably, the labeled aerial photos are randomly selected.We repeat the experiment 20 times and the average categorization accuracies are reported.As shown in Fig. 8, our method can well handles aerial photo categorization when there are no less than 40% labeled aerial photos.That means, our method can maximally support less than 60% unlabeled LR aerial photos.To our best knowledge, such attribute is very useful for real-world LR arial photo categorization.

V. CONCLUSION
Identifying the category labels of LR aerial image is a useful method in deep neural networks [25], [26], [27], [28], [29].This work introduces a new LR aerial image recognition framework.Herein, deep GSP-based visual representations are calculated and updated subsequently using HR aerial photos.The proposed categorization pipeline contains multiple parts: 1) a multi-layer low-rank paradigm learning deep representations from multi-solution aerial 29472 VOLUME 12, 2024 Authorized licensed use limited to the terms of the applicable license agreement with IEEE.Restrictions apply.photographs, and 2) a new GPFS for effectively obtain qualified features.Extensive experimental results shown our method's effectiveness.

FIGURE 1 .
FIGURE 1. Top: the salient regions perceived by observers in an LR aerial image (as shown in the path) as well as the playground that is blurred.Bottom: multiple high-resolution aerial images represent the internal regions from the above LR one.

FIGURE 2 .
FIGURE 2. The flowchart of the proposed aerial image with LR.Given a collection of labeled/mislabled aerial photos, in the first place, we map the internal patches onto a manifold.Then, a deep low-rank model extracts the attractive regions from an LR aerial image, wherein the GSP are constructed accordingly.Also, the deep GSP features can be calculated simultaneously.Then, we use our GGPS to obtain sufficient represntative and low redundant features from the original deep GSP features.The selected feature are finally fed into a multi-class classifier for visual categorization.

FIGURE 3 .
FIGURE 3. The flowchart of classifying LR aerial image by leveraging the proposed GPFS, which can optimally preserve the geometry structure among samples during feature selection.
as the matrix of features, that is, each feature is represented by a D-dimensional vector of either an LR or HR aerial image at the training stage.Herein, the number N represents the sample number during training.Meanwhile, we represent L = [l 1 , • • • , y M , y M +1 , • • • , y N ] ∈ {0, 1} N ×C as the matrix containing the labels from the entire multi-resolution aerial images at the training stage.Herein, C denotes the number of different semantic class.In this context, l uv is used to represent the v-th label toward l u (1 ≤ v ≤ C).Besides, we have the following settings, i.e., l uv = 1 if the u-th aerial image comes from the v-th category, otherwise we set l uv = 1.
1 otherwise.Noticeably, V enforces the predicted category label matrix P maximally consistent with ground-truth one L. Based on (13), the graph Laplacian semi-supervised FS can be mathematically represented as: arg min P,Q tr(P T TP) + tr((Y − P) T V(Y − P))

FIGURE 4 .
FIGURE 4. Statistics of HR&LR aerial photos from our data set.

FIGURE 5 .
FIGURE 5.An example of the insufficient quality aerial photos with lots of fogs (left) and blurred areas (right).

FIGURE 6 .
FIGURE 6. Categorization accuracy by changing the three parameters (top) and K (bottom).

FIGURE 7 .
FIGURE 7. The objective function value by varying the iteration number.

FIGURE 8 .
FIGURE 8.The objective function value by varying the iteration number.

TABLE 2 .
Computational time of compared recognition algorithms (Highest Results are bolded).

TABLE 3 .
Comparative average categorization accuracies among the six FS algorithms.

TABLE 4 .
Performance enhancement and decrement by adjusting each module.