Aircraft Classification Based on PCA and Feature Fusion Techniques in Convolutional Neural Network

The characterization of aircraft in remote sensing satellite imagery has many armed and civil applications. For civil purposes, such as in tragedy and emergency aircraft searching, airport scrutiny and aircraft identification from satellite images are very important. This study presents an automated methodology based on handcrafted and deep convolutional neural network (DCNN) features. The presented aircraft classification technique consists of the following steps. The handcrafted features achieved from a local binary pattern (LBP) and DCNN are fused by feature fusion techniques. The DCNN features are extracted from Alexnet and Inception V3. Then we adopted a feature selection technique called principal component analysis (PCA). PCA removes the redundant and irrelevant information and improves the classification performance. Then, Famous supervised methodologies categorize these selected features. We chose the best classifier based on its highest accuracy. The proposed technique is executed on the multi-type aircraft remote sensing images (MTARSI) dataset, and the overall highest accuracy that we achieved from our proposed method is 96.8% by the linear support vector machine (SVM) classifier.


I. INTRODUCTION
In public and martial applications, recognition of aircraft type from remotely sensed imageries has more importance. In this era, images can be found with high spatial resolution remote sensing by using modern technologies and equipment. With the progress in remote sensing technologies, the detail attributes of a target can be obtained due to the enhanced resolution. Characterization and identification of aircraft have accomplished research and investigational attention. It has a prodigious consequence in aerospace fields, applications, intellect evidence, and much more [1]. For civil purposes, such as emergency aircraft searching, identification of an aircraft, and airport scrutiny are extremely important [2,3].
In the early stages of researches, handcrafted features, like "SIFT" [4,5]and "HOG" [6], are some of the approaches that were used for the recognition of objects from remote sensing images such as aircraft, boats, houses and so on. Numerous methodologies are based on shape matching methods [7,8], like the grouping of an edge potential and artificial bee colony (ABC) methodology in [8] and the coarse-to-fine, suggested in [7] by employing the parametric shape representation. These technologies play a key part in the presentation improvements of aircraft recognition/acknowledgment. With the expansion and advancement of hardware efficiency, deep neural networks have revolutionized remote sensing satellite images. Deep convolutional neural network (DCNN) plays an important role and has been broadly applied in different fields such as segmentation [9,10], identification or detection [11,12], cataloging [13,14], etc.
Feature extraction is also a vigorous chapter for all computerized systems. The features merging and selection procedures present much devotion last couple of years in computer vision (CV), and various associated techniques are presented, expanding the system recognition correctness [15][16][17]. The synthesis of several features gives improved performance compared to a particular feature kind. The noteworthy advantage of feature fusion is to associate the facts of multiple descriptions, which enhances the complete system efficiency. The drawback of feature fusion is to upsurge the recognition period due to the accumulation of redundant and unrelated evidence. These types of issues are fixed by the features selection phases, which eliminates the redundant and unrelated evidence and only picks the top features [1,18]. Currently, deep learning (DL) illustrates more achievements in machine learning (ML) and CV research fields, particularly in surveillance jobs, classification [19], biometrics [20], satellite imageries [21], medical imaging [22]. In a convolutional neural network (CNN), the extractive features enclose both regional and global information.
Image recognition is the procedure of identification and acknowledgment of an element in digital pictures or videos. Object recognition in digital imageries would possibly start with pre-processing image procedures, for example, image enhancement, noise removal, that are treated by feature extraction to discover sections, lines, and possible zones with specific exteriors. Besides the composite structure, altered aircraft vary in figures, shading, scope, or colors, and even for one part of the airplane. The intensity and texture are typically dissimilar in various situations. Furthermore, recognition frequently suffers from several instabilities, for example, altered contrasts, cluttering, and anxiety inconsistency. Subsequently, the resistance to disruption and robustness are highly required for the methodology. Several approaches were applied on different datasets under different investigational situations. However, the datasets used are often not publicly available. That is why it is very challenging to reproduce the effort for comparison. A dataset called multi-type aircraft remote sensing images (MTARSI) [14] is now publicly available to solve this problem. There are 20 airplane types with 9,385 imageries with amalgamated backgrounds and dissimilar threedimensional (3-D) resolutions. In our investigation, after applying some prior processing on aircraft images, we apply handcrafted and some convolutional neural network algorithms on MTARSI dataset to improve classification accuracy. The global average pooling layer (APL) was utilized to figure the average of every feature plot from the preceding layer. We reduce the redundant information and irrelevant features by using the principal component analysis (PCA) technique [23,24]. After that, we perform the feature fusion technique. This technique associates the facts of multiple imageries and enhances the complete system efficiency. Some of the other classification methods that we applied are support vector machine (SVM) (Linear and Quadratic) [25], least squares SVM (LSSVM) [26], and knearest neighbors (KNN) [27].
There are various challenges for object cataloging in aerial images, which vitiates the system's accuracy. These challenges can be the similarity between multiple objects, illumination effects, resolution of aerial images, complex and transparent background. Several methodologies are presented in the literature, but there is scope to handle these types of challenges. The size of the dataset is also challenging in aircraft classification for the training of the models. Our work used the MTARSI dataset, which contains 20 classes of aircraft and ranges from 230~800 images per class. We present a methodology for aircraft classification by using deep learning techniques. The major contribution in our work is listed below:  Local binary pattern (LBP) features are computed for texture information of aircrafts  Feature extraction by CNN along with LBP  Feature selection by using PCA  After selecting the features, different classifiers are applied, and the best result is compared with existing techniques.

II. RELATED WORK
In this part of our work, we will discuss several datasets commonly used for aircraft recognition. In public and martial applications, recognition of aircraft from remotely sensed images has more significance. On the other hand, the datasets used are often not publicly available. That is why it is very challenging to reproduce the effort for comparison. At present, there are five popular datasets for aircraft identification named, University of California Merced land use (UCMerced_LandUse) [10], Pattern-Net [28], North Western Polytechnical University-remote sensing image scene classification (NWPU-RESISC-45) [9], fine-grained visual classification of aircraft (FGVC-Aircraft) [29] and MTARSI [14].
The UCMerced_LandUse [10] was established at the University of California, Merced. It is a widely used dataset in the area of satellite images, especially for object classification. The images were primarily selected from the US geological survey and then randomly cropped into 256 x 256 pixels. The spatial resolution is ~0.3m. The dataset consists of 21 different classes i.e., Airplane, Baseball, runway, residential, storage tanks, chaparral, forest, etc. The dataset contains 2100 images, 100 per class.
In 2017, a dataset, called NWPU-RESISC-45 [9], was created by North-western Polytechnical University, which is freely available as a benchmark for remotely sensed images scene classification. The RESISC-45 This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/ This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication.  [29] might be used to train the airplane recognition, identification, and segmentation algorithms. However, in these, the airplanes are used as a sub-category in the dataset. Now a dataset called MTARSI complied by Wu, Z.-Z., et al. [14] is now publicly available, which consists of a wide variety of aircraft. There are 9,385 imageries of 20 aircraft with an amalgamated environment and distinct space resolutions. In MTARSI [14], different methodologies have been applied to identify the aircraft type. The detail of the previously presented remote sensing satellite dataset is shown in Table I.
Many other pieces of research have been done on aerial images of aircraft, as in multiple class activation mapping (MultiCAM), which was used to pull out the diverse portions of aircraft of several styles [1]. Identification of aircraft was based on corner clustering, and CNN was proposed in [30]. Detecting small objects like aircraft from remotely sensed imagery using YOLOv3 achieved noteworthy detection performance with a small processing overhead [31]. Recognition of boats and airplanes in long-distance images by the composition of deep features attained from CNN [32]. Li et al. implemented an efficient aircraft identification agenda based on a reinforcement learning and CNN (RL-CNN) model in [33]. This method was used to correctly and quickly locate the airplanes in long distanced images.

III. PROPOSED METHODOLOGY
Our proposed methodology estimates the overall identification procedures and neural network tactics for airplane type identification on the MTARSI data package. It consists of the following three-step procedure: classical and CNN feature extraction, selection, and fusion of bestselected features. We apply PCA to the aerial images of aircraft after transferring them to the pre-trained DCNN models. PCA improves the classification performance. After that, we perform feature fusion on the dataset. We conduct a series of experiments with CNN, i.e., VGG16, AlexNet, Resnet, and inception. The flow diagram of our suggested model is shown in Figure 1. Our diagram indicates that the CNN and classical features are extracted from our dataset in parallel processing and choose the best features before the fusion stage. Lastly, we apply classifiers to our dataset to get the labeled images of aircraft.
As revealed in Figure 1, the input images are pre-processed and passed to the handcrafted feature LBP. Simultaneously, the dataset is given to CNN such as Alexnet and Resnet for feature extractions. Afterward, from these extracted features, we select the most robust feature by using PCA. After that, we performed feature fusion methodology on the obtained finest subset features. Finally, we perform different classifiers, i.e., This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/ This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication.

A. FEATURE EXTRACTION
Feature extraction is one of the main procedures in computer vision to demonstrate an object in the picture. The working of any automated technique depends on the number of extracted features. The robust and related features give improved accuracy, but the noisy or redundant features vitiate the system outcomes. We computed the LBP technique in the classical feature. Whereas the pre-trained model named Alexnet, inception V3 is utilized in CNN. The comprehensive description of these features explains below.

1) LBP
We extract LBP features from the greyscale images to handle the complications of illumination changes and simplify the complexity of originally extracted LBP features. Figure 2 depicts the working of LBP. It labels the pixels of an image by thresholding the neighborhood of each pixel and considers the result as a binary number. The central pixel is compared with each neighbor pixel and assigned a binary 1 or 0. If the value of the pixel is less than the central pixel, then the value of that pixel will be binary 1; otherwise, 0.
The achieved binary code can be written from the topmost first cell and moving to the right from the above figure, and the binary code will be Moreover, the light changes the pixel value of the image, but it does not change the binary pattern of a texture, as shown in Figure 3.

2) CNN FEATURE
The leading representation in dl is that of CNNs, which is assumed in an extensive range of facets in image handling, as well as in image categorization [39], super resolution restoration [40], object detection [41], etc. In CNN, we used the pre-trained CNN model name inception v3 and alexnet. Figure 1 depicts the two of the pre-trained models, i.e., alexnet and inception-v3. Alexnet consists of 5 convolutional layers and 3 fc layers (fully connected layers). Whereas, Inception V3 has 316 layers and 350 connections.
In these models, we apply several filters on the same layer for deep feature extraction. A CNN contains three key elements like convolution layer, pooling layer, and FC layers. Each part plays a diverse task. The working procedure of CNN is revealed in Figure 4.
Both of these models (Alexnet and Inception V3) are initially trained on a database of ImageNet [42]. Therefore we utilized their complete architecture by applying transfer learning notion and executing training on MTARSI dataset. We divide the dataset into 70:30 ratios for training and testing purposes. Then train Alexnet and Inception V3 on MTARSI dataset by utilizing transfer learning. Our proposed reduction, Traditional and DCNN feature fusion base model, is revealed in Figure 1.
In CNN, It is a very suitable way to extract automatically extreme connected features [43]. It gets input as where D denotes the number of convolutions, K represents the kernel size, and t represents the threshold value. Afterward, ReLu activation layer [44] is executed as follows: After that, one more layer, called pooling, is executed to diminish the dimensionality of the extracted attributes from the preceding layers. There are three sorts of pooling layers that are commonly used, i.e., maximum, minimum, and average [45]. The benefit of the pooling procedure is that we can attain comparative features. Figure 5 depicts the pooling procedure.
Lastly, the final layer of CNN architecture is the fully connected (FC) layer; its mathematical representation follows.
L out = ReLU (L in ) L in stands for input layer, where L out stands for the output layer. The input layer (L in ) is passed to the Relu activation function, and then the resultant layer is represented as L out .
The generalized equation of FC layer is as follows: Li in = Li-1 out Li out = Fi (Li in ) (9) where Fi denotes the activation function at the layer i.

B. FEATURE SELECTION
In AI and ML, feature selection is the method of attaining the minimum number of robust features from an innovative set with the least data loss. The researchers try hard to seek various methods to eliminate the glitches of massive amounts of data into little portions. The higher dimensional feature increases the computational cost, memory of algorithm, and accommodation. Therefore, an algorithm requires that it is effective enough to remove the redundant information. This algorithm may also handle the irrelevant feature. We used a selection method that removes the unrelated feature and reduces the unnecessary information in our work. Figure 5 shows the comprehensive feature extraction and selection process. The notation F1, F2 and F3 shows the extracted feature from Alexnet, Inception V3, and LBP. The notation N represents the entire number of images used for testing and training purposes. The features extracted from LBP, Alexnet, and Inception V3 are then passed to PCA based selection method. These selected features are then fused and carried out further classification. The detailed working of PCA is described below.
PCA is numerically a difficult procedure to accomplish this overview. The technique generates a novel set of variables called principal components (PCs). The entire PCs are orthogonal to each other, so there is no superfluous information. PCA is a methodology that takes numeric datasets and utilizes orthogonal techniques of transformation. It transforms an inspection into a variable set, then plotted with a set of variables recognized as PCs. When there are noisy data sets, PCA is most beneficial as it is much easier if the inconsistency spreads on some of the components instead of over the entire set. Thus relatively, there is less noise effect as the signal-to-noise-ratio of the initial higher some components. This consequence of focusing much of the sign on the initial few components can be attained by PCA's dimensionality reduction attributes. Later, PCs may be conquered by noise, and consequently, they can be rejected without immense loss. In addition, this method reduces the dimension of the dataset. However, the variance of the dataset remains the same. Feature selection diminishes the merged feature vector (FV) and chooses the most favorable features for well recognition. Principal components as a whole provide an orthogonal foundation for the data space [23]. As a result, we offer unsupervised FS algorithms for PCA based on eigenvectors analysis to recognize the original features.
The Eigenvalue decomposition of the data covariance /correlation matrix or the singular value decomposition of a data matrix is used to determine PCs. Usually, after each attribute's data has been mean-centered. When the variances of variables are significantly high as compared to correlation, a covariance matrix is preferred. When the variables are of various kinds, it is preferable to use type correlation.
We can say that the PCA consists of these four key steps: (a) first getting the mean of fused feature Vector; (b) subtracting mean from every feature; (c) computing the covariance matrix; (d) computing the eigenvalues and vectors of the covariance matrix. The PCA returns the main components as well as a score. The algorithm of these PCA steps is given below:

Algorithm: PCA Input: Dataset matrix [X] Output: Features reduction
Step 1: Generate N x d dataset matrix (one row vector per data point xn) Step 2: subtract the mean from every vector row xn in X.
Step 3: calculation covariance of matrix X.
Step 4: find eigenvalues and vectors of a covariance matrix.
Step 5: PCs the mean eigenvectors with the greatest eigenvalues.
After selecting the finest subset of features with a minimum error rate and best accuracy, we further passed these selected features for fusion. The detail of the feature fusion process is given below.

C. FEATURE FUSION
Feature fusion is an energetic research area to achieve the finest accuracy compared with the individual feature sets [46] and [47]. There are two kinds of feature fusions, i) early fusion and ii) late fusion. The feature-based combination of details is "early fusion" while the late fusion is applied at the categorization step. After sampling, the superficial and deep layer attributes are combined to an identical extent to control the glitches of little dimensionality in the deep layer and the insufficient appearance of tiny stuff. The dimensions of the applicant are customized to fit the dimensions of the authentic aircraft in the aerial images.
In our presented methodology, we apply the late feature fusion technique. We provide the dataset to CNN models Alexnet, Inception V3, and a hand crafted feature LBP. After extracting the features from these models, we passed these features to PCA based selection technique. By obtaining the finest subset of features from PCA, we further fused these features. Finally, we provide these fused features to various classifiers such as Quadratic SVM (QSVM), Liner SVM (LSVM), KNN, and ensemble subspace disarmament (ESD). The best classifier was selected based on higher accuracy. The proposed experimental result is described in section V.

D. DATASET DESCRIPTION
The proposed work is evaluated on the MTARSI dataset [14], comprising several types of aircraft images. By this, the identification of airplane types from remotely sensed imageries becomes more possible. This dataset has 9,385 images of 36 different airports, including 20 types of aircraft acquired from Google Earth and manually expanded. This novel dataset of long distanced images is composes of the following 20 airplane types: A-10, A-26, B-29, B-52,B-1, B-2, Boeing, C-17, C-130, KC-10, C-5, C-135, C-21, F-22,F-16, E-3, P-63, U-2, T-43,and T-6.  Experts in the area of aerial imageries analysis cautiously label every single sample picture. Each picture includes absolutely one whole airplane. Each kind of airplane model in MTARSI data package is revealed in Figure 6. The number of model imageries of aircraft in each category is different, ranging from 230 to 846. The detail of the classes and the number of images per class is shown in Table II.

1) DATASET AUGMENTATION
The MTARSI dataset contains some models based on the differences in background, pose, resolution, light, color, and aircraft model. Some aircraft, such as the KC-10 tanker and the B-2 bomber, are very unusual and hard to capture by satellite sensors. This condition delays the procedure of accumulating and structuring the data sets.
To diminish this issue, Zhi et al. [14] preciously enlarge the dataset by pretending pictures of aircraft that were hard to witness. We randomly select the changed background from the related satellite imageries (i.e., those Lands that do not surround any airplanes. At last, the achieved extracted airplane picture is reflected, rotated, and subsequently merged with the particular background to get the ultimate resultant image. The detailed procedure is revealed in Figure 7.
The dataset has many variations in the images of aircraft, like the same type of airplanes in different colors, poses, points of view, changes of background, and resolution. The pictures are captured at different times, like in a day, evening, or different weather conditions, etc. The sample images are shown in Figure 8.

IV. EXPERIMENTAL SETUP AND RESULT
The presented CNN technique is executed in a publically available dataset named MTARSI. The dataset has 9,385 images of 36 different airports, including 20 types of aircraft. The experiment is performed on MATLAB 2018b by desktop computer Core i7 8 th generation with 16GB of RAM.
The extracted CNN features are predictable by many different classifiers and picked the classifier grounded on the utmost accuracy. Different classifiers that are applied in our efforts are SVM, KNN, and ensemble approaches. All conclusions are calculated through 5-10-folds cross validation, and the 70:30 approach is utilized. Then figure the performance in the following measures, i.e., accuracy, recall, precision, F1-score.

A. RESULT AND ANALYSIS
In this part of our article, we presented the results of the proposed model in Tabular    executed in this effort and abstract features from FCs layers. After fusion, 10-fold cross-validation (10FCV) is prepared for training and testing models. The testing results are present in Table III and Table IV Finally, the resultant confusion matrixes and graphs are revealed in Figure 9 and Figure 10.    because no other dataset has this type of variety. Most of the methodologies were evaluated on diverse datasets under different investigational situations. MTARSI dataset is utilized to evaluate and review the performance of airplane type identification methodologies for natural pictures. It also gives advantages to the growth of computer vision, image manipulation, and target identification procedures for aerial imageries. After that, we performed several delegated aircraft type identification methodologies with multiple investigational conventions on the purposed novel dataset. We also notice that the data package evidently differentiates the performance of various methods. The scholars using the proposed MTARSI dataset will therefore have a sound foundation of outcomes to compare. Despite the success of the feature fusion strategy for aircraft image classification, the study was limited to satellite aircraft images. It did not investigate the influence of the multi-resolution method on aerial images. There is no information about the specific contribution of each level of resolution in the image classification task exists. In our upcoming effort, we will exploit this dataset to extend superior aircraft identification techniques. In addition, we further enlarge to accumulate more plentiful data founded on these aerial datasets and consider as the other object category of remotely sensed images.