An Integrated Parallel Inner Deep Learning Models Information Fusion With Bayesian Optimization for Land Scene Classification in Satellite Images

Classification of remote scenes in satellite imagery has many applications, such as surveillance, earth observation, etc. Classifying high-resolution remote sensing images in machine learning is a big challenge nowadays. Several automated techniques based on machine learning and deep learning have been introduced in the literature; however, these techniques fail to perform for complex texture images, complex backgrounds, and small objects. In this work, we proposed a new automated technique based on the inner fusion of two deep learning models and feature selection. A new network is designed at the initial phase based on the inner-level fusion of two networks and combined weights. After that, hyperparameters have been initialized based on the Bayesian optimization (BO). Usually, the hyperparameters have been initialized through a manual approach, but that is not an efficient way of selection. After that, the designed model is trained and extracted deep features from the deeper layer. In the last step, a poor–rich controlled entropy-based feature selection technique is developed for the best feature selection. The selected features are finally classified using machine learning classifiers. We performed the experimental process of the proposed architecture on three publically available datasets: Aerial image dataset (AID), UC-Merceds, and WHU-RS19. On these datasets, we obtained the accuracy of 96.3%, 95.6%, and 97.8%, respectively. Comparison is conducted with state-of-the-art techniques and shows improved accuracy.


An Integrated Parallel Inner Deep Learning Models Information Fusion With Bayesian Optimization for Land Scene Classification in Satellite Images
Ameer Hamza , Muhammad Attique Khan , Member, IEEE, Shams ur Rehman , Hussain Mobarak Albarakati , Roobaea Alroobaea , Abdullah M. Baqasah , Majed Alhaisoni , and Anum Masood Abstract-Classification of remote scenes in satellite imagery has many applications, such as surveillance, earth observation, etc. Classifying high-resolution remote sensing images in machine learning is a big challenge nowadays.Several automated techniques based on machine learning and deep learning have been introduced in the literature; however, these techniques fail to perform for complex texture images, complex backgrounds, and small objects.In this work, we proposed a new automated technique based on the inner fusion of two deep learning models and feature selection.A new network is designed at the initial phase based on the inner-level fusion of two networks and combined weights.After that, hyperparameters have been initialized based on the Bayesian optimization (BO).Usually, the hyperparameters have been initialized through a manual approach, but that is not an efficient way of selection.After that, the designed model is trained and extracted deep features from the deeper layer.In the last step, a poor-rich controlled entropybased feature selection technique is developed for the best feature selection.The selected features are finally classified using machine learning classifiers.We performed the experimental process of the proposed architecture on three publically available datasets: Aerial image dataset (AID), UC-Merceds, and WHU-RS19.On these datasets, we obtained the accuracy of 96.3%, 95.6%, and 97.8%, respectively.Comparison is conducted with state-of-the-art techniques and shows improved accuracy.

I. INTRODUCTION
I N THE widest definition, remote sensing is a data-gathering technique that does not require the investigator to have direct physical contact with the object, substance, or phenomenon being studied.The whole procedure starts with the detection of radiation using sensor technologies, which is followed by the measuring of radiation at different wavelengths.This radiation is released or reflected by distant objects and materials [1].Because remote sensing can provide observations on a local, regional, and even global scale, it is useful for a variety of applications, including monitoring land cover and use for agricultural purposes [2], supervising forest management [3], conducting geomorphological surveys [4], and determining the dynamics of water quality [5], among others.The availability of aerial images, which allows for a more in-depth analysis of the planet's surface, has led to a significant surge in interest in earth observation [6].In the classification of aerial scenes [7], [8], each aerial image is evaluated using semantic labeling, a core component of the field of remote sensing, to assign it a meaningful label [9].Aerial sceneries are often quite intricate, and there aren't many visual variations across groups [10].For instance, common land-cover types are seen throughout several different scene classes.The classification of aerial images may be challenging since several diverse spatial and structural patterns are present [11].
It is required to create a scene representation for aerial imagery before attempting to ascertain the semantic labels used in aerial scene classification.Creating a reliable scene representation has received much attention recently, and several different aerial scene classification methods have been proposed [12].These methods may be generally divided into two groups: those that address low-level scene features and those that address medium-level scene features.The common low-level approaches include the Invariant Feature Transform, the Local Binary Pattern, the color histogram, and the GIST [13], [14], [15], [16].The scene representation that midlevel processes create includes the low-level local feature descriptors.The methods for midlevel coding include Bag of Visual Words, Spatial Pyramid Matching, Locality-Constrained Linear Coding, Probabilistic Latent Semantic Analysis, Latent Dirichlet Allocation, Improved Fisher Kernel, and Vector of Locally Aggregated Descriptors [17], [18], [19].Deep convolutional neural networks (DCNNs) [20], currently dominate the classification of most aerial images.The compelling depiction of the trait served as the inspiration for these CNNs.Because CNNs can provide strong feature representations to characterize the aerial image, classification performance, particularly for high-level approaches, has greatly improved.This is especially true for sophisticated operations.High-level methodologies extract impressive representations from aerial landscapes, unlike standard low-level methods, which depend on manually created features.High-level approaches may be contrasted with traditional lowlevel ones [21].
A researcher has recently presented several computer visionbased methods for classifying an object using satellite images.Some worked on nonhistorical buildings using airborne and satellite imagery [22].The researcher used and worked on developing an automatic ship detection approach and a DL method for using satellite images [23].For example, Duarte et al. [24] suggested an approach for satellite images using a deep learning approach.In this presented method, they implemented the DCNN technique for image classification of building damages.By this method, they gained 94% accuracy.The main drawback of this presented framework was only one multiresolution network did not improve the classification accuracy compared to the used benchmark.Pritt et al. [25] presented a method for satellite image classification using deep learning.In this presented methodology, they performed object and facility recognition using high-resolution and multispectral satellite images.From this technique, they obtained 95% accuracy.The dark side of this method was the state-of-the-art object detection method, which is not well for satellite images.Gao et al. [26] a regionbased deep learning approach is suggested to segment satellite images.In this presented method, they used rooftop detection by using the segmentation approach.From this method, they obtained 92% accuracy.This presented method could not avoid the speckle-like error sometimes found in the segmentation model.Rostami et al. [27] demonstrated using deep learning techniques for fire detection with Landsat-8 satellite imagery.In this presented method, they used CNN multiscale detection for AFD in the Landsat-8 dataset.Consequently, they succeeded with 95% accuracy.This presented method's limitation was detecting fires of varying sizes and shapes over challenges test shape.Yosmaoglu et al. [28] presented a road network generation using satellite images.The presented method evaluates and compares the Resnet and U-net generation models.As a result, they achieved 99% accuracy.Lim et al. [29] presented a dead pine tree detection using a deep learning method.In this presented method, they used aerial vehicle and object detection deep learning to solve the problem.As a result, they achieved 99% accuracy.Ch et al. [30] presented a method for ECDSA-based water bodies using satellite images.They employed the U-Net model to achieve data integrity by using the security feature elliptic curve electronic signature algorithm.Therefore, they obtained 94% accuracy.This technique's main flaw was extending this model into video input.Najar et al. [31] demonstrated an approach for coastal bathymetry using deep learning approaches.From this presented method, they used Sentinel-2 satellite imagery and multiple bathymetry to train the deep learning model.As a result, they achieved and predicted 50% accuracy.One limitation of this approach was the selection of data based on certain dates and the need to train on application sites.Kaur et al. [32] introduced a transfer learning-based approach for automatically detecting and tracking hurricanes using satellite imagery.In this presented method, they utilized a transfer learning-based model.Thus, they gained 95% accuracy.The limitation of this method was made more generalizable by including images and another hurricane.Zhuang et al. [33] presented a method for semantic guidance transfer-based method by using satellite images.In this presented method, the UAV-based geo localization dataset.As a result, they achieved 8% more improvement in accuracy.The limitation of this method was a lot of information would be lost when using this model.Zhang et al. [34] presented a building height extraction using satellite images.In this presented method, the researchers used a stereo-matching technique coupled with a DSM-based approach for predicting bottom elevation.As a result, they improved the accuracy as compared to other method.Ul Ain Tahir et al. [35] presented a method for wildfire detection using deep learning.In the presented method, they utilized YOLOVv5-based deep learning based model.As a result, they achieved 94% F1-score.
Hasan et al. [36] presented a novel-based resource allocation technique for 5G heterogeneous networks.In this presented work, the authors designed a new biogeography-based dynamic subcarrier allocation algorithm for minimizing the crosstire subcarrier snooping problems in MeNB and HeND.They achieved 88.1% outage and 83.6% spectral efficiency.It was higher than the existing techniques.Ariffin et al. [37] demonstrated a modeling approach based on frequently modulated continuous radar waves for detecting landslides in Malaysia.The authors designed a radar for detecting slow-moving landslide movements in this work.They successfully achieved 20 m/s speed radar performance to detect landslide occurrences.El Asri et al. [38] presented a method for modular system based U-Net using satellite images.In this presented method, they utilized CNN based deep learning model.Therefore, they obtained 70% accuracy.The presented framework's main flaw was the use of the data augmentation method, which will improve the result.
In summary, the authors in the related works used deep learning and U-net generation models for the classification of land scenes using satellite images.Few of the authors focused on the detection of multiple objects from satellite images.Remotely sensed images play a critical role in several applications such as environmental monitoring, disaster assistance, and geological surveys.The increasing need for satellite-derived imagery has led to a substantial flow of data being acquired on a daily basis.Consequently, the database has been expanded to include a much larger quantity of remote-sensing images.However, the task of accurately and efficiently acquiring and classifying images from an unstructured database is a significant challenge.Cloud cover and atmospheric conditions may conceal certain areas of the image.Therefore, getting clear and consistent data for classification might be challenging.Landscape features are complex and there is a chance of spatial heterogeneity within a single image.Therefore, correct classification is another challenge in landscape classification using satellite images.In this work, we designed a deep learning-based internally fused models approach for classifying land scenes using satellite images.
The major contributions of the proposed framework are as follows.
1) Substitution-based approach is employed for the contrast enhancement of the satellite images.2) Proposed a novel fused model technique based on Ef-ficientb0 and MobileNetV2 architecture.The proposed model's hyperparameters were optimized using Bayesian optimization (BO) and trained using deep transfer learning.3) Proposed an improved poor and rich controlled entropy optimization for best feature selection and conducted t-test analysis to measure the significance of different classifiers.The rest of this article is organized as follows.The methodology section describes the dataset and normalization techniques, the proposed fused architecture, and the improved controlled poor and rich optimization (PRO) approach for best feature selection.The findings are explained under Section III, while Section IV presents the proposed method's conclusion.

II. PROPOSED WORK
This section explains the proposed landscape classification framework using a novel fused model.The proposed model was trained using BO and employed improved poor and rich controlled-entropy optimization for best feature selection, as shown in Fig. 1.This figure illustrates that the publically available datasets of satellite images were used for classification of landscape classification.In the initial step, contrast enhancement is performed by using a substitution-based approach.Following that, the two pretrained models named EfficientB0 and Mo-bileNetv2 are internally fused for training proposes.In addition, the proposed model is fine-tuned by using deep transfer learning.Then, BO was utilized to select the optimized hyperparameters for the proposed model.The features were extracted from the trained model using newly added depth-wise activation.Furthermore, improved poor and rich controlled-entropy optimization was employed to select the best features.The optimized features are fed to neural network classifiers for the final classification.In the last phase, t-test analysis is conducted for statistical comparison of the performance of neural network classifiers.

A. Dataset and Contrast Enhancement
In this article, we used three publically available land-use datasets for the experimental process.The selected datasets are aerial image dataset (AID) (https://captain-whu.github.io/BED4RS/), UC-Merced land use (https://captain-whu. github.io/BED4RS/),and WHU-RS19 (http://weegee.vision.ucmerced.edu/datasets/landuse.html).AID dataset is one of the largest aerial scenes datasets containing 30 aerial scene classes: forest, airport, farmland, bridge, beach, mountain, river, church, desert, dense residential, baseball field, industrial area, playground, pond, park, meadow, and to name a few.The total number of samples in this dataset is 10 000.Each aerial image has a predetermined resolution of 600 × 600 pixels to offer as much information as possible about a location.The UC-Merced land-use dataset consists of 21 land-use classes, each with 100 samples.The size of each image is 256 × 256 and manually acquired from the USGS National Map Urban Area Imagery collection for urban sites around the United States.In the WHU-RS19 dataset, 19 classes exist airport, beach, bridge, commercial area, desert, farmland, football field, forest, industrial area, meadow, mountain, park, parking lot, pond, port, railway station, residential area, river, and viaduct.This dataset's images have dimensions of 600 × 600 pixels and nearly 50 images per class.Fig. 2 presents a few images of each class of this dataset.
The images of these datasets were in low contrast and dark.These problems may lead us to misclassification.Therefore, we created a substitution-based approach for contrast enhancement by utilizing different filters.First, an adjusted filter with stretch limits is employed, and the resultant images are substituted in a sharpened filter.By sharpen filter, the intensity values of Authorized licensed use limited to the terms of the applicable license agreement with IEEE.Restrictions apply.the images at the edges where different colors converge are heightened.Mathematical formula is defined as follows.
Consider that the satellite database has k images S ∈ R k , where each image is represented by f k (v 0 , h 0 ) and(v 0 , h 0 ) ∈ R. Assume that S L and S U are the specified lower and upperrestrictions on the image's intensity values before being normalized and E L and E H are the current lowest and maximum pixel values.Each pixel is measured by using the following equation: where g adj (v 0 , h 0 ) is the resultant image, this image is further substituted in sharper filter using un-sharp masking approach.This filter is utilized to upgrade the polarity along the edges, sharpen using un-sharp mask is mathematically represented as where α is the scaling coefficient that determines the degree of sharpness and ) is a sharpen using the un-sharp mask filtered image.Therefore, the resultant image is mathematically defines as where I out (v 0 , h 0 ) denotes the final contrast-enhanced image, which is presented in Fig. 3.

B. Proposed Fused CNN Model
DCNN architectures EfficientNet-B0 and MobileNet V2 are both utilized for image classification tasks.EfficientNet-B0 is a version of the EfficientNet architecture, which was introduced by Google in 2019 [39].EfficientNet-B0 is a variant of the EfficientNet architecture, which is known for its efficient use of computation and network capacity.EfficientNet-B0 is the smallest and most efficient version of the EfficientNet framework, differing from its larger predecessors by requiring less computing capacity and having fewer parameters.To obtain the highest accuracy in image classification tasks, the architecture of the network is based on a compound scaling technique that effectively increases the network's dimensions (depth, breadth, and resolution) [40].Moreover, MobileNetV2 is an architecture for a network of CNN that was designed specifically to meet the needs of mobile and embedded devices [41].In MobileNetV2, the expressive potential of the model is increased by integrating inverted residuals into its conceptual framework.Due to the design's primary focus on memory and computational efficiency, it is optimally adapted for deployment on devices with limited computing capacity, such as smartphones and tablets [42].Effi-cientNetB0 and MobileNetV2 have gained recognition for their significant computational efficiency and reduced model size, while maintaining a satisfactory level of performance.In this research, the selection of both models was based on their ability to achieve an appropriate balance between computational efficiency and accuracy.The Efficient-b0 and mobileNetv2 architectures are fused into a single network to leverage both models' strengths.Efficient-b0 is utilized as a backbone network and mobileNet-v2 is added as a light -weight feature extractor.This process increased the accuracy and reduced the computation and memory usage.The fused model accepts input images up to 224 × 224 × 3 pixels in size.Fully connected, SoftMax and classification layers were removed from the Efficient-b0 and MobileNetv2 in order to add a new depth-wise layer to combine the features of global average pooling layers of both models.Following that, new fully connected layer, new softmax, and classification layers are added.The new FC layer is modified according to selected datasets.We trained the fused model by utilizing the BO in order to achieve the optimized hyperparameters.The brief explanation of BO is provided in Section III-D.After training, deep features are extracted from the depth-wise concatenation activation.The dimensions of extracted features are N × 2560.The MobileNetV2 has 3.5 M parameters and EfficienetNetB0 has 3.5 M parameters.After depth-wise fusion process, resultant architecture has 6.3 M parameters instead of 8.8 M parameters.The fusion process of Efficient-b0 and MobileNetv2 architectures is presented in Fig. 4.

C. Bayesian Optimization
Hyperparameters tuning is a crucial step in the training of DCNNs to achieve optimal performance.However, the search space of hyperparameters is often large, complex, and the evaluation of different hyperparameter configurations can be computationally expensive.Traditional methods, such as grid search and random search, are not well suited for this task due Authorized licensed use limited to the terms of the applicable license agreement with IEEE.Restrictions apply.to their inefficiency and lack of ability to handle constraints and noise.BO is a powerful technique that can be used to solve this problem.It models the unknown performance of the DCNN as a function of the hyperparameters with a surrogate model, typically a Gaussian process (GP).The GP is used to model the distribution of the performance over the hyperparameters space, and the optimization algorithm is based on this distribution [43].In each iteration, the BO algorithm chooses the next set of hyperparameters to evaluate based on the current state of the GP and an acquisition function that balances exploration and exploitation [44], [45].
GP is a type of stochastic process where the distribution of any subset of its random variables is multivariate Gaussian.This process operates under the assumption that inputs that are similar will produce similar outputs, and therefore, it uses a statistical model to represent the function.Similar to a Gaussian distribution, which is distributed by its mean and covariance, GP is defined by its mean function μ : d → R and covariance function cov ( The function f (d) for any given d is instead of being a scalar, the new distribution represents f (d).For simplicity, the mean function of the GP can be assumed as μ (d) = 0.For covariance function cov, the exponential function is selected which is mathematically defined as where d i and d j denote the ith, jth samples, respectively.The closer d i and d j are to each other, the more likely the value of some parameter will approach 1. Conversely, as the separation between d i and d j increases, the value of the parameter tends to approach 0. This relationship highlights the correlation and mutual influence between the samples, which intensifies, as the samples are closer together, and weakens as they move further apart.
The procedure for ascertaining the posterior distribution of f (d) is as follows.

Initially, sample s observations as training set
Assume that the values f are derived according to multivariate normal distribution f ∼ N (0, τ ), where (7) Every value of vector τ is determined by using (6).The degree of approximation between the two samples is calculated by function f and without taking the noise effect the diagonal values of cov Based on function f, calculate the function value of new sample point d s+1 using f s+1 = f (x s+1 ).Based on the GP assumption, it can be stated that the combination of the function values of f 1:s in the training set and the value of f s+1 follows a normal distribution with s + 1 dimensions described as where In addition, f s+1 adheres to a normal distribution with a single dimension, meaning that according to the characteristics of a joint Gaussian distribution Once the posterior distribution of the objective function is established, BO employs an acquisition function (ϕ) to find the maximum of the function f .Typically, a high value of the acquisition function is assumed to correspond to a high value of the objective function f .As a result, maximizing the acquisition function is considered the same as maximizing the function f .Hence, the objective function is defined as Authorized licensed use limited to the terms of the applicable license agreement with IEEE.Restrictions apply.
The employed acquisition function is expected improvement (EI).The EI method calculates the expected level of improvement that can be achieved while investigating the area around the current suitable value.The current ideal value could be a local optimum, and the algorithm will need to look for the best value in other regions of the domain if the actual improvement of the function value is less than the predicted value after the process has run.The difference between the function value at the sample point and the present optimal value is used to compute improvement (I).The improvement is regarded as 0 if the function value at the sample point is less than the existing optimum value In accordance with the EI optimization strategy, the objective is to maximize EI with respect to the current optimum value (f) The function EI is used to compute the expected value of the degree of improvement that can be derived by analyzing the neighborhood surrounding the current optimal value.If the increase in function value during algorithm execution is less than the expected value, then the current optimal value point may represent a local optimal solution.In such situations, the algorithm will continue to seek for the optimal value point in other domain locations.The definition of EI is as follows: where The expectation of improvement (I) is represented by (14), which is the definition of the EI function.In the final step, the employed stopping condition of BO has two factors; MaxTime is the first in which the BO optimization procedure will have a time limit of 50 400 s, which is equivalent to 14 h.The optimization process will end upon reaching the allotted time limit, regardless of whether it has converged to the optimal solution or not and the second stopping condition was when it completed its initial 30 function evaluations.In this research, we employed the BO to

D. Improved Feature Selection Technique
Feature selection in deep learning is a challenging task that requires specific techniques to handle the high dimensionality and nonlinearity of deep neural networks.The use of regularization, auto-encoder-based methods, and metaheuristic optimization methods are effective strategies that can improve the accuracy and effectiveness of deep learning algorithms.Selecting the most suitable feature selection approach is crucial in order to get optimum results, since it should be based on the distinctive attributes of the given situation.In feature selection, the aim is to identify a subset of relevant features from a large number of input features that can contribute to the prediction accuracy of the model [46].The achieved feature vector from the proposed model was high in dimension, which can lead to increased computational cost and longer training time.Therefore, we employed PRO controlled entropy for feature selection.The proposed technique can reduce the computational cost and improve the training efficiency, while still maintaining high accuracy.The original PRO algorithm is based on a real-world social phenomenon that can provide a viable solution for complex optimization problems [47].Wealth is a widely used concept in various fields, particularly in economics.Its definition varies based on the attitude and implementation of the context.It is a measure of the economic status of individuals, and its quality and quantity are defined within the economic categories.The aspiration to become wealthy is a universal human desire, and people are naturally driven by financial pursuits to satisfy their needs and desires.Although there are numerous ways to acquire wealth, seeking insights from the experience and knowledge of the wealthiest individuals globally seems to be the most effective approach.Sociologically, people in a society are classified into two financial classes: the rich, whose wealth level exceeds the average, and the poor, whose wealth level is below the average.Members of both classes strive to improve their economic conditions through diverse means.However, they share a common tendency to observe each other's behavior and attempt to enhance their position by emulating or influencing the other.Therefore, the PRO algorithm's fundamental concept is to apply two strategies.
1) The poor population endeavors to improve their status and reduce the class gap by learning from the rich.2) The rich population aims to widen the class gap by observing and gaining wealth from the poor.
The PRO algorithm involves the generation of an initial population by a random process using a uniform distribution technique.This technique selects values within specified upper and lower boundaries for each parameter.The original population is thereafter assessed according to the objective function and afterward arranged in ascending order depending on the outcomes.The PRO algorithm primarily consists of two distinct subpopulations, that represent the rich and the poor, respectively.The main population is mathematically defined as where P f main , P f rich , and P f poor denoted the main population, rich population, and poor population size of features f, respectively.Following that, the main population is sorted in ascending order.The better-position population is considered as rich population and the remaining are considered as poor population of feature.The equation is defined as where f 1 , f 2 , f 3 , f 4 , and f r represented the rich population and f r+1 , f r+2 , and f N denoted the poor population.The primary population comprises two subpopulations: the poor and the rich.At each iteration of the algorithm, a defined mechanism must be employed to alter the position of every member of both subpopulations The change in position of each feature of rich population by using the following equation: where −−→ V new r,k denotes new kth position value of rich population, − − → V old r,k represents the present kth position value of rich population, ∝ is the parameter that represents the class gap, and denotes the present position of best member of the poor population.The value of V considered as a vector of all variables.Actually, each member of rich population widens the gap with every member of the poor population.Therefore, − −− → V old p,best is the best member of poor population.When the distance of rich population member increases from the −−−→ V old p,best , its distance increases from all the members of poor population.Actually, the poor population gets poorer when the distance between poor and rich gets higher.The distance that each member of the rich population should maintain from the poor population is determined by a random value, ∝ which falls between 0 and 1.The arbitrary nature of ∝ creates an internal competition within the rich population.
In every alteration of PRO, change in position of each feature of poor population by using the following equation: where represents the new kth position of poor population, − − → V old p,k denotes the current value of kth position of poor population, ∝ is a random parameter, which presents the pattern improvement and pattern of getting rich.The pattern value mathematically formulated as where − −− → V old r,best represents the best member positions of the rich population, − −− → V old r,avg represents the average position member of the rich population while − −−− → V old r,worst denotes the worst position member of the rich population.
In the realm of economics, certain factors have the potential to positively or negatively affect the overall economic climate.Examples of these factors include sudden fluctuations in the price of gold, oil, or petrochemicals, as well as significant changes in exchange rates, stock interest rates, or banking interests.Such factors can lead to abrupt alterations in the situation of certain individuals within a given society.Due to the inherent difficulty and sometimes impossibility of predicting these factors, they are utilized as a form of mutation in the algorithm.In this algorithm, we employed Gaussian mutation process.In Gaussian mutation, a small random value is added to each variable in an individual's solution vector, drawn from a Gaussian (normal) distribution with mean zero.The Gaussian distribution is a probability distribution that is symmetric around the mean, with most values close to the mean and progressively fewer values further away from the mean and the scale and shrink parameters determine the standard deviation of the distribution.At the first generation, the standard deviation is determined by the scale parameter.The initial population range is defined as a vector V with rows and columns, the standard deviation for each coordinate i of the parent vector is determined by scale × (V (i, 2) − V (i, 1)), and the reduction in standard deviation as generation's progress is determined by the shrink parameter.The standard deviation for coordinate i of the parent vector at the kth generation, represented as σ i,k , is determined by utilizing a recursive formula where Sh denotes the shrink, the default value of shrink and scale is set.For generating new population after every iteration.
After each iteration of the PRO algorithm, fitness is calculated by employing KNN.This function returns the cost value and cost is measured by using the following equation: The cost function of KNN is mathematically formulated in the following equation: where the default values of α 0.99 and β are 0.01.There exist four distinct populations.These include the original populations of both the rich and the poor, as well as the updated populations of poor and rich.An objective function is used to assess each of these four populations, which are then merged into a composite population based on their ascending order of values.Prior to the Authorized licensed use limited to the terms of the applicable license agreement with IEEE.Restrictions apply.
creation of this composite population, the poor and rich subpopulations are separated by a predefined number.The purpose of merging the poor and rich populations at the end of each iteration is to account for the possibility that a member of the poor population may have gained enough wealth to replace a member of the rich population, and vice versa.It is worth noting that the top-performing member is always the first one in the rich population.Based on this, the original PRO selected features of dimension × S i where i ∈ {N × 1267, N × 773, N × 1220}.These feature vectors are obtained for three selected satellite datasets.After that, an improved version is designed based on the entropy calculation after each iteration.
Entropy-controlled Selection: Consider ( 21); the Entropy is computed after each iteration, removing the uncertainty among them.Entropy is computed as follows: Based on the entropy value, the ( 21) is updated as follows: This equation's values (features) are returned and passed to the fitness function that checks the fitness after each iteration.In addition, the cost of each iteration is computed after each iteration.In the end, the final feature vectors are obtained of dimensions N × 1060, N × 642, N × 1004, respectively, for all three datasets.The selected features are fed into neural network classifiers for the final classification.

III. RESULTS AND ANALYSIS
In this section, detailed experimental results of the proposed framework are described.The experiments are conducted on three datasets, and a complete description of each dataset is given under the Dataset and Contrast Enhancement section.Each dataset is divided into a 50:50 ratio.This indicates that 50% of images are utilized to train the proposed model and other 50% images are opted for testing.All the experiments were conducted using the 10-fold cross-validation because 10-fold cross-validation is widely favored due to its ability to achieve an appropriate balance between variance, which pertains to the generalization of the performance estimate, and computational cost.In our case, we had N×2560 features the smaller value of k was not performed well and after 10 values of k, the performance of models was consistent.The utilized static hyperparameters during training of the proposed model are epochs, minibatch size and optimizer having values are 300, 18, and stochastic gradient decent with momentum, respectively.Furthermore, the initial learning rate, section depth, momentum, L2Regularization, dropout, and activation type are defined with their ranges and optimized by using BO.Multiple neural network classifiers and KNN are employed for the classification task, including narrow neural network, medium neural network, wide neural network, bi-layered neural network, and weighted KNN.The performance evaluation parameters are precision, recall, accuracy, error, false negative rate, f1-score, and time.All the experiments were conducted on MATLAB R2023a executing on MSI's leopard series with Intel core i7 processor, 16 GB RAM, 512 SSD with 1TB HDD integrated disk, and 4 GB NVIDIA RTX graphics card.

A. AID Dataset Results
In this section, the AID dataset's results are provided.The deep features of the proposed fused architecture model are extracted in the first step.The enhanced data set was used to train this model using BO and deep transfer learning.Table II shows the classification accuracy of this model, which obtained a 95.7% score from the wide neural network classifier.The precision, recall, error, FNR, and F1-score are 95.58%,95.53%, 4.3%, 4.47%, and 95.55%.The medium neural network has the shortest execution time of 32.21 (s) and the longest execution time of 106.23 (s) in this phase experiment, which records the classification computational time for each classifier.The best features were selected in the next phase utilizing PRO.According to Table III, selected features are passed to the classifiers.Wide neural network classifier achieved a maximum accuracy of 95.6%.The wide neural network recall rate is 95.37%, the accuracy rate is 95.45%, the error rate is 4.4%, the FNR is 4.63, and the F1-score is 95.54%.Each classifier's processing times are further recorded.
The results of the third step, which involves controlled Entropy are performed, are shown in Table IV.The wide neural network classifier has an accuracy of 96.3%, higher than the previous two steps (see Tables II and III).Furthermore, recall and precision has 96.13 and 96.0%, respectively.A confusion matrix is shown in Fig. 5 and may be used to verify the performance of a wide neural network.The controlled entropy approach significantly improves accuracy in comparison to the previous two experiments performed on this dataset.It is also noticed that time decreases after the entropy phase.

B. UC-Merced Land-Use Results
In the initial phase, UC-Merced Land-use dataset results are described.Deep features are extracted from the proposed fused architecture model and trained using BO and deep transfer learning on enhanced datasets.Table V presents the classification results of the UC-Merced Land-use dataset.The wide neural network achieved a higher accuracy of 96.4% in this table.The precision, recall, error, FNR, and f1-score having values are 96.4%,96.3%, 3.6%, 3.7%, and 96.3%, respectively.The wide neural network classifier has achieved higher accuracy than all the listed classifiers in Table V.Furthermore, the computation is also recorded for all the classifiers.The shortest execution time is 15.96 (s), and the longest execution time has been recorded for the bi-layered neural network classifier, which is 23.09 (s).
In the next phase, the best features are selected by opting for PRO.The optimized features are passed to a neural network classifier for classification.Table VI illustrates the improved PRO results on the UC-Merced Land-use dataset.The wide neural network achieved a higher accuracy of 96.5% from this experiment.Wide neural networks outperformed the rest of the classifiers.The precision rate is 96.5%, the recall rate is 96.4%, the error rate is 3.5%, FNR is 3.6%, and the f1-score is 96.4%.
Authorized licensed use limited to the terms of the applicable license agreement with IEEE.Restrictions apply.

TABLE II PROPOSED FUSED ARCHITECTURE OF EFFICIENTB0 AND MOBILENETV2 MODEL FUSION RESULTS ON THE AID DATASET
Bold entities presents the highest values in the tables.These values are also calculated from the other classifiers.The computation time is noted for all the classifiers, and it is observed that the medium neural network classifier required a lesser time of 9.37 (s).
In contrast, weighted KNN takes the longest time, which is 17.35 (s).The final step employs a controlled entropy approach on best features.Table VII shows the controlled entropy results on the UC-Merced Land-use dataset.In this table, wide neural network gained the highest accuracy of 95.6%.The precision, recall, error, FNR, and f1-score values are 90.9%,95.6%, 4.4%, 4.4%, and 93.7%.A confusion matrix presented in Fig. 6, can be utilized to verify the performance of a wide neural network classifier.This experiment shows that the computation time is reduced from the 1 and 2, described in Tables V and VI.

C. WHU-RS19 Results
In this experiment, the result of the WHU-RS19 has been presented.Deep features were extracted from the proposed fused architecture of the efficientnetb0 and mobilenetv2 model in the first step.The proposed model was trained through BO and deep transfer learning.highest accuracy from all the other classifiers in this table.The highest accuracy is 92.8%.The precision, recall, error, FNR, and f1-score values are 93.2%,93.5%, 7.2, 6.5%, and 93.3%.These statistics are calculated for all the other classifiers.It is observed that medium neural network classifier executes faster than the listed classifiers.The executing time of this classifier is 12.7 (s), although the longest execution time is 17.5 (s).The extracted features are optimized in the next step by employing improved PRO.Following that, the optimized features are passed to the neural network classifier.The results of improved poor and rich feature selection on selected features are presented in Table IX.This table shows that the wide neural network outperformed all the other neural network classifiers.It achieved an accuracy of 93.0%.The precision rate is 93.2%, the recall rate is 92.9%, the error rate is 7.0%, the FNR rate is 7.1%, and the f1-score is 93.0%.The computation time is recorded for all the listed classifiers; it is noted that the medium neural network takes less time, which is 11.57(s).
Table X shows controlled entropy-based results on the WHU-RS19 dataset in the final step.In this table, the maximum 97.8% accuracy has been noted from the medium neural network classifier, and it takes 2.80 (s) for execution, which is the shortest time from all the listed classifier's computation time and maximum execution time of 127.7 (s) has been recorded from bi-layered neural network classifier.The precision, recall, error, FNR, and f1-score have 97.7%, 97.8%, 2.2%, 2.2%, and 97.7% values.This numerical Analysis is also conducted for all the other neural kernels.After applying Entropy, it was observed that the accuracy was significantly improved.Moreover, it was clearly observed that computation is reduced from the previous experiments, shown in Tables VIII and IX.Fig. 7 presents the confusion matrix of the medium neural network classifier, which further verifies computed values.

1) T-Test-Based Analysis:
The t-test is a statistical test that helps determine whether the means of two groups or samples differ significantly.The performance of the two classifiers can be compared using a t-test analysis.In this work, we performed Authorized licensed use limited to the terms of the applicable license agreement with IEEE.Restrictions apply.

TABLE X PROPOSED CONTROLLED ENTROPY BASED RESULTS ON WHU-RS19 DATASET
Bold entities presents the highest values in the tables.a t-test the selected datasets.Two classifiers from all the selected datasets have been selected based on the highest and second-highest accuracies, as shown in Table XI.Initially, we selected two hypotheses named the null hypothesis (H 0 ) and alternative hypothesis (H 1 ), the H 0 supposed that there is no significant difference in the classifier's performance, whereas H 1 assumed that there is a significant difference.In the first step, we calculate the difference among the classifier accuracies for each process using (26).The value of N = 3, which denotes the process of the proposed framework.After that, we computed the mean (μ) value of differences (∂) for all the selected datasets by using ( 27) where C 1 , C 2 presented the wide neural network classifier (highest value classifier) and medium neural network classifier (second highest classifier), respectively, N denotes the total number of processes in the framework.The values of μ for selected datasets are 0.86, 1.26, and 0.6.In the next phase, we calculate the standard deviation using the following equation: Standard deviation σ having values are 0.351, 0.92, and 0.52, respectively; the t-test values are calculated by using the tselection formula.The t-selection is mathematically formulated as where values of T sel are 4.  H 0 is accepted, it indicates no significant difference between the classifiers performance.
2) Graphical Results: Fig. 8 illustrates the error rate of selected datasets corresponding to their methods.The graphs show that the WHV dataset has a lower % error rate of 2.2% when controlled Entropy is applied and 7.2% and 7.0% when proposed fused architecture and PRO is employed.The AID dataset shows that the smallest error rate is achieved by employing Entropy, which is 3.7%.In addition, the maximum error rate is noted when improved PRO (IRPO) is applied.In the UC-Merced dataset, a 3.6% error rate has been noted when improved PRO is utilized, which is the lowest error rate from the other methods.The entire graph shows that the error is gradually reduced when a   [48], whereas the proposed methodology achieved 96.3%.Similarly, on UC Merced and WHU-RS19 datasets, the achieved higher accuracies using the proposed methodology are 95.6% and 97.8%, compared to [49], [50], [51], and [52] by different methods.

IV. CONCLUSION
This article proposes new deep learning models, inner information fusion, and optimal feature selection-based architecture to classify land scene images.The proposed architecture includes contrast enhancement, model creation, hyperparameter optimizations, feature selection, and classification.Contrast enhancement is performed initially, and a deep learning model is designed.The purpose of enhancement is to increase the quality of low-contrast images and then better learning of a designed model.After that, the hyperparameters have been initialized based on BO instead of manual assignment.The manual assignment is inefficient, and sometimes, this process reduces the learning performance.After that, features are selected based on the poor-rich controlled entropy technique and classified using machine learning classifiers.Three publically available datasets have been employed for the experimental process and obtained Authorized licensed use limited to the terms of the applicable license agreement with IEEE.Restrictions apply.
the accuracy of 96.3%, 95.6%, and 97.8%, respectively.Comparison with the recent techniques shows an overall improvement in accuracy and less computational time.Overall, we conclude with the following points.
1) Fusion of inner layers based on deep learning models improved accuracy and lessened overall parameters.2) Initialization of hyperparameters using BO improved the accuracy and learning performance.3) Selection of best features using the poor-rich controlled entropy technique reduced the computational time and maintained the accuracy.The limitation of this work was the training time increased after the internal fusion of EfficientNet B0 and MobileNetV2 architecture.Moreover, the designed architecture has a large amount of pooling activations due to the fusion of both models, which reduces the useful information from the data.These limitations will be considered as future work.
Ameer Hamza is working toward the Ph.D. degree in computer vision with HITEC University, Taxila, Pakistan.
His major interest include object detection and recognition, video surveillance, medical, and agriculture using deep learning and machine learning.He has published four impact factor papers to date.
Muhammad Attique Khan (Member IEEE) received the master's and Ph.D. degrees in human activity recognition for application of video surveillance and skin lesion classification using deep learning from COMSATS University Islamabad, Islamabad, Pakistan, in 2018 and 2021, respectively.
He is currently an Assistant Professor with Department of Computer Science, HITEC University, Taxila, Pakistan.He has above 280 publications that have more than 10000+ citations and an impact factor of 850+ with h-index 61 and i-Index 165.His primary research interests include medical imaging, COVID- He is a Senior Professor of the university where teaching courses related to AI and embedded systems.In addition, he is a senior AI researcher related to remote sensing and medical.He published more than 50 research articles and also a reviewer for several good journals.
Roobaea Alroobaea received the bachelor's degree (Hons.) in computer science from the King Abdul-Aziz University, Saudi Arabia, in 2008, and the master's degree in information systems and the Ph.D. degree in computer science from the University of East Anglia, Norwich, U.K., in 2012 and 2016, respectively.
He is currently an Associate Professor with the College of Computers and Information Technology, Taif University, Ta'if, Saudi Arabia.His research interests include human-computer interaction, software engineering, cloud computing, the Internet of Things, artificial intelligence, and machine learning.
Abdullah M. Baqasah is with College of Computers and Information Technology, Taif University, Ta'if, Saudi Arabia, Saudi Arabia.He has published more than 40 papers in several reputed journals related to knowledge management and AI.He is also a senior contributor of AI in the same university for the last 2 years.
Authorized licensed use limited to the terms of the applicable license agreement with IEEE.Restrictions apply.

Fig. 1 .
Fig. 1.Proposed methodology for the classification of land scene using satellite images.

Fig. 3 .
Fig. 3. Some samples of contrast enhanced of satellite datasets.

Fig. 4 .
Fig. 4. Proposed fused model using depth concatenation of the classification using satellite images.
of follows a normal distribution with the mean μ(d), and the standard deviation, σ 2 (d).Consequently, the distribution of the random variable I is also a normal distribution, with the mean μ(d) − f (d + ) and standard deviation both being equal to σ 2 (d).The probability density function of I is

Fig. 5 .
Fig. 5. Confusion matrix of a wide neural network of controlled entropy process on AID dataset.

Fig. 6 .
Fig. 6.Confusion, matrix of controlled entropy technique on medium neural network classifier, using UC-Merced land-use dataset.

TABLE I SELECTED
HYPERPARAMETERS AND ITS RANGES FOR OPTIMIZATION USING BAYESIAN OPTIMIZATION fine-tune the proposed model hyperparameters.The considered hyperparameters for tuning are listed in TableI.

TABLE III PROPOSED
IMPROVED POOR AND RICH OPTIMIZATION RESULTS ON AID DATASETBold entities presents the highest values in the tables.

TABLE IV PROPOSED
CONTROLLED ENTROPY RESULTS ON AID DATASETBold entities presents the highest values in the tables.

TABLE V PROPOSED
FUSED ARCHITECTURE MODEL RESULTS ON UC-MERCED LANDUSE DATASETBold entities presents the highest values in the tables.
TABLE VI PROPOSED IMPROVED POOR AND RICH OPTIMIZATION RESULTS ON UC-MERCED LAND-USE DATASET Bold entities presents the highest values in the tables.licensed use limited to the terms of the applicable license agreement with IEEE.Restrictions apply.

TABLE VII PROPOSED
CONTROLLED ENTROPY BASED RESULTS ON UC-MERCED LAND-USE DATASETBold entities presents the highest values in the tables.

TABLE VIII PROPOSED
FUSED ARCHITECTURE MODEL RESULTS ON WHU-RS19 DATASETBold entities presents the highest values in the tables.
Table VIII illustrates the classification results of this model.The wide neural network classifier gained the

TABLE IX PROPOSED
IMPROVED POOR AND RICH OPTIMIZATION RESULTS ON WHU-RS19 DATASETBold entities presents the highest values in the tables.

TABLE XI COMPREHENSIVE
COMPARISON WITH EXISTING TECHNIQUES Fig. 8. Error rate graph measured on satellite datasets.

TABLE XII SELECTED
CLASSIFIERS FROM ALL THE DATASETS FOR T-TEST ANALYSIScontrolled entropy process is employed, which is a strength of this experiment.3)ComparisonWithSOTA:A Comprehensive comparison with existing techniques has been presented in TableXII.It can be observed that the proposed method outclasses the rest of the listed advanced methods.The highest accuracy achieved on the AID dataset is 89.58% byVinaykumar et al.
19, MRI analysis, video surveillance, human gait recognition, and agriculture plants using deep learning.Dr. Khan is reviewer of several reputed journals such as IEEE TRANSACTION ON INDUSTRIAL INFORMATICS, IEEE TRANSACTION OF NEURAL NETWORKS, Pattern Recognition Letters, Multimedia Tools and Application, Computers and Electronics in Agriculture, IET Image Processing, Biomedical Signal processing Control, IET Computer Vision, Eurasipe Journal of Image and Video Processing, IEEE ACCESS, MDPI Sensors, MDPI Electronics, MDPI Applied Sciences, MDPI Diagnostics, and MDPI Cancers.Shams ur Rehman received the master's degree in computer science in 2023 from HITEC University, Taxila, Pakistan, where he is currently a research associate.He has published one paper in MDPI Diagnostics and currently submitted several papers.His research interest includes medical imaging, remote sensing, and action recognition.Hussain Mobarak Albarakati is with Department of Computer Engineering, College of Computer and Information Systems, Umm Al-Qura University, Makkah, Saudi Arabia.