Variable Length Black Hole for Optimization and Feature Selection

In the high dimensional space, the problem of feature selection (FS) can be regarded as combinatorial optimization problem with high complexity due to the huge number of candidate features. In this article, a novel type of meta-heuristic searching based on variable length of solution space is proposed in order to solve the high dimensionality issue of the FS and to obtain more optimal results. The proposed algorithm uses the original black hole optimization as baseline for development. Blackhole optimization assumes in fixed solution space which decreases the efficiency when the number of features is high. Furthermore, the algorithm is subject to stagnation due to the single exemplar or black hole selection. Hence, or novel variable length black hole modifies the original black hole algorithm with various aspects, namely, it enables decomposing the solution space into subset of dimensions and searching within each dimension separately with selecting an exemplar for each dimension which represents the black hole of the corresponding dimension. In addition, it enables length changing of the solutions based on stagnation criterion. Furthermore, it proposes new concept of energy to the black hole which indicates to the decrease in the effectiveness of black hole with time in an exponential way and use it to replace the black hole when it is not effective anymore. The proposed algorithm which is designated as variable length black hole optimization VLBHO is compared with the variable length particle swarm optimization. The approach has increased the accuracy from 50% to 67% for forest cover dataset and from 38% to 80% for wine dataset.


I. INTRODUCTION
In the era of artificial intelligence and machine learning, data plays an important role in generating knowledge among learners. Typically, data is not used directly but pre-processed before being presented for training or testing [1]. One of the essential stages conducted with the data before using it for learning is feature selection [2]. This means choosing a sub-set of the data for reducing the length of the record used for training, as well as providing more discriminative and less redundant information [3]. Feature selection can be carried out in different ways: offline or online, fixed or variable length, low or high dimension, and supervised and unsupervised [4]. The majority of the approaches used for feature selection adopt certain criteria to indicate the importance of The associate editor coordinating the review of this manuscript and approving it for publication was Cong Pu . the attributes in considering whether the class information exists or not. Some of these are entropy, information gain, symmetric uncertainty and error Bayesian rate [5]. In addition, many approaches use various searching algorithms to identify the most important features to be selected and the less important or redundant features that are to be de-selected. The searching is called combinatorial searching due to the large candidate number of choices and their combinatorial nature.
Most feature selection approaches can be classified into two main categories: filter and wrapper [6]. Filter methods evaluate features based on the rankings of their importance to categorise classes using statistical methods (e.g., Student's t-test and the Chi-square test); information theory-based methods (e.g., entropy, Kullback-Leibler divergence and the information gain measure), or other search techniques (e.g., the correlation-based feature selection algorithm [7]. However, filter methods may remove features that are particularly VOLUME 10, 2022 This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/ relevant to some classes with fewer instances [8]. Unlike filter methods, wrapper methods use the classifier as an evaluation function for feature subsets, which require classification accuracy as a form of feedback to evaluate the classifier. Thus, wrapper methods are usually modelled as optimisation problems from which to select the optimal feature subset. Because of this, wrapper methods require more computational time than filter methods, as the latter only use a separate metric to evaluate and select features. However, wrapper methods usually have better accuracy than filter methods. For example, there is a large margin hybrid algorithm for the feature selection [9] and hybrid meta-heuristic approaches.
Meta-heuristic searching is a family of random search algorithms that use various heuristics while searching for the way to optimise certain types of objective function [10]. The elements of meta-heuristic searching are the solution space, which carries the solution representation; the objective function, which provides the way of evaluating the candidate solution; and the approach of changing the set of solutions while searching. Two main classes of meta-heuristic searching algorithm exist, namely swarm, where solutions are represented by a set of moving solutions to generate new positions from an existing one; and evolutionary, where solutions are represented by a pool of mating for solutions to generate off-spring from existing ones. An example of the swarm searching algorithm is the particle swarm optimisation, while an example of the evolutionary searching meta-heuristic algorithm is the genetic algorithm.
Swarm algorithms, as a subset of meta-heuristic algorithms, indicate optimisation algorithms, in which solutions are represented by swarms or sets that interact with each other and evolve based on the interaction until they reach maturity and convergence. The length of the solution space can be determined in advance and is called the fixed solution space length. Alternatively, it can change according to the solution, when it is called variable length (VL) size. An example of the variable length meta-heuristic algorithm with the swarm type is given in the work of [11]. In this study, particle swarm optimisation enables searching based on different lengths of solutions in order to solve the problem of feature selection in high dimensional space.
One recently developed swarm-based meta-heuristic searching approach is Black Hole Optimisation (BHO), [12] which imitates in its representation the state of the stars, their interaction with themselves or with a black hole and their mobility in the universe. This algorithm has recently been used for the purpose of clustering; however, no development was made for it to be used for feature selection. The aim of this article is to propose a novel development of the BHO searching algorithm for feature selection with supporting supervised data and using the Variable Length (VL) type of selected features while searching. Contrary to the work of [11], the developed VLBHO in this article solves the issues of stagnation that are caused when solutions do not improve based on certain exemplars.
The use of feature selection for improving classification accuracy is a recent topic in the literature. Researchers have indicated the importance of features selection in the improvement of the model performance [13]. In [11], a novel variable length optimisation algorithm based on particle swarm optimisation was proposed. The algorithm is distinguished by its capability to provide different lengths of particles. It starts by sorting the features in a descending manner based on their importance and evaluating them based on an objective function that combines both inter-and intraclass distances. Additionally, the algorithm contains a solution changing mechanism to enable efficient and effective solution finding. In [14], an integrated particle swarm optimisation with k-means was proposed for feature selection. In [15], a wrapper-based feature selection based on a heuristic was proposed. The heuristic uses Ant Colony Optimisation (ACO) and Particle Swarm Optimisation (PSO) to integrate both face and fingerprint features. In [16], two prevailing sets of features were combined for use in iris recognition: first-order and second-order statistical measures as textural feature descriptors. A hybrid statistical dependencybased feature selection algorithm is also applied to the extracted feature descriptors to remove noisy and redundant features, thus reducing the size of the feature vector. A back propagation neural network using the Levenberg-Marquardt training algorithm is used for the recognition task. In [17], a whale optimisation algorithm was used to build two hybridisation models for feature selection. Simulated Annealing was embedded in the whale optimisation algorithm to enhance the exploration. In [18], an improved meta-heuristic selection was proposed based on the Chaotic dragonfly algorithm. Their framework consisted of multiple stages starting from pre-processing, which aimed to use an over-sampling technique to solve an imbalanced dataset. Next, a feature selection phase was used for selecting discriminant features using the Chaotic dragonfly algorithm. For classification, they used a support vector machine. In [19], a new feature selection algorithm was presented for two stages of a forecasting engine, consisting of both a recursive neural network and the Elman neural network. In [20], a feature selection method was proposed for iris spoofing detection using top-k feature selection based on the Friedman test and fused through score-level fusion. In addition, the work includes score level fusion of handcrafted and data-driven features. Hence, the feature selection approach worked on both of them. In [21], a feature selection method was proposed based on hybrid Particle Swarm Optimisation (PSO) and a genetic algorithm. It is based on the Gabor filter and the Local Tetra Pattern. The feature selection approach is based on a multi-model biometric framework that includes the face, iris and fingerprint. In [22], the feature selection was proposed based on a genetic algorithm with a cross-generational elitist selection, heterogeneous recombination and cataclysmic mutation for feature selection. In [11], the first variable-length PSO representation for feature selection was proposed, enabling particles to have different and shorter lengths, which defines a smaller search space and, therefore, improves the performance of PSO. By rearranging features in descending order of their relevance, they facilitate the capacity of particles with shorter lengths to achieve better classification performance. Furthermore, using the proposed length changing mechanism, PSO can jump out of local optima, further narrowing the search space and focusing its search on smaller and more fruitful areas. However, this approach suffers from a lack of stagnation handling, which happens when the exemplar does not improve its associated solutions for a long period of time. In other words, the algorithm might fall in premature convergence.
There are two issues with existing BHO [12], namely, fixed length searching which causes inefficiency when the length of solution space increase and relying on single blackhole which might cause stagnation or falling in local minima. In this article, we propose a novel swarm-based meta-heuristic searching algorithm developed under the framework of blackhole optimisation and incorporating two novel aspects, namely the variable length of VLPSO and energy-based stagnation handling. The role of the first aspect is to enable efficiency in searching in high dimensional data for feature selection and the role of the latter is to enable avoidance of premature convergence.
The remainder of the article is organised as follows. In II, the proposed method is presented. The results and discussion are provided in III. Lastly, the conclusion is given in IV.

II. THE PROPOSED METHODS
This section presents the developed methodology for VLBHO optimisation. In sub-section A, an overview is presented of traditional black hole optimisation. In sub-section B, the general algorithm of variable length black hole optimisation is presented. Afterwards, the evaluation metrics are given in C.
f j denotes an objective function considered in the feature selection j = 1, 2..o is an index for the optimization function o denotes the number of the objective function Many researchers consider only one objective function such as the inverse of accuracy (because there is a minimisation) [23]. However, some researchers have formulated the issue of feature selection as a bi-objective problem with two objectives: the number of neurons and the classification error; one such study is the work of [24]. In this article, we are only concerned with the classification error rate or accuracy as an objective for optimisation.

B. AN OVERVIEW BLACK HOLE OPTIMIZATION
This sub-section presents the original BHO before adding our subsequent modification. The algorithm uses two equations: the first one is Equation (2), which is used to move stars or solutions toward the black hole. The second one is Equation (3), which is used to calculate the event horizon of the black hole [12].
where x i denotes the position of the solution x BH denotes the black hole position f BH denotes the fitness value of the black hole f i denotes the fitness value of the solution i t denotes the iteration The difference between the mobility equation provided in 1 and 2 and the one of particle swarm optimisation is its use of the concept of the event horizon, in which solutions close enough to the black hole are removed. The radius is computed based on the fitness value of the black hole and the fitness value of the solution.

C. GENERAL ALGORITHM
Variable length black hole optimisation VLBHO is presented in Algorithm 1. The inputs of the algorithm are: Max_iteration: the number of iterations that will be executed on the algorithm; numOfStars: the number of stars that will be initialised in the population; and rangeOfDimension: the range of dimension related to the searching in the solution space. The output of the algorithm is bho, which represents a black hole object that includes the global best gBest and other information.
The algorithm starts by initialising the black hole object BHO using Max_iteration, numOfStars and rangeOf Dimension. It returns an original black hole object bho. In this initialisation process, the initial population is created. Next, the algorithm iterates until Max_iteration and it loops over the stars one by one to do the following: 1-it updates the position of the star using the function updatePosition()2-it updates the fitness of the star using updateFitness() 3-it updates the global best using best() 4-it updates the energy using UpdateEnergy Next, the algorithm performs an inner loop over each dimension and over each exemplar in the corresponding dimension in order to check the energy of the exemplar, thus disabling its role as exemplar if the energy is lower than bho.Emin. In order to disable its role as an exemplar, the algorithm finds its follower stars and assigns each of them a new exemplar according to the dimension of the star. On the other hand, the algorithm is responsible for replacing the black hole when stagnation happens or when no improvement is made over a certain period of time.
Observing the procedure of the algorithm, it was noted that it introduces two concepts: the first is a black hole, which represents the global best, and the second is an exemplar, which represents a solution with the same dimension as its VOLUME 10, 2022 follower. The exemplar loses its role when its energy is below a certain level while the black hole loses its role when stagnation happens. The algorithm of the exemplar assignment is presented in Algorithm 2 of the ExemplarAssignment() function. It takes an existing bho object, a current star, an iteration and a logical variable of perspective as the inputs and it provides an updated version of the bho object with its updated exemplar as the output.
The algorithm works as follows: it scans all the stars and decides whether they are part of the radius of the position circle or the fitness circle, according to the perspective. Next, it adds all the stars within the radius as a candidate exemplar. The exemplar is selected probabilistically based on a roulette wheel model.
The algorithm for updating the energy of the star is presented in Algorithm 3. As it is provided, the algorithm takes AND (perspective == FITNESS) 11-add star to candidate exemplars 12-end 13-end 14-Exemplar for curr_star in this dimension ← Roulette_Wheel_Selection (bho, candidate exemplars) 15-End the star and the iteration as the input along with the bho object. Next, it iterates on the stars in the same dimension and adds the ones that use a subject star as an exemplar. Next, the energy is updated according to the improvement percentage of the exemplar over its associated stars, as is presented in line numbers 4 and 6, multiplying by the inversely exponential factor based on the iteration. The energy update is given according to Equation (4).
It is important to track the improvement of any star in order to change its energy and thus to decide whether to keep it as an exemplar or not and whether to kill it or not. The algorithm of updating the improvement of the star is given in Algorithm 4. It accepts the black hole object BHO, the star, and the iteration number and it gives the output, which is an updated BHO object. The algorithm uses the star current cost to compare it with the personal best cost and triggers the improvement flag if the former is lower. Otherwise, the improvement star is set to false. The improvement flag is used to count how many times the star has not been improved. When the star has not been improved a certain number of times, it is killed and a new star is generated. The flowchart of the developed variable length blockhole optimization is presented in Figure 1.

D. TRANSFER FUNCTIONS
When dealing with binary space, such as with the problem of feature selection, the Equation (5) is modified to Equation (6).
where T denotes the transfer function that might be an S-shaped or V-shaped transfer function. Two types of transfer functions, S and V are given in Equations (7,8) and presented in Figure 2.
where a and b are constants

E. EVALUATION METRICS
After the evaluation, there were four types of records, namely true positive (TP), false positive (FP), true negative (TN) and false negative (TN). Afterwards, the metrics were calculated as shown in Equations (9) to (14). 1-Precision or positive predictive values and it is given as in Equation (9). It aims at finding the proportion of predicted Positives that is truly Positive where FDR indicates to false discovery rate 2-Sensitivity or recall (true positive rate) it aims at finding the proportion of actual Positives that is correctly classified. It is calculated based on the equation 3-F-measure it indicates to the harmonic mean of precision and recall

4-G-mean it indicates to the geometric mean between precision and recall
5-Accuracy: it is defined as the proportion of true results among the total number of cases examined.
Another way of calculating the accuracy is using this Equation (12) acc = number of true predictions total number prediction (14) F. DATASETS For the datasets, two datasets were used, as presented below: 1-Forest Cartographic shows various cartographic variables to classify forest categories that range from 1 to 7. It was downloaded from Kaggle 1 and is comprised of 15,120 samples with 54 features [25].  2-Wine Dataset: these data are the results of a chemical analysis of wines grown in the same region in Italy but derived from three different cultivars. The analysis determined the quantities of 13 constituents found in each of the three types of wines [26].

III. RESULTS AND DISCUSSION
For evaluation, we ran our developed VLBHO with is four variants, namely S-shaped position, V-shaped position, S-shaped fitness and V-shaped fitness. Firstly, we tested their performance on the benchmarking mathematical function to investigate their optimisation performance. Secondly, we tested their performance on the different datasets.

A. A EVALUATION ON MATHEMATICAL FUNCTIONS
The first stage of evaluation compared the performance of the two variants of VLBHO with particle swarm optimisation (PSO). The parameters used to operate VLBHO are given in Table 1. In addition, the benchmarking mathematical functions are provided in Table 2. These functions were selected because they are considered challenging benchmarking optimisation functions with non-convexity, non-linearity and multi-modal optimisation surfaces.
Observing Figure 3, we found that VLBHO provided the least fitness value for Ackley, Rastrigin, Sphere, and Shubert. The least fitness value indicates better performance, as the functions need to be minimised. The fitness value is similar to PSO for Michalewicz

B. EVALUATION ON BENCHMARKING DATASET
The evaluation is composed of two parts: the first is the evaluation based on the optimisation error, presented in A.   The second is the evaluation based on the benchmarking dataset, which is presented in B.

1) OPTIMIZATION ERROR
Each of the algorithms was run ten times and the performance is reported in Figures 4 and 5 for the forest cover dataset, in which the minimised cost of the optimisation of feature selection follows different statistical behaviour according to the algorithm. However, each of the VL-BHO variants accomplished lower cost values and more narrow distribution compared with the benchmark VL-PSO. On the other hand, the S-shaped position variant of VL-BHO was the best at providing more minimum cost compared with the other variants. In addition, the algorithms were run on the wine dataset for feature selection and generated the fitness values, as depicted in Figure 4, for our developed VL-BHO with its four variants and for the benchmark of VL-PSO. Obviously, the algorithm accomplished lower values of cost compared with the benchmark. The best performance was observed for the S-Shaped function-based VL-BHO. This can be interpreted from the dimension decomposition perspective of the algorithm, which is done through dealing with solutions FIGURE 6. Accuracy comparison between our developed VLBHO variants, the benchmark VLPSO and using all the features for -a-the forest cover dataset -b-the wine dataset.
or stars according to their dimension, where each dimension has its own exemplar. In addition, the superiority can be interpreted because of the effectiveness of the concepts of energy and the number of improvements, which assists in deciding when to kill a certain solution or replace it with another.

2) TESTING ACCURACY
The evaluation of our developed VLBHO with its four variants, FV, FS, PV and PS was evaluated and compared with the benchmark VLPSO [11] and with the inclusion of all features. It was observed from Figure 6 that the highest testing accuracy was accomplished with a value of 67% with VLBHO-PV, which was close to the case of the classification with all the features. The least performance was observed with the benchmark and VLBHO-FS, with an accuracy rate of around 50%.
The testing accuracy for the wine dataset is presented in Figure 7, where the best accomplished accuracy of nearly 80% resulted from VLBHO-FV and VLBHO-PS. The least performance resulted from VLPSO, with an accuracy of lower than 40%. For further elaboration of the performance, the other classification metrics for the two datasets are shown in Figure 7. We present the numerical results of the algorithm in Tables 4 and 5 for Forest and Wine datasets respectively.
For further elaboration, the confusion matrix matrices are presented in Figure 8 for the forest cover dataset and for the wine dataset in Figure 9. The numbers show that each   algorithm generated a different number of misclassifications. However, VLBHO-PV accomplished the best performance for the forest cover dataset as its diagonal values are higher compared with the other algorithms.
The results of applying our developed VLBHO on the wine dataset are presented in Figure 9. The confusion matrices indicate that the best accomplished results of predictions were for VLBHO-PS and VLBHO-FV, with a diagonal value of 35, 58 and 38, which is consistent with the accuracy results presented in Figure 6.
In summarising the performance, the best accuracy was achieved for VLBHO, and its comparison with the benchmark VLPSO are presented in Table 3. The results show that VLBHO reached an accuracy of 67%, compared with an accuracy of only 50% for VLPSO for the forest cover dataset. In addition, it was observed that the accuracy reached for the wine dataset was 80% for VLBHO, compared with only 38% for the VLPSO dataset.

IV. CONCLUSION
This article has presented a variable length meta-heuristic algorithm for feature selection. The algorithm uses the metaphor of original black hole optimisation as the baseline for development. It was modified with various aspects: namely, it enables the decomposing of the solution space into a subset of dimensions and can search within each dimension separately by selecting an exemplar for each dimension, which represents the black hole of the corresponding dimension. In addition, it enables length changing of the solutions based on the stagnation criterion. Furthermore, it proposes a new concept of energy to the black hole to replace it when it is no longer effective. The proposed algorithm, designated as variable length black hole optimisation VLBHO, is compared with the benchmark of variable length particle swarm optimisation. The evaluation showed its superiority in terms of testing accuracy and other classification metrics. The accuracy of VLBHO reached 67% for the forest cover dataset and 80% for the wine dataset, compared with VLPSO, the accuracy of which was only 50% and 38%, respectively. Future work should apply VLBHO in optimisation problems for object identification and the recognition of biometric authentication. Here, feature selection and optimisation are carried out for high dimensional data, which requires variable length searching.