A Comparative Performance Study of Hybrid Firefly Algorithms for Automatic Data Clustering

In cluster analysis, the goal has always been to extemporize the best possible means of automatically determining the number of clusters. However, because of lack of prior domain knowledge and uncertainty associated with data objects characteristics, it is challenging to choose an appropriate number of clusters, especially when dealing with data objects of high dimensions, varying data sizes, and density. In the last few decades, different researchers have proposed and developed several nature-inspired metaheuristic algorithms to solve data clustering problems. Many studies have shown that the firefly algorithm is a very robust, efficient and effective nature-inspired swarm intelligence global search technique, which has been successfully applied to solve diverse NP-hard optimization problems. However, the diversification search process employed by the firefly algorithm can lead to reduced speed and convergence rate for large-scale optimization problems. Thus this study investigates the application of four hybrid firefly algorithms to the task of automatic clustering of high density and large-scaled unlabelled datasets. In contrast to most of the existing classical heuristic-based data clustering analyses techniques, the proposed hybrid algorithms do not require any prior knowledge of the data objects to be classified. Instead, the hybrid methods automatically determine the optimal number of clusters empirically and during the program execution. Two well-known clustering validity indices, namely the Compact-Separated and Davis-Bouldin indices, are employed to evaluate the superiority of the implemented firefly hybrid algorithms. Furthermore, twelve standard ground truth clustering datasets from the UCI Machine Learning Repository are used to evaluate the robustness and effectiveness of the algorithms against those of the classical swarm optimization algorithms and other related clustering results from the literature. The experimental results show that the new clustering methods depict high superiority in comparison with existing standalone and other hybrid metaheuristic techniques in terms of clustering validity measures.

to be an unsupervised classification of data, of which the results of the analysis greatly depend on the superiority and effectiveness of the clustering algorithms or methods employed.
In the past few decades, several heuristic-based algorithms have been proposed to solve clustering problems. Each of these algorithms is designed and implemented based on the two main classifications of clustering methods, namely, hierarchical and partitioning clustering algorithms [10], [11]. Hierarchical clustering algorithms generate a tree-like hierarchical structure which represents a nested grouping of data points. The most popular of these algorithms are the single-link and complete-link algorithms [12]. In the other hand, partitioning clustering algorithms distribute data points into non-overlapping clusters such that each data points belongs to only one cluster. In other words, the partitioning clustering algorithms produce single data partitions instead of constructing a tree-like structure, as it is the case for hierarchical clustering algorithms [13]. One major challenge with these algorithms is how to select an appropriate number of output clusters. The k-means algorithm seems to be the most popular among these algorithms. However, the success of the algorithms mentioned above in solving clustering analyses problems highly rely on having predetermined information about the data objects and the initial solution, which in most case can easily lead the algorithms into getting trapped around local optima [8]. These are serious drawbacks that have led data mining researchers to improvise and come up with other effective means of overcoming these defects among which includes the use of several evolutionary and swarm intelligence algorithms to deal with more complex and high dimensional data clustering problems.
Some of the evolutionary algorithms that have been employed to handle data clustering problem are genetic algorithm (GA) and differential evolution (DE), while several swarm intelligence techniques such as particle swarm optimization (PSO), ant colony optimization (ACO), firefly algorithm (FA), invasive weed optimization (IWO), artificial bee colony optimization (ABC), and teaching learning-based optimization (TLBO) have as well been effectively applied to solve clustering problems [14], [15]. For examples, Zabihi and Nasiri [16] proposed the use of a history-driven artificial bee colony algorithm to solve data clustering problem, for which a memory mechanism that is based on a binary space partitioning was incorporated into the ABC algorithm to improve its clustering performance. Merwe and Engelbrecht were the first to propose the use of PSO to solve clustering problems [17]. Similarly, Zhao et al. [18], worked on improving the performance of the k-mean algorithm by hybridizing it with PSO to avoid the algorithm's performance from directly being affected by the original cluster centers. Liu et al. [15] develop a genetic algorithm-based automatic clustering method that was able to find good quality clustering solutions for an unknown cluster. Niknam et al. [19] proposed an efficient hybrid evolutionary algorithm that combined ACO and simulated annealing (SA) algorithms to solve clustering analysis problem. The simulation results of the ACO-SA showed that the hybrid algorithm outperformed the basic SA, ACO and k-means respectively for partitional clustering problem. Satapathy and Naik [20] developed a TLBO algorithm that was used to find the centroids of a user-specified number of clusters. In another related study, Sahoo and Kumar [21] proposed two different modifications for the TLBO method to enhance its performance in clustering domain, in which instead of random initialization a predefined method previously used to exploit initial cluster centers was exploited. Zhao and Zhou [22] proposed an improved kernel possibilistic fuzzy c-means algorithm based on IWO algorithm for clustering analysis problem, while Liu et al. [23] employed multi-objective IWO algorithm to solve clustering problem. In the study carried out by Wang et al. [24], a flower pollination algorithm (FPA) with bee pollinators was proposed to solve cluster problem, while Agarwal and Mehta [25] studied application an enhanced flower pollination algorithm to solve data clustering problem. In recent times different authors have also considered. Senthilnath et al. [26] conducted a performance evaluation study on the use of standard FA to solve clustering problem and its results compared with that of the PSO, ABA, and other classical based clustering algorithms from the literature. Furthermore, a similar study on the performance analysis of the firefly algorithm for data clustering was also considered in [27] by Banati and Bajaj. At the same time, in 2012, Abshouri and Bakhtiary [28] proposed a new hybrid FA that combines FA and K-Harmonic Means algorithm to solve data clustering problem.
However, most of the clustering problem where the algorithms mentioned above have been tested and proved to yield superior quality solutions required the algorithms to be supplied with specific prior knowledge of the data objects characteristics and features. For example, specifying the number of clusters and other related dataset attributes. Unfortunately, in many real-life datasets, the number of clusters is not always known a priori, especially for large data objects. More so, determining automatically the exact number of clusters that would provide the appropriate clustering analysis under this condition can be extremely challenging [15]. Therefore, the specific objective of this paper is to develop an improved FA based clustering method that would automatically provide the proper clustering partition without any prior knowledge of the characteristics of the dataset. Also, the study proposes the implementation of four hybrid FA algorithms to solve a wide range of clustering analyses problems automatically. The newly developed hybrid algorithms include firefly algorithm particle swarm optimization (FAPSO), firefly algorithm artificial bee colony (FAABC), firefly algorithm invasive weed optimization (FAIWO), and firefly algorithm teaching-learning-based optimization (FATLBO). For the improved FA algorithm, a mutation selection operator is incorporated into the standard FA algorithm to maintain the balance between selection pressure and population diversity of the algorithm. Two cluster analysis validity indices, namely Davies-Bouldin (DB) [35] and Compact-Separated (CS) [36] are employed as a measure of determining the validity of clustering solutions. Experimental results on real-life datasets are illustrated to validate the superior performances of the proposed improved and hybrid FA algorithms over other existing clustering methods.
The outline of this paper is as follows. Section II presents a more detailed and comprehensive literature review on stateof-the-art clustering algorithms. Section III elaborates on the methodology of FA algorithmic design concept and the details of the proposed FA-based hybrid algorithms design for solving data clustering problems, afterword's some of the preliminary mathematical concepts relating to clustering analysis is discussed. Section IV describe a series of numerical and comparison experiments. Finally, concluding remarks and future research directions are provided in Section V.

II. RELATED WORK
Firefly algorithm due to its robustness, efficiency, ability to handle problems in different fields, including NP-hard, versatility, and other great benefits, has been successfully applied to solve problems in various domains. A comprehensive review of FA that discusses the diverse areas where the algorithm has been successfully used to a broad spectrum of real-world applications with satisfactory results was done by Fister et al. in 2014 [41]. In both works of literature, the authors went further to suggest future directions to the algorithm. FA although has been studied and traced to have good track records in diverse domains; however, its implementation in data clustering and automatic data clustering scopes is still very much shallow. Very few works have been done in the application of the firefly algorithm to data clustering, and quite a more difficult challenge in finding previous studies in its application to automatic data clustering.
A performance study on the firefly algorithm (FA) for data clustering was carried out by Senthilnath et al. in [26]. They acknowledged the strengths of FA and applied classification error percentage (CEP) to generate optimal cluster centroids. The standard FA was implemented for data clustering by focusing more on the attractiveness, light absorption, population size, and distance, and CEP was applied to check the method that generates the optimal number of clusters. Further, FA was compared with ABC, PSO, and nine other clustering methods. Results showed that the classification efficiency of FA compared to others is more superior in terms of reliability, efficiency, excellent global performance, and robustness.
Hassanzadeh and Meybodi presented a hybrid approach based on FA and k-means for data clustering [42]. The proposed model called K-FA was implemented such that, FA was used to find cluster centroids to a user-specified number of clusters, then the FA was extended using the k-means algorithm. The extension with the k-means algorithm was done in order to aid the refining of the cluster centroids detected by FA. Also, global optima were used to improve the standard FA. Experimental results showed that K-FA outperformed three other clustering algorithms in terms of better efficiency, and a decrease in intra-cluster distances which allowed the k-means method to have a proper initialization.
Banati and Bajaj conducted a viability performance analysis of FA for data clustering in [27]. The proposed method, called FClust, which is centroid-based, adopted the flashing behaviour of fireflies with the objective function of the clustering problem to obtain the optimal solution. The performance of FClust was evaluated using two statistical criteria, namely, trace within criteria (TWR) and variance ratio criteria (VRC) [43]. For simulation results comparison of FClust with standard PSO and DE showed that the FClust achieved the best mean fitness and standard deviation values on the VRC measure. Further, the quality of solutions obtained by FClust was also evaluated using the number of function evaluations via the run length distribution (RLD) approach [44]. RLD for FClust showed that it achieved the best function evaluation value and a faster convergence rate.     In 2015, Kaushik and Arora integrated FA with an improved genetic algorithm [45], called FGA. The proposed model selects its initial population from a pool of population which is based on firefly algorithms, i.e. the initial population is generated from the global best solutions of the firefly algorithm. FAG operates in two ways. First, the classical FA is VOLUME 8, 2020 applied to sets of a randomly selected initial population which generates chromosomes of a set, and secondly, the chromosomes are then positioned in the mating pool from where they partake in the mutation and crossover operations of the genetic algorithm. Also, at the initialization stage of FGA, it results in global optimization, which prevents the solutions from getting trapped within the local optima. The test results, when compared to the basic genetic algorithm and firefly algorithm, showed that FGA had better inter-cluster and intracluster distances, and better satisfactory results.
Nayak et al. [47] implemented an improved FA with a fuzzy c-means algorithm called FAFCM and improved FAFCM for real-world clustering datasets. The improved FA addressed the shortfalls of the fuzzy c-means method, of local optima entrapment and high sensitivity to initialization. FAFCM was incorporated in two stages, firstly, a standard firefly algorithm with fuzzy c-means clustering, and secondly, an improved firefly algorithm with fuzzy c-means clustering. The first handled the limitations of the fuzzy c-means algorithm by minimizing the objective function. In contrast, the second phase refined the cluster centers that were identified from the first phase, and it also helped in further minimization of the objective function. FAFCM was compared with three other clustering algorithms, and the results showed that FAFCM had consistent results over the test datasets, a faster convergence speed, as well as a minimized objective function. However, the number of clusters was predefined before centroid assignment by FAFCM.
An efficient hybrid method based on a modified FA and a dynamic k-means algorithm for data clustering were developed by Sundararajan and Karthikeyan in [48]. The proposed algorithm is called a hybrid modified firefly and dynamic k-means algorithm. The dynamic k-means algorithm was incorporated so that it can adequately find the optimal number of clusters during execution time, as well as to improve the cluster quality and optimality. The model works in such a way that; it determines new centroids by adding one to the cluster counter in each iteration until the required cluster quality is attained since the model works well for a predefined number of clusters. Experimental results showed that the proposed model found better clusters quality in less time with increased optimality, against the compared algorithm.
Ezugwu [40] presented an extensive survey study of major nature-inspired metaheuristic algorithms that have been applied to solve automatic data clustering problems. Furthermore, the author carried out a comparative study of several modified well-known global metaheuristic algorithms to solve automatic clustering problems, of which three hybrid swarm intelligence and evolutionary algorithms, namely, particle swarm differential evolution algorithm, firefly differential evolution algorithm and invasive weed optimization differential evolution algorithm, were employed to deal with the task of automatic clustering. The experimental results revealed that the firefly algorithm was more appropriate for better clustering of both low and high dimensional data objects than were other state-of-the-art algorithms.
All the different literature and comparative analyses results do point to the fact that the FA is a very efficient and robust metaheuristic algorithm for solving real-world problems. More so, the findings from Ezugwu [40] and Agbaje et al. [49] on the promising performance of the FA for automatic clustering compelled us to go into this research to investigate further the superior performances of both the improved nutation based firefly algorithm and its hybrid variants for automatic data clustering.
After extensive analysis that was carried out, we have compiled the following possible clustering methods, application areas, and clustering validity index types for the respective identified automatic metaheuristic techniques, which is presented in Table 1 above.

III. THE FIREFLY ALGORITHM
Firefly Algorithm is a nature-inspired optimization algorithm that was developed by Xin-She Yang in the late 2007 and early 2008 [29], [30]. The FA algorithmic design concept was inspired by the dynamic illumination of the light attribute from the fireflies, which are commonly found in most tropical and temperate regions. There are approximately 2000 species of fireflies, of which many of them produce short, rhythmic flashes of illuminations at regular intervals. The flashlights produced by these insects often act as communication signals that are used to entice other fireflies and also to send warnings to potential prey [31]. As a novel swarm intelligence population-based metaheuristic algorithm, FA has been used for solving different nonlinear engineering design optimization problems, as reported in [32]. Furthermore, studies have also shown that FA is very promising in terms of solving the most difficult NP-hard numerical optimization problems in both continuous and discrete spaces [33]. The mathematical modelling and representation of the standard FA algorithm are represented in equations (1) to (5). In equation (1), the light intensity I of a firefly flashlight is said to be inversely proportional to the square of its distance denoted by r. This implies that the light intensity of the individual firefly diminishes with an increase in distance. However, this is because as the distance increases, the flashlight is released into the atmosphere [33].
Aligning the problem landscape to the FA algorithm design, the optimization model can be formulated in such a manner that the firefly flashlight is proportional to the fitness function value to be optimized. The following design principles were used to formulated basic FA [31]: it was assumed that all firefly species are identical in sex, the attractiveness of every firefly is directly proportional to the quality of its light intensity produced, the intensity of flashlight produced by any firefly is determined by the fitness function landscape that is to be optimized. In the FA algorithm design, light intensity and attractiveness are considered to play a vital role in the algorithm implementation and performance. Usually, in the case of maximization problems, the light intensity, produced at a specified point (y) is directly proportional to the fitness value of the fitness function, that is I (y) ∝ F(y). As shown in eq. (2), the light intensity changes with respect to distance and intensity of light emitted into the atmosphere.
where I 0 denotes initial light intensity at r = 0, γ is the light absorption coefficient, while r is the distance. From eq. (2), by combining the effect of the inverse square law and absorption, the singularity at r = 0 is circumvented in the expression 1 r 2 [30], [33]. Based on eq. (3), the attractiveness of a firefly (β) is proportional to the light intensity of the firefly.
where β 0 refers to the attractiveness at r = 0. The distance measure between any two fireflies x i and x j is determined in terms of Euclidean distance: where d is the problem dimension. The movement of firefly from one point i to another point j is formulated as shown in eq. (5): where α ∈ [0, 1] , γ ∈ [0, ∞) . The parameter i is a random number obtained from a Gaussian distribution. i can be replaced with rand − 0.5, where rand ∈ [0, 1]. The third term (α i ) in eq. (5) shows firefly movement from one point to another, with regards to their attractiveness. In this paper, to improve the exploration and exploitation capability of the FA, so that the algorithm can handle clustering tasks of high dimensionality more efficiently, the concept of mutation strategy is introduced into the FA searching process. Ideally, modified FA mutation strategy explores and exploits the search space by leveraging more desirable features from attractive fireflies and adding such functionality to enhance the attractiveness of the less bright fireflies. The extent of the enhancement feature modification that is required for any identified firefly with weak light brightness is determined by calculating the mutation probability (MP) of that firefly. Therefore, it is expected that those fireflies with excellent brightness will have lower MP, while those fireflies with low light intensity will have higher MP. In general, the concept of using MP is that there is a high probability of improving low-quality solutions and a low likelihood of reducing good quality solutions. The mutation operator probability used to introduce additional diversity among the firefly swarm is commutated as follows.
where f (x new ) is the new firefly fitness and f (x old ) is the fitness of the first firefly. The main steps of the mutated FA are summarised as illustrated in Algorithm listing 1.

A. FIREFLY-BASED HYBRIDS AND CLUSTERING PROBLEM DESCRIPTION
The proposed hybridization methods described in this paper focuses on exploiting the various advantage of both the FA and other representative algorithms, namely, PSO, ABC, IWO, and TLBO algorithms. It is equally interesting to note that all the algorithms mentioned above work well for a wide range of global optimization problems. In this study, we propose a set of new hybrid firefly-based algorithms by combining some of the advantages of all the above mentioned individual algorithms. The proposed hybrid algorithms combine the attraction mechanism of FA with the effective fraternization capabilities of PSO, ABC, IWO, and TLBO to end Calculate attractiveness variance with distance r using exp(−γ r); Calculate new fitness values for all fireflies; Accept new solution with best fitness; end Update firefly light intensity L i ; Update iteration counter t = t + 1; Reduce α by a factor; end maintain a good balance between exploration and exploitation of the problem search space. Also, the combination is done so as also to increase the solution accuracy, speed of convergence and the diversity of the population. We implemented four hybrid algorithms, namely, FAPSO, FAABC, FAIWO, and FATLBO, to solve data clustering problems. It is noteworthy to mention that the improved FA and other four metaheuristics are executed in parallel to specifically promote information sharing among the swarm population and thus enhance searching efficiency [37].
The implementation strategy employed by the four new hybrid algorithms begins its search process by using FA as the global optimization search algorithm, because of its strong exploration ability and then subsequently introducing the other four single algorithms separately and then using them as a local search optimization algorithm to enhance the intensification capability of the new hybrid methods. The local search mechanism is suggestively important in the design of the new hybrid algorithm, especially when the search process descends the paths of the local optimal solutions, it will prevent the algorithms from entrapment into local minima. Therefore, the advantage, as mentioned above, is leveraged to improve both the exploitation and exploration ability of the proposed FA-based hybrid algorithms. Furthermore, one of the main enhancement quality of such hybridization and regrouping mechanism of the new algorithms is to ensure that the search for candidate solutions is concentrated only on the promising region of the solution search spaces. This mechanism is significant, as it aids the proposed method not to search for a candidate solution within less promising regions of the search space. A similar technique was implemented in [37], where FA was combined with the differential evolution algorithm.
The effectiveness and efficiency of the proposed FA-based hybrid methods are evaluated using the CS and DB validity indices discussed in section III of this paper. These two validity indices also help to determine the appropriate optimal number of clusters and find the best partitioning for the detected clusters. For the first phase of the hybrid algorithm implementation, the FA-based hybrid algorithms start their search optimization processes with the generation of initialization population of fireflies. After that, the fitness function of each candidate solution found by the FA is computed and determined using the two clustering validity measures. Iteratively, these new solutions with the best fitness values are updated using the operators of FA. In the second optimization phase, the same process is iteratively repeated using now the operators of PSO, ABC, IWO, and TLBO algorithms, respectively to re-optimize the solutions obtained in the first phase. Note that the two phases of optimization techniques form the first cycle of the evaluation phase for FAPSO, FAABC, FAIWO, and FATLBO implementation. It is interesting to mention here that the four FA-based hybrids use the best solution generated by the FA search results in the first phase as its initial search population. As for the evaluation process, the previous local best and global best within the new population are compared, and the candidate solution with the best fitness values is updated accordingly. As stated earlier, the CS and DB indices are used by the four methods to compute the final fitness function of each solution, which the FA-based hybrids use to determine the best candidate solution and make the necessary updates. Finally, the best solution is determined based on which solution has the smallest CS-index value or DB-index value. The entire process of the FA-based hybrid algorithms is repeated until the termination criteria are reached. The Algorithm listing 2 shows the steps mentioned above for the FA-hybrids algorithms. Figure 1 illustrates the compartmentalized flowchart of the proposed method, while Figure 2 illustrates the implementation flowchart of the generalized hybrid methods. In general, the figure also represents the clustering processes of the four hybrid algorithms implementations. However, it is noteworthy to mention that part of the main contribution of the current paper is the proposal of a critical performance study and evaluation of several hybrid firefly algorithms for the task of automatic clustering. No record of a similar research focus in the literature exist as of the time of writing this paper.
As aforementioned earlier, the hybrid algorithm implementation methods comprise of two stages. The first stage engages the modified FA algorithm by randomly generating initial swarm, where the number of fireflies equal to the number of clusters and the swarm population is uniformly distributed across the dimension of the dataset, which in this case is the clustering problem search space. After the swarm initialization, the next task is the evaluation of the best swarm according to the fitness function determined by the DB and  [40]) on the current newpopulation (i) Apply ABC updating formula (see [50]) on the current newpopulation (i) Apply IWO updating formula (see [40]) on the current newpopulation (i) Apply TLBO updating formula (see [51]) on the current newpopulation (i) Update the global best solution in the whole population Evaluate the fitness value of each individual candidate solution Update the new value as the global best End For End For End While End CS validity indices [40]. Note that the best swarm position, for example, represents the data point that achieves the minimum distance to the swarm from its previous searches. The PSO, ABC, IWO, and TLBO operate on the new set of the solution generated by the FA updating equation given in (5). The parameters of the respective logarithms are used to determine next movement patterns of their optimization strategies as also explained earlier. Iteratively the various position of the new populations is updated until the case of a satisfactory termination condition is met, and the algorithm simulation process is terminated.

B. CLUSTERING PROBLEM DESCRIPTION
In this performance study, we propose a series of hybrid firefly algorithm to solve automatic data clustering problems. As described in [34] to handle automatic data clustering problems, we adopt the same approach for the implementation of the variants of the hybrid firefly algorithms. Given that a set of dataset F is defined as F = {f 1 , f 2 , . . . , f n } which is divided into non-overlapping groups of cluster G = {g 1 , g 2 , . . . , g n }, such that the dimension w i (i = 1, 2, . . . , n) is p. For each of the cluster G = {g 1 , g 2 , . . . , g n }, there is a centroid d i = (i = 1, 2, . . . , C) represented for each of the clusters, that is, For a p-dimensional data vector, the following conditions must take place: At the initialization phase of each of the hybrid algorithms, the population (swarm) size K is defined as W = (w 1 , w 2 , . . . , w K ) . As described above, let each member a i in the population be a Q × p-dimensional vector, F n×p , which is defined as W i = w * 1 , w * 2 , . . . , w * q (w 11 , w 12 , . . . , w 1p ), (w 21 , w 22 , . . . , w 2p ), . . . , (w Q1 , w Q2 , . . . , w Qp ). The main goal of the optimization method over the four proposed hybrids of the firefly algorithm in this study is minimization, where we employed the two common and most used cluster validity indices namely, CS and DB indices, to minimize the sum of the distances between the datasets f i (i = 1, 2, . . . , n) and centers d i (i = 1, 2, . . . , C). The upper and lower boundaries of the number of groups in the population are respectively defined as, Var min represented as k * j = min{F 1 , F 2 , . . . , F p } and Var max denoted as m * j = max{F 1 , F 2 , . . . , F p }. In general, the lower boundary is k = (k * 1 , k * 2 , . . . , k * C ) and the upper boundary is m = (m * 1 , m * 2 , . . . , m * C ), for the solution space. To solve the automatic clustering problem, the ith particle W i is evaluated as follows: where rand (1, Q × p) is a vector of a uniformly distributed random number which returns an integer between 0 and 1.

C. CLUSTERING VALIDITY INDEX
In this section, we discuss the two validity indices that are used across the study to measure and analyze the effectiveness of the four proposed hybrids of the firefly algorithm, as well as the quality of the clustering solution obtained. Generally, a good cluster validity index offers two significant purposes; firstly, it helps to determine the number of clusters and, secondly, it determines the best (optimal) partition [35]. Likewise, a good cluster validity index is expected to handle two key areas of portioning namely cohesion and separation. Cohesion: in this case simply means that the objects or data points in a cluster should be compact and identical (similar) and as possible. A deviation in the fitness variance of the objects in a cluster indicates good compactness of such a cluster. On the other hand, separation in contrast to cluster compactness should be different and distinct to each other. This step can be, however, seen in the distance among cluster centers, which indicates the cluster separation. Davis and Bouldin [36] further stated that a clustering validity index should as well exhibit the following properties: 1. Ability to involve minimal or no human interference or parameter specification during its operation. 2. Ability to be scalable computational-wise for large datasets. 3. Ability to produce accurate results for datasets with arbitrary dimensions. For a crisp or hard clustering, some of the most used and well-known validity indices are CS index [35] and DB index [36], which were also used in this study as aforementioned. For most of the validity indices, they are considered as either minimization or maximization optimization technique by default. Similarly, their implementation outputs demonstrate a good clustering partition. As a result of their optimizing strategy, the clustering validity indices are best adopted with optimization algorithms such as the PSO, DE, GA, etc. In this study, we define the cluster validity index as a function J , such that a given clustering B, and a similarity measure V it is defined as J (B, V ). The function J (B, V ) returns a real number which indicates the cluster validity index or the fitness of the clustering task B. The two validity indices adopted for our study are further discussed in the next section.

1) COMPACT-SEPARATED INDEX
This cluster validity measure estimates the ratio of the sum of within-cluster scatter to between-cluster separation, which is similar to how the DB index operates. It has been studied that the CS index offers more efficiency in handling clusters having different dimensions, densities or sizes. Although, it is computationally more intensive than the DB index in terms of execution time, however, it does produce more good quality solutions. Furthermore, a large value of a CS index indicates weak compactness or separation, while a lesser value means a good and better clustering. Let the within-cluster scatter be denoted as Y i and the between-cluster separation be represented as Y j , such that the distance measure V is given as V Y i , Y j . Hence, the CS index for a clustering B is computed as given in equation 11.
where K is the number of clusters in B.

2) DAVIS-BOULDIN INDEX
The DB index estimates the quality of clustering by evaluating the intra-cluster (average distances of all data points within a cluster from the centroid) to inter-cluster (the distance between two centroids) distances. Likewise, for DB index, the smaller the index value, the better the compactness or separation, and otherwise for a large value. Let W i be defined as the average distance of all the data points within a cluster B i to their centroids x i . The average distance is calculated as: where V (R, x i ) is the distance between a data point R in B i and its centroid x i , and t ≥ 1 is an integer that can be selected independently.    centroids. Taking H ij to represent the inter-cluster distance between two centroids x i and x j , we have that, Let V i be defined as Thus, the DB index is expressed as: where K is the number of clusters. In summary, it is important to note that the data clustering problem described in this paper is modelled as an optimization problem. For example, given an instance of data points with x attributes and a predetermined number of clusters g, the objective function aims to determine an optimal cluster setting such that the sum of squared Euclidean distances between each data object and the center of the belonging cluster is minimized. Therefore, by so doing, each data point should belong to a unique cluster, and no cluster must be left empty.

IV. SIMULATION EXPERIMENTS
Experiments were carried out using a 3.60 GHz Intel(R) Core(TM) i7-7700 processor and 16 GB memory on Windows 10 operating system. The entire algorithm was programmed in MATLAB R2018b and statistical analysis conducted using IBM SPSS Statistics Version 25.

A. PARAMETER SETTING
In this section, we present the settings of the control parameters for the respective four firefly-based algorithms that are studied in this paper. The control parameter settings are described in Table 2(a). For each of the proposed algorithm, we initialize an equal number of populations and number of iterations, as well as the same number of replications, which in this case is 40 runs for all our experiments. The FA being the control algorithm has the following parameter settings: The population size is set as 25, a maximum number of iteration MaxIt is set as 200, light absorption coefficient γ is set as 1, attraction coefficient β is set as 2, mutation coefficient m is set as 2, and finally the mutation coefficient damping ratio α is set as 1. The parameter configurations of ABC, IWO, PSO, and TLBO are further detailed in Table 2a. VOLUME 8, 2020   Parameter Key Terms: The parameter a is the acceleration coefficient upper bound, S min and S max are the minimum and maximum number of seeds, E is the variance reduction exponent, sigma_initial and sigma_final are the values of initial and final standard deviations, c 1 and c 1 are the personal and global learning coefficients, wdamp is the inertia weight damping ratio, while w is the inertia weight defined as w = w max − (w max −w min ) * t

MaxIt
, where t denotes the number of iterations. Note that the value of w is dynamically adjusted relative to iteration t to avoid the hybrid FAPSO from plunging into premature convergence.

B. DATASETS DESCRIPTION
The twelve datasets used are well-known and well-used benchmark datasets from the UCI Machine Learning Repository. A brief description of some of the datasets are presented as follows:  , and Magnesium (Mg)), were used as a standard for identifying a glass, which belongs to one of six types of glasses. It consists of 214 data points with ten attributes.
• Iris dataset: this dataset consists of three different variants of the iris flower, namely, Iris Setosa, Iris Versicolor and Iris Virginica. The three different species are comprised of 150 instances with four attributes.
• Statlog (Heart) dataset: this dataset is based on the diagnosis of heart disease from four different databases, which was generated based on 13 different attributes. It consists of 250 instances and 13 attributes.
• Wine dataset: the wine dataset was obtained by using chemical analysis to determine the origin of wines grown in the same region, but from three different cultivators in Italy. The analysis was able to determine the quantities made up in the 13 constituents that were found in each type of the three varieties of wines. It contains 178 patterns with 13 attributes.
• Yeast dataset: the yeast dataset was used to predict the localization sites of protein in cells. It contains 1484 patterns and 8 attributes.
The details of the remaining datasets namely, Jain dataset, Pathbased dataset, Spiral dataset, and Thyroid can be obtained in [38] for Chang and Yeung [39] for both Pathbased and Spiral, and [40] for the Thyroid dataset. The twelve datasets configurations are summarized in Table 2b.

C. RESULTS AND DISCUSSION
In this section, we present and discuss the average numerical results obtained by the standard FA and other four FA-based hybrid firefly algorithms. The algorithms were compared based on their computed average CS and DB indices values. In Table 3, the bolded values indicate the algorithm that obtained the best solution as compared to other competing algorithms. All the results presented in this study are in reported in four decimal places, and we focused mainly on the quality of solution produced by each of the algorithms, as well as execution time taken for each algorithm to search for the VOLUME 8, 2020 near-optimal solutions. For the CS measure, it is shown that FA performed well on the Breast dataset and Flame dataset. Likewise, FAABC did well on Flame dataset. Furthermore, the FAPSO recorded most of the best performance on nine of the twelve datasets, namely, Compound, Iris, Jain, Pathbased, Spiral, Statlog, Thyroid, Wine and, Yeast datasets. On the contrary for the DB index, the best performance is seen with the FATLBO algorithm, in which it obtained best results in five datasets, namely, Flame, Iris, Pathbased, Spiral and, Yeast datasets. This is closely followed by the FAPSO which achieved the best performance in four datasets, Glass, Jain, Thyroid, and Wine datasets. Although, the standard FA did reasonably well on three datasets which are Breast, Compound, and Statlog datasets. Both FAABC and FAIWO had no outperforming solutions on the DB index. It was observed that the FA outperformed all the other algorithms on the Breast dataset in both instances of the cluster validity measure. In contrast, FAIWO did not exceed any of the different approaches for either CS or DB validity measures.
In general, the comparisons between the standard FA and its hybrid variants, show that the optimal fitness solutions achieved by the FAPSO on the CS index are lesser in values, which signifies better performance. More so, the performance of the FAPSO algorithm was able to attain excellent performance across more datasets than any other algorithms, thus making it the most superior algorithm. However, for the DB index, FATLBO emerged the best performed algorithm with the best minimum average clustering results, and the FAPSO closely follows it, then the standard FA. Therefore, since the FAPSO algorithm showed excellent performance in both instances of the validity measure, we can deduce that the FAPSO is an efficient and effective automatic clustering algorithm.
Next, we present and discuss the results of the four proposed hybrid firefly algorithms using the following descriptive statistics, namely, the best solution, worst solution, average solution and standard deviation. The highlighted values in bold indicate where an algorithm outperformed the rest of the compared algorithms or have the same results with them. As seen in the CS index column, FAABC, FAIWO and FATLBO achieved the same results on the Breast dataset, as well as with FAABC and FATLBO on Compound dataset. Likewise, FAPSO had the best performance on Flame, Iris, Jain, Thyroid, Wine and Yeast datasets. FAPSO and FATLBO achieved the best identical values for Pathbased dataset, while FAABC obtained the best solution for Spiral dataset. A level of consistency and stability is shown in the results obtained in Glass and Statlog datasets, across all the four hybrid methods. Hence, FAPSO clearly shows performance superiority over the other hybrid algorithms on the CS index.
However, for the DB measure, FAABC, FAIWO and FATLBO obtained the best but identical results for the Breast dataset. These results are also similar to those of the CS index, which is to say that FAABC, FAIWO and FATLBO performance are the same for the Breast dataset in both instances of the cluster validity measures. The FAPSO achieved the best scores for Compound, Glass, Statlog and Thyroid datasets. Similar to the results obtained by FAABC, FAIWO and FATLBO for the Breast dataset, the three algorithms also had identical results for Flame and Yeast datasets. The results achieved by FAABC and FATLBO are identical for Iris and Wine datasets. FAABC outperformed the other algorithms on Spiral dataset. The values obtained by FATLBO on Jain and Pathbased datasets are superior to those of the other algorithms. Although there are a few instances where two or more algorithms have similar results in some datasets, however, this does not rule out the apparent evidence that the FAPSO outperformed the other algorithms on four datasets.
For example, in Compound, Glass, Statlog and Thyroid datasets, as aforementioned. Although the values obtained by FAABC, FAIWO and FATLBO for Statlog dataset are identical as those of the CS index, FAPSO, however, obtained the overall best clustering solution. Based on these evaluations, we can, therefore, say that on the average, for all the four algorithms and across the twelve datasets, the CS index is an efficient validity measure for clustering solutions than the DB index. Figures 3 and 4 show the average computational time consumed by each of the algorithms using the two validity indices to complete their search for optimal solutions. For both time graphs, FAACB is represented in yellow bars, FAIWO by purple bars, FAPSO by red bars, and FATLBO by blue bars. The average time consumed is plotted against corresponding algorithms and datasets. For the CS index in Figure 3, it is observed that FAPSO has the highest (worst) execution run time across the twelve datasets. The FAIWO follows this, and then FATLBO. FAABC has the best (least) run time across all the twelve test datasets. As earlier discussed, although FAPSO achieved the best solutions on CS amidst all the methods, it, however, consumed considerable time in all its search process on each dataset. Similarly, for the DB index in Figure 4, FAABC has the best execution time across all the datasets, followed by FATLBO, and then FAPSO. FAIWO has the worst (highest) run time among the four algorithms.

D. STATISTICAL ANALYSIS TEST
For further comparison, we performed a non-parametric statistical test called the Friedman rank-sum test, which can be used to identify the presence of any significant differences between the behaviour of two or more algorithms. As presented in Table 5, we observe that for the CS index, FAPSO particularly has the best rank on seven of the twelve datasets, namely, Iris, Jain, Pathbased, Spiral Thyroid, Wine and Yeast datasets. Similarly, there is an identical rank for all the four algorithms across Glass and Statlog datasets, as was seen in the numerical results presented above in Table 4. The FATLBO is ranked next to FAPSO in three datasets which include, Breast Compound and Flame datasets. However, strengthens the fact that FAPSO is a better efficient hybrid firefly algorithm for solving automatic data clustering problem. Yet, for DB index, FAPSO and FATLBO have a tie in their ranks on the equal number of datasets, namely Compound, Jain, Statlog, Thyroid, and Yeast datasets for FAPSO and Breast, Flame, Iris, Pathbased and Spiral datasets for the FATLBO. Finaly, FAABC, FAIWO and FATLBO have an identical mean rank-sum in Glass dataset.
To further justify the mean ranks obtained by the Friedman test statistic in Table 5, we performed additional Wilcoxon post-hoc test to ascertain where significant statistical difference exists among the compared algorithms. Therefore, the Wilcoxon's statistics test is used in this case to aid us to draw a meaningful statistical conclusion. Tables 6 and  7 reports the p-values produced by this posthoc analysis for the pairwise comparison of FAPSO vs FAABC, FAPSO vs FAIWO, FAPSO vs FATLBO, FAABC vs FAIWO, FAABC vs FATLBO and, FAIWO vs FATLBO, for both the CS and DB validity indices respectively. Almost all the values are below our adjusted p-value of (p ≤ 0.0083). We obtained a great number of statistically significant values on pairwise of FAPSO with other algorithms than the other pairwise in both cases of CS and DB indices. Hence, this further proves the superiority of FAPSO over other methods with a clear indication that the algorithm is a robust and efficient hybrid firefly algorithm for carrying out the task of automatic data clustering.

E. CLUSTERING PROCESS
The clustering results of some selected datasets for all the algorithm on CS and DB index across all the four hybrid algorithms are presented in Figures 5-12. In Figure 5 (FAABC based on CS index), we have three perfect clusters for the Compound dataset, while we have one cluster for Flame, Pathbased and Yeasts datasets, although with a blue string outlier on them which is not noticeable. Likewise, for Figure 6 (FAABC based on DB index), we have good clustering but with red stringed-outlier on the Glass and Jain dataset, and a red stringed-outlier in the Spiral cluster. Figures 7 and 8 show clustering results for FAIWO using CS and DB measures, respectively. The compound dataset has three exact clusters, while Statlog has one. A blue string of outlier is noticed in Flame and Iris datasets, as shown in Figure 7. Also in Figure 8, Compound dataset is well clustered into three groups, Pathbased into one group with a string of red outlier, Thyroid dataset was equally classified into three classes of blue, green and red. In contrast, the Wine dataset was classified into one class but with a blue exception class.
In Figure 9 on the Compound dataset, a small part of the magenta and yellow class are mixed with the blue class, but the dataset is divided into six classes. Also, some outliers were not properly grouped, which are present in the green class. For the Jain dataset, we had three distinct clusters with a few green outliers attached to the magenta class. Also, in Pathbased and Spiral datasets, we had five and six clearly separated classes, respectively. Likewise, for DB index as shown in Figure 10, all the selected datasets had perfect clustering which is well separated and presented on each graph, except the Yeast dataset that had a few outliers around it.
A good clustering result is presented for FATLBO, according to Figures 11 and 12. In Figure 11, FATLBO achieved one clustered distinct group on each of the selected datasets, with a few outliers of red, blue and green outliers that are not noticeable. While the Compound dataset has three definite clusters, one cluster for each of Flame, Spiral, and Yeast datasets, they, however, had green, red and blue outliers, as seen in Fig. 12.

F. ALGORITHM CONVERGENCE CURVES
The equivalent convergence comparison curves for the four hybrid algorithms are presented in Figures 13 and 14. The overall convergence evaluation for the respective algorithm on both the CS and DB measures show that the FAPSO  converges better than FAABC, FAIWO and FATLBO. Next to FAPSO is FATLBO, which obtained fair convergence than FAABC and FAIWO, while FAIWO converged the poorest in both instances of the validity measures.

G. HIGH-DIMENSIONAL DATASET AND PARAMETER FINE-TUNING
In this section, an additional experiment was carried out to determine the scaling performance behaviour of the two best performed algorithms proposed in this paper, namely, FAPSO and FATLBO on seven relatively high dimensional datasets. The performance of the two algorithms is further validated by fine-tuning their standard control parameters, which in this case is the population size. The population size of 50 and 100 were selected for the parameter tuning task. On the one hand, the parameter tuning measure assists in evaluating the impact of control parameters for the two algorithms, which might somewhat affect the performance of the individual algorithm either negatively or positively in terms of solution quality or computational cost complexity. The results of the fine-tuning experiment are shown in Tables 8 and 9, respectively. Note that the results of FAPSO and FATLBO are compared with those of three hybrid algorithms from literature namely, particle swarm optimization differential evolution (PSODE) [40], firefly algorithm differential evolution (FADE) [40], and invasive weed optimization differential evolution (IWODE) [40]. Each of these algorithms is implemented and executed under the same experimental conditions, which makes it logical to compare their clustering results and computational costs.
For the two algorithms, some noticeable performance improvements in the solution quality were observed as compared to the results of the hybrid methods from the literature [40]. However, the observed improvements were at the expense of computational time, which increased significantly as shown in the two tables 8 and 9 below. The FAPSO obtained the least average solution with 0.5411 and 0.5700, followed by FATLBO with 0.5719 and 0.6096 for both population sizes of 50 and 100 as compared to literature results. However, with an increase in the number of population size, there is no significant improvement in terms of clustering solution quality based on the results obtained by the hybrid FATLBO.
The results of the computational time complexity for the FAPSO and FATLBO algorithms implementation are presented alongside the obtained clustering solutions  in Tables 8 and 9. One of the significant drawbacks of the parameter fine-tuning is that the running time considerable grows for each algorithm. For example, the FAPSO even though it produced the best clustering solution in terms of cohesion and compactness, the computational costs increased exponentially relative to population size. Although, similar characteristics behaviour was displayed in the computation cost obtained by other hybrid algorithms from the literature. However, this is expected because the hybrid implementation process incorporates additional subroutine processing overhead, which invariably increases the execution time complexities of the combined algorithms. Thus the high computational cost recorded by both FAPSO and FATLBO.

H. ALGORITHM COMPLEXITY
In determining the complexity of any metaheuristic algorithm, there is no one size fits all solution that can be applied. Although the detailed computational complexity may depend on the structure of the algorithm design and implementation [29]. However, for the five proposed metaheuristic algorithms used in this paper, their complexities can be easily estimated. For the improved FA algorithm, the time complexity is defined as O(n 2 t) where n denotes the number of population size used, which in this case is n = 25 and t represents the number of iterations. Also note that for the sake of simplicity in the implementation process, all the five proposed algorithms, including FA and the four hybrids, namely, FAPSO, FAABC, FATLBO, and FAIWO algorithms have two inner loops when going through the entire population n. Therefore, for the four proposed hybrid algorithms, the time complexity is defined as O n 2 t 4 + n 2 t 2 , this is because each section of the four single or individual representative algorithms only uses half of the population size. Also, as the values of n and t that were used for the experiments reported in this paper are small (typically, n = 25, t = 200), the computation cost is relatively inexpensive because the algorithm complexity is linear in terms of t. Similarly, also note that the main computational cost relies on the evaluations of the defined clustering task objective function.
Further, similar to some other metaheuristic algorithms, the FA, which is used as the core representative algorithm for the proposed hybrid techniques have some limitations as follows: FA optimal performance highly depends on adequate parameter fine-tuning, diversification in FA can lead to reduced computational speed and convergence rate, FA is not very suitable for handling complex problems, because it can be trapped in many local optima in the event of searching for possible candidate solutions [29]. However, because each of the hybrid methods depends on the FA, their performance can as well be restricted, precisely due to the parameter tuning effects and over-diversification or exploration mechanism of the FA base algorithm. These limitations were experienced when the hybrid algorithms were subjected to clustering task that involves the use of high dimensional datasets.

V. CONCLUSION
In this study, four new FA-based hybrid algorithms were implemented and successfully used to solve automatic data clustering problems. Subsequently, a performance study of the respective proposed algorithms was carried out. The simulation results obtained from the multiple experiments executed revealed that the FAPSO outperformed the other hybrid algorithms, including the FAABC, FAIWO and FATLBO, respectively, in terms of solution quality and convergence speed. On the other hand, the FATLBO seemed to have equally performed relatively well and was next to the FAPSO algorithm, as it was able to yield high clustering solutions and better computational speed as well. However, the FAIWO appeared to be the least superior methods in terms of clustering quality and speed of convergence. In future research, we intend to apply the proposed FA-based hybrid algorithms to solve other complex optimization problems with similar settings and possibly on variants of the clustering problem considered in this paper. Similarly, it will be interesting to see some high-level extension of the proposed hybrid clustering algorithms that would dynamically enable the individual algorithms to determine the set of optimal parameter configuration for maximum performance improvement of the individual process.
Finally, the possibility of combining FA algorithms with some recent deep learning clustering methods such as the deep embedding clustering [83], deep clustering network [84], pairwise constraints clustering [85], deep embedding network [86], joint unsupervised learning of deep representation for images [84], deep learning with non-parametric clustering [87], convolutional neural network clustering [88] and deep clustering with convolutional autoencoder embedding [90] can be investigated to solve real-world data clustering problems, specifically, those problems with high dimensionality and complex features.