An Optimized Continuous Dragonfly Algorithm Using Hill Climbing Local Search to Tackle the Low Exploitation Problem

Optimization problems are usually solved using heuristic algorithms such as swarm intelligence algorithms owing to their ability of providing near-optimal solutions in a feasible amount of time. An example of an optimization problem is the training of artificial neural networks to obtain the most optimal connection weights. Artificial Neural Network (ANN), being the most prominent machine learning algorithm, has a multitude of applications in a myriad of areas. Recently, the use of ANNs has risen exponentially owing to its effective ability of making conclusions based on certain inputs. This ability is primarily achieved during the training phase of the ANN, which is a vital process prior to being able to use the ANN. Gradient descent-based algorithms, which are usually used for the training process, often encounter the problem of local optima, thus being unable to obtain the optimal connection weights of the ANN. Metaheuristic algorithms, including swarm intelligence algorithms, have been found to be a better alternative to train ANNs. The Dragonfly Algorithm (DA) is a swarm intelligence algorithm that has been found to be more effective than multiple swarm intelligence algorithms. However, despite having a good performance, it still suffers from low exploitation. In this paper, we propose to further improve the performance of DA by using hill climbing as a local search technique so as to enhance its low exploitation. The optimized DA algorithm is then used for training artificial neural networks which are employed for classification problems. Based on the experimental results, the optimized DA algorithm has higher effectiveness than the original DA and some other swarm intelligence algorithms as the ANNs trained by the optimized DA have a lower root mean squared error and a higher classification accuracy.

. 112 The Dragonfly Algorithm (DA) [27] is a swarm intelli-113 gence algorithm which is inspired by the behaviour of drag-114 onflies, specifically their hunting and migrating behaviours. 115 These two behaviours of dragonflies aptly represent the two 116 crucial phases of optimization algorithms; exploration and 117 exploitation. Hence, DA makes use of these behaviours, and 118 a population of artificial dragonflies so as to get the optimal 119 solution in a search space. DA has been found to perform 120 better than multiple swarm intelligence algorithms in various 121 applications as we have seen from our work in [28]. How-122 ever, it still has some limitations such as a low exploitation 123 phase [29]. This results in low accuracy of solutions, local 124 optima problem, and low convergence rate. 125 The Hill Climbing algorithm is a local search optimization 126 algorithm which has high exploitation. This is because it 127 always selects a solution which is better than the current 128 solution, that is, one which optimizes the cost of the objective 129 function, until the local optimal solution is achieved [30]. Hill 130 climbing has been successfully used to increase the effec-131 tiveness of various swarm intelligence algorithms, in par-132 ticular by enhancing their exploitation phase. It has also 133 been used to increase the convergence rate of some swarm 134 intelligence algorithms by accelerating the search process. 135 However, it has not been used in any existing hybrid of DA 136 to enhance the low exploitation of the original dragonfly 137 algorithm.

138
In [31], we have proposed the idea of improving the low 139 exploitation of the continuous DA by using the hill climbing 140 algorithm as a local search technique. The continuous version 141 of DA is used for solving continuous optimization problems 142 like the training of ANNs. A continuous optimization prob-143 lem means that the state space consists of real values within 144 a specified range of values, that is, a potential solution can 145 be any real number within a specific range. However, the 146 algorithm was not implemented or applied to any optimiza-147 tion problem in [31]. Since the original dragonfly algorithm 148 has been found to have a higher performance than other 149 swarm intelligence algorithms in various applications [28], 150 VOLUME 10, 2022 it is worth improving its low exploitation phase and applying 151 it for solving optimization problems. 152 In this paper, we implement and propose an improved 153 continuous dragonfly algorithm with a better exploitation 154 phase. The exploitation of DA is improved by using the 155 stochastic hill climbing algorithm as a local search technique. 156 The stochastic hill climbing is one of the main types of dataset from the UCI Machine Learning Repository [32],   weights for the ANN so as to increase its prediction accu-  Specifically, the model is employed to identify and predict 203 whether the classes in the object-oriented system are faulty.

204
To determine the performance of the ANNs trained using 205 the swarm intelligence algorithms, their prediction accuracy 206 and time taken per run are compared to an ANN trained 207 by gradient descent. The results show that all four neural 208 networks trained by the swarm intelligence algorithms have a 209 higher prediction accuracy than the ANN trained by gradient 210 descent. The ANNs trained by the firefly algorithm, ACO, 211 ABC and PSO have an average improvement of 18.559%, 212 28.606%, 38.852% and 50.75% respectively in the fault 213 prediction over the ANN trained by gradient descent. The 214 average time taken by the ANN trained by gradient descent 215 is 21.038 seconds per run, and those trained by firefly, 216 ACO, ABC, and PSO are 22.796 seconds, 5.568 seconds, 217 56.339 seconds and 5.235 seconds respectively. Hence, it can 218 be seen that the ANNs trained by ACO and PSO are more 219 efficient than that trained by gradient descent while those 220 trained by firefly and ABC are less efficient than the ANN 221 trained by gradient descent.

222
In [34], PSO is used for training an ANN which is then used 223 for the prediction of the load-slip behaviour of channel con-224 nectors embedded in normal and high-strength concrete. The 225 ANN consists of an input layer with five neurons, one hidden 226 layer with 10 neurons, and an output layer with one neuron. 227 70% of the data is used for the training of the ANN, and the 228 other 30% is used for testing the ANN. To assess the perfor-229 mance of the ANN trained by PSO, the root mean squared 230 error (RMSE), Pearson correlation coefficient (r), and deter-231 mination coefficient (R2) of the resultant ANN are generated. 232 Its performance is also compared to another ANN of similar 233 architecture which is trained by the Levenberg-Marquardt 234 backpropagation algorithm. In both the training and testing 235 phases, the ANN trained by PSO has higher r and R2 val-236 ues and lower RMSE as compared to the ANN trained by 237 backpropagation, which indicates that the ANN trained by 238 PSO has superior prediction accuracy. In the testing phase, 239 the RMSE value of the ANN trained by PSO is 2.069 while 240 the RMSE value of the ANN trained by backpropagation is 241 2.569. Hence, the RMSE value of the ANN trained by PSO 242 is 19.5% lower than the RMSE value of the ANN trained by 243 backpropagation.

244
In [35], Grasshopper Optimization Algorithm (GOA) and 245 Gray Wolf Optimization (GWO) are used for the training 246 of a neural network so as to increase its accuracy in the 247 estimation of the heating load of residential buildings. The 248 data which is obtained from the analysis of 768 residential 249 buildings is randomly split for the training and testing of the 250 ANNs. 80% of the data is utilized for the training phase and 251 20% is used for the testing phase. Three artificial neural net-252 works, one which is trained by backpropagation, one which 253 is trained by GOA (GOA-MLP), and one which is trained by backpropagation is found to be 0.3230, that of the 296 HHO-MLP is found to be 0.3200, which is a 0.93% decrease, 297 and that of DA-MLP is found to be 0.2904, which is a 10.09%   The MLP is then used for three benchmark classification and 0.030228 for the XOR problem, 5.48e-16, 9.38e-15, 315 0.000585, 5.08e-24, 0.019055, and 2.49e-05 for the balloon 316 classification problem, and 0.114351, 0.122600, 0.188568, 317 0.188568, 0.192473, and 0.154096 for the heart classification 318 problem respectively. Hence, it can be seen that the MLP 319 trained by INMDA achieves the lowest MSE for the XOR 320 problem, the second lowest MSE for the balloon classifica-321 tion problem, and the lowest MSE for the heart classification 322 problem.

323
In [38], the original DA algorithm is used to train an 324 ANN which is used for the brain classification of Magnetic 325 Resonance Images (MRI). The DA algorithm is used to 326 avoid the local optimum problem usually faced by the back-327 propagation algorithm while training ANN and to increase 328 the speed of the training process. The neural network con-329 sists of seven inputs that represent seven feature vectors, 330 one output that can indicate either a normal or an abnor-331 mal brain, and one hidden layer with four nodes. DA is 332 used to optimize the weights of the ANN and the sensi-333 tivity, specificity, and accuracy of the resultant neural net-334 work are calculated. The performance of the DA-based ANN 335 is compared to that of GA-based ANN, PSO-based ANN, 336 and backpropagation (BP)-based ANN. The sensitivity of 337 the DA-based ANN, PSO-based ANN, GA-based ANN, 338 and BP-based ANN is found to be 89%, 83%, 82%, and 339 77% respectively, the specificity is found to be 83%, 72%, 340 70%, and 68% respectively, and the accuracy is found to 341 be 85%, 80.5%, 80%, and 75% respectively. Since, the 342 values for the sensitivity, specificity, and accuracy of the 343 DA-based ANN are higher than those of the other mod-344 els, this indicates that the DA-based ANN has a better 345 performance.

364
The dragonfly algorithm makes use of the static and dynamic 365 swarming behaviours of dragonflies during hunting and 366 migration respectively [27]. While hunting, the population 367 of dragonflies divide into small groups and they fly over a 368 VOLUME 10, 2022  During both the exploration and exploitation phases, five 382 factors are used to control the movement of the dragonflies 383 in the search space, namely, separation, alignment, cohesion, 384 attraction to food and distraction from enemy. Each of these 385 factors has a corresponding weight. and its neighbours, and is calculated as follows: Alignment is used to match the velocity of a dragonfly to 394 that of its neighbours, and is calculated as follows: where A i is the alignment of the i−th dragonfly, and V j is the 397 velocity of the j-th neighbour.

398
Cohesion is the tendency of one dragonfly towards the 399 center of mass of the neighborhood, and is calculated as 400 follows: where C i is the cohesion of the i−th dragonfly, and X j is the 403 position of the j-th neighbour.

404
Attraction to food is used to attract a dragonfly towards the 405 food source which is taken as the best position obtained by 406 the population, and is calculated as follows: where X + represent the position of the food source.

409
Distraction from enemy is used to distract a dragonfly away 410 from the enemy which is taken as the worst position obtained 411 by the population, and is calculated as follows: where X − represent the position of the enemy.

414
In order for the dragonflies to move in the search space by

428
The position vector allows the dragonflies to move in the 429 search space by updating their positions using: In DA, the neighbourhood of the dragonflies is an impor-432 tant aspect. This is considered by assuming a radius around 433 each dragonfly. The radius is incremented proportionally to 434 the iteration counter to enable the transition from the explo-435 ration to the exploitation phase, thereby changing the static 436 swarms at the early iterations into dynamic swarms. During 437 the final iterations, the whole population forms one dynamic 438 swarm and converges to the global optimal solution. Another 439 way by which the algorithm transitions from exploration 440 to exploitation is by adaptively tuning the weights for the 441 different factors.

442
If a dragonfly has no neighbours at some point, its position 443 is updated using the Lévy flight mechanism. This is a random 444 walk that increases the randomness of the algorithm. The 445 position vector used is: where t, and d represent the current iteration number and the 448 dimension of the position vectors respectively.

449
The Lévy flight mechanism is calculated using (9): where r 1 , and r 2 are random numbers between 0 and 1. β is 452 a constant which is taken as 1.5 [27], and σ is calculated 453 using (10): where (x) = (x − 1)!. The pseudocode of DA is given in Fig. 2.    Specifically, lines one to ten from Algorithm 2 are integrated 497 after line 11 in Algorithm 1 so as to further update the position 498 which has been found by equation 7. The position found by 499 equation 7 is taken as the initial solution for the hill climbing 500 algorithm, that is, it is taken as 'current position' from line 501 one in Algorithm 2. The final 'current position' obtained 502 from Algorithm 2 is then taken as the new position of the 503 dragonfly for the optimized DA algorithm.

504
Hill climbing is not applied after the position of the drag-505 onflies is updated using equation 8 since this equation is used 506 for exploration of the search space and not its exploitation. 507 Moreover, equation 8 makes use of the levy flight mechanism 508 to update the position of dragonflies which have no neigh-509 bours. This is a random walk which updates the position of 510 the dragonfly in a stochastic manner. Hence, this may mean 511 that the dragonfly is not in a good region of the search space 512 and hence there is no need to exploit the region using the hill 513 climbing algorithm. This method of employing the hill climbing algorithm as 515 a local search technique after updating the position of the 516 dragonflies using equation 7, allows the dragonflies to update 517 their position to a better one in the area that has been obtained 518 by DA. This is because the hill climbing algorithm starts at the 519 position obtained by DA and then only updates it to a better 520 one until it can no longer be updated to a better position. 521 Hence, the exploitation phase of DA is improved, and this 522 increases the effectiveness of the dragonfly algorithm, that is, 523 better solutions are obtained as compared to the original drag-524 onfly algorithm. The pseudocode of the proposed algorithm 525 is given in Figure 4.   The loop of optimization is repeated until the end criteria 582 is met, that is when the maximum number of iterations is 583 reached.

586
In this section, we provide a description of how the optimized 587 DA algorithm is used for the training of feedforward neural 588 networks which are applied to classification problems by 589 using the iris, balloon, glass, and breast cancer datasets from 590 the UCI Machine Learning Repository [32]. 591 Firstly, the data consisting of the inputs and the targets is 592 loaded and it is split. 70% of the data is used for the training 593 of the ANN and 30% is used for testing the ANN.

594
Secondly, the feedforward neural network is constructed. 595 The architecture of the ANNs used for the different datasets 596 is described in Section VII-A. An example of the architecture 597 used for the iris dataset is shown in Fig. 5. After constructing the network, the total number of param-599 eters, that is, the total number of weights and biases to be 600 optimized during the training process is determined using the 601 formula 11. This number is used as the dimension of the 602 optimized DA algorithm since the set of all the connection 603 weights and the biases need to be optimized. One set of 604 weights and biases represents the position of one dragonfly 605 in optimized DA.
where i, n, and o represent the input size, the number of 608 hidden neurons, and the output size respectively.

609
The optimized DA algorithm is then employed to obtain the 610 optimal set of connection weights and biases for the neural 611 network. This step is the training stage of the ANN.

634
After training the ANN, it is tested using the testing dataset.

635
The RMSE of the resultant neural network is calculated 636 using 12 and its accuracy is calculated using 13. of the glass are used to identify the type of glass, specifically 670 whether the glass is float processed or not.

671
The breast cancer dataset consists of 569 instances, 672 10 attributes, and two classes. The attributes consist of fea-673 tures of a cell nucleus and the aim is to classify whether it is 674 malignant, or benign.

676
The optimized DA algorithm is employed for the training of 677 feedforward neural networks as described in section VI by 678 using the iris dataset, the balloon dataset, the glass dataset, 679 and the breast cancer dataset.

680
For all four datasets, 70% of the data is used for the training 681 of the ANN and 30% is used for testing the ANN.

682
For the iris dataset, the architecture of the neural network 683 used is one input layer with four neurons, one hidden layer 684 with three neurons, and one output layer with three neurons. 685 To determine the number of neurons in the hidden layer, 686 different ANNs are trained using the gradient-descent algo-687 rithm by changing the number of neurons in the hidden layer 688 from one to five. The number of neurons which results in the 689 least average RMSE and the highest average accuracy is then 690 chosen. The average RMSE and accuracy obtained when the 691 different ANNs are trained using gradient-descent are shown 692 in Table 2. Since the least average RMSE and highest average 693 accuracy are obtained when the number of neurons in the 694 hidden layer is three, this architecture is chosen for the ANN. 695 For the balloon dataset, the architecture of the neural net-696 work used is one input layer with four neurons, one hidden 697 layer with three neurons, and one output layer with one 698 neuron. For the glass dataset, the architecture is one input 699 layer with nine neurons, one hidden layer with three neurons, 700 and one output layer with one neuron, and for the breast 701 cancer dataset, the architecture is one input layer with nine 702 neurons, one hidden layer with three neurons, and one output 703 layer with one neuron.

704
For all four datasets, the performance of the optimized 705 DA algorithm in training an ANN is compared to that of 706 the original DA algorithm. This is done by training similar 707 ANNs with the same number of layers and neurons using 708 the original DA algorithm. Then the two neural networks 709 trained by the optimized DA and the original DA are com-710 pared in terms of the final RMSE of the neural network 711 during the training phase, the RMSE of the resultant neural 712 network during the testing phase, the accuracy of the resul-713 tant neural network, and the time taken for the algorithms 714    In order to have a fair comparison, the optimized DA is 753 used to train ANNs with the same architecture as in [37], [47], 754 and [48]. Hence, the number of neurons in the hidden layer is 755 changed to nine for the iris dataset and balloon dataset, and to 756 19 for the glass and breast cancer datasets. More experiments 757 are then conducted by using the optimized DA to train ANNs 758 with the mentioned architectures and the results are recorded 759 and compared to that obtained by other swarm intelligence 760 algorithms.  Tables 3, 4, 5, and 6 show the results obtained when the 765 optimized DA and the original DA are used for training ANNs 766 using the iris dataset, the balloon dataset, the glass dataset, 767 and the breast cancer dataset.

768
For the iris dataset, the following architecture is used for 769 the neural network: one input layer with four neurons, one 770 hidden layer with three neurons, and one output layer with 771 three neurons. For the balloon dataset, the architecture used 772 for the neural network is: one input layer with four neu-773 rons, one hidden layer with three neurons, and one output 774 layer with one neuron. For the glass dataset, the follow-775 ing architecture is used: one input layer with nine neurons, 776 one hidden layer with three neurons, and one output layer 777 with one neuron. For the breast cancer dataset, the architec-778 ture used is: one input layer with nine neurons, one hidden 779 layer with three neurons, and one output layer with one 780 neuron.

781
The number of search agents used is five and 10 and the 782 maximum number of iterations used is 10, and 20. The results 783 VOLUME 10, 2022     Tables 3, 4, 5, and 6, and from Fig. 6, 7, 8, and 9, 807 it can be deduced that the optimized DA algorithm has a 808 better performance as compared to the original DA in terms 809 of the effectiveness, that is, it is able to better optimize the 810 connection weights and biases of the neural networks during 811 the training phase, which leads to better accuracy of the 812 resultant neural network.

813
From Tables 3, 4, 5, and 6, it can be seen that the RMSE 814 obtained when the ANN is trained using the optimized DA 815   Fig. 6, it can be seen that the 833 optimized DA algorithm converges to the optimal solution at 834 around iteration 15 while the original DA converges at around 835 iteration 18. Moreover, it can be seen that the optimized DA 836 converges to much better solutions than the original DA as 837 the value of the objective function, that is the RMSE of the 838 neural network is much lower.

847
In order to have a fair comparison, the optimized DA algo-848 rithm is used for training ANNs with the same architecture as 849 the other works in [37], [47], and [48] for the four datasets. 850 VOLUME 10, 2022    The results are compared in terms of the Mean Square 863 Error (MSE) obtained during the training phase and the accu-864 racy obtained during the testing phase.

865
From Tables 7, 8, 9, and 10 it can be deduced that the 866 proposed optimized DA algorithm has a higher effectiveness 867 as compared to multiple other swarm intelligence algorithms 868 in training artificial neural networks. For the iris, glass, and 869 breast cancer datasets, the ANN trained by the proposed opti-870 mized DA achieves a higher accuracy than all other swarm 871 intelligence algorithms used to train ANNs in [37], [47], 872 and [48]. For the balloon dataset, the accuracy obtained by 873  The training of artificial neural networks is a crucial pro-887 cess as it is a requisite step in order to be able to use 888 neural networks. This process primarily allows the neural 889 network to learn how to generate the correct output based on 890 the inputs provided, thus enabling the neural network to be 891 used for various tasks such as classification and regression. 892 Conventional algorithms used for training ANNs such as the 893 Backpropagation algorithm have some limitations such as 894 being trapped in local optima and hence they are unable to 895 find the optimal connection weights for the neural network. 896 Recently, the use of swarm intelligence algorithms to train 897 ANN has been increasing owing to their high exploration and 898 exploitation capabilities.

899
In this paper, an optimized dragonfly algorithm is proposed 900 and used as a training algorithm for feedforward neural net-901 works which are used for benchmark classification problems, 902 namely the iris, balloon, glass, and breast cancer classifica-903 tion problems. The performance of the dragonfly algorithm 904 is improved by overcoming its low exploitation phase. This 905 is achieved by using the stochastic hill climbing algorithm as 906 a local search technique.

907
From the experimental results, it can be deduced that the 908 optimized DA algorithm has a better performance in training 909 ANN as compared to the original DA and multiple other 910 swarm intelligence algorithms. The RMSE of the ANNs 911 trained by the optimized DA is found to be lower than the 912 RMSE of the ANNs trained by the original DA for both the 913 training and testing phases. The classification accuracy for 914 the ANN trained by the optimized DA is also higher than 915 the ANN trained by the original DA. Moreover, the ANNs 916 trained by the proposed algorithm have higher accuracy as 917 compared to those trained by multiple other swarm intelli-918 gence algorithms.

919
For future work, the ANN trained by the optimized DA 920 algorithm can be applied to more classification datasets and 921 also to regression datasets so as to use it for regression prob-922 lems in addition to classification problems. Moreover, it can 923 be used for some real-world applications with real-world 924 datasets instead of benchmark datasets. For example, the 925 ANN trained by the optimized DA can be used as prediction 926 systems in smart cities, and for channel estimation in optical 927 systems [49].