Adaptive Ant Colony Optimization With Node Clustering for the Multidepot Vehicle Routing Problem

This article deals with the novel metaheuristic algorithm based on the ant colony optimization (ACO) principle. It implements several novel mechanisms that improve its overall performance, lower the optimization time, and reduce the negative behavior which is typically connected with ACO-based algorithms (such as prematurely falling into local optima, or the impact of setting of control parameters on the convergence for different problem configurations). The most significant novel techniques, implemented for the first time to solve the multidepot vehicle routing problem (MDVRP), are as follows: 1) node clustering where transition vertices are organized into a set of candidate lists called clusters and 2) adaptive pheromone evaporation which is adapted during optimization according to the diversity of the population of ant solutions (measured by information entropy). Moreover, a new termination condition, based also on the population diversity, is formulated. The effectiveness of the proposed algorithm for the MDVRP is evaluated via a set of experiments on 23 well-known benchmark instances. Performance is compared with several state-of-the-art metaheuristic methods; the results show that the proposed algorithm outperforms these methods in most cases. Furthermore, the novel mechanisms are analyzed and discussed from points of view of performance, optimization time, and convergence. The findings achieved in this article bring new contributions to the very popular ACO-based algorithms; they can be applied to solve not only the MDVRP, but also, if adapted, to related complex NP-hard problems.


Adaptive Ant Colony Optimization With Node
Clustering for the Multidepot Vehicle Routing Problem

Petr Stodola and Jan Nohel
Abstract-This article deals with the novel metaheuristic algorithm based on the ant colony optimization (ACO) principle.It implements several novel mechanisms that improve its overall performance, lower the optimization time, and reduce the negative behavior which is typically connected with ACO-based algorithms (such as prematurely falling into local optima, or the impact of setting of control parameters on the convergence for different problem configurations).The most significant novel techniques, implemented for the first time to solve the multidepot vehicle routing problem (MDVRP), are as follows: 1) node clustering where transition vertices are organized into a set of candidate lists called clusters and 2) adaptive pheromone evaporation which is adapted during optimization according to the diversity of the population of ant solutions (measured by information entropy).Moreover, a new termination condition, based also on the population diversity, is formulated.The effectiveness of the proposed algorithm for the MDVRP is evaluated via a set of experiments on 23 well-known benchmark instances.Performance is compared with several state-of-the-art metaheuristic methods; the results show that the proposed algorithm outperforms these methods in most cases.Furthermore, the novel mechanisms are analyzed and discussed from points of view of performance, optimization time, and convergence.The findings achieved in this article bring new contributions to the very popular ACO-based algorithms; they can be applied to solve not only the MDVRP, but also, if adapted, to related complex NP-hard problems.

I. INTRODUCTION
T HE MULTIDEPOT vehicle routing problem (MDVRP)   is a well-known NP-hard combinatorial optimization problem related to the famous traveling salesman problem (TSP) formulated at Princeton University in the 1930s [1].The problem has been applied in many domains of human activities such as logistics, transportation, distribution, navigation, military, etc. [2].The authors are with the Department of Intelligence Support, University of Defence, 66210 Brno, Czech Republic (e-mail: petr.stodola@unob.cz;jan.nohel@unob.cz).
Color versions of one or more figures in this article are available at https://doi.org/10.1109/TEVC.2022.3230042.

A. Problem Formulation
The problem is formulated as follows.Let D = {d 1 , d 2 , . . ., d m } be a set of vehicles (depots) of size m, and C = {c 1 , c 2 , . . ., c n } be a set of customers of size n.Let G be a graph whose vertices (nodes) comprise all the customers and vehicles: V = D ∪ C = {d 1 , . . ., d m , c 1 , . . ., c n } = {v 1 , v 2 , . . ., v n+m }.The aim of the problem is to serve all the customers by vehicles (each customer must be served just once by any vehicle) so that the total distance traveled is as small as possible.
Route R k for vehicle d k ∈ D represents a sequence of customers to be served in the exact order: R k = (r k 1 , r k 2 , . . ., r k n k ) where r k j ∈ V, and n k is the number of vertices to be visited by vehicle d k including its depot (r k 1 = r k n k = d k ).The solution to the problem is a sequence of routes: R = (R 1 , R 2 , . . ., R m ).The objective function is in Several constraints must be satisfied.Constraint (2) ensures that all customers must be served, constraint (3) forces each vehicle to start and end its route at the depot, and constraint (4) ensures that each customer is served just once r 1  1 ; . . .; r 1 n 1 ; r 2 1 ; . . .; r 2 n 2 ; r m 1 ; . . .; r m n m = C if r k j ∈ C (2) Moreover, each customer demands a certain amount of goods [marked r k j (demand) ], and vehicles in depots have a maximum capacity [marked d

(capacity) k
] which cannot be exceeded.This means that the vehicle must return back to its depot if its capacity would be exceeded by visiting the next customer on its route (thus emptying its load); after that, the vehicle can continue to serve the remaining customers on the route.Route R k may be thus composed of more than 1 consecutive subsequences R i k ⊆ C comprised of customers only (i.e., any part of the route without returning to the depot): Then, capacity constraint (5) must be satisfied

B. Contributions
In this article, the novel metaheuristic algorithm based on the ant colony optimization (ACO) theory is proposed for the MDVRP.This algorithm implements several unique techniques which improve its performance and reduce the negative behavior typically connected with ACO-based methods (particularly the impact of the setting of control parameters on performance in connection with the graph configuration, and the tendency to fall prematurely into a local optimum).
The algorithm presented in this article extends the original ACO algorithm proposed by Stodola in [3].Some of the key techniques implemented in the original are as follows.
1) Stochastic selection of depots before the node transitions based on the pheromone potential.2) Hybridization with the simulated annealing principle; this is used when determining a solution from the population to update the pheromone matrix.3) Modified k-Opt local search optimization applied not only to individual routes in the solution, but also to a pair of routes (based on a mutual exchange of nodes).This algorithm has been extended by three novel techniques which further improve its performance.These techniques, originally developed by the authors for the TSP in [4], have been modified and adapted for the MDVRP, and integrated into the algorithm.They are as follows.
1) Node Clustering Principle: For every vertex, candidate lists of transition vertices (called clusters) are created.
The vertices in clusters are organized based on the probability that they will be a part of the optimal solution; the higher the probability, the higher the significance of the cluster.The probability is not influenced only by the distance between vertices; the graph configuration also is considered using the sectorization principle.Node clustering does not restrict the pheromone attraction principle to be used only to a small set of vertices in a single candidate list, but enables to access a complete set of transition vertices.2) Adaptive Pheromone Evaporation: The adaptive pheromone evaporation has been proposed because the right speed of evaporation is one of the most critical factors influencing the performance of the algorithm.Thus, the speed of the evaporation reflects the current optimization progress measured by the information entropy expressing the diversity of solutions in the population.3) Entropy-Based Termination Condition: A new termination condition has been formulated based on the information entropy.The main idea behind this is that when the entropy is low (i.e., solutions in the population are the same or similar to one another), the probability of improving the best solution is also low and thus the optimization can be terminated.Based on these new techniques, the proposed algorithm is called the adaptive ACO with node clustering (AACO-NC).

C. Motivation
Prakasam and Savarimuthu [5] and Han et al. [6] of this article chose the ACO-based methods because these proved to be very successful in solving discrete optimization problems.Several novel techniques have been proposed and verified.The practical use of the AACO-NC in the real software application (see details below) is based on the MDVRP transformation into various real-world problems.
The AACO-NC has been integrated into the tactical decision support system (TDSS) as a key optimization method.This system is being developed at the University of Defence, Czech Republic, in long-term research.The aim is to support military commanders of the Czech Armed Forces on the battlefield in their decision making [7].The TDSS is composed of a set of models of military tactics.Commanders can use this system to plan their operations when one of the models is compatible with the task at hand.The system plans the variants to solve the task along with the second-order effects, and presents them to the commanders who can decide the next course of action (they can choose one of the variants which is immediately transmitted to subordinate soldiers or robotic systems for execution).More information about this topic can be found in [8], [9], [10], [11], [12], [13], [14], and [15].Some of the models of military tactics, in which the proposed algorithm is implemented, are as follows.
1) Cooperative reconnaissance of the area of intelligence responsibility via a swarm of unmanned aerial systems (UASs).The algorithm is used to plan trajectories of individual cooperating UASs so that the reconnaissance is as fast and complete as possible.2) Cooperative reconnaissance of the area of intelligence responsibility using a group of ground elements (soldiers or unmanned ground systems).3) Persistent surveillance of the area or object of interest via a swarm of UASs.4) Logistics to units on the battlefield via a group of supply vehicles.

D. Article Organization
This article is further organized as follows.Section II reviews the literature particularly with respect to the novel improvements.In Section III, the proposed algorithm is presented with a focus on the novel techniques.Section IV shows the experimental results on a set of benchmark instances, and compares the performance of the AACO-NC with other state-of-the-art methods.The analysis and discussion about the impact of the novel techniques on the performance and behavior of the algorithm are in Section V. Finally, Section VI concludes this article and suggests future work for the authors.

II. LITERATURE REVIEW
The MDVRP is a well-known NP-hard problem that attracts the full attention of researches.Since it was formulated in 1959 [16], many different methods have been proposed, both exact and stochastic.The most recent survey of the MDVRP [2] discusses mathematical models, state-of-the-art solution methods, and real-life applications.
This section is organized as follows.First, some examples of metaheuristic algorithms used for the MDVRP (or related problems) are introduced in Section II-A.The bioinspired algorithms follow in Section II-B as a strong class of stochastic methods.Then, ACO-based methods are presented in Section II-C with a focus on modern techniques and mechanisms integrated to enhance the overall performance.Finally, Sections II-D and II-E aim at the current state of research in the areas that are related to the key contributions of this article-node transition and adaptive parameters control.No part of this section can bring the complete overview of scientific works; therefore, only the state-of-the-art studies and/or key studies related in some aspect to this work are included.

A. Metaheuristic Algorithms
Metaheuristic algorithms have proved to be very successful for the constrained combinatorial problems.Genetic algorithms have been often used for the MDVRP; Singh et al. [17] proposed an improved genetic algorithm with various crossover operators and preassigning customers to their nearest depots.The genetic algorithm combined with the enhanced ordered distance vector (ODV) for population initialization is introduced in [18].Another popular approach is the variable neighborhood search (VNS); the approach in [19] integrates biased-randomization strategies within a VNS framework in order to better guide the searching process.The variable tabu neighborhood search (VTNS) algorithm is proposed in [20]; it combines the VNS with a tabu shaking mechanism in the diversification phase of the search.The imperialist competitive algorithm (ICA) has been also often used for the combinatorial problems.Dzalbs et al. [21] proposed the ICA with independence and constrained assimilation (ICA-ICA) which combines the classical ICA assimilation and revolution operators, while maintaining population diversity.

B. Bio-Inspired Algorithms
In recent years, bio-inspired methods have become very popular and widely used for combinatorial problems.Rapanaki et al. [22] proposed a variant of the Artificial Bee Colony algorithm for the Multiobjective Energy Reduction MDVRP.The Enhanced Firefly Algorithm, which improves the solution quality by using the Clarke and Wright savings algorithm and a local search technique for the Capacitated Vehicle Routing Problem, is put forth in [23].A hybrid nature-inspired heuristic algorithm integrating the Clarke and Wright savings algorithm, the Sweep Algorithm, and the Multiobjective Particle Swarm Optimization algorithm to solve the Multidepot Green VRP is proposed in [24].

C. ACO Algorithms With Various Improvement Techniques
ACO is one of the most popular metaheuristic optimization techniques inspired by the behavior of ants when searching for food.It has been applied to a wide range of combinatorial optimization problems.These algorithms often combine the basic ACO-based principle with particular techniques and mechanisms to enhance performance and to reduce the negative effects typically connected with the ACO algorithms.Some of the most recent research of the ACO-based algorithms applied to the MDVRP or related problems are [25] (ACO with an improved approach in updating the pheromone matrix), [26] (clustering nodes into a desired number of groups using k-means clustering algorithm, then using ACO to generate routes for each cluster), [27] (ACO with the scanning strategy and crossover operation), [28] (ACO with two distinct types of ants, the first for assigning customers to depots, the second to generate routes), [29] (ACO based on a polygonal circumcenter used for assigning customers to depots).A lot of ACO algorithms are also complemented by some local search optimization technique, mostly k-Opt local search, to enhance the overall performance, e.g., [30], [31], and [32].

D. ACO Algorithms-Node Transition Techniques
The key phase of ACO-based algorithms is the sequential nodes transition into ant solutions.In the original approach, any node that is not yet part of the solution can be selected and inserted.The probability of selecting a node depends typically on the heuristic information (reverse distance between the last node inserted and this node) and the strength of the pheromone trail.These probabilities are necessary to calculate for all the free nodes which leads to the quadratic dependence on the graph size.The idea to restrict the set of nodes for selection is therefore logical not only to reduce the computational complexity, but also to increase the overall performance and decrease the memory requirements [33].The restricted set of nodes for selection is typically called a candidate set or a candidate list.
In the MDVRP and related graph theory problems, a single candidate list has typically a fixed size and the nodes are selected using the nearest neighbor principle (the general assumption is that local transitions lead to good solutions), i.e., the nodes closest to the last customer in the solution are inserted into the candidate list [34].An example of this approach is in [35], where the authors developed a hybrid ACO algorithm to solve the open vehicle routing problem.In the transition phase, the customers are selected from the candidate lists comprised of the closest customers; the size of the candidate list is limited to some portion of the total number of customers.Another example is in [36]; the authors proposed the vectorized candidate set selection technique (VCSS) for a parallel ACO algorithm, where the candidate list is generated from a set of the nearest free neighbors using the roulette wheel principle.A different approach is used in [37]; instead of creating a list of candidate nodes, the pheromone matrix is restricted to the constant number of nearest neighbors for each node.All these approaches need some strategy to select a customer when there is no free node in the candidate list.The easiest way is to select the nearest free customer.However, a more advanced procedure is proposed in [37]: a pheromone value is mapped to every edge which is part of the best route; a node with the highest pheromone value assigned to the edge adjacent to the last node in the route is then inserted.
The node clustering principle introduced in this article brings novelty in two ways.First, there is not a single candidate list but a set of candidate lists comprising a complete number of vertices.The selection of a vertex for transition is performed in two phases: first, a candidate list from the set Authorized licensed use limited to the terms of the applicable license agreement with IEEE.Restrictions apply.
is determined, then a vertex from this list is selected.In both phases, the probability of selecting either a list or a vertex depends on the heuristic information and the intensity of the pheromone trails.Thus, the principal strength of the ACO is preserved for the full set of vertices.Moreover, this approach does not need a special strategy if there is not a free vertex inside the single candidate list.The second enhancement is that the vertices are not organized in clusters strictly using the nearest neighbor approach, but the graph configuration is also considered using the sectorization principle.

E. ACO Algorithms-Adaptive Parameters Control
Some research has been recently conducted in the area of adaptivity of ACO-based algorithms in order to reduce negative effects, such as falling into local optima or the influence of the setting of control parameters on performance for difference graph configurations.A nucleolus game learning strategy in a set of ant colonies was proposed in [38] to improve the population diversity; the idea is to share pheromone distribution among these colonies when the algorithm stalls.Meng et al. [39] proposed an adaptive stagnation avoidance strategy in which a cooperative game model based on the population diversity determines the selection of the appropriate pheromone matrix for different colonies.
The adaptive approach to evolving control parameters has been addressed in several research articles.The alpha and beta control parameters are adjusted in [40] using a set of rules based on exploring the parameter and fitness landscape during the search.Li et al. [41] used the greedy strategy for the dynamic change of alpha and beta control parameters to accelerate the convergence speed linearly with iterations.The linear change of alpha and beta control parameters and, in addition, the evaporation coefficient is introduced in [42].
The diversity of the population of ants is measured in some studies by information entropy [39], [43].However, to the best knowledge of the authors of this article, information entropy has not yet been used for the adaptive evolution of control parameters (particularly the evaporation coefficient) as it is used in this article.Using this principle, the convergence speed is controlled via the speed of the pheromone evaporation according to the current state of the population diversity (greater diversity enables faster convergence, and vice versa).

III. ALGORITHM
This section is aimed at the key principles of the proposed AACO-NC algorithm.This algorithm is inspired by the behavior of ants in nature when searching for food and it implements several techniques which improve its performance and behavior for the MDVRP.
Fig. 1 presents the key phases of the AACO-NC in pseudocode.The input of the algorithm is a set of graph vertices V (customers and depots) and control parameters which are as follows.
1) n ants : Number of ants in colonies.
2) n freq : Frequency of the local optimization.
3) n prim : Number of primary clusters.4) n size : Number of vertices in clusters.The algorithm works with a number of ant colonies equal to the number of depots (one colony is located at each depot); ants starting from these colonies represent vehicles; their movements represent routes in a solution.There is the same number of ants in each colony controlled by parameter n ants .A pheromone matrix is created for each colony.At the initial phase of the algorithm (point 3), the pheromone matrixes are initialized using (6).Also, clusters for every vertex of the graph are created (points 4 and 5)-see Section III-A for details The algorithm runs in iterations (points 6 to 19).In each iteration, solution R a is found (point 10) for every set of ants (one ant from each colony belongs to this set) using the novel node clustering principle (see Sections III-A and III-B for details).The best solution found in an iteration is labeled as R best (points 11 and 12).Then, the local optimization process may be applied to solution R best ; it is applied with frequency given by control parameter n freq (points 13 and 14).The local optimization uses the modified k-Opt local search algorithm for individual routes in the solution, but also for the mutual exchange of vertices between any pair of routes.More details about the local optimization can be found in [3].
At the end of each iteration, pheromone matrixes are updated (point 17), and then evaporated (point 19)-see Section III-C for details.The speed of the evaporation is controlled via the pheromone evaporation coefficient ρ which is set adaptively in the range (ρ min , ρ max ) in each iteration (point 18)-see Section III-D.The best solution found so far R is stored (points 14 and 15) in each iteration and returned (point 20) when the algorithm is terminated (see Section III-E for termination conditions).

A. Node Clustering Principle
Node clustering is the key principle improving the performance of the algorithm.This principle is used in the phase of creating a solution R a when searching for the next node to be inserted into one of the routes of vehicles (ants).
A cluster is a set of vertices of a defined size n size 2 , . ..} is created independently so that they together contain all the customers in the graph except vertex v i itself.Vertex v i , for which clusters K (v i ) are created, is then referred to as the cluster vertex for these clusters.
Customers are inserted into clusters based on the probability of being the neighboring vertices for cluster vertex v i on the optimal route.Customers closer to the cluster vertex are more probable, so they are placed into clusters with lower indexes [the closest vertices are placed into the first cluster K However, based on the topology of the problem at hand, even customers relatively distant to the cluster vertex may be the neighboring vertices on the optimal route.This may occur when the problem is composed of several separate groups of customers.For this reason, a simple algorithm for creating clusters was proposed which takes this possibility into account.This algorithm can be used when the positions of vertices in their space representation are known (Euclidean or geographical).
The algorithm for creating clusters is in Fig. 2. The principle lies in dividing the space into a number of sectors n sect (1 ≤ n sect ≤ n size ).In the first phase (points 4 to 7), the closest vertex in each sector is inserted into the first cluster.In the next phase (points 8 to 14), the first cluster is completed by other vertices closest to v i (up to n size if n size > n sec ), and other clusters are gradually filled by remaining vertices; the ones closer to v i lie in clusters with lower indexes.
The first n prim clusters of K (v i ) for each cluster vertex v i are called primary clusters.The vertices sequentially inserted into routes of vehicles are preferentially selected from the primary clusters.See more details in the next section.

B. Ant Solution Using the Node Clustering Principle
Fig. 3 presents the algorithm for generating a solution for a set of ants using the node clustering principle.Routes R d for individual vehicles at depots d ∈ D are sequentially generated in a loop (points 6 to 17).Each vehicle starts from its depot (points 2 and 3).Set V free (point 1) contains all the customers not visited by any vehicle so far.The last vertex (customer or depot) inserted into each route is continuously (point 8); a vertex to be inserted into the route R d will be chosen only from free (as yet unvisited) vertices in this cluster; set V candidates contains those vertices (see the intersection of sets K and V free in point 9).The algorithm for the cluster selection is elaborated in Fig. 5.
The final step is the selection of the vertex v ∈ V candidates to be inserted into the route R d (point 10).This algorithm is elaborated in Fig. 6.The selected vertex v is then inserted into the route R d (point 14); however, the vehicle would return to its depot before visiting the vertex (thus emptying its load), if its capacity were exceeded (points 11 to 13).Then, the loop continues until all the customers are visited (i.e., V free is empty).All the vehicles then return to their depots (points 18 to 19), and the ant solution R is returned (point 20).
Fig. 4 shows the algorithm for selecting a depot used in point 7 of the AACO-NC algorithm in Fig. 3.For each depot, the probability of its selection is calculated (points 1 to 7), and one of the depots is selected with this probability distribution using a simple roulette wheel principle (point 8).The probabilities are proportional to the pheromone potential of each depot; this pheromone potential is represented by the sum of pheromone trails from the last vertex v d on the route to all available candidate nodes in the primary clusters.
When the depot d ∈ D is determined, its route R d will be expanded by a vertex from one of the clusters (primary clusters are preferred to other clusters).This cluster is selected using the algorithm in Fig. 5.For each cluster in primary clusters, the average heuristic information η k (inverse distance) and average pheromone trail between the cluster node v d and all Authorized licensed use limited to the terms of the applicable license agreement with IEEE.Restrictions apply.available (as yet unvisited) vertices are calculated (points 1 to 5).If there is not a single available vertex in the primary clusters (this is however a very rare situation), the first cluster behind the primary clusters with at least one available vertex is selected and returned (points 8 to 11).Otherwise, the probability of selecting individual primary clusters is proportional to the multiple of the calculated average heuristic information and the pheromone trail (points 12 to 13).The influence of the heuristic information and the pheromone trail on the probability is controlled by parameters α and β.Then, the cluster is selected based on these probability distributions using the simple roulette wheel principle (point 14).
When the cluster K ) is determined, one of the as yet unvisited customers in this cluster is selected using the algorithm in Fig. 6.The probability of selecting a vertex (points 1 to 2) is calculated using the standard ACO principles: it is proportional to the multiple of the heuristic information and the pheromone trail between the vertex and  The vertex is then selected using the simple roulette wheel principle (point 3).

C. Pheromone Matrix Update and Evaporation
The update of the pheromone matrix (see point 17 in the algorithm in Fig. 1) uses the principles of the simulated annealing optimization method.This method uses the Metropolis criterion in each iteration to decide whether to accept the newly transformed solution, or preserve the original solution.If the transformed solution is better than the original, the transformed solution is always accepted; however, even the worse solution can be accepted.This idea extends the search in the search space, thus preventing the solution from falling into some local optima.
The same idea is applied in the AACO-NC when determining which solution is used to update the pheromone matrix: 1) the best solution found in an iteration R best or 2) the best solution found so far R.In general, using the former results in maintaining the diversity of the population but also in slowing the convergence down; moreover, the promising solution can be sometimes lost.Therefore, it is advantageous to use the best solution found so far for the update from time to time instead.It results in faster convergence as well as in higher performance.
The probability of using either R best or R is determined by the Metropolis criterion using (7).If R best is better than R (i.e., the new best solution is found in an iteration), then R best is always used.Otherwise, the R best is used with the probability which depends on the percentual difference between R best and R, and the current value of temperature T update (the higher the temperature, the higher the probability of using the worse solution to update the pheromone matrix).The temperature, as one of the key control parameters of the AACO-NC, may change from iteration to iteration using the cooling coefficient α update according to (8 The solution for updating the pheromone matrix R update is selected based on the calculated probabilities: The update itself is then conducted using (9); the pheromone trails lying on the routes are increased in proportion to the pheromone updating coefficient δ and the quality of the updating route (a ratio of |R| to |R update |) After the update, the pheromone matrix is evaporated (point 19 in the algorithm in Fig. 1) using (10).The speed of the evaporation is controlled by the pheromone evaporation coefficient ρ

D. Adaptive Pheromone Evaporation
One of the most critical control parameters for ACO-based algorithms is the pheromone evaporation coefficient ρ.It affects the convergence speed: higher values mean a faster convergence but also a quick fall into some local optimum which is often too far from the global optimum; lower values mean a slower convergence (or even infinite convergence).Moreover, the setting of this control parameter is problem dependent; the best value can differ significantly for different graph topologies.
To overcome this problem, the adaptive setting of the pheromone evaporation coefficient is proposed in this article.It is controlled in each iteration by the diversity of the population of ant solutions.The main idea behind this principle is that the value of the pheromone evaporation coefficient reflects the current progress in optimization.At the beginning of the optimization, the population diversity is big and therefore the convergence can be accelerated via the bigger value of the evaporation coefficient.During the optimization, when the solution converges toward some local optimum (documented by decreasing diversity in the population), the evaporation of the pheromone trails is lowered with the effect of slowing down the convergence and thus extending the search space.This principle works automatically during the whole optimization.
The population diversity is measured by the Shannon entropy using (11).The probability p ij determines the probability that the edge between vertices v i and v j (i = j) is part of any solution in the ant population.This probability is calculated statistically based of the number of edge occurrences in solutions using (12) where n ij is the number of occurrences of the edge between vertices v i and v j in the entire population, and n ij is the sum of all edges used in all the solutions The minimum and maximum limits of entropy in the population are given by ( 13) and ( 14), respectively The pheromone evaporation coefficient is then controlled within the predefined limits using (15) where ρ min is the minimum value of the pheromone evaporation coefficient, and ρ max is the maximum value.The setting of these limits is not as sensitive and dependent on the graph configuration as is the setting of a single evaporation parameter.The value adapts to the actual situation based on the current diversity in the population (linear dependence on entropy)

E. Entropy-Based Termination
The algorithm is terminated when at least one of the following conditions is satisfied.
1) Maximum number of iterations is exceeded.
2) Maximum optimization time is exceeded.
3) Maximum number of iterations without improvement in solution is exceeded.4) Diversity of solutions in the population is too low.The newly proposed last condition is based on the fact that when the diversity of solutions in the population is too low then the probability of finding a better solution is also low.This is because individual ant solutions are very similar to one another (there are a lot of common edges in the solutions) and, therefore, the discovery of new information is not plausible.
The diversity in the population is measured by the Shannon entropy, mentioned in the previous section.The algorithm is terminated when the relative difference between the current entropy H and the lower limit H min is equal to or lower than the relative distance expressed by coefficient ω

F. Computational Complexity
The computational complexity of the original algorithm of finding a single ant solution (i.e., without the node clustering principle) is in (17); it is quadratically dependent on the number of vertices n (the quadratic dependence is caused by the calculation of transition probabilities for all the free nodes in the phase of insertion vertices into the solution).The term m represents the selection of a depot before the vertex transition The node clustering principle significantly decreases the computational complexity for this algorithm-see (18).The selection of a node to be inserted into the solution now proceeds in two phases: 1) selection of a cluster (there is n prim primary clusters for selection) and 2) selection of a vertex from the cluster (there is n size vertices in the selected cluster) The increase of the optimization speed is apparent as, principally, parameters n prim and n size are substantially lower than the total number of vertices (n prim n, n size n).Moreover, the node clustering principle preserves the key feature of the original algorithm: nodes inserted successively into the solution are always selected from the substantial set of free nodes (n prim • n size ) using pheromone attracting principles (it is not limited to a small set of closest vertices).

IV. EXPERIMENTS AND RESULTS
This section presents the experimental results on a set of benchmark instances, and compares them with several state-of-the-art stochastic methods.

A. Benchmark Instances
As benchmarks, 23 of the well-known Cordeau's MDVRP instances [44] are used to verify the performance of the proposed AACO-NC algorithm.Table I records their dimension (number of vertices and depots) and constraints (maximum vehicle capacity and maximum route length).The layout is either random (the vertices are distributed randomly), or regular (there is at least a particular regular pattern).

B. Experimental Results
The algorithm was implemented in C++ programming language using the Visual Studio IDE.The optimizations were conducted on a personal computer with parameters as follows: Intel Core i9-10940X CPU 3.30 GHz, 32-GB RAM.
All the results were achieved using control parameters set as follows: n ants = 192, n freq = 10, n prim = 4, n size = 24, n sect = 16, T update = 0.1, α update = 1, ρ min = 0.001, ρ max = 0.1, δ = 3, α = 1, and β = 1.The setting of control parameters is either based on the experiments conducted in the previous research [3], [4] (parameters n ants , n freq , T update , α update , δ, α, β), or, in case of the new parameters connected with the node clustering (n prim , n size , n sect ) and adaptive pheromone evaporation (ρ min , ρ max ), on the general analysis of the impact of these parameters on the behavior and performance of the algorithm.Due to the limited number of pages, there is not enough space to present the complete results here; the shortened version is presented in Section V.
For each benchmark instance, 100 optimization trials were performed.The complete results (including individual solutions for all the optimization trials) are available for download here: https://zenodo.org/record/7341076;this enables comparison of the AACO-NC with other algorithms in the future including paired statistical tests.
Table II shows the results.The best-known solutions (BKSs) were taken from [44].These solutions are, however, outdated in some cases; better solutions for several instances can be found in the literature.However, these solutions are often only reported but not available for download (i.e., they cannot be verified).Therefore, [44] is used for the BKS in this article as it provides complete reference solutions for the benchmark instances.The BKS values in bold are proven optima [45].Table II records the best and the worst solutions found by the AACO-NC in the optimization trials, the average values along with the standard deviation, and the optimization time in seconds.
The AACO-NC managed to find better or the same solutions for 18 instances when compared to the BKS (see the gray cells in Table II).In the remaining five instances, the difference between the best solutions and the BKS is below 1%.In four cases, the BKS were improved (p07, p09, p10, and p11).The algorithm provides very good results regardless of the graph configuration (random or regular), particularly for larger instances: in the case of three instances with 249 nodes and random graph configuration (p09, p10, and p11), the BKS were significantly improved; in case of two instances with 360 nodes and nine depots and regular graph configuration (p21 and p22), the same solutions as the BKS were achieved.
The last row in Table II shows the average values in individual columns over all the instances.The difference between the best solutions found and the BKS is minimal (0.01%).The difference between the average and worst solutions and the BKS is 1.27% and 3.30%, respectively.
The optimization time differs sometimes substantially.It is caused, for one, by the graph configurations which may influence the convergence speed radically.The local optimization process also has a big impact on the optimization speed, especially the mutual exchange of vertices between vehicle routes (see [3] for details).This can be accelerated when the maximum route length is limited (the exchange of vertices between routes too far from one another is pointless).The effect of this acceleration can be seen on instances p21, p22, and p23: instance p21 does not have a limited maximum route length, and thus the optimization time is considerably longer than in the case of instances p22 and p23, even though all these instances have the same graph complexity (360 vertices and nine depots).

C. Comparison With Other Methods
The performance of the AACO-NC on the benchmark instances was compared to five state-of-the-art stochastic methods.These methods are comprised of three recent ACObased algorithms as well as two different metaheuristic principles.
When choosing methods for comparison, those based on ACO principles were preferred.They were supplemented by two methods based on different metaheuristic principles.All selected methods meet the following conditions: 1) they are state-of-the-art methods with high performance; 2) they have been published in renowned scientific journals in recent years; and 3) sufficiently detailed results for benchmark instances are available in the publications.The latter condition results from the unavailability of source codes or the difficulty of making them operational; in some cases, however, the published results are not complete.
The methods for comparison are as follows.1) IACO: Improved ACO with a scanning strategy [27].

5) VND-TSH: The hybrid algorithm combining Variable
Neighborhood Descent and Tabu Search Heuristic [46].Table III compares the best results found.The best values for each instance are emphasized by a gray background.The AACO-NC provided better or the same solutions for 17 instances.The last row of Table III records the average solutions over all the instances.The AACO-NC outperformed the rival algorithms in this regard followed by the ICA-ICA (gap 0.23%), IACO (gap 0.4%), BPC-HACO (gap 0.48%), TSACS (gap 0.7%), and VND-TSH (gap 0.84%).The gap is calculated using (19) where Avg is the average of the best solutions from Table III found by one of the algorithms (see the last row of Table III) Table IV compares the average solutions (results for the TSACS and VND-TSH are missing as they are not available in the literature).The AACO-NC, BPC-HACO, and ICA-ICA provide similar results overall: the AACO-NC is followed by BPC-HACO (gap 0.21%), ICA-ICA (gap 0.48%), and finally IACO (gap 3.44%).The gap is calculated using (19) where Avg is the average of the solutions from Table IV found by one of the algorithms (see the last row of Table IV).The BPC-HACO, although it has slightly lower overall performance than AACO-NC, provides better average solutions for more instances (eight cases) than the AACO-NC (six cases).Despite this, the AACO has more cases of best solutions found (17 cases) than the BCP-ACO (11 cases).This corresponds to the AACO-NC having a larger standard deviation (averaged 16.3) than the BCP-ACO (averaged 9.06).
Table V presents optimization times (in seconds) of the algorithms taken from the literature (they are not available for the TSACS and BPC-HACO).This is for purposes of illustration only because the optimizations were taken on different hardware configurations and, therefore, the deeper analysis would not be appropriate.However, some basic conclusions can be drawn.The AACO-NC is noticeably faster than other Authorized licensed use limited to the terms of the applicable license agreement with IEEE.Restrictions apply.with regular distribution of nodes (p12, p13, p14, p15, p15, and p17); this also applies for some more complex instances (p19, p20, and p22).On the other hand, the AACO-NC is visibly slower for complex instances with a random distribution of nodes (p08, p09, p10, and p11)-except for the ICA-ICA which is slower in all cases.Wilcoxon signed-rank tests are performed to further compare the AACO-NC with the IACO, BPC-HACO, and ICA-ICA (insufficient data is available for the TSACS and VND-TSH to perform these tests).The Wilcoxon tests were chosen because the AACO-NC results are significantly different from a normal distribution (confirmed by Shapiro-Wilk's tests).The hypotheses are as follows (H method denotes H IACO , H BCP−HACO , or H ICA−ICA according to the method benchmarked with the AACO-NC in the pairwise tests).
1) Null Hypothesis H 0 : μ AACO−NC = μ method ; there is no significant difference between the AACO-NC and the benchmarked method.2) Alternative Hypothesis H AACO−NC : μ AACO−NC < μ method ; there is a significant difference between the AACO-NC and the benchmarked method (the AACO-NC provides better results).3) Alternative Hypothesis H method : μ AACO−NC > μ method ; there is a significant difference between the AACO-NC and the benchmarked method: (the rival method provides better results).when it is above the higher value, H method is accepted (the benchmarked method outperforms the AACO-NC).
Wilcoxon tests show that the AACO-NC outperforms the IACO and ICA-ICA in most cases (hypothesis H AACO−NC accepted in 15 and 11 cases, respectively, typically for more complex problems), while these methods outperform the AACO-NC in six cases (hypotheses H IACO and H ICA−ICA are mostly accepted for simpler problems).The BPC-HACO and the AACO-NC are comparable; hypothesis H AACO−NC is accepted in ten cases, H BPC−HACO in 12 cases, and there is not enough evidence to reject the null hypothesis in one case (p14).The BPC-HACO has higher performance mostly on instances with a random distribution of nodes, while the AACO-NC mostly on instances with regular distribution of nodes.Although the BPC-HACO provides higher performance in more cases (12 compared to 10), the higher standard deviation of the AACO-NC ensures the noticeably more best solutions found in the set of performed experiments (11 compared to 2).

V. ANALYSIS AND DISCUSSION
This section analyses the behavior of the proposed algorithm from the new enhancements point of view.
A. Node Clustering Fig. 7 shows the impact of node clustering on the performance of the AACO-NC on eight selected benchmark instances.This selection was based of the problem complexity (from simple to the most complex) and distribution of nodes (random or regular).One representative is included in each category (n = 50, 75, 100, 249 for random distribution, and  n = 80, 160, 240, 360 for regular distribution).The dark blue color shows the average error (the relative difference between the average solution and the BKS) of the algorithm without using node clustering (named AACO), i.e., the full set of nodes is available for selection in the transition phase.The light blue color shows the average error of the AACO-NC, i.e., with node clustering.In both cases, the local search optimization was switched off to emphasize the influence of the node clustering principle.However, the adaptive pheromone evaporation was preserved.In all cases, performance improved; the more complex the problem, the bigger the improvement.For example, in instance p22 (360 nodes and 9 depots), the error is over 8% when not using node clustering, and it improved to about 5% when using this mechanism.Moreover, for illustration, the violet color shows the average error when the complete AACO-NC with the local search optimization is used (see results in Table II).The improvement is substantial both for random and regular graphs.
An even bigger enhancement is achieved in the optimization time-see Fig. 8.Note that the axis scale is logarithmic because of the significant differences over instances.The reduction of the optimization time depends on the problem complexity; e.g., for instance p11 (249 nodes and 5 depots), Authorized licensed use limited to the terms of the applicable license agreement with IEEE.Restrictions apply.
the AACO-NC without the local search (light blue color) is almost 5 times faster compared to AACO (dark blue color), for instance p22 (360 nodes and 9 depots), it is more than 25 times faster.The local search (violet color) has a very interesting impact.In most cases (the exception is instance p11), it accelerates the optimization; it is especially extensive in the case of regular graphs (p13, p16, p19, and p22).This is caused due to a much faster convergence (the local search is very efficient for the regular graphs).
The following assumptions can be made from the analysis conducted.
1) The node clustering increases the performance of the algorithm.
2) The more complex problem, the higher the performance improvement (the performance of the general ACO algorithm without other supporting techniques deteriorates with increasing problem complexity; the node clustering reduces this deterioration).
3) It is important to integrate the general ACO algorithm with the local search optimization to further reduce the deterioration.4) The optimization time is reduced significantly when using the node clustering.5) The more complex problem, the higher the time acceleration [due to the significantly lower computational complexity-see ( 18)].6) The local search optimization usually shorten the optimization time even though it is computationally demanding (because it accelerates convergence).

B. Adaptive Pheromone Evaporation
The impact of the pheromone coefficient on the convergence speed is shown in Fig. 9.It presents the dependence of the average error on the number of performed iterations for instance p10.This instance was chosen as one of the most complex representatives of problems with a random distribution of nodes.Two of the curves (green and blue color) indicate the optimization progress when the constant value of the pheromone evaporation coefficient was used (ρ = 0.001 and ρ = 0.1).On the other hand, the violet curve is the optimization progress with the adaptive pheromone evaporation (ρ min = 0.001, ρ max = 0.1).Adaptive evaporation ensures almost the same performance at the end of the optimization (average error 1.08%) compared to the constant evaporation when ρ = 0.001 (average error 0.89%).However, the convergence was much faster (with the adaptive approach, the 2% error is provided about iteration 10 000; without this approach, it takes almost twice as long to achieve the same error).When the larger value of the constant pheromone evaporation coefficient is used (ρ = 0.1), the convergence is fast, but at the expense of the overall performance (average error 1.98%).Fig. 10 shows the progress of the diversity of the population of ants for the same optimization as in Fig. 9.The information entropy curve corresponds to the average error curve in Fig. 9.When the large value of the pheromone evaporation coefficient is used (ρ = 0.1), the diversity quickly becomes very low (blue color).On the other hand, when ρ = 0.001,  high diversity is maintained significantly longer (green color).Adaptive evaporation ensures that the population diversity is between these two extremes (violet color); this enables to high-quality results to be achieved faster.
The same analysis is in Figs.11 and 12, this time for instance p22 as a representative for the most complex instances with regular distribution of nodes.The faster convergence, than as in case of random distribution of nodes, is noticeable in Fig. 11; the best solution is usually found before iteration 10,000 when using the adaptive evaporation.Otherwise, the optimization progress (Fig. 11) as well as entropy progress (Fig. 12) have the same development as in case of the problem with a random distribution of nodes (Figs. 8 and 9).
Several assumptions about differences between algorithms with the adaptive and constant evaporation (AACO versus ACO) can be formulated.
1) AACO ensures faster convergence to a local or the global optimum than ACO when ρ min = ρ and ρ max > ρ.
2) The overall performance of AACO is very close to the performance of ACO when ρ min = ρ.3) Achieving the comparable performance as with ACO, however faster, could be done via AACO when ρ min < ρ. 4) Solutions of similar quality are found noticeably faster via AACO when ρ min = ρ.

C. Entropy-Based Termination Condition
The impact of the new condition on the termination of the algorithm is shown in this section.The optimizations were executed with other termination conditions switched off, i.e., only the entropy-based condition terminated the algorithm (ω = 0.1).The total number of iterations performed as well as the iteration in which the best solution was found (both values averaged over 30 experiments for each instance) were recorded and the ratio between these two values is calculated and presented in Fig. 13.This ratio represents the "wasted" optimization time, i.e., the iterations that were performed at the end of the optimization but did not improve the solution.The results show that for the simple problems (n < 100), the ratio is relatively high.The reason for this is that a high-quality local or the global optimum is found quickly, long before the diversity in the population decreases to the termination threshold.However, the ratio for more complex problems (n > 100) is mostly below 2 (or slightly exceeds 2 in two cases); the only exception to this is instance p16 (ratio above 3).For the more complex problems (n = 249 for random topology of nodes and n = 360 for regular topology of nodes), the ratio is even lower (below 1.3, that is no more than 30% of iterations are wasted).Also, the termination condition works without the significant difference both for random and regular distribution of nodes.For the simpler problems, the entropy-based termination condition is recommended to complement with another termination condition, preferably by limiting the number of iterations without improving the solution.

VI. CONCLUSION
The proposed algorithm with the novel techniques (node clustering, adaptive pheromone evaporation, and entropybased termination condition) improves the overall performance when applied to solve the MDVRP.This was verified on a set of benchmark instances of various complexity, both with random and regular distribution of vertices.The comparison with other state-of-the-art metaheuristic methods showed the high potential and competitiveness of the proposed algorithm for complex combinatorial optimization problems.
The novelty in this article lies in the adaptation of the three supporting optimization mechanics to the MDVRP, and their integration into the ACO-based algorithm.Moreover, the findings achieved in this study could inspire further modifications of the proposed techniques for other combinatorial problems.
The future work of the authors will be focused on the organization of vertices into clusters.The goal is to increase the probability that the vertex forming the optimal route is included in the primary clusters; at the same time, the size

Manuscript received 5
May 2022; revised 7 September 2022 and 21 November 2022; accepted 7 December 2022.Date of publication 19 December 2022; date of current version 1 December 2023.This work was supported by the Ministry of the Interior of the Czech Republic through the Project An Artificial Intelligence-Controlled Robotic System for Intelligence and Reconnaissance Operations under Grant VJ02010036.(Corresponding author: Petr Stodola.)

Fig. 8 .
Fig. 8. Impact of node clustering on the optimization time.

Fig. 10 .
Fig.10.Information entropy progress for instance p10 using constant and adaptive pheromone evaporation.

Fig. 12 .
Fig.12.Information entropy progress for instance p22 using constant and adaptive pheromone evaporation.

Fig. 13 .
Fig.13.Ratio of the total number of performed iterations to the iteration with the last improvement in solution.

TABLE I BENCHMARK INSTANCES
Table VI presents the results of the tests.To fail to reject the null hypothesis, test statistic Z must lie within the critical interval (−1.96; 1.96) for the level of significance α = 0.05.When Z is below the lower value, H AACO−NC is accepted (the AACO-NC outperforms the benchmarked rival method),