A Hybrid Swarm Intelligence Algorithm for Vehicle Routing Problem With Time Windows

,


I. INTRODUCTION
In recent years, logistics has been playing an important role in many areas, such as economy, industry and environment, etc. The Vehicle Routing Problem (VRP) is a logistics problem and has drawn considerable attention in the last decades. VRP has many real-world applications in industry, seeking optimal solutions can make real-world logistics more efficient, reduce transportation cost and satisfy customer requests better, etc. According to the 2019 third-party logistics study, 1 reducing transportation cost is still the top challenge.
VRP is a combinatorial optimization problem seeking to find the optimal set of routes for a fleet of vehicles in order to serve a given set of customers. In fact, VRP is a generic name given to a whole class of problems, the basic VRP makes The associate editor coordinating the review of this manuscript and approving it for publication was Gustavo Olague . 1 https://www.supplychain247.com/paper/2019_third_party_logistics_ study_the_state_of_logistics_outsourcing assumptions such as there is only one depot, the fleet vehicles are homogeneous, one route per vehicle, etc. Researchers eliminate these assumptions by regarding them as constraints, which results in many variations of traditional VRP, such as Capacitated Vehicle Routing Problem (CVRP) [1], Vehicle Routing Problem with Time Windows (VRPTW) [2], Dynamic Vehicle Routing Problem (DVRP) [3], Vehicle Routing Problem with Pickup and Delivery (VRPPD) [4], etc. In this paper, we address VRPTW, aiming to minimize the number of vehicles (NV) first, and then the total distance (TD). A solution for VRPTW is feasible if the set of routes satisfy the constraints, i.e., all vehicle capacities are not exceeded, and all customers are served within the given time windows. A typical example of VRPTW is shown in Fig. 1, in which each customer node has its location and a certain service time window, and three vehicles depart from the depot to service customer requests.
Determining the optimal solution to VRP is NP-hard [5], current VRP algorithms can be divided into two main categories: exact algorithms and heuristic algorithms [6]. Exact algorithms [7], [8] usually solve small scale VRP, with the size of the problem increases, the computational time of exact algorithms grows exponentially. Golden et al. [9] point out that exact algorithms do not work well on VRP with more than 50 customers.
Many comparison studies [44]- [46] have analyzed the impact of different heuristics and metaheuristics for VRP, the conclusions drawn from those studies showed that no single heuristic or metaheuristic could exceed others in all the cases, and certain cases require dynamic heuristic analysis to determine which heuristics to use according to their features. The heuristic analysis also illustrated that hybridization allows enhancing the strengths and compensating the weaknesses of two or more methods, with the aim of generating better solutions by combining the key elements of competing methodologies. In this paper, we further explore and implement multiple heuristics including ACS, BSO, 2-opt and λ-interchange to achieve near-optimal solutions for VRPTW. The peripheral frame of the algorithm is ACS, in which after setting the initial information, ants begin to construct routes and update pheromones locally. When all the ants have constructed their solutions, the best solution found by all the ants is sent to BSO for further optimization. In the modified BSO procedure, 2-opt heuristic is performed for intra-route improvement if one route is selected, and λ-interchange are performed for inter-route improvement if two routes are selected. The further optimization strategy based on BSO and the improvement heuristics is performed not only enhance the search in the solution space, but also avoid local optimum of ACS.
It is worth to mention that Wu et al. [41] proposed a brainstorming-based ant colony optimization algorithm named IBSO-ACO to solve VRP with soft time windows. In the IBSO-ACO method, an improved BSO was designed and combined with the ACO algorithm. The main differences between their work and ours are: 1) A penalty cost is added to their objective function if the constraints of the time window is violated, while we focus on VRP with hard time windows, i.e., time window constraints must be satisfied by all vehicles; 2) The algorithm proposed by us hybridized ACS instead of the classic ACO in order to balance exploration and exploitation better due to the state transition rule in the ACS [47]; 3) The global pheromone update in the ACS also makes the search more directed; 4) In the BSO procedure, we applied a different clustering scheme which clusters the population according to the geographical coordinates of customers in different routes, while the IBSO-ACO clusters the population according to the cost; 5) The IBSO-ACO was performed at the solution level, i.e., it maintains a population of solutions, and generates new solution randomly, which is very time consuming and will probably lead to infeasible solutions. However, the proposed algorithm applied different heuristics such as 2-opt and λ-interchange to generate new solutions, which is more effective and more efficient.
The main contributions of this paper are: • A hybrid ACS-BSO algorithm is proposed, in which BSO is used to further optimize the solution and to avoid local optimum compared to classic ACS. VOLUME 8, 2020 • Both intra-route and inter-route improvement are considered in the BSO procedure.
• 56 instances of VRPTW with 100 customers are evaluated to demonstrate the effectiveness of the proposed algorithm. The rest of this paper is organized as follows. Section II describes the definition and mathematical model of VRPTW. Section III first introduces classic ACS, BSO, 2-opt and λ-interchange algorithms, and then proposes the hybrid ACS-BSO algorithm. Section IV evaluates ACS and the proposed algorithm. Section V concludes the paper.

II. PROBLEM DEFINITION AND MODELING
The VRPTW can be defined as a directed complete graph Normally, v 0 is set as the depot, and {v 1 , v 2 , . . . , v n } are N customers. A set of |K | homogenous vehicles with the same capacity Q depart from depot v 0 . Each customer v i has a demand of capacity q i and a service time window [e i , l i ], where e i is the earliest time at which service for customer v i may start, and l i is the latest time at which service may start. Thus, a vehicle must wait if it arrives at customer v i before e i , and it must arrive before l i . Each customer request also has a service time s i , and each customer in the network requires to be serviced by one vehicle only once. The travel cost c ij between vertices i and j is represented in proportion to Euclidean distance between them. In the 100 customer instances of Solomon's VRPTW benchmark, the vehicle speed is set as the unit, i.e., the time cost t ij is equal to c ij . The mathematical model of VRPTW is defined as follows [46].
Parameters description: K the set of all vehicles V the set of all customers N total number of customers Q maximum vehicle capacity j∈V ,j =i where TD is the total distance of all vehicles in Eq. (1). Eqs.
(3)-(4) illustrate that there are maximum |K | vehicles used to serve customers. Eqs. (5)-(6) ensure that each customer is serviced by one vehicle only once. The vehicle capacity constraint is specified by Eq. (7). The time windows constraints are defined by Eqs. (8)- (10). A solution is feasible if all the constraints are satisfied. The route can be represented as a concatenation of customers, and the solution is represented as a list of routes, a typical solution for a VRP with 10 customers is shown in Fig. 2, which has two routes: 0-5-2-1-6-3-0; 0-4-8-7-10-9-0.

III. PROPOSED HYBRID ACS-BSO ALGORITHM
In this section, we first introduce the ACS, improvement heuristics 2-opt and λ-interchange (local search), and BSO algorithm. Then, the hybrid ACS-BSO algorithm is proposed. The breadth search of solutions is ensured via swarm intelligence algorithm due to its population based feature, while the depth search of solutions is achieved by the local search heuristics. Therefore, hybridization of swarm intelligence algorithm and local search leads to both breadth and depth search of solutions.

A. ANT COLONY SYSTEM
Ant Colony Optimization (ACO) [48] is first proposed by Dorigo, which is inspired by ant behavior of leaving pheromones to direct each other to food while exploring the environment. When a colony of ants have different routes to reach the food, those who travel the shorter route go back and forth to the depot more frequently and leave more pheromones. Ants choose route according to the density of pheromones, i.e., the more pheromones left on a route, the more likely ants choose this route. At the same time, pheromones also evaporate over time.
In ACO, the state transition rule, i.e., the probability of ant k moves from customer i to j is defined in Eq. (11): where τ ij is the pheromone deposited for transition from customer i to j, η is the desirability of state transition (normally set as 1/d ij , where d ij is the distance between customer i and j), β ≥ 1 is a parameter which controls the relative influence of η ij , and J k (i) is the feasible set of customers that remain to be visited by ant k. When a solution is found by the ant colony, pheromones along the edges are updated according to Eq. (12): (12) where 0 < ρ < 1 is the pheromone evaporation coefficient, and τ k ij is the pheromone deposited by ant k, which is defined by Eq. (13): where L k is the length of the route traveled by ant k. ACS [47] is a variation of ACO algorithm, which differs from ACO in three main aspects: 1) a probability parameter is added to the state transition rule to balance exploration and exploitation 2) a local pheromone update rule is applied when ants are constructing routes 3) a global pheromone update is applied only to the edges in the best route In ACS, the state transition rule is defined by Eq. (14): where 0 ≤ q ≤ 1 is a uniformly distributed random number, 0 ≤ q 0 ≤ 1 is a parameter of probability, which allows ants to focus more on exploitation when q ≤ q 0 and focus more on exploration otherwise. S is the state transition probability from Eq. (11). In ACS, the local pheromone update is performed by Eq. (15): The global pheromone update is performed by Eq. (16): where where α is the pheromone evaporation rate, and L gbest is the length of the global best route. The global pheromone update is performed when all the ants have completed the tours, and only the ant which constructed the shortest tour is allowed to deposit pheromone, which makes the search more directed. The pseudocode of the ACS algorithm is shown in Alg. 1.
In optimization, 2-opt [16] is a widely used local search algorithm first proposed by Croes for solving TSP. The main idea of 2-opt algorithm is to reverse a subset of the route itself, as shown in Fig. 3, in which 2-opt is applied to modify a single route: the original route is 0-4-2-1-3-5-6-7-0, after performing 2-opt algorithm, the order of the sub-route 3-5 is reversed, and the new route is 0-4-2-1-5-3-6-7-0.
There are two selection strategies for selecting candidate solutions S from N λ (S), where N λ (S) is the neighborhood solutions of current solution S.

1) Best-Improve (BI) strategy goes over all solutions S in
N λ (S) and selects the one which results in maximum decrease in cost.

2) First-Improve (FI) strategy accepts the first solution S
in N λ (S) which results in a decrease in cost. Since BI strategy usually takes too much computational time than FI strategy, in this paper, 2-interchange with FI strategy is implemented to make the algorithm more efficient, i.e., the algorithm accepts the first improved solution and runs the next iteration.

C. BRAIN STORM OPTIMIZATION
Brain Storm Optimization (BSO) [49], [50] was first introduced in 2011, which is inspired by the human brainstorming process, and has been widely and successfully used to solve a lot of optimization problems [51], [52]. The procedure of classic BSO algorithm is described as follows. 1) Randomly generate N individuals / solutions, initialize parameters p 1 , p 2 , p 3 , p 4 ; 2) Clustering: Cluster N solutions into M clusters, and mark the best solution in each cluster as the cluster center; 3) Evaluate N solutions according to fitness function; 4) Replacing: Generate a random number r 1 ∈ (0, 1), if r < p 1 , randomly select a cluster center, and randomly generate a solution to replace it; 5) Generating: Generate a random number r 2 ∈ (0, 1), if r 2 < p 2 , randomly select a cluster and generate a random number r 3 ∈ (0, 1). If r 3 < p 3 , generate a new solution by adding random values to the selected cluster center, otherwise, generate a new solution by adding random values to a random solution in selected cluster; If r 2 ≥ p 2 , randomly select two clusters, and generate a random number r 4 ∈ (0, 1), if r 4 < p 4 , then combine two cluster centers and add random values to generate a new solution, otherwise, combine two random solutions in selected clusters and add random values to generate a new solution; 6) Selecting: Evaluate the new generated solution, and compare it to the existing solution with the same index, the better one is kept and recorded; 7) If N new solutions have been generated, go to step 8; otherwise, go to step 5; 8) Terminate the procedure if the maximum number of iterations is reached; otherwise, go to step 2. To apply the classic BSO to VPRTW, the new solution generation operation in step 5) performs at the solution level, which is very time consuming and will probably lead to infeasible solutions. To make the process more efficient, we modify the classic BSO algorithm to optimize VRPTW solutions at the route level. First, we divide routes into two clusters A and B according to their coordinates, i.e., the geographical coordinates of customers in the routes. Other different clustering strategies can also be applied since clustering in BSO is only for simulating the problem owners to pick up better solutions they believe in the brainstorming process.
After clustering, randomization rationale from BSO is performed to enhance solution diversity and to avoid local optimum of ACS. The algorithm has four different ways of generating new solutions: 1) perform 2-opt on a cluster center; 2) perform 2-opt on a random route in the cluster; 3) perform 2-interchange on two cluster centers; 4) perform 2-interchange on two random routes in two different clusters. More details are shown in Alg. 2.

Algorithm 2 Modified BSO for VRPTW
Input: solution S (i.e., NV, routes) as initial solution Output: new_solution S for i := 1 to NV do compute cost for routes in S end for while not termination do perform route clustering on S find centers for each cluster for i := 1 to NV do if rand(0, 1) < p 1 then randomly pick a cluster C j if rand(0, 1) < p 2 then

PROPOSED HYBRID ACS-BSO ALGORITHM
The proposed hybrid ACS-BSO algorithm combines population based method and local search, the overall procedure is described in Alg. 3.
The first step is to initialize the parameters for ACS and BSO algorithms. After initialization, the outer-loop of ACS starts. In the inner-loop of the proposed algorithm, the initial solution is constructed in two ways, either picking the nearest neighbor as the next customer or picking a new customer randomly. In this case, the diversity of the solution is ensured, and different initial solutions can also help to avoid local optimum. When the initial solution is constructed, ant actions described in Section III-A is performed. Since local Algorithm 3 Hybrid ACS-BSO Algorithm 1: Initialize parameters 2: while not termination do each loop is an iteration 3: set ants' initial positions 4: while not termination do each loop is a step 5: construct solution 6: perform local pheromone update 7: send current best solution to BSO 8: further optimize by BSO with local search 9: end while 10: perform global pheromone update 11: end while pheromone update can't ensure the quality of the solution, current best solution is then sent to BSO to get further optimization. Besides, further intra-route and inter-route optimization can also improve the diversity of solutions. Global pheromones of ants are updated after BSO. The proposed algorithm will output the best solution found if the condition of termination is satisfied, i.e., either the maximum number of iterations is achieved, or the solution is not improved after a certain number of iterations.

IV. EXPERIMENTS AND DISCUSSIONS
For our experiments, we choose the 56 instances of Solomon's benchmark with 100 customers, which is most widely used for evaluation. The benchmark has six sets of problems: C1, C2, R1, R2, RC1, and RC2. ''C'' stands for clustered, which means that the geographical coordinates of customers are clustered in problem sets C1 and C2. ''R'' represents random, which means the benchmark data are randomly generated (uniformly distributed) in problem sets R1 and R2. And ''RC'' means a mix of random and clustered. In problem sets 1 (i.e., R1, C1, and RC1), the capacity of the vehicle is small, and the time windows are narrow, thus more vehicles are required to service the customers, and fewer customers will be serviced by the same vehicle. The number of vehicles required are normally larger than 10 for problem sets 1. In contrast, problem sets R2, C2 and RC2 have wide time windows and permit more customers per route, the number of vehicles required are much fewer.

A. EXPERIMENT SETUP
The parameters for ACS are set as follows: M = 30, α = 1, β = 2, ρ = 0.1, q 0 = 0.1, max_iter = 500. All the parameters were tuned to balance the quality of solutions and the computational cost. Although the maximum number of iterations of ACS is set as 500, the proposed algorithm terminates earlier before achieving the maximum number of iterations except for a few complicate instances.
BSO is used to further improve the solutions obtained by ACS, the probability parameters for BSO are set as: p 1 = 0.3, p 2 = 0.4, p 3 = 0.5. Since it is nested in the loop of ACS, the number of maximum iterations for BSO is set as 15, i.e., for each current best solution obtained by ACS, BSO will further optimize the solution by using either 2-opt or 2-interchange method with first improvement strategy.
The proposed algorithm was programmed in Python, and all our experiments were conducted on an Intel Xeon E5-2650 CPU@2.30GHz PC with 16GB RAM.

B. RESULT ANALYSIS
The best results obtained by classic ACS and the proposed algorithm are shown in Table 1, as well as the Best Known Solutions (BKS) found by other researches so far. In Table 1, ''NV'' represents the number of vehicles, ''TD'' means total distance, ''BNV'' and ''BTD'' stands for best number of vehicles and best total distance, respectively. Although the objective is to minimize the total distance, researches also focus on minimizing the number of vehicles used as well. Tan et al. points out that all the instances in problem sets C1 and C2 have positively correlating objectives, and many instances in problem sets R1, R2, RC1, and RC2 have  conflicting objectives [53], i.e., a multiobjective optimization problem. We list two columns of the best results found, one with minimum NV and the other with minimum TD. To see how much hybridization of different algorithms cam improve the solutions, we computed the cost reduction between our solutions and solutions obtained by classic ACS. To compare with the BKS, the solutions which have fewer NV or smaller TD are highlighted with bold fonts, the gap (i.e., percentage deviation) between the BTD and the BKS is also computed. The cost reduction [41] and the gap [54] are computed according to Eq. (18) and Eq. (19).

Cost Reduction =
TD ours − TD ACS TD ours (18) Gap = TD ours − TD BKS TD ours (19) It can be observed from Table 1 that for problem sets C1 and C2, all the best known solutions are found by the proposed algorithm. Classic ACS fails to find two optimal solutions for instances C104 and C204. For problem sets R and RC, more vehicles are used in R1 and RC1 because they have tight time windows, 5/12 and 2/8 solutions with fewer NV or smaller TD are found by the proposed algorithm, all the cost reductions of total distance are further optimized by BSO ranging from −7.14% to −2.15% for problem set R1, and −13.89% to −2.47% for problem set RC1. For problem sets R2 and RC2, almost all solutions with fewer number of vehicles (10/11 and 8/8 respectively) are found by the proposed algorithm, and the total distance of all instances are also optimized by the BSO algorithm. For all instances in problem sets R and RC, all the solutions obtained by the proposed algorithm are better than classic ACS. The proposed algorithm found competitive solutions for 42 out of 56 instances, including all instances in type C and 18 out of 19 instances of problem sets R2 and RC2.
To get an overview of different problem sets, the average cost reduction and gap of each problem sets are also computed. For all the 56 benchmark instances, the experiments were run for 10 times, and the average total distance of BKS, ACS and the proposed hybrid ACS-BSO algorithm, as well as the average cost reduction and gap of each problem sets are shown in Table 2. In Table 2, for problem sets C1 and C2, the proposed algorithm has a slight improvement over classic ACS, and there is no gap since all the optimal solutions are obtained. For problem sets R1, R2, RC1 and RC2, the average cost reduction are much larger than C1 and C2 problem sets, which are −4.03%, −5.68%, −5.71% and −4.37%, respectively. Besides, the average gaps are relatively small, which are 1.08% and 0.96% for problem sets R1 and RC1, 2.06% and 2.02% for problem sets R2 and RC2. The average gaps are higher for R2 and RC2 problem sets is that most solutions obtained have fewer vehicles being used to service the customers, thus result in larger total distance.
For problem sets C1 and C2, the proposed algorithm can find optimal solutions in less than 30 seconds. Thus, in the aspect of convergence speed, the proposed algorithm is very efficient.

C. CASE STUDY
Two heuristics were used in the proposed algorithm, which are 2-opt and 2-interchange. Theoretically, a sequence of operations of 2-opt could get stuck on the local optimum, while a sequence of operations of 2-interchange can move current solution to anywhere in the solution space [46]. However, 2-interchange operation is very time consuming, while 2-opt is simple and effective. Thus, both heuristics were taken VOLUME 8, 2020  to balance the computational cost of the algorithm and the quality of the solutions.
For further analysis of the heuristics used, two instances were chosen, which are C104 and RC208. The solutions to instance C104 are very similar to each other, while the solutions to instance RC208 are quite different. The solutions are shown in Fig. 5 and Fig. 6, and the detailed routes are shown in Table 3 and Table 4.
For instance RC208, it can be observed from Fig. 6 that the number of vehicles is reduced from 7 to 4 by the proposed algorithm, and the total distance is reduced from 862.36 to 782.15. The detailed routes are shown in Table 4. There are many differences between the two solutions, but there also exists many short fragments in two solutions. Firstly, the intra-route improvement by 2-opt would change the position of the customer nodes in the routes, but 2-opt operation is not able to reduce the number of vehicles. Secondly, in Table 4, many routes in the solution to ACS are short, i.e., there are not many customer nodes in the routes. In addition, the inter-route improvement by 2-interchange with operator (0, x) or (x, 0) is able to reduce the number of vehicles. Thus, the number of vehicles is reduced from 7 to 4. However, to optimize the solution to ACS to the the solution obtained by the proposed algorithm, a lot of 2-interchange operations have to be taken. Therefore, any single heuristic has its limitations, it is essential to perform multiple heuristics.

V. CONCLUSIONS AND FUTURE WORK
In this paper, we proposed a hybrid swarm intelligence algorithm for solving the VRPTW. The ACS, BSO, as well as 2-opt and λ-interchange local search heuristics, were illustrated. The proposed algorithm uses ACS to optimize the solution first, and then performs BSO with local search for further optimization. Experiments on Solomon's benchmark with 100 customers showed that the proposed algorithm can achieve competitive results comparing to the best known solutions obtained by many other different methods. In addition, although classic ACS method can achieve good quality solutions for VRPTW, hybridization of different algorithms can highly improve those solutions achieved from one single classical method. We think a successful strategy must consider two aspects: i) ''breadth'' via population based method, ii) ''depth'' via local search. Our experimental results obtained very competitive solutions with regard to the best known solutions, a total of 42 out of 56 optimal solutions (18 best and 24 competitive solutions) were found.
Many successful VRP metaheuristics use either local search or large neighborhood search (LNS). The main idea of LNS is ''destory and repair'', and LNS usually requires very large computational cost to explore the search space better. In this paper, the ACS and the BSO algorithms were used to explore the search space, which are more efficient due to their population-based properties. In addition, the 2-opt and λ-interchange local search methods were applied, which are simple and effective.
The new solution generation operations of BSO in the proposed algorithm have been performed at the route level. According to our observation from the near-optimal solutions, most fragments (i.e., edges in routes) are the same as the best known solutions. In this case, cross over at solution level to inherit good fragments is another possible way to improve the convergence speed and produce high quality solutions. Further research could also focus on solving other variants of the VRP and related problems.