Memory-Based Ant Colony System Approach for Multi-Source Data Associated Dynamic Electric Vehicle Dispatch Optimization

The developments of electric vehicle (EV) technology and mobile internet technology have made the EV-oriented ride-hailing service a trend in smart cities. In the service scenario, a high-quality order allocation approach is in great need to quickly process a series of customer request orders, so as to reduce total customer waiting time and transportation cost. To simulate real-world customer-EV allocation scenarios, in this paper, a dynamic EV dispatch (DEVD) model is established by considering multi-source data association from five sources, including customer, vehicle, charging, station, and service. To solve the proposed multi-source data associated DEVD model, a memory-based ant colony optimization (MACO) approach is developed. MACO maintains a memory archive to store the historically good solutions, which not only can be used to update pheromone to guide the search, but also can be used to help the reactions to environmental changes. In response to dynamic changes, a partial reassignment strategy is also proposed to re-optimize some of the assigned customer-EV pairs in the historically best solution. Moreover, an exchange or replace local search procedure is designed to enhance the performance. The MACO algorithm is applied to a set of dynamic test cases with different customer request and EV sizes. Experimental results show that MACO generally outperforms the first-come-first-served approach and some state-of-the-art ACO-based dynamic optimization algorithms.


I. INTRODUCTION
W ITH the development of mobile internet technology, online car hailing services (e.g., Didi and Uber) have become popular in people's travel in smart cities [1]. Moreover, electric vehicles (EVs) are gradually being promoted as an alternative to fuel vehicles in smart cities due to the increasing green energy requirements in society [2] and the low energy consumption and environmental protection of EVs [3]. For example, Didi Chuxing, China's largest online ride-hailing platform, launched the first customized car, an EV called D1, in 2020 through cooperation with the BYD company [4]. Also, a recent study based on Uber shows that the use of EVs in online ride-hailing services is greatly beneficial for reducing emissions and has no statistical difference for services when compared with using fuel vehicles [5]. Therefore, nowadays EVs have gradually become an important part of the online ride-hailing services. The significance of online ride-hailing services and the universality of EVs raise the urgent need for research into EVs operations.
Compared to the research on fuel vehicles [6], [7], the research into EVs operations mainly includes energy management [8], [9], charging station allocation [10], power and charging system [11], [12], and EV route planning [13]- [15]. Traditional fuel vehicles use diesel or petrol as energy, while EVs rely on electricity [16]. For traditional fuel vehicles, due to the large capacity fuel tank and the quick refueling speed, the influence of tank capacity and refueling is always negligible in research. However, for EVs, they have relatively small battery capacity and slow recharging speed, which cannot be ignored in practical applications. These differences between fuel vehicles and EVs make it more complicated to dispatch EVs in vehicle dispatch problems of online ridehailing service, due to the need for considering multi-source data, such as the EV battery status and the charging station information.
Many studies have been made in solving vehicle dispatch problem. Some of them are on traditional fuel vehicle dispatch, and some of them are on EV dispatch. For solving vehicle dispatch problem, a simple way is dispatching the nearest vehicle to the customer who makes a request, based on the first-come-first-served (FCFS) approach [17], [18]. However, this approach only focuses on individual customer satisfaction and cannot provide a satisfactory solution at the global level. This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/ As described in [19], during peak demand periods, Didi Chuxing [20] needs to match over a hundred thousand orders every second in China. Therefore, for the ride-hailing service platform, a global dispatch scheme is in great need. To this aim, some studies are conducted by considering the dispatch of vehicles to customers in a global view. Seow et al. [21] proposed a multiagent system called NTuCab to assign taxis to all customer requests made in a given time window. The agents on behalf of drivers cooperatively negotiate the assignment of customer requests in a distributed fashion. Zhang et al. [19] modeled the taxi order dispatch as a combinatorial optimization problem (COP). They predicted the probability of a customer request accepted by a driver based on various factors and used a hill-climbing method to maximize the global success rate. Miao et al. [22] presented a dynamic taxi dispatch problem based on real-time sensing data. A receding horizon control approach is applied to allocate vacant taxis to different regions for matching the passenger demands. Hu and Dong [23] proposed an optimization-based dispatch model that considered both the taxi system efficiency and customer equity. Moreover, an artificial-neural-network-based model is proposed and trained using the optimization model's dispatch solutions to learn the optimal dispatch strategies.
Different from the traditional fuel vehicle dispatch problem that only needs to consider the customer data (e.g., location and destination) and the vehicle data (e.g., location, velocity, and service status), the EV dispatch (EVD) problem needs to consider more data from other sources such as the charging data (e.g., remaining battery capacity and charging information) and the station data (e.g., location and available status). For example, Shi et al. [24] regarded the scheduling of an EV with insufficient battery to complete the service as an infeasible action and developed a reinforcement learning based algorithm to solve a community owned EVD problem for providing ride-hailing services to local residents. In [25], we studied to solve the EVD problem in static environment. However, in the real-world real-time service environment of EVD, dynamic changes may always occur such as the arrival of new customer requests, the cancellation of old customer requests, or the entry and exit of EVs. In order to make our EVD model more practical for real-world application, a dynamic EVD (DEVD) model is proposed in this paper, so that the real-time dynamic information about new and cancelled customer requests can be considered. Therefore, the DEVD model is a practical dispatch model associated with multi-source (i.e., five sources) data from the customer, vehicle, charging, station, and service.
In order to efficiently solve the multi-source data associated DEVD problem, a swarm intelligence algorithm named ant colony optimization (ACO) [26]- [28] is adopted in this paper because swarm intelligence algorithms have shown promising performance in many kinds of optimization and scheduling problems [29]- [32]. The ACO is a meta-heuristic search algorithm inspired by the foraging behavior of ants in nature. Real ants cooperate to find the shortest path from their nest to the food via pheromone. In recent years, ACO and its ant colony system (ACS) variant [33] have been successfully applied to many COPs, such as cloud resource scheduling [34], [35], scheduling problems [36], [37], personalized trip recommendation [38], disassembly planning problems [39], vehicle routing problem [40]- [42], and taxi dispatch problems [43]. Since the DEVD problem studied in this paper is a COP extended from taxi order dispatch, ACO can be a promising solver. In fact, in our previous study [25], an ACS-based approach has shown its effectiveness and efficiency in the static EVD problem, showing the potential of ACO in dealing with the DEVD problem.
The DEVD is also a dynamic optimization problem (DOP) and is challenging for traditional ACO/ACS algorithms. In the literature, the research into ACO-based approaches for solving DOP and dynamic COP has also raised great attention [44]- [47]. The simplest way is to restart the algorithm once a dynamic change occurs [48]. However, the re-optimization process is time-consuming for the re-convergence, resulting in poor efficiency for most DOPs that are with smooth/slight changes [49]. Therefore, some studies have been made to better solve dynamic COP. Guntsch and Middendorf [50] proposed a population-based ACO (P-ACO) to solve dynamic traveling salesman problem (TSP) and also extended the P-ACO to solve dynamic quadratic assignment problem [51]. Inspired by [51], Montemanni et al. [52] further designed a pheromone conservation parameter to manage the information transfer from the previous environment to the new environment, so as to solve dynamic vehicle routing problem (DVRP), resulting in the ACS-DVRP algorithm. Mavrovouniotis and Yang [53] studied the dynamic TSP with traffic factors and proposed an ACO framework with three immigrant schemes to increase the diversity, including random immigrant, elitism-based immigrant, and memory-based immigrant. Their experimental results show that the elitism-based immigrant ACO (EIACO) performs best in random dynamic environments and the memory-based immigrant ACO (MIACO) performs best in cyclic dynamic environments.
In this paper, we propose a novel and more realistic memory-based ACO (MACO) approach for efficiently solving the DEVD problem. The MACO uses a memory archive to record the solutions that perform well in previous environments. These well-performing solutions can be utilized to help fast indicate the new global optimal solution in the new environment. Moreover, two special strategies are designed to further help MACO better solve the DEVD problem. Firstly, in response to dynamic changes, we design a partial reassignment (PR) strategy to re-optimize some of the customer requests instead of re-optimizing all valid customer requests in the new environment. Secondly, a local search procedure called exchange or replace (EoR) strategy is designed to enhance the performance. We conduct experiments on a set of dynamic test cases with different customer and EV sizes and compare MACO algorithm with not only the FCFS approach, but also some state-of-the-art and recent ACO-based dynamic optimization algorithms. Experimental results show that MACO has generally better performance than the compared algorithms on the DEVD problem. Therefore, the main contributions of this paper are summarized as follows: (1) Firstly, we establish a dynamic EV dispatch model for simulating real-world dynamic EV dispatch application scenarios, by associating multi-source data from five sources, including customer, vehicle, charging, station, and service.
(2) Secondly, we propose a memory-based ACO approach for efficiently solving the DEVD problem by enhancing the adaptability in dynamic environments via pheromone transfer through archived solutions.
(3) Thirdly, we propose a partial reassignment strategy to optimize partial requests in the new environment to better respond to the dynamic environment, and a new local search procedure to enhance the performance. The proposed strategies help the MACO algorithm obtain a better balance between the performance and execution time, being more suitable for the DEVD real-world application.
The rest of this paper is organized as follows. Section II introduces the multi-source data associated DEVD model. Section III describes the MACO algorithm in detail. Section IV presents the experimental results, comparisons, and analysis. Finally, Section V concludes this paper.

II. ELECTRIC VEHICLE DISPATCH PROBLEM A. Static EV Dispatch Problem
In a dispatch scenario, a number of EVs and charging stations are distributed within a certain geographical region. The dispatch center monitors the activity of EVs and the status of charging stations (whether available or not) in real time via GPS and wireless communication network. In a small time window, multiple customers send out service requests. The dispatch center arranges suitable EVs for these customers simultaneously, so as to maximize total service quality. The basic notations for the EVD problem that is modeled by multi-source data are listed in Table I.
The service process of EVD is illustrated in Fig.1, which is divided into two stages: 1) EV goes to the customer location; and 2) EV delivers the customer to the destination. When the dispatch center selects a candidate EV for a customer, it needs to consider both the locations of the customer and the EV. Furthermore, the remaining power of the EV battery should also be taken into account. That is, the EV's remaining battery capacity should be 'sufficient' to deliver the customer to the destination. To ensure the availability of EV after reaching the destination, 'sufficient' is defined as that the remaining battery capacity can support the EV to reach at least one charging station after completing a customer's service request. If the remaining battery capacity of an EV is not sufficient, this EV needs to be recharged during the service process.
In an actual EV service scenario, there are three types of service routes for EVs as shown in Fig.1. The selection of route type depends on the remaining battery capacity of the EV: 1) if the remaining battery capacity of an EV is sufficient, then the route for this EV is R 1 + R 2 ; 2) if the remaining battery capacity is not sufficient to support this EV to reach the customer location or reach any charging station after reaching the location of the customer. That is, in this service, the EV needs to be recharged in the first stage of the service process, so the route is x 1 + x 2 + R 2 ; 3) if the remaining battery capacity is not sufficient but can support this EV to reach at least one charging station after reaching the customer location. In this case, this EV can be recharged in the first stage or the second stage, so it is necessary to judge in which stage the transportation cost is lower, so as to select route x 1 + x 2 + R 2 or route R 1 + x 3 + x 4 for this customer-EV pair. It is assumed that if an EV is needed to be recharged, it is only recharged once during the service. Moreover, to avoid too long charging time, the charging process stops once the remaining battery capacity is sufficient to support the EV to complete the service, without having to fully recharge the battery. It should be noted that when a low-power EV completes a service during which it needs to recharge, it will stop accepting request, which is controlled by the dispatch center. The EV will be recharged at the charging station until it is fully recharged or the charging process is terminated by the owner, and then the EV could continue to serve. This can avoid the situation where the EV is always in low-power status and has to be recharged again and again. Also note that how to manage EV charging may vary due to different policies on different platforms. The policy considered herein is just an example. Fig.2 illustrates how to select the charging station when the remaining battery capacity of the EV is not sufficient, taking recharging in the first stage as an example. First, the range that the remaining battery capacity of the EV can support its arrival is calculated and signed by the dashed circle in Fig.2. Charging stations s 2 and s 3 located in this range are candidate charging stations. Then based on the minimal increase in transportation cost, s 3 in the candidate set is selected as the best charging station. In this case, although s 2 is closer to v j , s 3 is selected because the total driving distances from v to c via s 3 (i.e., d(v, s 3 ) + d(s 3 , c)) is smaller than the total driving distances from v to c via s 2 (i.e., d(v, s 2 ) + d(s 2 , c)). If the charging station selection occurs in the second stage, the selection mechanism is the same as that in the first stage. That is, among all the charging stations reachable by the EV, the one with the minimal increase in transportation cost is selected.
Therefore, if an EV v j is assigned to a customer c i to drive to the destination g i , its driving distance l(c i , v j ) is calculated as: where ns g i , ns c i , and ns v j represent the nearest charging stations to g i , c i , and v j , respectively; q j is the remaining battery capacity of v j ; r is the electricity consumption per kilometer; s 1 and s 2 represent the charging stations selected in the first stage and the second stage, respectively. It should be noted that s 1 and s 2 are not necessarily ns v j and ns c i . "NA" means that the remaining battery capacity of an EV is not enough to reach any charging stations, so it is not available and not considered in the dispatch process. In the description below, we only discuss EVs that can reach at least one charging station according to their remaining battery capacity.
To improve the service quality, the time cost of a service process should be considered, including driving time and charging time. The driving time dt(c i , v j ) is calculated as where vel is the EV velocity. The charging time ct(c i , v j ) is calculated as where P is the EV charging power. Therefore, the time cost of one service process between c i and v j is For a customer c i , the distance d(c i , g i ) between the starting position and the destination is fixed, regardless of which EV is assigned. It has no influence on the evaluation of dispatch results but may have a negative impact on the optimization process because of the relatively large value. Therefore, d(c i , g i ) is not considered in the proposed model. In addition, to avoid excessive charging time and improve customer's using experience, a penalty for the charging time is set. That is, if the charging time exceeds a threshold T , extra charging time will be penalized. Therefore, the time cost in (4) is updated as where T is the charging time threshold and f is the penalty factor. Equation (5) indicates that for a customer c i , an EV, that is closer to the customer and has more remaining battery capacity, has a lower time cost. Thus, it is more likely to be assigned to this customer. It should be noted that our model is generic that it can consider both the situations of allowing and not-allowing recharging during the service. That is, we can simply set T = 0 and f = infinity so that the low-power EV cannot be considered to be assigned to the customer. The objective of the EV dispatch problem is to minimize the time cost formulated as subject to where x i j is an indicator variable. Constraint (8) guarantees that every customer will be assigned an EV and constraint (9) ensures that an EV is assigned to at most a customer. In this paper, it is supposed that the number of EVs to be assigned is not smaller than the number of customers, so that all the requests can be satisfied.

B. Dynamic EV Dispatch Problem
During the working time of the dispatch center, the dispatch algorithm is repeated carried out again and again. Each execution of the dispatch algorithm is regarded as a segment, which is a complete dispatch process that includes two phases: the request acquisition phase and the optimization phase. The customers in each segment can obtain the dispatch results only after the current segment is completed.
In our previous work that uses ACS to solve the EVD problem [25], the algorithm is carried out at the optimization phase after the customer requests are obtained in the request acquisition phase. However, in a real-time service environment, dynamic changes can occur during the optimization phase. Therefore, in this paper, we build the DEVD model to consider the dynamic information that occurs during the optimization phase, including the new customer requests coming and the old requests cancellation. Specifically, we divide the optimization phase into several cycles and execute them one after one, so that the new coming requests and the cancelled requests during one optimization cycle can be considered in the next optimization cycle. Fig.3 illustrates the dispatch process of the DEVD model, in which optimization cycles of the optimization phase are carried out after the request acquisition phase. In the request acquisition phase, the dispatch center receives customers' requests. In the optimization phase, these requests (customers) are dispatched by an optimization algorithm to assign appropriate EVs. However, as new requests may come and old requests may be cancelled during the optimization phase, the DEVD model treats the dispatch as a DOP and divides the optimization phase into several cycles. In the first cycle, the optimization algorithm only considers the requests acquired in the request acquisition phase. Then, new requests and cancelled requests during the first cycle are put into the buffer pool and will be considered in the second cycle. Note that the first cycle is not aware of these new and cancelled requests. Moreover, the second cycle carries out the optimization algorithm by considering all these requests, that is, the requests acquired in the request acquisition phase and the requests acquired and cancelled in the first cycle. Similarly, the third cycle considers all the requests dispatched in the second cycle and those new/cancelled requests appeared during the second cycle. This way, after the optimization of the last cycle, the algorithm obtains the dispatch solution, which considered the dynamic changes in the optimization phase, and will send the results to both the customers and EVs. Because the requests to be optimized in each cycle are different from each other, each cycle can be regarded as being in a different environment. These different environments also reflect the dynamicity of DEVD problem.
III. MACO FOR SOLVING THE DEVD PROBLEM As mentioned above, to efficiently solve the DEVD problem, the proposed MACO approach is executed on the optimization phase in a DEVD dispatch process, and the optimization phase is divided into several optimization cycles so that each cycle can consider the dynamic service information that occurred in the previous cycle. Therefore, without loss of generality, the optimization process described in the follows is based on the t th optimization cycle in a DEVD dispatch process, named the environment E t in this paper.

A. Encoding of MACO for DEVD
Each ant in MACO is encoded as a matrix, as shown in where the number of rows and columns of the matrix are the number of valid requests in current environment E t , denoted by N t , and the number of available EVs, denoted by M, respectively. Each element x i, j in the matrix refers to whether the customer i is assigned the EV j . If the value is 1, it means that customer i has been assigned EV j , while value 0 means not assigned, as shown in (7).

B. Initialization State Configurations
In ACO-based algorithms, pheromone records accumulated experience in the colony, which affects the path construction of ants. For the DEVD problem, the pheromone value τ (i, j ) is set between customers and EVs to indicate the preference that an EV v j is assigned to a customer c i , and its initial value τ 0 is set as where N t is the number of customers to be assigned in the current environment E t and T nn is the time cost (i.e., the fitness value calculated by (6)) of the solution obtained by the FCFS approach. The FCFS approach works simply as that, when a customer request is received, an EV with the lowest time cost among the currently available EVs is assigned to it.

C. Solution Construction
The solution construction process in MACO is the same as that in the traditional ACS algorithm, where ants iteratively construct solutions by using a state transition rule. To increase the diversity of solutions, the order of customers to be assigned is shuffled randomly before construction. Each ant searches for feasible solutions by assigning EVs to customers one by one, according to the order in which customers are shuffled. The search behavior is influenced by pheromone (swarm knowledge) and heuristic information (individual knowledge). Similar to TSP [33], heuristic information in the DEVD problem is given by where η(i, j ) represents the heuristic information between customer i and EV j . Based on the pheromone and heuristic information, the probability that an unassigned EV j is selected for customer i is calculated as where J i is the set of EVs that have not been assigned and β (β >0) is a predetermined parameter that determines the relative importance of heuristic information.
The state transition rule is as follows: for customer i , an EV j is dispatched by applying the rule given by where q is a random variable uniformly distributed in [0,1], J is a random number selected by roulette wheel selection according to the probability calculated in (13), and q 0 (0 ≤ q 0 ≤ 1) is a parameter to control the exploitation and exploration behaviors of ants. For the customer i , if q ≤ q 0 , then the ant greedily chooses the EV with the maximal pheromone and heuristic information, measured by τ (i, u)·η(i, u) β . Otherwise, the EV is determined as J .

D. Memory-Based Pheromone Updating Rule
In MACO, the best solution of the current iteration and the historically best solution (i.e., the global optimal solution from the beginning of the current environment) are denoted as S b and S gb , respectively. In every iteration, the S b is added into the memory archive. Furthermore, if the S b is better than the S gb , the S gb will be updated by the S b in every iteration. With the help of the memory archive, the pheromone can be updated when a solution enters or leaves the memory.
The memory archive is with the size of K and uses a first-infirst-out fashion to keep the latest best solutions information. For the first K iterations, solutions can be stored in the memory archive one by one during the iterations. Therefore, when the iteration index g ≤ K , the pheromone is positive updated according to the stored solution at the end of the g th iteration as (15) where τ (i, j ) = (τ max − τ 0 )/K and τ max denotes the maximum pheromone value, which is a predetermined parameter. Note that the τ 0 obtained in (11) will be re-calculated in every environmental change, as described later in Section III-E-2). From the (K + 1) th iteration, the oldest (i.e., the first-in) solution in the memory archive is removed (i.e., first-out) before the new solution is added and the pheromone information is negative updated according to the removed solution as where S oldest is the oldest solution in the memory archive (i.e., the removed solution) and τ is the same as defined in (15). After (16), the new solution enters the memory archive and the pheromone is positive updated according to this newly entered solution via (15). The pheromone values are maintained between τ 0 and τ max during the optimization process.

E. Strategies Reacting to Dynamic Change Based on Memory
During the optimization process, some old customers may cancel requests and some new customers may send requests. The solutions obtained in the previous environment (i.e., the previous optimization cycle) may no longer be feasible in the current environment, and the requests to be optimized may be different from those in the previous environment. In this paper, partial reassignment (PR) strategy and pheromone transfer operation are performed after the environment changes (i.e., when entering the next optimization cycle). The PR strategy is used to update the requests to be optimized in the new environment. The pheromone transfer operation is used to repair the solutions in the memory archive and update the pheromone.
1) Partial Reassignment: When the environment changes, new coming customer requests and cancelled requests are considered. The first thing to do is to invalidate the cancelled requests. Then, for all currently valid customer requests, a simple method is re-assigning EVs for all requests, no matter whether they have been assigned before or not [49]. However, such a method discards matching information in previous environments and is not suitable for slightly changing environments. Another method is to optimize the new coming customer requests independently once a change occurs, based on the incremental (INC) optimization method [54], which can reduce response time but may easily fall into local optima. In the customer-EV dispatch service, the environment often changes slightly due to short optimization time. Therefore, the restart strategy may be computationally expensive by considering all the valid customer requests, while the INC strategy may trap into local optimum by only considering the new/cancelled customer requests. Thus, this paper proposed a new PR strategy, different from the restart strategy and the INC strategy, to respond to the environmental change. PR strategy considers both the new/cancelled customer requests and previously assigned requests. However, not all the previously assigned requests, but only some of them are considered. Therefore, the number of customer requests to be assigned in the new environment is between that of INC and restart, so as to make a trade-off between reducing computational time and improving solution quality.
To get the requests to be optimized in the new environment, the PR strategy is implemented. The PR strategy releases some matched customer-EV pairs to get customers and EVs that need to be re-dispatched, so as to avoid local optimal to some extent. The specific operation of PR strategy is as follows. It should be noted that all operations of PR are performed on the historically best solution S gb . First, for the cancelled requests, their assigned EVs are released. Second, for each new customer (i.e., request), find its nearest R· N init N new customers (measured by Euclidean distance) and release those customer-EV pairs. The R is a predefined parameter, N init represents the number of customer requests received in the request acquisition phase, and N new is the number of new requests in the new environment. The more initial requests than new requests, the more customer-EV pairs will be released for optimization, so as to effectively avoid local optima. Fig. 4 shows an example of the two steps of the PR process. After the PR process, we can obtain the requests that need to be optimized in the new environment, that is, all new requests and released requests. The task of DEVD in the new environment (i.e., the next optimization cycle in Fig.3) is to assign currently idle EVs to these new and released customer requests. Note that the solution in the new environment should consider both the new dispatched and previous remained customer-EV pairs to calculate its fitness value.
2) Pheromone Transfer: In this section, we discuss how to transfer pheromone from the previous environment to the new environment. In MACO, the pheromone updating operation is performed based on the solutions in the memory archive as described in Section III-D. However, when an environmental change occurs, these solutions may be infeasible, because some customers may cancel requests. Hence, solutions in the memory archive need to be repaired and re-evaluated. As suggested in [51], a principle called KeepElite [55] is adopted to repair solutions after a change. That is, for each solution in the memory, the cancelled customer-EV pairs are released and the new customer-EV pairs are added, where each new customer will be assigned an EV that minimizes After repairing and re-evaluating all the solutions in the memory, conduct the pheromone transfer as follows. Firstly, re-initialize pheromone between all customers and EVs by (11). Then, update pheromone positively based on all these repaired solutions in the memory archive according to (15).

F. EoR Local Search Procedure
To improve solution quality, the EoR local search is carried on the historically best solution S gb before updating the memory archive and pheromone in every iteration. In order to reduce the extra computational time caused by local search and to improve the solution quality as much as possible, the EoR is conducted on the customer-EV pair with the maximal time cost in S gb , denoted by π max . The EoR includes two components: "exchange" and "replace". The example of the EoR local search procedure is shown in Fig.5.

1) Exchange Operation:
The exchange operation swaps EVs between π max and other customer-EV pairs in S gb . For each customer-EV pair in S gb , if the total time cost is reduced after swapping the EV with π max , put this pair into an exchange list exl.
where c max and v max represent the customer and EV of π max . The pair with the largest value of in exl is the selected pair to be exchanged, where is defined by 2) Replace Operation: The replace operation replaces v max with idle EVs, so that the total time cost is reduced. The idle EV that reduces the total time cost the most is the selected EV to be replaced.
The EoR local search chooses the better operation between "exchange" and "replace" to perform. That is, the "exchange" or the "replace" operation that reduces more total time cost is performed. It should be noted that if there is no idle EV (i.e., the number of available EVs is not larger than the number of currently valid customer requests), only the exchange operation is performed.

G. Complete MACO Algorithm and Complexity Analysis
The flowchart and the pseudocode of the whole MACO algorithm are shown in Fig. 6 and Algorithm 1, respectively. Moreover, the complexity analysis of MACO is given as followings.
Herein, we denote the total number of customer requests and the number of available EVs as N and M, respectively.  (N G × N), as obtained by lines 9-14 in Algorithm 1. When a dynamic change occurs, the PR Algorithm 1 MACO Input: customer requests to be matched, available EVs Output: historically best matching solution S gb Begin 1: Initialize pheromone according to (11); g = 1; 2: While g <= N G Do 3: For each ant a Do 4: Ant a constructs the customer-EV assignment by (12)-(14); 5: Evaluate the fitness of ant a; 6: End For 7: Update the historically best solution S gb ; 8: Perform the EoR local search; 9: If g > K Do 10: Remove the oldest solution in the memory; 11: Update pheromone based on the removed solution by (16); 12: End If 13: Put the best solution of current iteration S b into the memory; 14: Update pheromone based on the inserted solution by (15); 15: If dynamic change occurs Do 16: Get the requests to be optimized in the new environment by PR; 17: Repair and re-evaluate solutions in the memory; 18: Re-initialize pheromone by (11); 19: Update pheromone based on the solutions in memory by (15) Table II. IV. EXPERIMENTS In this section, experimental tests are conducted to investigate the performance of MACO on DEVD. All the algorithms are implemented in C + + and run on a PC with a Core quad-core CPU i7 and 8.0GB RAM.

A. Experimental Settings
We compare MACO with the FCFS approach and five dynamic optimization algorithms, including restart ACS (RSACS), incremental (INC) method-based [54] ACS (INCACS), ACS-DVRP [52], P-ACO [50], and EIACO [53]. 1) FCFS: Assign EVs with the minimum time cost for all valid customer requests according to their orders of arriving at the dispatch center. 2) RSACS: Re-initialize the pheromone and re-optimize all customer requests when a dynamic change occurs. The optimizer is ACS [33] and the algorithm is termed as restart ACS (RSACS).

3) INCACS: Once a dynamic change occurs, it only
focuses on the new coming and cancelled customer requests and assigns suitable EVs for newly

4) ACS-DVRP [52]: ACS-DVRP introduces a parameter to
transfer pheromone from the previous environment to the new environment for solving DVRP. When ACS-DVRP is adopted to solve DEVD problem, the optimization process is similar to that of RSACS. Except that when a dynamic change occurs, the migration of pheromone is carried out and controlled by a parameter. 5) P-ACO [50]: In every iteration, P-ACO stores the iterative best solution S b found by the ant colony into a population list of size K for solving dynamic TSP. When a change occurs in DEVD problem, P-ACO repairs the solutions in the population list by releasing the cancelled customer-EV pairs and re-assigning EVs to new coming customer requests and then re-optimizes all currently valid customer requests. 6) EIACO [53]: EIACO is an improved variant of P-ACO to solve the dynamic TSP, which introduces elitism-based immigrants to replace the worst ants in the population list in every iteration. The immigrants are generated based on the best solution in the previous iteration, using the inver-over operator [56].
Some parameter settings of EVs and DEVD have been given in Table I, where the EV technical parameters are based on the BYD e6 product parameters [57], [58]. The parameters settings of the six compared algorithms are listed in Table III. The EV dispatch is conducted in a physical area covering 100 km × 100 km. The position of all objects is generated uniformly, including EV, customer (request), destination, and charging station. The remaining battery capacity of the EV is randomly generated within [1 kWh, 60 kWh]. The requests to be cancelled are randomly selected from all customer requests. The specific data of the test cases can be downloaded from https://zhanapollo.github.io/zhanzhh/resources.htm.
For DEVD model, we design various test cases (i.e., A1 to A8) by considering different EV sizes and customer request sizes, ranging from 100 to 500 and from 50 to 500, respectively, shown in Table IV. The W is the number of charging stations. The N is the total number of customer requests (including valid and cancelled requests) and the M is the number of available EVs which can reach at least one charging station with the remaining battery capacity. Initial requests in the table refer to requests acquired in the request acquisition phase. After the request acquisition phase, the optimization phase starts, from the 1 th iteration. For dynamic changes, it is assumed that new customer requests and cancelled requests are processed every 25 iterations. Thus, the environment changes are considered after the 25 th , 50 th , 75 th , 100 th , and 125 th iterations. Changes that occur between the 125 th and 150 th iterations are not considered in current dispatch process and are postponed to the next dispatch process. In each environmental change, new requests are taken from the buffer pool and γ × N t customer requests are randomly cancelled, where N t is the number of valid requests in current environment E t and γ is set to 0.01 herein. The numbers of new requests and cancelled requests for each change are listed in the "Number of new requests" column and the "Number of cancelled requests" column, respectively. For example, on the test case A8, the numbers of charging stations, available EVs, and initial customer requests in the initial iteration are 50, 500, and 400, respectively. After the 25 th iteration, 4 old requests are cancelled and 20 new requests arrive. So after the first dynamic change, the number of valid customer requests is 416. The same changes also occur after the 50 th , 75 th , 100 th , and 125 th iterations, according to Table IV. In the experiments, all the stochastic algorithms perform 30 independent runs on each case, with their mean values compared. The best results are marked in boldface. Moreover, Wilcoxon's rank-sum test is conducted at a 0.05 significance level. The results marked with "+", "≈", and "−" indicate that MACO is significantly better than, similar with, and significantly worse than the compared algorithm, respectively.

B. Experimental Results
Table V lists the fitness values (i.e., the average time cost in minutes required for completing all valid customer requests, calculated by (6)) obtained by MACO, FCFS, and other five  Fig. 7(a). It can be observed that: 1) FCFS performs the worst, because it solves the problem from a local perspective and cannot guarantee the solution quality at the global level. 2) INCACS performs better than FCFS, but performs the worst among the six dynamic optimization algorithms. This may be due to that INCACS only optimizes the new requests when an environmental change occurs. Therefore, it only focuses on the new information, but is regardless of the previous information, being easy to fall into local optima. 3) RSACS and ACS-DVRP have a similar performance, and perform worse than MACO. This means that simply transferring pheromone by a parameter cannot improve the solution quality very efficiently. 4) EIACO, which introduces elite immigrant ants, has no significant improvement compared to the original P-ACO. This means that the immigrants generated by the inver-over operator designed for TSP have no significant effect on the DEVD problem. 5) In the cases that the number of EVs is more than the number of customer requests (i.e., A1, A2, A4, A5, and A7 with rich EV resources), MACO can get results that are similar to the best values obtained by the other six algorithms, although slightly better or worse in some cases. However, MACO performs significantly better than the other six algorithms when the number of requests is the same as the number of EVs (i.e., A3, A6, and A8 with limited EV resources). Moreover, to compare the computational efficiency of the six dynamic optimization algorithms, the average CPU runtime (in second) of each algorithm over 30 independent runs is also presented in Table V and the histogram is compared in Fig. 7(b). Notice that the runtime of FCFS is not presented because it is a kind of greedy algorithm that consumes very little CPU time. From the results, we can see that INCACS runs fastest because it only optimizes new customer requests. The runtime of RSACS and ACS-DVRP is similar, which is reasonable because ACS-DVRP runs the same mechanism as RSACS except for transferring pheromone from the previous environment by a parameter. The runtime of P-ACO is similar to that of RSACS and ACS-DVRP, indicating that the operation of the embedded population-list does not cause excessive computational time consumption. Compared to P-ACO, EIACO runs slightly longer because of the extra time spent by its elitism-based immigrant replacement strategy.
The MACO algorithm runs faster than other algorithms except INCACS. This means when new customer requests arrive, releasing some of the assigned customer requests around them can significantly reduce the runtime compared to the approaches that re-optimize all requests (e.g., the RSACS). Taking both fitness values and the runtime in Table V into consideration, it can be concluded that MACO can effectively make a trade-off between improving the solution quality and reducing computational time.

C. Necessity of Power Awareness and Recharging
In this paper, we consider that there are low-power EVs and they can be recharged during the service. To verify the necessity of vehicle recharging in the service, we design a comparative experiment. The experiment is designed based on two variants of the DEVD model: not considering the remaining power of the EV battery and not allowing the EV recharging. In the first model variant, the algorithm variant is denoted as MACO-full-power, where the remaining power of the EV battery is always regarded as sufficient no matter how much it remains. In the second model variant, the algorithm variant is denoted as MACO-w/o-recharging, where the recharging is not allowed for EVs during the service, that is, if the remaining power of an EV battery is not sufficient to complete a customer request, this EV will not be assigned to this customer. We compare MACO, MACO-full-power, and MACO-w/o-recharging in terms of fitness values, SatPercen, and UnsatPercen. The SatPercen (i.e., satisfied percentage) is the percentage of requests that can be served by EVs. However, in these customer-EV pairs, some pairs obtained by the MACO-full-power variant may be invalid. For example, for a low-power EV, the remaining power of its battery is regarded as sufficient in the MACO-full-power variant, but in the practical application, its remaining battery capacity is not sufficient to let it serve its assigned customer to the destination. In this case, the assignment is invalid and unsatisfied, and therefore the UnsatPercen (i.e., unsatisfied percentage) is the percentage of unsatisfied customer requests in all assigned customer requests. The results are given in Table VI.
It can be observed that MACO-full-power can get better fitness values than MACO does on all test cases, but has at least 25% unsatisfied percentage. It means that not all assigned EVs have enough power to serve the assigned customer requests. So when we dispatch EVs to customer requests, we need to consider the remaining power of the EV battery to avoid the invalid assignments. MACO-w/o-recharging obtains a worse global service quality (i.e., considering both the fitness values and satisfied percentage) than MACO does. On A1, A2, A4, A5, and A7, MACO-w/o-recharging can satisfy all customer requests, but has worse fitness values than MACO. On A3, A6, and A8, the fitness values of MACO-w/o-recharging are better than those of MACO, but not all customer requests can be assigned with an EV. It means that if the EV is not allowed to recharge in the service, some customer requests may not be served or take more time to complete. So it is necessary to consider the remaining power of the EV battery and recharging in the service for EVs.

D. Effectiveness of EoR Local Search and PR Strategy
To validate the effectiveness of the EoR local search, we integrate the EoR into the five compared dynamic optimization algorithms (i.e., except FCFS), termed RSACS-EoR, INCACS-EoR, ACS-DVRP-EoR, PACO-EoR, and EIACO-EoR, respectively. The fitness values obtained by these five algorithms are given in Table VII. To show the efficiency more directly, the fitness values of the EoR-enhanced algorithms are plotted as color bars in Fig. 8. Moreover, the improved performance brought by EoR is also plotted as the shaded bar. It can be observed that the EoR local search can improve the performance of all the five dynamic optimization algorithms to varying degrees. Nevertheless, the results in Table VII show that MACO is still the best algorithms among all the EoR algorithm variants.  To further investigate the effectiveness of EoR and PR strategies, we compare MACO with the MACO variants without the PR or EoR. There is no PR but EoR in MACO-w/o-PR, no EoR but PR in MACO-w/o-EoR, and no PR or EoR in MACO-w/o-PR-EoR. Table VIII lists the mean fitness values and runtime of the four algorithms over 30 independent runs. For the solution quality, it can be observed that MACO obtains similar results to MACO-w/o-PR, while outperforms MACO-w/o-EoR and MACO-w/o-PR-EoR. This means that the EoR local search helps MACO improve the solution quality. On the runtime, MACO runs slightly slower than MACO-w/o-EoR, which indicates the EoR local search consumes only a little computational time. Moreover, MACO runs much faster than MACO-w/o-PR and MACO-w/o-PR-EoR. It means the PR strategy helps improve computational efficiency on running time. Therefore, both EoR and PR help the MACO algorithm quickly find promising solutions in a short time. Based on the above, both EoR and PR have their contributions to the promising performance of MACO, and removing any of them will has bad influence on the performance of MACO.

E. Convergence Analysis
In order to conduct the convergence analysis, we plot the convergence curves of MACO, RSACS, INCACS, and P-ACO during the optimization process on A8 in Fig. 9. In the figure, every time a sharp peak appears, it means that a dynamic change occurs. In the first 25 iterations, which is the initial environment, the results obtained by the four algorithms are similar. In the next iterations, MACO can quickly find new promising solutions when an environmental change occurs, compared to the other three algorithms. This may be due to two advantages brought by the memory archive information in MACO. Firstly, after being repaired, the solutions in the memory archive will have good fitness values (i.e., low average time cost) in the new environment. Since the environment before and after a dynamic change is relevant in the DEVD application, such a re-use of the past solutions can help the algorithm converge faster in the new environment. Secondly, by transferring pheromone from the previous environment via the solutions in the memory, the MACO can guide ants to search promising areas of the new environment quickly. The fastest convergence speed and best fitness value obtained by MACO show the good utilization of the memory archive information of our proposed MACO algorithm.

F. Parameter Analysis of MACO
In this section, we take A3 to A8 as examples to study the influences of parameters on MACO. The parameters q 0 and β are set according to standard ACO, and then we investigate the other parameters K , R, and τ max . Note that when performing analysis of a parameter, the other parameters are consistent with the settings in Table III.
K represents the size of the memory, that is, the number of stored solutions that perform well in the previous iterations. The mean fitness values obtained by MACO with different K values (i.e., K = 1, 5, 10, 15, 20, and 25) are plotted in Fig. 10(a). It is shown that the MACO performs slightly worse on A3 and A6 when K = 1. This indicates that too little information stored in the memory archive will lead to inefficient pheromone update. In addition, the performance of MACO is similar when K is 5 to 25. Therefore, we set K = 10 in this paper.
Then the parameter R is investigated, which determines the number of customer-EV pairs released in the historically best solution when the environment changes. We set R from 0 to 0.7 with an interval of 0.1 and the mean fitness values are shown in Fig. 10(b). The results when R = 0 are similar to those obtained by INCACS-EoR in Table VIII. This is reasonable because they adopt the same mechanism to deal with dynamic changes, that is, only optimizing new requests. The fitness values generally decrease with the increase of R, which indicates that releasing some assigned customer-EV pairs can help escape from local optima. The performance improvement is not significant when R is greater than 0.5. Therefore, to avoid the waste of computational time caused by releasing too many customer-EV pairs, R is set to 0.5 in this paper.
Finally, the maximum pheromone value τ max is tested. We set τ max from 1.0 to 5.0 with a step of 1.0. The tendency of the curves in Fig. 10(c) shows that the τ max value has little effect on the performance of MACO. In this paper, τ max is set to 1.0.

G. Transportation Costs
From the perspective of the enterprises, it is important to reduce the transportation costs of EV during service. In this section, we compare the transportation cost obtained by each  algorithm. Table IX lists the total driving distance (km) of all the EVs, including the distance to pick up the customer and the distance to deliver the customer to the destination, calculated according to the dispatch results on all the eight test cases A1 to A8.
It can be observed from Table IX that MACO can obtain the minimal transportation cost or the approximate minimal cost among all the seven algorithms. In particular, MACO performs significantly better than other algorithms in A6 and A8, which is consistent with the results in Table V. Therefore, it can be concluded that MACO can effectively solve the DEVD problem. It can reduce customers' waiting time and improve the quality of service. Meanwhile, it can reduce the driving distance of EVs during the service process, so as to reduce operating costs.

H. Comparison on Real-World Dataset
To evaluate the performance of MACO on real-world application, a real dataset from Didi Chuxing GAIA Initiative [59] is adopted. The real-world data include all the customer requests in a whole day in a China city. Herein, we select the first 500 customer requests (i.e., the same number of customers as our test case A8) from the dataset to conduct a real-world case, termed as DidiTest. However, the data only contain the original locations and destinations of the customers, which are located in a 55 km × 30 km physical area. Therefore, in order to complete the DidiTest case, we further set up the number of EVs and charging stations as 500 and 50, respectively, the same as those in A8. Then, the locations of EVs and the charging stations are generated randomly within the coordinate range of the customer locations, while the charging data is generated randomly within the battery  Table X. It can be observed that MACO can get the best fitness values and obtain the minimal transportation costs (i.e., the total driving distances of all the EVs, including the distances to pick up the customers and the distances to deliver the customers to the destinations) among all comparison algorithms on the real-world test case. Thus, it can be concluded that MACO is practical and effective in solving the real DEVD problem.
V. CONCLUSION EV dispatch is a challenging issue due to the charging characteristics and dynamic scenarios. To simulate real-world dynamic EV dispatch application scenarios, a dynamic EV dispatch model is established by considering multi-source data association from five sources, including customer, vehicle, charging, station, and service. To solve the DEVD problem, we propose a memory-based ACO approach MACO as the optimizer to enhance the adaptability in dynamic environments. Furthermore, the PR strategy and the EoR local search procedure are incorporated into MACO to obtain a better balance between the performance and execution time. The PR strategy gets the requests to be optimized in the new environment to better respond to the dynamic environment. The EoR local search procedure integrates empirical knowledge into the search process to enhance the performance of MACO.
Experimental results show that MACO outperforms the traditional FCFS approach and some state-of-the-art ACO-based dynamic optimization algorithms. It has effectively achieved the objectives of minimizing customer waiting time and transportation costs at the global level. The experimental results on the real-world dataset also show the practicability of MACO.
In the future work, we can consider the personalized preferences of both the passengers and drivers on the EV recharging to make the problem model more practical. Some other promising future research directions include the scheduling over larger areas by dividing the dispatch area and the scheduling with more objectives. For these goals, some recent large-scale optimization algorithms [60] and multi-objective optimization algorithms [61] are worth studying.