Faster Capacitated Arc Routing: A Sequence-to-Sequence Approach

The Capacitated Arc Routing Problem (CARP) is an NP-hard optimization problem that has been investigated for decades. Heuristic search methods are commonly used to solve it. However, given a CARP instance, most heuristic search algorithms require plenty of time to iteratively search for the solution from scratch, and hence may be impractical for emerging applications that need a solution to be obtained in a very short time period. In this work, a novel approach to efficiently solve CARP is presented. The proposed approach replaces the heuristic search process with the inference phase of a trained Deep Neural Network (DNN), which is trained to take a CARP instance as the input and outputs a solution to the instance. In this way, CARP could be solved by a direct mapping rather than by iterative search, and hence could be more efficient and more easily accelerated by the use of GPUs. Empirical study shows that the DNN-based solver can achieve significant speed-up with minor performance loss, and up to hundreds of times acceleration in extreme cases.


I. INTRODUCTION
The Capacitated Arc Routing Problem (CARP), with numerous practical applications such as garbage collection and post-delivery [1] is an important NP-hard combinatorial optimization problem that has attracted researchers for decades of investigations [2]- [5]. Recently, emerging applications, such as autonomous driving systems, require almost realtime (online) solving of CARP instances. However, despite the excellent performance in terms of solution quality, the state-of-the-art CARP solvers are not computationally efficient enough. The main reason is that these approaches are mostly heuristic search algorithms that typically start from scratch to tackle a CARP problem through a trail-anderror procedure [6], [7]. Thus, they could neither be easily parallelized nor leverage on experience to rapidly achieve a sufficiently good solution.
Motivated by the fast development of deep learning techniques [8]- [11], a new potential paradigm for solving The associate editor coordinating the review of this manuscript and approving it for publication was Ahmed A. Zaki Diab .
CARP is to build models with machine learning techniques, rather than through manually designing heuristic search algorithms. Specifically, for a given problem class, one can utilize a set of problem instances, for which a target solution (e.g., the optimal solution or a solution that is good enough) is known, to train models that act as solvers to an optimization problem. When a new problem instance arrives, the model can directly generate a solution to the instance.
The above paradigm offers advantages over heuristic search algorithms in two aspects. First, solving a CARP instance is reformulated as the inference procedure of a learning model, which could be much more easily accelerated with the aid of new computing facilities [12], [13], e.g., GPUs. Second, the training process of the learning model involves the information of other instances, the solution of which could be first labeled (in an offline manner) by the state-of-the-art search algorithms. Hence, the training process could be viewed as transferring the knowledge inside the heuristic solver into the learned model. In this way, good solutions could be generated for a new problem instance even VOLUME 10, 2022 This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/ faster, since it leverages on previously solved instances rather than start from scratch. Besides, an algorithm under this paradigm can essentially be considered as a meta-heuristic that provides an alternative solution to legacy heuristic approaches. In terms of solution quality, such a machine learning approach, might not perform as well as state-ofthe-art search algorithms, since that depends heavily on the training data and training algorithm. However, it may offer a better trade-off between solution quality and computational efficiency for its high inference speed. In this paper, a deep learning approach to CARP is proposed following the above idea. Specifically, to an input CARP instance, the proposed method, namely Deep Arc Routing Solver (DARS), has four steps. The first is the pre-sorting step. Task edges in the instance are sorted as a sequence of edges where the distance between adjacent edges in the sequence will be smaller. Then, it produces a corresponding numerical representation for each edge in the sorted sequence. This is done by utilizing a graph embedding technique, i.e., node2vector [14]. Now, the representation of task edges is actually a sequence of numerical edge vectors. In the third step, the edge sequence is fed into a Recurrent Neural Network (RNN) based sequence-tosequence model [15], called Pointer Network (Ptr-Net), to generate the output solution for the input instance. It should be noted here a masking scheme is utilized during the generation. Finally, post-sorting is used to further improve the solution quality.
To summarize, this work contributes to the literature in two aspects. First, a deep neural network-based DARS is proposed for CARP, and it achieves significantly faster speed than heuristic search methods with the help of GPUs. Second, DARS is a new approach that formulates a combinatorial optimization problem as a sequence-tosequence prediction problem. Empirical results on CARP indicate this formulation achieves better solution quality than other attempts of applying machine learning to combinatorial optimization.
The rest of the paper is organized as follows. Section II presents the related work. Section III introduces the proposed method. Section IV presents the empirical studies. Section V concludes the paper.

II. RELATED WORK AND NOTATIONS A. RELATED WORK
In the literature, the state-of-the-art solvers for CARP are mostly heuristic search algorithms [6], [7], [16]. In general, these methods all follow a trial-and-error paradigm, i.e., candidate solutions are iteratively generated and tested, until a predefined termination condition is met. The best solution found so far is adopted. Studies along this line, e.g., the Memetic Algorithm with Extended Neighborhood Search (MAENS) [7] and [16], mainly focus on designing search operators to generate high-quality solutions more effectively, and have been reported to achieve the best performance on many benchmark test instances.
More recently, inspired by the achievements of deep learning, several attempts have been made to tackle NP-hard combinatorial optimization problems with deep learning. These meta-heuristics may provide a sufficiently good solution to such problems with less computational effort than iterative methods or simple heuristics. The first work is called Pointer Network (Ptr-Net) [15] which mainly follows the encoder-decoder architecture [17] with a customized pointer attention mechanism. It is trained in a supervised manner. With the same problem formulation proposed by Ptr-Net, various deep neural solvers have arisen with different architectures or different training modes in these years.
Some solvers are trained by reinforcement learning. The work of [18] adopts a modified version of Ptr-Nets trained by the policy gradient algorithm. The work of [19] replaces the encoder of Ptr-Nets with a set of graph embedding and is applied to the Vehicle Routing Problem (VRP). The work of [20] proposes a new encoder-decoder architecture inspired by Transformer [21] with a new attention mechanism different from Ptr-Net. It uses REINFORCE [22] to train the solver. The work of [23] constructs a solution by adding nodes incrementally. It utilizes a graph embedding model, called structure2vec [24], to parameterize the Q-value of nodes. It is trained by the Q-learning algorithm. A neural network based solver optimized by reinforcement learning which is a combination of a modified graph convolutional network and two encoder-decoder models is presented in [25].
Other works may use supervised training. The work of [26] employs recurrent neural networks to solve quadratic assignment problem. The model incorporates a so-called objectivebased learning. This learning method only back-propagates the gradients if the predicted solution yields a worse objective than the approximate solution. The work of [27] uses graph neural networks for decision variant of the Traveling Salesman Problem (TSP). It is trained with approximate solutions generated by Concorde TSP solver.
Although this work is inspired by the above deep learning approaches, it is distinct from previous studies in three aspects. First, concerning the specific target problem, no previous study addresses CARP. Although CARP is similar to TSP and VRP, it is in essence a more challenging problem. To be specific, CARP can be viewed as a TSP with an additional capacity constraint, and a CARP instance with k edges can be converted into a capacitated VRP with 3k + 1 nodes theoretically [28]. Second, CARP is a specific permutation-based optimization problem [29]. Previous deep learning approaches formulate such type of problem as a set-to-sequence prediction problem, i.e., sequence of input elements for Deep Neural Networks (DNNs) is not considered in their formulation, while this work formulates the problem as a sequence-to-sequence prediction problem. Finally, previous studies emphasize more on the solution quality of the trained model, while the potential speed-up of DNNs in comparison to state-of-the-art heuristic search algorithms is the focus of this work.

B. NOTATIONS OF CARP
This work considers the CARP on a graph G(V , E, A), where V is the node set, E is the edge set and A is the arc set.
). It should be noted that vehicles are all located in a special node v 0 ∈ V , called depot. A view of graph is shown in the top-left corner of Fig. 1.
Taking E as input, a solution of CARP is a set of routes, contains a sequence of arcs belonging to task edges and is expressed as j is the order of tasks to be served in R i and |R i | is the number of arcs in R i . A valid S must satisfy 3 constraints: 1) Each route should start and end at the depot v 0 .
2) Each task is served in only one route (but it can be traversed without serving for unlimited times).
3) The total demand of a route should not be larger than the vehicle's capacity Q. Note that the S can be equally represented as a sequence The cost of a route R i can be calculated as: dc(a, b) for node a ∈ V and b ∈ V denotes the total deadheading cost of the shortest path between node a and b, and the path is found by Dijkstra algorithm [30]. The objective of CARP is finding a solution S * that minimizes the total cost r∈S * C(r) while satisfying the aforementioned constraints.

III. DARS FOR CARP
In this section, we first model the CARP solving as a prediction task which maps an input sequence to an output sequence. Then, based on the formulation, the overall structure of the proposed solver is illustrated. Notations in Section II-B is the basics for this section.

A. FORMULATION
For a set of n CARP instances E = {E 1 , E 2 , · · · , E n }, by applying a heuristic algorithm, a corresponding output solution set {S 1 ,S 2 , · · · ,S n } is generated and further a data set denoted as D = {(E 1 ,S 1 ), (E 2 ,S 2 ), · · · , (E n ,S n )} is constructed. Given a pair (E k ,S k ) ∈ D, the process of CARP solving can be reformulated as a sequence to sequence prediction task.
It first sorts the edges in E k to ensure that the distance between adjacent elements in sequenceĒ k is minimized. This step called presorting is a greedy method. Then the problem become predicting the sequenceS k based on a sequenceĒ k . This is a sequence-to-sequence prediction task and the conditional probability p(S k ,Ē k ; θ ) can be estimated using the probability chain rule with respect to (1), i.e., Here, p θ (a i | · · · ,Ē k ; θ ) is computed by an RNN model with parameters θ and the best θ * can be learned by the maximum likelihood estimation over the training set D, i.e., using optimization methods like stochastic gradient decent in a supervised training mode.

B. THE PROPOSED SOLVER
Based on (3) (4), a new solver consists of edge pre-sorting, graph embedding, DNN with masking and post-sorting is proposed. Given a graph G(V , E, A), the input is the set of task edges and it is denoted as E = {e 1 , e 2 , · · · , e T }, where d(e i ) > 0 and T is the number of tasks. The solver will generate a sequence of input elements, denoted asS. The illustration of the whole pipeline is shown in Fig. 1. In the following, the details of the method are explained.

1) PRE-SORTING IN GREEDY
E is a set of edges which presents no information about the distances among edges. However, for CARP, if two task edges are very close to each other in reality, they are likely to appear in the same route of the output solution. This is because that once two task edges become closer in route R i , it will lead to a better value of C(R i ) in (2) and makes R i more likely to be added into the final solutions.
To address this drawback, a sequenceĒ is generated by sorting the elements in E with respect to minimizing the distance between adjacent elements. Here, the distance dhc(e 1 , e 2 ) between edges {e 1 , e 2 } ∈ E is defined as the average deadheading cost among endpoints on different and the distance dhc(v 0 , e) between the depot v 0 and an edge e is defined as: Elements in E are sorted in a greedy manner, as shown in Algorithm 1. Specifically, it starts from the depot v 0 (line 2) to select the nearest task edge as the first element inĒ. Then, it iteratively selects the next nearest element by measuring the distances between unselected edges and the current selected one (line 4). The sorting is finished until all the elements in E are selected. Finally, the sequenceĒ = (ē 1 ,ē 2 , · · · ,ē T ) is derived.
2) GRAPH REPRESENTATION USING EMBEDDING DNN accepts numerical vectors as input and cannot process a graph directly, so that a graph embedding method, i.e., node2vec [14] is employed. Node2vec computes a l-dimensional embedding emb(v) ∈ R l for each node v ∈ V and l |V |. The adjacency information can be preserved by the embedding of all nodes.
Moreover, the information graph of the CARP instance contains not only the adjacency information of graph structure but also other key information like the depot, the demands on the edges, and the vehicle capacities. These necessary features cannot be represented by emb(v). Thus, the numerical representation µ e for each e ∈Ē used in this paper is expressed as follows: , d(e) Q ) ∈ R 2l+4 (7) where for a node v ∈ V , f (v) = 1 if v = v 0 , otherwise f (v) = 0. Note that it can use either → e or ← e to generate µ e for e, because the edges are undirected which is different from arcs. Therefore, the representation ofĒ becomesĒ emb . It is expressed as follows: E emb = (µē 1 , µē 2 , · · · , µē T ).

3) POINTER-NETWORK
In this work, Ptr-Net with trainable parameters θ is employed to estimate p θ (a i j | · · · ,Ē emb ; θ ) and p(S|Ē emb ; θ ) with respect to (3). Ptr-Net is an encoder-decoder model which is capable to process sequential inputs for combinatorial optimization problem while existing applications [15], [18], [19] only use it to process non-sequential data. As indicated in Section III-A, this work transforms the CARP solving to a sequenceto-sequence prediction task, therefore Ptr-Net is suitable. Additionally, it is worth noting that the purpose of this work is not to propose a new sequence-to-sequence learning model, but to use a sequence-to-sequence learning model to assist CARP solving and to verify that the use of sequence-tosequence learning can help speed up the solving. There are many well-established models that are available to address the resultant task [32]- [34] and it is believed that the performance may be further improved with more advanced methods.
As illustrated in Fig. 1, for a CARP instance, the encoder network reads the elements ofĒ emb one by one and produces a hidden state sequence h = (hē 1 , hē 2 , · · · , hē T ). For i ≤ T , h e i is the corresponding encoder hidden state of µē l which takes µē l and hē l−1 as input. Particularly, For i = 1, hē 0 is an initial hidden state. After all the input elements are processed, the decoder begins generating outputs. At each decoding step t, the decoder retrieve the input considering its output at step t − 1 and produces a hidden state h t .
A pointer attention mechanism receives h t and h to generate a distribution over elements in the input sequencē E. The output of the decoder at step t is an index of elements and is determined by the distribution.

4) MASKING SCHEME AND POST-SORTING IN TESTING
As presented above, Ptr-Net selects an output element according to a distribution over the elements ofĒ at each decoding step. That means there are about O(T T ) choices to explore for the best result. To deal with this, a masking scheme which is similar to [19], is adopted in testing. Note that masking is only used in testing, because the best output element in each decoding step is known in training.
In each decoding step, two types of edges will be masked: the edges that are severed and the edges whose demands are larger than the current load of the vehicle. Masked edges would not be considered in this step, and the edge with the highest probability in the unmasked edges are output.
Post-sorting is adopted to further improve the quality of the solution. This is another application of the idea appeared in pre-sorting. To be specific, for each route in the solution, the order of tasks is first organized by the presorting method described in Algorithm 1. Then to each arc in the final solution, its direction is tuned by fixing the starting vertex as the endpoint which is closer to the previous edge in the sequence.

5) SUMMARY
Given a graph G (V , E, A), the proposed solver first extracts the task edges as input E . Then the sequenceĒ is generated by adopting Algorithm 1 to sort the elements in E (i.e., step 1 in Fig. 1). AsĒ cannot be processed by DNN, the graph embedding is employed to generate the numerical representationĒ emb forĒ (i.e., step 2 in Fig. 1). TheĒ emb is subsequently fed into Ptr-Net and Ptr-Net generates the elements of the output sequence step by step (i.e., step 3 in Fig. 1). During the generation masking scheme is used to filter out the invalid solutions (i.e., step 4 in Fig. 1). Finally, the output sequence is determined by fine-tuning the sequence and direction of arcs in the solution by post-sorting (i.e., step 5 in Fig. 1).

IV. EXPERIMENTS
In following experiments, MAENS [7], a well-known CARP solver is used as the heuristic method which DARS should learn from. VRP-RL [19] is a neural solver for VPR which is trained by reinforcement learning, and is modified to solve CARP. The modification is simply using the (8) as the data repre-sentation and determining the directions of output arcs by that used in posting-soring.
The official implementation of MAENS in C++ is used in experiments and other algorithms are based on PyTorch. PyTorch is a machine learning library for Python and can utilizes GPUs to accelerate DNNs. All experiments are run on a workstation with two intel CPUs E5-2678 v3@2.50GHz and one NVIDIA GTX 1070.

A. DATASETS AND CONFIGURATIONS
The sizes of data sets used in previous CARP research are too small for training DNN. Thus, similar to the work of [25], we synthesize three new CARP data sets, named BJ-30, BJ-50, BJ-100 and BJ-150, with around 30, 50, 100 and 150 edges, respectively. All of the four data sets are generated from the roadmap of Beijing in China (https://www.openstreetmap.org). They are the set of E for D defined in Section III-A.
The roadmap includes 409,650 nodes and 778,304 undirected edges. Each dataset is first generated by sampling a number of instances from this roadmap and then divided into two disjoint subsets with specified proportions. One of these two subsets is used as the training set and the other is used as the test set. More specifically, each instance is randomly sampled from this roadmap using the same sampling settings. That is, for each instance, the graph is extracted from the roadmap randomly; the depot is determined randomly and the edges are randomly selected as tasks. Other features like the values of vehicle capacity and task demands are also generated randomly within the specific ranges. An illustration of sampling an instance is shown in Fig. 2. Note that instances that cannot be solved by MAENS will be regenerated until each instance in data has a corresponding solution from MAENS. In this work, each dataset is divided into a training set with 1 million instances and a test set with 10,000 instances. The information of datasets is shown in Table 1.
The embedding size l for emb(·) is 4. The Ptr-Net in DARS has 2 GRU layers 128 hidden units in each layer. The Ptr-Net is trained with Adam optimization algorithm [35] and the initial learning rate is set to 5 × 10 −4 . Each batch size is 256 and the training lasts for 200 epochs. The weights of Ptr-Net are initialized with the orthogonal initialization [36] and the L2 norm of gradients is clipped to 2.0 to prompt generalization ability.
For convenience, we call the solutions generated by MAENS as the reference solutions, denoted as S * , and the solutions generated by the proposed method, denoted as S . For a solution S i ∈ S , S * i ∈ S * is the corresponding reference solution. All the feasible solutions (i.e., they satisfy the constraints in Section II-B) in S construct a setŜ = To evaluate the performance of different methods, the commonly used metric Mean Percentage Error (MPE) is employed, i.e., Note that MPE defined here is meaningless for MAENS, since there are no references solutions for MAENS.  Table 2.

B. RESULTS AND DISCUSSIONS 1) SOLUTION QUALITY OF DARS
According to the results in Table 2, VRP-RL performs worse than DARS. Moreover, as the model trained on solutions from MAENS, DARS shows the best approximation to MAENS regarding the values of MPE. For DARS, the Ptr-Net is capable to learn the mapping from CARP instances to the solutions, as shown by the good performance of DARS1 in Table 2. However, it may not find feasible solutions when the number of tasks increases, which is reported in [15]. The masking scheme is proposed to address this drawback and performs well.
The pre-sorting and post-sorting can improve the MPE which means better solutions. For pre-sorting it can be verified by comparing the values between DARS2 and DAR4, DAR3 and DARS. For post-sorting, the performance between DARS2 and DARS3, DARS4 and DARS in Table 2 implies its effectiveness.
Therefore, some conclusions can be derived based on the above discussions. The masking scheme can be effectively on filtering infeasible solutions, the Ptr-Net is capable to learn the mapping between CARP instances and their corresponding solutions; the post-sorting and the pre-sorting can both improve the solutions. These conclusions are coincident with the motivations described in Section III. Additionally, the results also imply the generalization ability of DARS as it performs well on the test sets where the CARP instances are unseen during the training.

2) SUPERVISED AND REINFORCEMENT LEARNING
According to Table 2, except DARS1 which may fail to find feasible solutions, all the other variants of DARS show better solution quality when compared with VRP-RL. This is intuitive, as heuristic methods are the state-of-the-art inexact methods. Thus, DARS that makes use of the experiences from the heuristic methods (i.e., the solutions generated by MAENS in the training set) are more likely to perform better than VRP-RL by learning from scratch. Nevertheless, reinforcement learning has the potential to perform better than heuristic methods, once it can explore the solution space more effectively considering the characteristics of combinatorial optimization tasks.

3) SPEED COMPARISON
The average computational time is shown in Table 3, and it is clear to see the advantages of neural solvers compared with MAENS, especially on GPU platforms. On CPUs, VRP-RL is close to DARS in terms of running time but worse in terms of solution quality considering Table 2. On GPUs, both VRP-RL and DARS show significant speedup. Due to the difficulty in the implementation of pre-sorting and post-sorting on GPUs, it is believed that the performance of DARS on GPU can be TABLE 1. Information about datasets. |E | is the number of edges, T is the number of tasks, Q is the capacity of each vehicle, demand is for each task edge. Value 'a − b' indicates the column value for each CARP instance is randomly determined within the range from a to b.  further improved. This could be a direction for future work of faster sorting methods. It should be note that, the stand deviations of MAENS are extremely large in Table 3. This is because MAENS conducts a randomized iterative search which may cost various time on different instances, while neural solvers direct map the instances to solutions and thus could be stable in the performance of time. This indicates that it is not easy to predict the time cost for heuristic methods for certain instances while neural solvers could. Specifically, we show the speedups of DARS over MAENS in Fig. 3. The speedup is calculated by the running time of MAENS divided by the running time of DARS for each instance and averaged over the test set. It should be noted that the standard deviations of the speedups are large due the unstable time performance of MAENS as discussed above.
The largest speedup for each dataset (i.e., the extreme cases) on GPUs are: 378 (BJ-30), 131 (BJ-50), 194 (BJ-100) and 357 (BJ-150). According to Table 3 and Fig. 3, DARS achieves the best approximation to MAENS  and is significantly faster on GPUs. Note that some parts of DARS are implemented in python which is much slower than C++ used by MAENS.

4) COMPARED WITH MAENS WITH DIFFERENT SETTINGS
The running time of MAENS are mainly controlled by two hyper-parameters (ubstrail, max iteration) where each of them takes responsible for an independent while loop. The setting of (ubtrail, max iteration) used in previous experiments is (20,50). Different MAENS with different running time are ran by varying the settings (ubstrail, max iteration). Another setting is examined in this experiment and the results are shown in Table 4, together with the performance of DARS. This experiment is run on BJ-100 dataset.
As shown in Table 4, the solution quality of DARS is better than MAENS with unsuitable configurations. As hyperparameter tuning for heuristic algorithms is challenging, results in Table 4 indicates DARS can achieve a good tradeoff in terms of speed and the solution quality.

V. CONCLUSION AND FUTURE WORK
This paper proposes a novel approach called DARS to make CARP solving faster in a deep learning style. By utilizing solutions generated by the heuristic method as supervised information to train a DNN, the proposed approach has three benefits. Firstly, it makes the CARP solving as a direct functional mapping instead of an iterative searching process.
Secondly, the supervised training can effectively utilize the experiences of heuristic methods that have developed for decades. Thirdly, the new CARP solver can be easily accelerated, since the acceleration of DNN has achieved rapid development in recent years.
Compared with the heuristic algorithm, the proposed method approximates the solutions given by the heuristic algorithm with a good performance and has a great advantage in speed, especially with the help of specific hardware such as GPUs. In extreme cases, DARS has achieved up to 357 times faster than the compared heuristic method. Generally, DARS has presented competitive results. Nevertheless, DARS also presents some limitations. One is that using the experience of existing heuristics can achieve speedup in problem solving, but it also limits the performance of the solution to some extent. Incorporating methods to automatically update the experience may help to further improve the performance. The other is that the four steps adopted by the algorithm facilitate the flexibility to combine existing state-of-theart methods, but also leave room for improvement by integrating them synthetically and comprehensively. Besides, in order to further improve its performance, two directions are worthy of future study. One is to incorporate more advanced learning [37]- [39] and optimization mechanisms [40]- [42] into DARS and the other is to investigate GPU implementations [43] to further improve its efficiency.