Learned upper bounds for the Time-Dependent Travelling Salesman Problem

Given a graph whose arc traversal times vary over time, the Time-Dependent Travelling Salesman Problem consists in finding a Hamiltonian tour of least total duration covering the vertices of the graph. The main goal of this work is to define tight upper bounds for this problem by reusing the information gained when solving instances with similar features. This is customary in distribution management, where vehicle routes have to be generated over and over again with similar input data. To this aim, we devise an upper bounding technique based on the solution of a classical (and simpler) time-independent Asymmetric Travelling Salesman Problem, where the constant arc costs are suitably defined by the combined use of a Linear Program and a mix of unsupervised and supervised Machine Learning techniques. The effectiveness of this approach has been assessed through a computational campaign on the real travel time functions of two European cities: Paris and London. The overall average gap between our heuristic and the best-known solutions is about 0.001\%. For 31 instances, new best solutions have been obtained.


Introduction
The purpose of this article is to present a Machine Learning (ML) enhanced upper-bound for the Time-Dependent Travelling Salesman Problem (TDTSP), defined as follows.Let G := (V ∪ {0}, A, τ ) denote a time-dependent directed complete graph, where V = {1, . . ., n} is the set of customers, vertex 0 is the depot and A := {(i, j) : i ∈ V, j ∈ V } {(0, i) : i ∈ V } {(i, 0) : i ∈ V } is the set of arcs.With each arc (i, j) ∈ A is associated a travel time function τ ij (t), representing the travel time of (i, j) if the vehicle leaves node i at time t.The TDTSP amounts to determine a least duration tour visiting each customer once, with the vehicle leaving the depot at time 0.
In recent years there has been a flourishing of scholarly works in time-dependent routing.See [12] for a review of the field.The contribution [22] was the first to address the TDTSP and proposed a Mixed Integer Programming (MIP) formulation.An approximate dynamic programming algorithm was devised in [23], while two heuristics has been developed in [20].A simulated annealing heuristic was proposed in [29] and some metaheuristics were proposed in [17].Cordeau et al. [10] derived some properties of the TDTSP as well as lower and upper bounding procedures.They also represented the TDTSP as MIP model for which they developed some families of valid inequalities.These inequalities were then used into a branch-and-cut algorithm that solved instances with up to 40 vertices.Arigliano et al. [6] exploited some properties of the problem and developed a branch-and-bound algorithm which outperformed the Cordeau et al. [10] branchand-cut procedure.In [24] a new global constraint was presented and used in a Constraint Programming approach.This algorithm was able to solve instances with up to 30 customers.Recently, Adamo et al. [2] proposed a parameterized family of lower bounds, where the parameters are chosen by fitting the traffic data.When embedded into a branch-and-bound procedure, their lower bounding mechanism allows to solve to optimality a larger number of instances than Arigliano et al. [6].Variants of the TDTSP have been examined in [5], [7], [27] and [33] (TDTSP with Time Windows), in [18] (Moving-Target TSP) and in [26] (Robust TSP with Interval Data).Finally, it is worth noting that a scheduling problem, other than the above defined TDTSP, is also known as Time-Dependent TSP.It amounts to sequence a set of jobs on a single machine in which the processing times depend on the position of the jobs within the schedule ( [28], [11], [16], [32], [25], [30], [14], [1]).
In this paper, we propose an upper bounding technique inspired by the new findings of the recent paper [3], where the authors studied a property of time-dependent graphs, dubbed path ranking invariance.
A time-dependent graph is path ranking invariant if the ordering of its paths (w.r.t.travel time) is independent of the start time.The authors showed that, if a graph is path ranking invariant, the solution of a large class of time-dependent vehicle routing problems, including the TDTSP, can be determined by solving suitably defined (and simpler) time-independent routing problems.The authors demonstrated that the ranking invariance property can be checked by solving a (large) Linear Programming (LP) problem.If the ranking invariance check fails, they proved that a tight lower bound can be derived from the obtained LP solution.
In this paper, we show how the new findings of [3] can be further generalized for determining tight upper bounds for the TDTSP.The main idea is to determine a heuristic solution by solving the TDTSP on an auxiliary time dependent graph, which satisfies the path ranking invariant property.The travel time functions of the auxiliary graph are determined by generalizing the LP-based approach proposed in [3].In order to obtain a fast computation of the auxiliary travel time functions, we take advantage of the predictive capabilities of a supervised ML technique.Indeed, the ultimate goal is the fast computation of tight upper bounds, in those settings in which instances with similar features have to be solved over and over again, as it is customary in distribution management.As stated in [8], a company does not care about solving all possible TSPs, but only theirs.Therefore, instead of starting every time from scratch in the definition of the auxiliary graph, we insert a learning mechanism in such a way the upper bounding procedure can benefit from previous runs on other instances with similar features.To this aim, we boost our LP-based approach with a mix of supervised and unsupervised techniques in the spirit of [13].To the best of our knowledge, contribution [13] is the first attempt to use ML to solve a time-dependent routing problem.For a comparative analysis of machine learning heuristics for solving the classical (time-invariant) Travelling Salesman Problem, see [31].
The paper is organized as follows.In Section 2 we provide a problem definition and some background information on the study area.In Section 3 we introduce a parameterized family of upper bounds computed by solving the TDTSP on suitably defined auxiliary time-dependent graphs.Such family of upper bounds gives rise to an optimization problem aiming to determine the parameter providing the best (minimum) upper bounds.In Section 4 we propose a ML-based heuristic approach for solving such optimization problem.In Section 5 we describe computational experiments on the graphs of two European cities (London and Paris).Finally, we draw some conclusions in Section 6.

Problem definition and backgrounds
Let denote with [0, T ] the time interval associated to a single working day.Without loss of generality we suppose that the travel time functions are constant in the long run, that is τ ij (t) := τ ij (T ) with t ≥ T .
For the sake of notational convenience, we also use τ (i, j, t) to designate τ ij (t).We suppose that traversal time τ ij (t) satisfy the first-in-first-out (FIFO) property, i.e., leaving the vertex i later implies arriving later at vertex j.
For any given path p k := (i 0 , i 1 , . . ., i k ), the corresponding duration z(p k , t) can be computed recursively as: with the initialization z(p 0 , t) := 0. Therefore, a compact formulation of the TDTSP is : min p∈P z(p, 0).
where P denotes the set of Hamiltonian tours on the time dependent graph G := (V ∪{0}, A, τ ).Algorithms developed for the classical time-invariant TSP are not able to consider time-varying travel times without essential structural modifications.Nevertheless, we observe that the absence of time constraints implies that time-varying travel times have an impact on the ranking of solutions of the TDTSP, but they do not pose any difficulty for feasibility check of solutions.A quite natural way of defining a heuristic solution approach is to determine the optimal solution of a classical Asymmetric TSP (ATSP), defined on a graph A, c) where c : A → R + is a time-invariant (dummy) cost function.The main issue in this approach is how to determine a time-invariant (dummy) cost function that mimics in an effective manner the solutions ranking of the original TDTSP.In this respect, it can be proved that there always exists a time-invariant (dummy) cost function such that a least duration route of TDTSP is also a least cost solution of the TSP defined on the time-invariant graph G c , which motivates the following definition.If we are given a cost function valid for an instance of the TDTSP defined on a time-dependent is path ranking invariant then a valid cost function is c(i, j) = τ ij (T ).

The auxiliary graph
The proposed heuristic algorithm is based on the definition of an auxiliary path ranking invariant graph G = (V ∪ {0}, A, τ) where each τ ij (t) is an approximation of τ ij (t), with (i, j) ∈ A. Each continuous piecewise linear function τ ij (t) is generated by the travel time model proposed in [19] (IGP model for short), in which each arc (i, j) ∈ A is characterized by a constant stepwise speed function v ij (t) and a length L ij .We suppose that the horizon is partitioned into H subintervals [T h , T h+1 ] (h = 0, . . ., H − 1), with T 0 = 0 and T H = T .We assume that all arcs of the auxiliary graph G share a common speed function, such that According to the IGP model, given a start time t the travel time value τ ij (t) is computed by the following iterative procedure.
In the IGP model the speed of a vehicle is not a constant over the entire length of arc (i, j) ∈ A but it changes when the boundary between two consecutive time periods is crossed.Since the travel speed is a constant stepwise function, the relationship between the input parameters and the output value of the IGP model can be expressed in a compact fashion as follows: We denote with z(p k , t) the traversal time of a path p k at time instant t on the time-dependent graph G, that is with the initialization z(p 0 , t) = 0.

Proposition 1. (Adamo et al. [3] ) The time dependent graph
Proof.We observe that from (2) it follows that given a path p we have that: where the notation (i, j) ∈ p means that the arc (i, j) ∈ A is traversed by the path p.This implies that if a path p ′ is shorter than a path p ′′ then p ′ is also quicker than p ′′ for any start time t ∈ [0, T ]: which proves the thesis.
The main implication of Proposition 1 is that an upper bound on the TDTSP defined on the original graph G can be obtained by solving a classical time invariant ATSP with cost coefficients c(i, j) = τ ij (T ).
Clearly the quality of the obtained upper bound is correlated with the fitting deviation between the original travel time function τ and its approximation τ .Minimizing such fitting deviation is the main idea underlying the family of parameterized upper bounds presented in the following section.

A family of parameterized upper bounds
In this section we define a family of parameterized upper bounds z Ω , where parameters Ω constitute an ordered set of time instants.Given set Ω, upper bound z Ω is determined by solving the TDTSP on an auxiliary path ranking invariant graph G Ω = (V, A, τ Ω ).The travel time function τ Ω is an approximation of the original travel function τ .In particular τ Ω is generated by the IGP model and satisfies relationship (2).We recall that the IGP parameters are: the set of speed breakpoints, the speed values and the length of the arcs.We make use of the given upper-bound parameter Ω to model the set of IGP speed breakpoints, i.e.Ω = {T 0 , . . ., T H }, with H = |Ω| − 1.Then speed values and length of arcs are prescribed by a linear program, which aims to minimize the fitting deviation between the original τ and its parameterized approximation τ Ω .The main idea underlying the linear program is that the equalities (2) imply that the travel time functions τ and τ Ω are perfect fit if the following relationship holds for each arc (i, j) ∈ A and time instant t ∈ T : The objective function aims to minimize a fitting deviation given by the violations of equality constraints (4).Due to the continuous time nature of (4), we define a surrogate of the fitting deviation by evaluating (4) only for time instants belonging to a set Ω ij , that is: with h = 0, . . ., |Ω ij | − 1 and (i, j) ∈ A. The set Ω is defined as the union set of Ω ij , with (i, j) ∈ A, i.e.
Let define the coefficient a ijkh as follows: Since v(t) is constant stepwise, relationship ( 5) can be expressed by the following linear equality: where the free-sign variable s ijk models the violation of the right-hand-side of (5) with respect to L ij , with The proposed linear program determines a speed function v(t) and the corresponding right-hand-sides of ( 6), which we denote with x ijk : since it represents a length we require that x ijk ≥ 0, with (i, j) ∈ A, k = 0, . . ., |Ω ij | − 1.We model the maximum fitting deviation between the original travel time function τ (i, j, t) and τ Ω (i, j, t) as ζ ij represents an approximated measure of the total fitting deviation associated to the auxiliary graph G Ω .We determine the auxiliary graph G Ω , so that the corresponding travel time function τ Ω minimizes the value of ζ Ω .To this aim, we formulate the following linear program (i, j) ∈ A. The continuous variable y h represents the value of y(t) during the h − th time interval, that is: s.t.
Objective function (7) aims to determine a step function y * (t) that minimizes the total maximum fitting deviation between the original travel time function τ and its approximation τ Ω .Constraints (8) state the relationship between y(t) and x variables.Constraints ( 9) and ( 10) model the relationship between x ij , x ij and continuous variables x ijk .Constraints (11), ( 12), ( 13) and ( 14) describe the non-negative conditions on the decision variables.In particular, in order to cut off the trivial (pointless) solution y(t) = 0 for t ≥ 0, constraints (14) state that the constant stepwise linear function y(t) has to be greater or equal than the input parameter ρ > 0.
Let y * (t) and x * denote, respectively, the step function and the x values associated with the optimal solution of the linear program ( 7)- (14).Moreover, we denote with x * ij the average of the x values associated to arc (i, j) ∈ A in the optimal solution, that is: We observe that the linear program does not directly prescribe the IGP parameter L ij , with (i, j) ∈ A.
Indeed, according to (6) we have that: where, we recall, s ijk quantifies the violation of equality (5), with (i, j) ∈ A and k = 0, . . ., |Ω ij | − 1.Since L ij denotes the IGP length associated with τ Ω , from (6) we have that that is the lower the absolute value of equality ( 5) violation (i.e.|s ijk |), the lower the absolute error made by approximating τ (i, j, t k ) with τ Ω (i, j, t k ), with t k ∈ Ω ij and (i, j) ∈ A. Since x * ij minimizes the mean squared violation of equality (5), i.e.
x * ij = arg min we (heuristically) minimize such travel time approximation errors by generating the travel time function τ Ω (i, j, t) with the following IGP input parameters: with (i, j) ∈ A. Finally, we recall that the travel time function τ Ω (i, j, t) satisfies relationship (2), and, therefore, the auxiliary graph is path ranking invariant.Summing up, given a set of time instants Ω = (i,j)∈A Ω ij and a time dependent graph G, the proposed upper bounding procedure consists of three main steps.
• STEP 2. Determine solution p * Ω as the least cost solution of the following time-independent ATSP: τ Ω (i, j, T ).
• STEP 3. Compute upper bound z Ω by evaluating p * Ω w.r.t. the original travel time function τ that is: We finally observe that in order to find the least upper bound, the following optimization problem has to be solved: to Ω * , then there exists on G a feasible tour p ∈ P with t corresponding to the arrival time at a node i ∈ V .That such set Ω * exists is based on the observation that there is a finite number of feasible tours.
The main limit of the sufficient optimality condition stated in Remark 1 is that determining the entire Ω * is computationally challenging.To overcome this drawback, we take advantage of the predictive capabilities of supervised ML techniques, in order to determine a set Ω such that the arrival times associated to optimal solutions have a good chance of being included in Ω.We denote with f i a prediction (obtained through a supervised ML method) of the expected time of arrival (ETA) at customer i in an optimal solution.We observe that the ranking among arcs might deeply changes during the planning horizon on the original graph G. On the other hand, the path ranking invariance of the auxiliary graph G Ω implies also an arc ranking invariance.The intuition is that, by taking a snapshot around the optimal arrival times (of similar instances previously solved), we have a good chance of embedding in the auxiliary graph G Ω the arc ranking associated to the set of quickest tours of the original graph.For this purpose, we require that the maximum fitting deviation between the original travel time function τ (i, j, t) and τ Ω (i, j, t) is minimized for each arc (i, j) ∈ A in the time interval [f i − ǫ i , f i + ǫ i ], where ǫ i > 0 represents the mean absolute error associated to f i , with i ∈ V .
In particular, we first define a discretization D of the time horizon.Then for each node i we select the subset S i of D as follows: In the definition of the approximation travel time τ Ω , all arcs (i, j) ∈ A outgoing the node i ∈ V share a common set Ω ij corresponding to the set S i , i.e.Ω ij = S i .Therefore in the MLPL-HTSP, the travel time τ Ω is determined by solving the linear program ( 7)-( 14), where the role of Ω ij is played by the subset S i in the constraints ( 8)- (11), with i = 1, . . ., n.

ETA estimation
In order to estimate the ETA of a customer i in an optimal solution, an artificial neural network (ANN) is used in conjunction with an exact algorithm for the TDTSP [6].The chosen ANN is a Multilayer Perceptron Regressor (MPR) [4], consisting of at least three layers of nodes: an input layer, one or more hidden layer and an output layer.Except for the input nodes, each node uses a nonlinear activation function.Firstly customers are aggregated and the service territory is divided into a number of zones K.
The customer aggregation is an unsupervised learning technique that aims to partition the customers of the training set into K clusters of equal variance, where the sum of intra-cluster Euclidean distances is minimized.In our experimentation, we used a K-means algorithm [21].For each training instance, an average zone ETA, named ZET A k , is determined, with k = 1, . . ., K. In particular, the arrival times at the customers are computed by the exact algorithm of [6].The neural network has K inputs and K outputs: the inputs are constituted by the number n k of customers in each of the k zones (i.e. the customer distribution in the network); the outputs are the K ZET A k estimates (k = 1, . . ., K).It is worth noting that, if K is large the predictions are expected to be more accurate but the training phase would require a huge number of instances.On the other hand, a small value of K implies a large variability of the ETA inside a zone, which has a disadvantageous effect on the ETA estimation of individual customers.The optimal number of zones K was determined in a preliminary experimentation.

Computational Experiments
The quality of the proposed upper bounding procedure was empirically assessed through a computational campaign.The machine learning component of the MLPL-HTSP algorithm was implemented in Python (version 3.6).The Multilayer Perceptron Regressor implementation was taken from the sklearn neural network library (method MLPRegressor) while the K-means implementation came from the sklearn cluster library (K-means method).The training instances were solved to optimality (or near-optimality) using a Java implementation of the branch-and-bound scheme proposed in [6] enhanced with the lower bound proposed in [3].A time limit of an hour was imposed.The linear program ( 7)-( 14) was solved with IBM ILOG CPLEX 12.10.The instances of the Asymmetric TSP have been solved by means of [9].
All the codes were tested on a Linux machine clocked at 2.67 GHz and equipped with 8 GB of RAM.We considered the instances generated by Adamo et al. [13] and based on the real travel time functions of two major European cities: Paris and London.

Parameter tuning
In a preliminary tuning we have selected the most appropriate combination of parameters.Our datasets contained approximately 6 − 700 instances with 50 customers each: 90% has been assigned to the training set, while the remaining 10% to the test set.The neural network settings providing the best results, in terms of strength of captured relationships were: three layers, hyperbolic tangent activation function, five neurons in the hidden layer, LBFGS solver and constant learning rate.As far as customer aggregation is concerned, Table 1 and Table 2 reports the neural network mean errors (in minutes) for each zone.For London, 8 clusters gave the best results in terms of coefficient of determination (R 2 ), whilst for Paris 6 zones were the best case for neural network performance.It is worth noting that the R 2 scores (= 0.53 for the London instances and = 0.60 for the Paris instances) suggest a moderate effect size.We set parameter ǫ i equal to the mean absolute error of the zone, which the customer i ∈ V belongs to.We considered a 5-minutes time unit for the discretization D of the planning horizon.Finally, we set ρ equal to 1/ min h=0,...,|Ω|−1

Computational results
As illustrated in the previous section, the predictive capabilities of the ML-techniques have been exploited for the fast computation of two Ω sets, associated to London and Paris respectively.Then the two testsets were solved by the MLPL-HTSP algorithm.The computational results are presented in Tables 5 -Table 6, under the following headings: • the name of the test instance, • the objective value BK in minutes of the best-known solution determined by the exact algorithm proposed in [6] enhanced with the lower bound proposed in [3], with a time limit of 1 hour; • the objective value z Ω in minutes of the MLPL-HTSP solution; • the percentage of improvement DEV of z Ω with respect to BK, computed as: • T ime in seconds spent to determine z Ω .
If z Ω is a new best-known solution, it is indicated in bold.The average running times are 18.28 seconds for London instances and 12.46 seconds for Paris instances.The average percentage deviation between MLPL-HTSP result and the best-known solution is 0.23% for London instances and −0.18% for Paris instances.In the worst case, the percentage deviation is 2.15% and in 31 cases a new best-known solution is obtained.For 38 instances, the MLPL-HTSP heuristic also obtains the best known solution, whilst for 100 out of 140 instances the absolute value |BK − z Ω | is less or equal than 1 minute, which is the smallest time unit normally considered in real vehicle routing problems inside large cities.
We have also examined the impact of both the linear program ( 7)-( 14) and the machine learning algorithm.For this purpose we have implemented a baseline heuristic HTSP, where the auxiliary graph G is time-independent, with the constant value associated to each arc (i, j) ∈ A set equal to max t∈[0,T ] τ ij , for each (i, j) ∈ A. Table 3 and Table 4 report results for all three heuristics: column headings are self explanatory.
Results associated to the PL-HTSP highlight that the computation of the approximation τ Ω provides a remarkable increase of both the solution quality and the computing time w.r.t. the baseline heuristic HTSP.It is by leveraging the machine learning that the MLPL-HTSP heuristic obtains both solution quality improvement and a reduction (by an order of magnitude) of the computing time w.r.t. the PL-HTSP heuristic.Moreover we observe that the MLPL-HTSP heuristic provides remarkable improvements in terms of both worst case and best case, i.e. the maximum and minimum values of DEV in Table 3.
As far as the computing time is concerned, Table 4 shows that MLPL-HTSP represents a good tradeoff between the baseline algorithm and the PL-HTSP.Indeed, the maximum computing time of MLPL-HTSP is remarkably lower than the minimum time of PL-HTSP, whilst the minimum computing time of MLPL-HTSP is only few seconds above the maximum time of HTSP.
These results clearly illustrate that high quality results are obtained by the MLPL-HTSP algorithm for instances that correspond to realistic travel time functions.

Conclusions
The main contribution of this paper is an algorithm that learns from past data to solve the TDTSP in an efficient and effective manner.Computational results on two European cities show that the average gap with the best-known solutions is only 0.001% and the average computation time is 15 seconds.Furthermore, new best solutions have been produced for several test instances.This is achieved by solving a time-invariant Asymmetric TSP, where the arc (constant) costs are suitably defined by the combined use of an LP-based approach and a mix of unsupervised and supervised Machine Learning techniques.In particular, we make use of the ETA predictions provided by a feedforward neural network trained on past instances solved to optimality or near-optimality.With regard to future research we want to investigate the definition of new features for the neural network as well as to exploit the use of deep learning methods [15].
Another noteworthy research goal concerns the study of a more efficient algorithm for (approximately) minimizing the fitting deviation between the travel time function τ and its approximation τ Ω .Finally, future research could be focused on the adaptation of the ideas introduced in this paper to other routing problems.

Definition 1 (
Valid cost function).A time-invariant cost function c : A → R + is valid for the TDTSP defined on G = (V ∪ {0}, A, τ ), if the least duration solution p * = min p∈P z(p, 0) corresponds to a least cost solution of the time-invariant ATSP defined on G c = (V ∪ {0}, A, c), that is: arg min p∈P (i,j)∈P τ ij (T ) = arg min p∈P z(p).

( 7 )
-(14), where x ij and x ij model, respectively, the minimum and maximum value of the variables x ijk , with (i, j) ∈ A and k = 0, . . ., |Ω ij | − 1.A solution of such linear programming model represents the parameters of a constant piecewise function y(t) and constant values x ijh , with h = 0, . . ., |Ω ij | − 1 and

Table 1 :
Mean errors in the London instances

Table 2 :
Mean errors in the Paris instances

Table 3 :
Impact of approximation τ and the machine learning algorithm on solution quality

Table 4 :
Impact of approximation τ and the machine learning algorithm on computing time

Table 5 :
Computational results of MLPL-HTSP for the London testset

Table 6 :
Computational results of MLPL-HTSP for the Paris testset