1. Introduction
The performance of a neural network relies on the depth of a model to learn the structural correlations of the features [1]. Correspondingly, the performance of graph neural networks (GNN), heavily relies on the ability to train deep networks. Nevertheless, due to the possible reasons of over fitting, vanishing gradient and over squashing, GNNs suffer from decreasing performance as the number of layers increases which restricts the amount of information propagation between nodes [2]. On the other hand, providing optimal solutions to combinatorial optimization problems such as Capacitated Vehicle Routing Problems (CVRP) plays an important role in intelligent transportation system [3]. Classical solutions of CVRP fall into categories of heuristic [4], approximate [5], and exact approaches. Heuristic algorithms are known to have good computation performance though they have the drawback of lacking a theoretical guarantee. Moreover, they also require frequent customization and domain-specific knowledge which makes them unable to support large-scale optimization tasks. Exact approaches can provide optimal solutions although they are inherently computation-intensive which makes them unsuitable to solve large-scale problems. On the other hand, approximate algorithms can usually obtain quality-guaranteed solutions, yet they can only offer weaker optimality warrants compared to exact algorithms [6].