I. Introduction
The combinatorial optimization problems [1], [2], such as travel salesman problem (TSP) and vehicle routing problem (VRP), are frequently raised in various real-world applications fields and research fields [3]. Developing high-efficient methods to find optimal (or near-optimal) solutions from a finite set of discrete solutions for the combinatorial optimization problems has always been a hot research topic [4]. During the past decades, the methods developed for solving combinatorial optimization problems can be roughly grouped into three categories, i.e., exact methods [5], heuristic methods [6], [7], and learning-based methods [8], [9].