Skip to Main Content
In this paper we are presenting high performance GPU implementations of the 2-opt and 3-opt local search algorithms used to solve the Traveling Salesman Problem. The main idea behind them is to take a route that crosses over itself and reorder it so that it does not. It is a very important local search technique. GPU usage greatly decreases the time needed to find the best edges to be swapped in a route. Our results show that at least 90% of the time during Iterated Local Search is spent on the local search itself. We used 13 TSPLIB problem instances with sizes ranging from 100 to 4461 cities for testing. Our results show that by using our GPU algorithm, the time needed to find optimal swaps can be decreased approximately 3 to 26 times compared to parallel CPU code using 32 cores. Additionally, we are pointing out the memory bandwidth limitation problem in current parallel architectures. We are showing that re-computing data is usually faster than reading it from memory in case of multi-core systems and we are proposing this approach as a solution.