Multi-Colony Collaborative Ant Optimization Algorithm Based on Cooperative Game Mechanism

Aiming at the problem that ant colony algorithm is easy to fall into local optima and slow convergence, a multi-colony collaborative ant optimization algorithm based on cooperative game mechanism(CCACO) is proposed. First of all, this article presents pheromone matrix adaptive matching strategy and improved generative adversarial nets (GAN) model according to cooperative game mechanism. The pheromone matrix adaptive matching strategy allocates pheromone matrix for each colony to maximize the overall benefit of the colonies by establishing a cooperative game model. The improved GAN model is built according to the game between the global optimal solution and the new solution of each colony, which improves the convergence speed of the algorithm and the quality of solutions. In addition, collaborative optimization mechanism is proposed to improve the quality of the solution. In this mechanism, cloning strategy binds the cities on each common path together, which increases the pheromone concentration of the common path to accelerate the algorithm convergence; central diffusion strategy diffuses pheromones from central cities to nearby cities, which increases the diversity of solutions; forward and backward propagators adjust the amount of pheromone release to regulate the convergence rate of the algorithm. Finally, information entropy is used to measure the diversity of CCACO. When the value of information entropy is less than threshold value, cooperative game mechanism and collaborative optimization mechanism to regulate the relationship between the convergence speed and the quality of the solution. Experimental results in the TSPLIB standard library demonstrate that the proposed algorithm outperforms other state-of-the-art multi-colony ant colony optimization algorithms. The algorithm is applied to robot path planning, which reflects the practicality of the algorithm.


I. INTRODUCTION
Ant system (AS) is proposed according to the behavior of ants [1]. The algorithm selects the next position by random proportion rule. After all the paths are built, the pheromones on the paths are updated. Ant Colony System (ACS) was proposed to improve AS based on Q-learning [2]. ACS put forward two ideas: local pheromone update and global pheromone update. MAX-MIN Ant System (MMAS) was proposed to control the pheromone in the path to a certain The associate editor coordinating the review of this manuscript and approving it for publication was Nadeem Iqbal. extent [3]. AS, ACS and MMAS are considered classical ant colony algorithms.
When solving large-scale combinatorial optimization problems, the traditional ant colony algorithm has a slow convergence rate and tends to fall into local optima. More and more scholars propose relevant optimization algorithms on the basis of classical ant colony algorithm.
To overcome premature convergence, [4] adaptively adjusted the pheromone on the path according to the existing solution, which enabled it to escape the local optimal value. Reference [5] proposed a continuous interacting ant colony (CIAC), based on the study of how interacting ant VOLUME 8, 2020 This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/ colonies processed information. CIAC used two communication channels showing the properties of trail and direct communications, which improved the quality of algorithmic solution. Reference [6] proposed an ant colony algorithm based on multiple behaviors. Different subpopulations had different ways of evolution. The populations combined migration operators to update pheromones to improve the quality of the solution. An operationally independent pheromone initialization strategy was proposed in [7], which could be flexibly combined with ant colony algorithm to improve the robustness and solution quality of the algorithm. Reference [8] proposed an ant colony algorithm-improved Back Propagation (BP) neural network. The ant colony algorithm was introduced to absorb the behavior characteristics of the ant colony during the weight training of BP neural network. The algorithm continuously optimized the weights in the neural network learning process to solve the problem of falling into the local extremum. Reference [9] used fractional order differences with long-term memory characteristics to make full use of historical information. It used the information of a few steps forward to improve the exploration ability of the algorithm by combining transition probabilities. Reference [10] proposed the heuristic information multi-SURF* and integrated it into ant selection rules to discover the optimal solution. Reference [11] improved the classic ant colony algorithm into a sorted ant colony algorithm to improve its efficiency in a dynamic environment. At the same time, it was combined with the fuzzy algorithm to solve the problem of optimal path and improved the quality of solutions. Reference [12] proposed a variation on elitist ACO where the pheromone contribution of the best path is further predicated by a probability factor, which improved the accuracy of the algorithm, but reduced the convergence speed of the algorithm. Reference [13] introduced the crossover idea of genetic algorithm into ant colony algorithm to jump out of the local cycle of solution.
Then the local optimization of time planning was carried out to reduce the time complexity of the algorithm, so that the algorithm could achieve the comprehensive optimal solution.
To improve the convergence speed of the algorithm, [14] combines ACS and genetic algorithm in a cooperative and parallel way. The information exchange between ACS and genetic algorithm ensures the selection of optimal solution to accelerate the convergence in the next round. Reference [15] put forward alarm pheromone based on traditional guiding pheromone. The alarm pheromone would tell the ant that it has entered the infeasible area, which avoided invalid search and improves the search efficiency. Reference [16] fused the characteristics of A* algorithm and MAX-MIN ant colony system to solve the problem of path planning. The heuristic information of ant colony algorithm was improved by introducing the evaluation function and bending suppression operator of A* algorithm, which improved the convergence speed and smoothness of global path. Reference [17] constructed the unequal allocation initial pheromone to avoid blind search, which avoided blind search in the initial stage of planning and sped up the convergence speed of the algorithm.
To overcome premature convergence and accelerate convergence, according to the distribution of ant colony, [18] dynamically adjusted the influence of ant colony on pheromone updating and the probability of path selection, which kept the balance between convergence speed and diversity. Reference [19] proposed a two pheromone trail strategy to accelerate the pheromone accumulation process of ants. It proposed a fast memory and frequency memory method to further improve the quality of solutions. Aiming at the problems of the wireless sensor network system, [20] proposed a wireless sensor network data fusion method by combining ant colony algorithm and minimum spanning tree algorithm. The algorithm regulates the effective data rate by the number of nodes.
To solve the scheduling problem of parallel batch machines, [21] designed heuristic information for each candidate list to induce ants to make choices, which improved the convergence speed. Then the fuzzy local optimization algorithm is introduced to improve the solution quality.The algorithm can find a better solution in a reasonable time. Reference [22] proposed a multi-colony ant optimization based on KL divergence. With the change of global state of KL divergence, different subgroups were selected to communicate. The algorithm selected the time of subgroup communication according to the local state of information entropy. Reference [23] added an angle calculation rule to the pheromone update rule. The angle calculation rule expanded the next set of optional cities. This rule increased population diversity to improve the accuracy of solutions. The algorithm updated the pheromones on the worst path and the optimal path, and increased the weight of pheromones on the optimal path, which made ants concentrate near the optimal path more quickly and increases the convergence rate.
GAACO was proposed in [24]. In order to further improve GAACO's search performance and apply it to path planning, CCACO is proposed in this paper. The main contributions of this paper are summarized as follows: 1. Cloning strategy is proposed to improve the convergence speed of the algorithm based on imitation learning. The strategy is made up of pheromone matrix cloning and common path cloning. Pheromone matrix cloning is the copy of the pheromone matrix, which facilitates the operation of policies. Common path cloning retains optimal paths to improve the convergence speed of the algorithm.
2. Central diffusion strategy is proposed based on the star topology and common paths. The purpose of the mechanism is to increase the diversity of solutions. The strategy takes the cities on the common path as the central cities; central cities diffuse pheromones to nearby cities in a star topology.
3. Forward and backward propagators are proposed to regulate the relationship between the convergence speed and the quality of the solution. Forward propagator indicates whether a population has found a better solution than the global optimal solution in current iteration. Backward propagator is used to improve the pheromone update formula of the algorithm. The two operators effectively regulate the release of pheromones.
The structure of this paper is as follows. Section II introduces classical ant colony algorithms, path planning and other related work. Section III describes the proposed algorithm, cloning strategy, central diffusion strategy, forward and backward propagators, the improved GAN model, the pheromone matrix adoptive matching strategy and pretreatment. Section IV presents experimental results and analysis. Section V applies CCACO to path planning. Finally, section VI summarizes the content of the algorithm.

II. RELATED WORK
Recently, more and more people come up with new ideas to improve the performance of ant colony algorithm. Reference [25] placed pheromone on nodes to improve the efficiency of the pheromone storage and updates. Reference [26] introduced the filter method instead of wrapping method to compute subsets to reduce computational complexity. Reference [27] proposed a novel multi-objective optimization algorithm based on ant colony algorithm (ACO) to solve the community detection problem in complex networks. It updated pheromone in ant colony algorithms by Pareto concept and Pareto Archive.
The purpose of the traveling salesman problem is to find a shortest closed trip and visit each town only once. The ant colony algorithm model is established based on the traveling salesman problem. m is the number of ants. n is the number of cities.

A. ANT SYSTEM 1) CONSTRUCT THE SOLUTION
The probability selection rule from city i to city j for the k-th ant is defined as (1).
where allowed k is the current feasible city set of ants. α and β are parameters that control the relative importance of trail versus visibility. η ij is defined as visibility. d ij (i, j = 1, 2, . . . , n) is the distance between city i and city j. τ ij represents the concentration of pheromone between city i and city j at t-th iteration. allowed k is the collection of cities that ants have not traveled.

2) PHEROMONE UPDATE
After all ants have finished building the path, the pheromone on each path is updated according to (3): where ρ is a coefficient that represents the evaporation of trail after the movement.
where τ ij is the quantity per unit of length of the trail substance (pheromone in real ants) laid on edge (i, j) by the k-th ant after its movement. It is given by: where Q is a constant and L k is the tour length of the k-th ant.

B. ANT COLONY SYSTEM
Ant colony system (ACS) is an improved algorithm of Ant system (AS). ACS proposed a global pheromone update to update the pheromone in the optimal path of each iteration, which accelerated the convergence speed of the algorithm. In addition, ACS increases the pseudo-random ratio in the path construction rules, which can increase the diversity of algorithms.

1) CONSTRUCT THE SOLUTION
The application of pseudo-random ratio is added to the path construction rules, which can increase the diversity of the algorithm. See (6).
where q is a random variable subjected to a uniform distribution between 0 and 1. q 0 (0 ≤ q 0 ≤ 1) is a parameter that is adjusted experimentally. s is equal to (1).

2) PHEROMONE UPDATE
There are two types of pheromone update in ACS: local pheromone update and global pheromone update. Global pheromone updating only allows the ants on the optimal path of each generation to update pheromone, which accelerates the convergence rate of the algorithm. The global pheromone is updated to (7), and the local information is updated to (9).
where ρ is the evaporation rate of the global pheromone update. τ bs ij is the pheromone increment. L gb is the length of the global-best tour till now.
where τ 0 is the initial pheromone. ε is the evaporation rate of local pheromone update. VOLUME 8, 2020

C. MAX-MIN ANT SYSTEM
MMAS builds paths according to (1). The pheromone trail update rule is given by: (10) where τ best ij = 1/f (s best ) and f (s best ) is either the iteration-best solution or the global-best solution.
MMAS limits the size of pheromones, reduces the pheromone gap on the path, and increases the diversity of the algorithm. Its limit range is: [τ min , τ max ]. If τ ij ≤ τ min , we set τ ij = τ min ; If τ ij ≥ τ max , we set τ ij = τ max . The value of τ max and τ min are in (11) and (12).
where T gb is global optimal path.

D. INFORMATION ENTROPY
Information entropy was introduced by Shannon [28]. It is a way to measure diversity. The entropy is defined as (13).
where b is the base of the logarithm used. P(x i ) is the probability mass function. L is the threshold value of information entropy. L is adjusted experimentally. When information entropy < L, we consider algorithm falls into local optima.

E. IMITATION LEARNING
Imitation learning theory was first proposed by N. E. Miller in 1941 [29]. They believe that if the behavior of the observer is consistent with the behavior of the demonstrator, and often reinforced enough, the observer can learn to imitate. With the exchange of various disciplines, imitation learning has been proposed and applied as a mature theory. There are two main approaches: behavioral cloning [30] which learns a policy as a supervised learning problem over state-action pairs from expert trajectories; inverse reinforcement learning [31] which finds a cost function under which the expert is uniquely optimal.

F. GRID METHOD
Grid method [32] is a method of map modeling: work environment is divided into grids of the same size and the grids are numbered. As is shown in Fig. 1, building a grid map of M × M (M = 6) size. The white grids indicate activity grids that are allowed to pass. The black grids represent obstacle grids that are not allowed to pass.
The properties of the grid are represented by (14):  The correspondence between i-th grid and the central coordinate (x i , y i ) is shown by (15): where mod is the remainder operation; int is rounding operation.
Assume that the robot is in the center of the grid. It can only select its adjacent activity grid as the next position. As shown in Fig. 2, the robot has eight directions to choose from at an activity grid.
B is the grid where the robot is currently located. The next grid the robot chooses is A. d AB is the distance between grid A and grid B.
When the robot moves up, down, left, and right, A and B should satisfy (16): When the robot moves upper or bottom left, A and B should satisfy (17): When the robot moves upper left or bottom right, A and B should satisfy (18): The robot moves NL times from grid S (starting point) to grid E (end point). Then a path is generated. And the length of the path is calculated by (19): where {p 1 , p 2 , . . . , p NL+1 } is the set of grids that make up the path;p 1 = S; p NL+1 = E.

III. MULTI-COLONY COLLABORATIVE ANT OPTIMIZATION ALGORITHM BASED ON COOPERATIVE GAME MECHANISM
This algorithm includes collaborative optimization mechanism and cooperative game mechanism. Collaborative optimization mechanism is proposed in this paper. In this mechanism, central diffusion strategy is proposed to increase the pheromone near the common paths; cloning strategy is proposed in combination with imitation learning to accelerate the convergence of the CCACO; forward and backward propagators are proposed to control the convergence rate of the algorithm by adjusting the release of pheromone. The cooperative game mechanism improves the three theories of GAACO.
(1) Improved GAN model. We improve the call conditions of the GAN model in GAACO, which significantly reduces the running time of the algorithm. (2) Pheromone matrix adaptive matching strategy. A slight improvement is made on the basis of adaptive stagnation avoidance strategy.
(3) Pretreatment. Based on the GAACO's pretreatment method, the initial pheromone matrix only updates the optimal path for each population. This operation reduces the running time. The specific ideas of CCACO are as follows.

A. COLLABORATIVE OPTIMIZATION MECHANISM 1) COMMON PATH SET
The current optimal path for each colony is the optimal path for the current iteration. After all colonies build paths are completed, we compare the global optimal path with the current optimal path of each colony (this operation is not performed if the global optimal path is the same as the current optimal path of each colony). The overlapping path segments are defined as a common path set. A common path contains at least three cities. For further explanation, as shown in Fig. 3: Path0 is the global optimal path; Path1, Path2 and Path3 are part of the current optimal paths of three colonies respectively; an overlapping path segment for Path0 and Path1 is 7, 3, 11; an overlapping path segment for Path0 and Path2 is 9,  13, 16; an overlapping path segment for Path0 and Path3 is 18, 4, 15. All overlapping path segments make up the common path set.

2) CLONING STRATEGY
To improve the convergence speed of the algorithm, we propose a cloning strategy based on imitation learning. The strategy consists of two methods: Pheromone matrix cloning: each colony uses its pheromone matrix to find the path again; When the algorithm finds a path shorter than the global optimal path, the pheromone of that path is updated.
Common path cloning: uniting the cities of each common path in order. Each common path has two endpoint cities. When an ant selects one endpoint city of a common path, it will select all cities of the path and reach another endpoint city.
Take, for example, common path of three cities. Fig. 4 further depicts the strategy: the order of the common path is A, B, C or C, B, A; the ant looks for a path based on its pheromone matrix; when it reaches endpoint city A, it passes directly through city B to another endpoint city C.
To reduce the running time of the algorithm, the number of common paths is specified to be fewer than the number of cities. Suppose the number of cities is city_number. If the number of public paths is greater than city_number, select randomly city_number common paths for binding.

3) CENTRAL DIFFUSION STRATEGY
To overcome premature convergence, central diffusion strategy is proposed to increase population diversity. The strategy is selected if algorithm falls into local optimization. First, find common path set according to section III, B, (1). All cities on the common paths form a central city set T X . The purpose of this strategy is to increase the search probability of cities near the common paths, which improves the pheromone of the central cities and its nearby cities. The method of the strategy is shown in Fig. 5: each black dot represents a city; central city T (T ∈ T X ) is the center of the circle; the circle is the VOLUME 8, 2020 diffusion region of central city T ; r is the radius of the circle; city T diffuses pheromones to other cities in the circle.
Diffusion radius r is calculated by (20).
where d min is the minimum distance between cities; d max is the maximum distance between cities; r 0 is the control coefficient of radius. The quantity of diffused pheromones is inversely proportional to the distance from other cities to central city T . Pheromone diffusion formula from central city T to city j in the diffusion region is calculated by (21).
where d tj is the distance between city T and city j; τ tj is the pheromone between city T and city j; d max is the distance between the two farthest cities.
To avoid excessive accumulation of pheromone, which can lead to algorithm premature, the pheromone update range is set to [P_MAX , P_MIN ]. L f is the length of global optimal path. city_number is the total number of cities.
where L f is the global optimal path length. P_MIN is calculated by (23).

4) FORWARD AND BACKWARD PROPAGATORS
To regulate the relationship between the convergence speed and the quality of the solution in multi-colony ant algorithm, forward and backward propagators are improved. When the algorithm falls into local optimum, forward and backward propagators are called. Each colony has a forward propagator and a backward propagator. Forward propagator indicates whether a population has found a better solution than the global optimal solution in this iteration. It is calculated by (24).
where R X represents the forward propagation operator of the X -th colony; length_best_now X is the optimal solution for the If R X ≥ 0, it means that colony X didn't find a better solution in current iteration. The algorithm considers appropriately reducing the convergence speed to avoid premature convergence. If R X < 0, it means that colony X found a better solution in current iteration. The algorithm considers appropriately increasing the convergence speed.
The backward propagator is proposed based on the forward propagator. It is calculated (25).
where R (R ∈ [0, 1]) is the backward coefficient; and it is adjusted according to the experiment.
To control the convergence rate of the algorithm, the back propagation operator is used to improve pheromone accumulation formula: Eq. (4) (AS), Eq. (9) (ACS) and Eq. (10) (MMAS). The three improved formulas are as follows: As can be seen from (26), (27), (28): If R X ≥ 0, the release amount of pheromone will decrease, which makes the convergence speed of the algorithm decrease; If R X < 0, the amount of released pheromone will increase, which makes the convergence speed of the algorithm increase.

This paper improves the GAN model in GAACO.
In GAACO, discriminative model D is called every iteration. So it produces some unnecessary waste of time and space. To reduce unnecessary calls, we added a call condition. The condition is that the improved GAN model will be called when the algorithm falls into local optima. Fig. 6 shows the idea of the GAN model. The basic model is as follows: The generative model G is composed of the lengths of all the solutions from the three colonies. It is defined by (29). (29) where L i is the length of i-th ant's path, i = 1, 2, . . . , m. Discriminative model D removes the poor paths in model G and updates the pheromone of the better path. It is defined by (30). where L f is the length of global optimal path. γ is the proportional coefficient of L f . As can be seen from (29) and (30), Model G and Model D influence each other. If L i ≤ D, the pheromone is updated by (7).

2) PHEROMONE MATRIX ADOPTIVE MATCHING STRATEGY
The pheromone matrix adoptive matching strategy is proposed to improve the adaptive stagnation avoidance strategy in GAACO. For the adaptive stagnation avoidance strategy, we found that the three colonies have a certain probability of being assigned to the same pheromone matrix. The diversity of pheromone matrix may affect the optimization performance of the algorithm. Then pheromone matrix adoptive matching strategy is prosed. The strategy proposes a cooperative game model. The model is called when the algorithm falls into local optimality. The cooperative game model is as follows: A colony uses only one algorithm. After all colonies have constructed paths, each colonies will generate a pheromone matrix. An improved cooperative game model is proposed. Specifies that players cannot choose the same strategy. The three colonies are players and are defined as P1, P2 and P3 respectively. The strategies available to the three players are defined as Strategy1, Strategy2 and Strategy3. Each player chooses a strategy, which produces a payoff matrix. And the matrix is shown in Table. 1. In the table, a benefit represents the optimal solution found by a population using a pheromone matrix to construct paths.
The benefit of Pi is expressed as I i . I i is defined as (31).
When the value of I best is minimized, the corresponding pheromone matrix is assigned to each colony for the next cycle to build the path.

3) PRETREATMENT
Based on GAACO's preprocessing method, the pretreatment is proposed. The pheromone matrix records the pheromones between city paths. When the algorithm uses pheromone update rules, the pheromone matrix corresponds to the update. The initial pheromone matrix is the initial state of the pheromone matrix. Each colony has an initial pheromone matrix before the ants construct paths. Three colonies use respectively algorithms AS, ACS and MMAS to find paths N times. Each colony will produce a current optimal solution. Then we get 3 current optimal solutions from three colonies. The optimal solution of the three current optimal Algorithm 1 CCACO Algorithm for TSP 1. Initialize the pheromone and the parameters 2. Calculate the distance between cities 3. Pretreatment by (33) 4. While NE < NE_MAX 5. Construct ant solutions for AS, ACS, MMAS 6.
Find the common path set 7.
The cloning strategy is invoked to further improve the quality 8.
of the solution 9.
Calculate information entropy by (13) 10. IF information entropy is less than L 11.
Forward and backward propagators are calculated 12.
Central diffusion strategy is invoked 16.
Pheromone matrix adoptive matching strategy is called 18. by (31) and Eq. (32) 19. END IF 20. NE = NE + 1 21.End-While solutions is the initial global optimal solution. The initial pheromone matrix is updated according to (33). Pretreatment only updates three good paths, which saves running time and makes the direction of convergence more clear.
where C k is the length of optimal path for k-th colony.

C. FRAMEWORK OF PROPOSED ALGORITHM
NE records the number of algorithm loops; NE_MAX is the maximum of NE.

IV. EXPERIMENT AND SIMULATION
This experiment is simulated in the environment of MATLAB R2016a. We select various scale TSP instances to verify the performance of the algorithm. Each algorithm is executed 20 times. Section A shows and analyses the experimental data of CCACO and classical ant colony algorithms in some TSP instances. CCACO is superior to ACS and MMAS in convergence rate and optimal solution. Section B analyses the function of collaborative optimization mechanism and cooperative game mechanism. Experiments have shown that cooperative game mechanism effectively improves the convergence speed of the algorithm and the quality of solutions; collaborative optimization mechanism improves the quality of solutions.
Section C shows the comparison between CCACO algorithm and other multi-population ant colony optimization  algorithms. Experimental results show that CCACO is superior to other multi-colony ant algorithms. Especially in lin318, the advantage is more obvious.

A. EXPERIMENT ANALYSIS 1) PARAMETER SETTINGS
The parameters used in this paper are shown in Table. 2.

2) COMPARISON OF THREE ALGORITHMS
In order to more comprehensively show the performance of CCACO, we use various scale TSP instances to test. This paper analyzes the performance of the algorithm from three aspects: the optimal solution (Opt), the worst solution (Worst), average solution (Mean), Error rate, convergence iteration (Convergence) and Standard deviation. Experimental data is shown in Table. 3. Error rate is used to measure the difference between the optimal of each ACO and standard optimal solution. The Error rate is defined as (34). Standard deviation is used to measure the stability of each ACO. Standard deviation is defined by (35).
where L best is the optimal solution found by the ACO algorithm, L opt is the optimal solution for each instance.
where T is the number of times each TSP instance is tested (in this paper T = 20), l i is the current optimal solution for each experiment, L avg is the average solution of experiments. In Table. 3, CCACO is superior to classical ant colony algorithms in key metrics(the optimal solution, the worst solution and average solution). For convergence speed, the convergence speed of CCACO is significantly higher than ACS and MMAS, and the quality of the solution is better. The error rate of CCACO's experimental results remains within 1%.
In some small scale instances (Eil51, Eil76, rand100, KroA100), CCACO is superior to ACS and MMAS in all metrics. For KroB100, CCACO found the optimal solution, but ACS and MMAS did not. Only the convergence speed of this algorithm is lower than MMAS. In some medium scale instances (ch150, KroA150, KroB150), CCACO found the optimal solutions. ACS only found the optimal solution for  KroB150. MMAS did not find the best solution for them. For KroB150, CCACO has significant performance better than ACS in other metrics. In some large scale instances (TSP225, lin318), CCACO does not find the standard optimal solution. However, the optimal solution of CCACO is extremely superior to ACS and MMAS. VOLUME 8, 2020   The standard deviation shows the stability of the algorithm. As can be seen from Table. 3, most standard deviations of CCACO are smaller than classical ant colony algorithms, which indicates that CCACO is more stable.
To sum up, CCACO can quickly find high-quality solutions compared to ACS and MMAS. And CCACO is more stable. For the limitation of algorithm, the optimal solution found by CCACO is close to the standard optimal solution, but it is difficult to find the standard optimal solution in a short time. Fig. 7 shows the convergence rates and changes of CCACO, ACS and MMAS on 9 TSP instances. As can be seen from Fig. 7, the CCACO algorithm maintains a faster convergence rate than ACS and MMAS in the early stages. And CCACO converges to the optimal solution faster than the other two algorithms. In Fig. 7, the CCACO algorithm has a faster convergence rate than classical ant colony algorithms. Meanwhile, it has a better solution than the traditional ant colony algorithms. Cloning strategy increases the pheromone concentration of the common path to accelerate the algorithm convergence. Central diffusion strategy increases the probability of ants selecting cities near the central city. The improved GAN model, Forward and backward propagators regulate the convergence rate of the algorithm. The pheromone matrix adoptive matching strategy further improves the population diversity to avoid the algorithm falling into local optima.

3) OPTIMAL SOLUTION
To verify the authenticity of the data, Fig. 8 shows the optimal solution found by CCACO in several TSP instances.

B. THEORETICAL ANALYSIS
Denote CCACO without collaborative optimization mechanism as CCACO1, and CCACO without cooperative game mechanism as CCACO2. Take KroA100, for example, the experimental results are shown in Table. 4. Fig. 9 shows the effectiveness of the theories. As can be seen from Table. 3,  Table. 4 and Fig. 9, CCACO1 and CCACO2 are significantly better than ACS and MMAS; CCACO1 converges to the optimal solution faster than CCACO2; but the average solution, the worst solution and standard deviation of CCACO2 are less than CCACO1. This phenomenon shows that cooperative game mechanism effectively improves the convergence speed of the algorithm and the quality of solutions. And collaborative optimization mechanism can effectively improve the quality of solutions.

C. COMPARISON WITH OTHER MULTI-COLONY ANT COLONY OPTIMIZATION
CCACO is compared with other ant colony optimization algorithms. Table. 5 selectes some TSP experimental results for comparison. As you can see from Table. 5, the performance of all algorithms is similar for small scale instances Eil51 and KroA100. But CCACO is better than most algorithms. In the larger scale instances, CCACO can find a better solution faster than other algorithms. Especially for lin318, the solution found by CCACO is much closer to the optimal solution than other algorithms.

V. APPLICATION AND RESEARCH OF ROBOT PATH PLANNING A. SIMULATION RESULTS AND ANALYSIS
To show the superior performance of CCACO, raster maps were designed and tested in the environment of MATLAB R2016a. What's more, we designed three different scale raster maps(40 × 40, 50 × 50 and 60 × 60) for the experiment. To show the practicability of the algorithm, raster maps of 40 × 40, 50 × 50 and 60 × 60 were designed and tested in the environment of MATLAB R2016a. Table. 6shows the comparison of CCACO with ACS, MMAS. As you can see from Table. 6, CCACO's solution is better than ACS and MMAS, and and it converges faster. Fig. 10 shows the convergence process of the algorithms. Fig. 11 shows the tours of optimal solutions found by CCACO, ACS, and MMAS in 3 raster maps.

B. PRACTICAL APPLICATION
To show the practicability of the CCACO, we built a map in the real world and used the proposed algorithm to find the path. TurtleBot was created at Willow Garage by Melonee Wise and Tully Foote in November 2010. This paper uses the TurtleBot2 for path planning. In Fig. 12, subgraph (a) shows the real environment we built. Then, the TurtleBot2 uses a   hokuyo sensor to create a static map. Subgraph (b) shows the static map. The subgraph (b) is converted to raster maps using MATLAB R2016a. Set the number of ants to 10. Set number of iterations to 1000. Other parameters are set as shown in Table. 2. Subgraph (c) shows the raster map and the optimal paths using CCACO, ACS, and MMAS. In subgraph (c), the red path represents the optimal path for CCACO, and its length is 32.5563; the black path represents the optimal path for MMAS, and its length is 69.3966; the blue path represents the optimal path for ACS, and its length is 74.0163.  In conclusion, CCACO has better practicability than ACS and MMAS.

VI. CONCLUSION
This paper proposes a multi-colony collaborative ant optimization algorithm based on cooperative game mechanism. The improved generative adversarial nets (GAN) model is proposed based on the relationship between the optimal solutions and various colony solutions. The game between model G and model D not only improves the algorithm convergence speed, but also improves the quality of solution. The pheromone matrix adoptive matching strategy is proposed according to cooperative game. It selects the appropriate pheromone matrix for different colonies to improve the accuracy. Collaborative optimization strategy effectively improves the quality of the solution. Collaborative optimization strategy contains three methods: cloning strategy, central diffusion strategy, forward and backward propagators. The cloning strategy retains better paths, which effectively improves the convergence speed of the algorithm. Central diffusion strategy diffuses pheromones from central cities to nearby cities, which effectively improves the diversity of solutions. The Forward and backward propagators effectively control the release of pheromone, which effectively regulate the convergence speed of the algorithm. The experimental results show that the CCACO proposed in this paper has better superiority on TSP than the traditional single-colony ant colony algorithm and some other multi-colony ant colony optimization algorithms. At the same time, CCACO has better practicability than ACS and MMAS. The advantages of the CCACO: effectively balance the relationship between the quality of understanding and the convergence speed; effectively improve the quality of understanding; effectively improve the convergence speed of the algorithm. The disadvantage of CCACO: for large-scale problems, the operating efficiency is low and it takes a long time. In future work, we'll continue to use other ideas to improve ant colony algorithms, and demonstrate its practicability through experiments.