Game Theory-Based Optimal Cooperative Path Planning for Multiple UAVs

This paper presents new cooperative path planning algorithms for multiple unmanned aerial vehicles (UAVs) using Game theory-based particle swarm optimization (GPSO). First, the formation path planning is formulated into the minimization of a cost function that incorporates multiple objectives and constraints for each UAV. A framework based on game theory is then developed to cast the minimization into the problem of finding a Stackelberg-Nash equilibrium. Next, hierarchical particle swarm optimization algorithms are developed to obtain the global optimal solution. Simulation results show that the GPSO algorithm can generate efficient and feasible flight paths for multiple UAVs, outperforming other path planning methods in terms of convergence rate and flexibility. The formation can adjust its geometrical shape to accommodate a working environment. Experimental tests on a group of three UAVs confirm the advantages of the proposed approach for a practical application.


I. INTRODUCTION
Unmanned aerial vehicles (UAVs) have rapidly emerged with many interesting applications in both military and civilian domains [1], [2]. To carry out complicated tasks, it is required the teaming of multiple UAVs flying in formation or swarm. The cooperative control of a group of UAVs working together for a robotic task can result in such advantages as high efficiency, reliability, and flexibility, compared to the task execution with single UAVs [3]. Therefore, the problem of path planning for UAV formation control has received great interest. Indeed, path planning plays a significant role in completing flight missions for controlling multiple unmanned vehicle systems. Of importance in formation path planning for UAVs is how to avoid threats while completing a task by taking various constraints and cooperative cohesion conditions into consideration [4]. The strategy here is to plan a flight route for individual UAVs from the starting location to the goal while minimizing the total flight cost.
The associate editor coordinating the review of this manuscript and approving it for publication was Gerardo Flores .
For unmanned vehicles, the commonly-used A* algorithm can be incorporated with a prediction technique and route replanning strategy, or combined with numerical optimal control to solve the path planning problem subject to constraints [5]. The Voronoi graph-based techniques have been introduced by simplifying the space representation [6]. As mainly developed for 2D, the classical A* search and Voronoi techniques would need some extension to cope with cooperative tasks in complex 3D dynamic environments.
The artificial potential field (APF) is another technique for path planning of UAVs [7]. In this approach, the operation space is considered as a potential field that is characterized as ''attractive'' surrounding the target and ''repulsive'' in the neighborhood of obstacles. For UAV cooperative control, additional potential fields are also included to result in attraction effects for maintaining the formation configuration, and repulsion effects for inter-UAV collision avoidance. The paths are then generated when the total force acting on each UAV is induced from all the potential fields at each position. This technique can produce smooth and continuous paths.
On the other hand, it encounters the problem of local minima when the total force is equal to zero.
From the hierarchical perspective, path planning for UAVs can be considered as a high level control problem, whereby optimal and predictive control methods can be used for minimization of a cost function [8]. For multiple UAVs, cooperative path planning can be formulated into an optimization problem for single vehicles subject to multiple constraints [9]. To generate the paths, optimal control can be applied to individual UAVs and then extended to the whole group. Although this approach can solve the optimization problem under different constraints for the formation when performing a robotic task, it involves high computational complexity.
Computational evolution algorithms have been applied to multi-UAV cooperative path planning with the capability to find optimal solutions in complex scenarios [10]. This approach, not requiring discretization of the workplace, can generate smooth cooperative paths for the UAVs. However, if cooperative constraints and maneuvering tasks are not properly addressed, it may converge to sub-optimal solutions.
The key problem of UAV formation control is to resolve possible conflicts and interactions among the leader and followers or among the members of the group to maintain a desired shape under various constraints when executing a robotic task. In this context, the game theory, an important branch of mathematics for studying conflicts and interactions among rational decision-makers [11], can offer a powerful tool to determine an optimal strategy. Indeed, as a solid framework for strategic interactions among competing players, the game theory has found various successful applications, e.g., in distributed energy generations [12], cooperative spectrum sensing [13], or recently in highway maintenance [14].
During a game, to maximize the profit, players can choose to take action depending on not only their own strategy but also of others. Therefore, the best strategy is often decided on what a player expects others to do. A game, in general, can be categorized as cooperative or non-cooperative [15]. In cooperative games (CGs), several players share a common goal to win or to achieve a profit better than by playing alone. A major issue with CGs is trade-offs between stability and efficiency of the overall system [16]. In contrast, players in noncooperative games (NCGs), possessing available information of their own intentions, payoff functions, and procedural details of the game, can pursue their own strategies. For NCGs, each player in a Nash game is equally aware of other players' strategies to reach an equilibrium, which is called the Nash equilibrium. Although having information about the decisions of others, each player has to simultaneously make one's own decision in a symmetric competition, such as to find optimal control parameters [17] or to seek strategies for multiple clusters in a distributed way [18]. On the other hand, in Stackelberg games [19], players have to adopt sequential steps depending on the moves of the leading players. After the leaders take their first actions in priority, the followers can adapt their strategies accordingly.
In this paper, the path planning problem is formulated into a game with UAVs as the players. By incorporating the Stackelberg and Nash game into a two-layer framework with a defined Stackelberg-Nash equilibrium, each UAV in the group can develop a self-enforcing controller to establish and maintain a desired geometric shape. The desired formation is kept in the control layer while constraints on each UAV's path are taken into account via a cost function in the planning layer. Here, the cost function is minimized by using a new algorithm named the game theory-based particle swarm optimization (GPSO), developed for UAV cooperative path planning. Notably, our method considers cooperative constraints on the optimal paths of individual UAVs in the group to cover all requirements on formation, path feasibility, flexibility, and safety. For this, a hierarchical particle swarm optimization (PSO) algorithm is developed to determine the strategy for each UAV to reach the Stackelberg-Nash equilibrium. The resulting optimal paths can be achieved while maintaining the formation. Unlike other formation control methods, the generated paths here can be self-adjusted to reconfigure the whole group so that they can better adapt to changes in the operating environment.
The contributions and innovations of our work are threefold: (i) comprehensively formulating the cooperative path planning problem for multiple UAVs into the minimization of a multi-objective cost function, (ii) casting the solution of the minimization problem into the search for the equilibrium of a two-layer Stackelberg-Nash game, and (iii) developing new algorithms using game theory-based particle swarm optimization (GPSO) for finding optimal solutions. The results of this work enable the effective deployment of flexible and cooperative formations of UAVs to accommodate complex scenarios to effectively improve the autonomy of UAV-based task execution under harsh, hazardous, and maybe hostile conditions. This paper is organized as follows. The cooperative path planning problem is formulated in Section II. The proposed Stackelberg-Nash game and hierarchical PSO algorithms are introduced in Section III. Section IV provides numerical simulation results. Section V presents the setup and experimental validation results, followed by Section VI for the conclusion.

II. PROBLEM FORMULATION
Consider a fleet of UAVs flying in a known environment with obstacle models prescribed by cylinders as illustrated in Figure 1, wherein the inertial frame, xyz, is defined relative to the sea level with the z axis pointing upward. The flight formation has a hierarchical leader-follower structure, as illustrated in Figure 2. The structure is organized into l layers, l = 1, 2, . . . , . This structure, having advantages over the behavior-based or virtual ones in maintaining the stability of the formations under different environments, is quite popular in multi-robot coordination and often used for data acquisition and communication among UAVs [20], [21]. We denote V l a UAV at layer l and consider it as the leader of N l+1 followers at layer l + 1, which are, in turn, denoted as V (l+1)n ,  T , P 2 T , . . . , P S T ] T , where S is the total number of UAVs in the team and P n = (x n , y n , z n ) T is the position of the n-th vehicle, n = 1, 2, . . . , S. To define the formation shape, a set of reference positions P r = [P 1 r T , P 2 r T , . . . , P S r T ] T is given, where P n r = (x n r , y n r , z n r ) T is the reference position of the n-th UAV. The desired vector between two neighbors n and n is computed as P n r n r = P n r − P n r .
In a cooperative path planning problem, the aim is to find optimal paths for all UAVs from their starting points to target locations, fulfilling all requirements imposed by constraints on formation shape, path length, threat avoidance, and turning angle limit. We approach this problem with a cost function for each UAV in the team by incorporating those constraints to formulate the cooperative path planning into an optimization problem. The cost function associated with UAV n thus has the form: where X n represents the path of UAV n , J i (X n ) is the cost corresponding to constraint i, ω ni is a weighting factor, and η is the number of constraints. The path X n is defined by a set of K nodes, represented by waypoints P n (k) = (x n (k), y n (k), z n (k)) T , k = 1, 2, . . . , K , that connect the flight path of UAV n . The cost function J i (X n ), i = 1, 2 . . . , η, for each constraint is determined in the following.

A. FORMATION CONSTRAINT
The formation constraints are determined from the desired structure of the geometric shape and interactions among UAVs. Using the graph theory, we define UAV n as a vertex v n and its interconnection with UAV n as an edge m = (v n , v n ) of a directed graph G. Let V = {v 1 , v 2 , . . . , v S } be the set of vertices and E = { 1 , 2 , . . . , M } be the set of edges, the graph G can be represented as G = (V, E). To establish the formation, the graph must be connected, i.e., for any two vertices (v n , v n ) ∈ V, there exists an interconnection between them, or an edge in E. This interconnection is weighed by µ nn as our graph is an edge-weighted graph. The incidence matrix D of the graph has the dimension of S ×M , where its element is equal to 1 if the UAV is the head of an edge, −1 if the UAV is the tail of an edge, and 0 otherwise. The formation error for edge (v n , v n ) is computed from P n − P n − P n r n r . The total formation error can be expressed via the incidence matrix as Define L = DW D T as the Laplacian of the graph G, which is symmetric and positive semi-definite,L =DŴD T = L ⊗ I 3 is also symmetric and positive semi-definite [22]. The formation error can be rewritten as E = ||P − P r || 2L . The cost function coming from the formation constraint for UAV n is then defined as ( Apart from maintaining the formation, it is also important to avoid intervehicle collisions among the UAVs. Letd n (k) be the Euclidean distance from UAV n to its nearest neighbor at waypoint k, i.e., r n and d s be respectively the radius of UAV n and its safety clearance. To avoid collisions, the distance between a UAV to its nearest neighbor should then be greater than the sum of the clearance d s and double of the UAV radius r n . Accordingly, the cost function (3) is revised as, where the infinity assigned to a cost function implies an infeasible solution so that node k, at risk of collision, should not be considered by an optimizer.

B. PATH LENGTH COST
When planning a path, it is required that its length is minimized to save time and energy, especially for a low-cost UAV. In autonomous operations, a path typically includes a list of waypoints uploaded to the UAV as references for the flight controller to track [23]. With K waypoints, it can be represented by a set of K − 1 line segments connecting the waypoints as shown in Fig.1. The path length is then simply the sum of those segments. Denoting P n (k) as waypoint k of path n, the cost representing the length of path n is then computed as: C. OBSTACLE AVOIDANCE During operation, each UAV needs to avoid collision with not only other UAVs but also obstacles in its working environment, such as trees or buildings. In this work, each obstacle is considered as a threat modeled by a cylinder with radius r τ . Let p k of coordinates (x k , y k ) be the projection of waypoint P(k) on the Oxy-plane, the distance d(k) from the center of the obstacle C(x c , y c ) to the path segment k is determined as the shortest distance from C to p k p k+1 . If p k+1 − p k = 0, d(k) is calculated as the normal distance from point C to the line connecting two points p k and p k+1 as shown in Figure 3, or the smaller distance to those points in the case they are located at one side of the obstacle. Thus, d(k) is computed as where The threat cost J 3 (X n ) is then calculated across waypoints P n (k) for T obstacles as below: D. ALTITUDE CONSTRAINT In many applications, such as surface inspection, it is essential that the UAVs fly within a certain altitude range limited by the minimum and maximum heights, h min and h max , respectively. Let h n (k) be the relative height of the UAV with respect to the ground. The desired flying height for it is thenh = 0.5(h min + h max ). Thus, the altitude cost can be computed as

E. SMOOTHNESS
Due to motion restraints of UAV dynamics, they are unable to make a sharp turn, and thus it is required for the algorithm to generate smooth paths. This can be achieved by limiting changes in the turning and climbing angles. The turning angle between two consecutive segments, θ n (k), is computed as where −−→ p n (k) = Proj /Oxy {P n (k + 1) − P n (k)} is the projection of segment (P n (k + 1) − P n (k)) on the plane Oxy.
The climbing angle, ϕ n (k) between the path segment (P n (k + 1) − P n (k)) and −−→ p n (k) is calculated as: The smoothness cost can then be defined as where β 1 and β 2 are the penalty coefficients of the turning and climbing angles, respectively.

III. GPSO DEVELOPMENT
Given the cost function J (X n ) defined for each UAV, the cooperative path planning becomes finding paths X n , VOLUME 10, 2022 n = 1, . . . , S, to simultaneously minimize J (X n ). Since the value of J (X n ) depends on the path X n generated for UAV n itself as well as other paths of the remaining UAVs in the team, finding optimal solutions remains a challenging problem. The game theory concept of resolving conflicts and handling interactions has been well-recognized [24]. Aligning well with cooperative path planning in robotics, it can be promising for a game-theoretic framework to solve the formation control problem with the development of a suitable optimization tool. To deal with the problem of cooperative path planning for UAVs, we propose here a game theorybased particle swarm optimization (GPSO) approach consisting of two steps. In the first step, a Stackelberg-Nash game is formulated from the cooperative path planning problem. A hierarchical PSO-based algorithm is then developed to solve for the Stackelberg-Nash equilibrium to obtain the optimal paths.

A. STACKELBERG-NASH GAME FOR COOPERATIVE PATH PLANNING
We first introduce some definitions. Definition 1: Each UAV in the group is defined as a decision-maker or player. Hence, the leading and following UAVs are considered respectively as the leading and following decision-makers.
Definition 2: A strategy of player UAV n is its path X n . Definition 3: The payoff for player UAV n is its cost function J (X n ).

1) STACKELBERG GAME
The Stackelberg game is a model aiming to resolve the asymmetric competition among a leading decision-maker and some following decision-makers. In this game, the leader conducts his movement first. The followers then decide their strategies to respond to the leader's decision [25]. Here, the Stackelberg game can be used to model interactions among the UAVs. Now, the Stackelberg game for UAVs can be described as F) is a set of players with leading player L and following players F defined as a subset F = (F 1 , F 2 , . . . , F N ). The pair (S l , S f ) stands for the strategy sets of the leading player, S l , and followers, S f . They are defined respectively as S l = (X l 1 , X l 2 , . . . , X l ), where X l σ is the decision strategy made by the leader (σ = 1, 2, . . . , ), and S f = (S f 1 , S f 2 , . . . , S f N ), where S f n = (X n 1 , X n 2 , . . . , X n ) represents all decision strategies made by the n-th follower. The set (J l , J f ) is the players' payoffs.
Let * X l and * . . , * X f N ) be respectively the best strategy of the leader and of the followers, the Stackelberg equilibrium is defined as The relation X f (X l ) represents the strategy X f of the following players as a function of the leader's strategy X l . From (14a) and (14b), * 2) NASH GAME Apart from interactions between the leader and followers, those among the followers should also be taken into account. We use the Nash game for their modeling, making use of the symmetry in the roles of the players [26]. In a Nash equilibrium, each player is assumed to know the best strategies of other rivals, and no player can gain more payoff by changing only their own plan. The Nash equilibrium thus provides an approach to obtain optimal results for all symmetric players, which represent the following UAVs. The Nash game can be expressed as G N = F, S f , J f . The Nash equilibrium is defined as * is the optimal strategy set of X n 's rivals. The Nash equilibrium is obtained as * 3) STACKELBERG-NASH GAME By combining the two above models, the cooperative path planning problem for multiple UAVs can be represented by a Stackelberg-Nash game as illustrated in Figure 4. The game is expressed as The Stackelberg-Nash equilibrium, * To find * S, a hierarchical PSO-based algorithm is developed as described in the following.

B. HIERARCHICAL PSO FOR STACKELBERG-NASH GAME
From the above, the UAV cooperative path planning is reduced to finding the strategy * S of a Stackelberg-Nash game that fulfills all requirements of conditions (19a) and (19b) to bring the game to its equilibrium. However, simultaneously solving inequalities (19a) and (19b) is challenging and even impractical for analytical methods as involving nondifferentiable cost functions J (X n ) and dependent variables X f (X l ). Instead, heuristic optimization techniques based on swarm intelligence are more viable and computationally efficient for this problem. Among heuristic optimization techniques, PSO has been widely used for path planning problems as being effective in the optimal search. This is owing to the balance between exploration and exploitation of swarm particles [23], [27], [28]. The main advantages of PSO include its robustness, fast convergence, and computational efficiency, making it thus suitable for optimizing interactions among players in a game theory framework. In this paper, a hierarchical optimization algorithm based on PSO is developed to find the best strategies for the Stackelberg-Nash equilibrium. Those strategies represent the optimal paths of the UAVs.

1) PSO AND LEADER-FOLLOWER PATH PLANNING
Particle swarm optimization (PSO) is a population-based method that exploits a population of individuals to probe promising regions of the search space for optimal solutions. In this context, the population is called a swarm, and the individuals are called particles. Each particle moves with an adjustable velocity to find its best position. Besides, the best position of the swarm is shared among all individuals to navigate their direction.
Consider a D-dimensional search space, S ∈ R D , and a swarm consisting of N p particles. The position and velocity of particle i are D-dimensional vectors denoted as X i = (x i1 , x i2 , . . . , x iD ) T and V i = (v i1 , v i2 , . . . , v iD ) T , respectively. The best position encountered by particle i is a point in S denoted as Q i = (q i1 , q i2 , . . . , q iD ) T . Let g be the particle that has the best position among all individuals in the swarm at time t. The evolvement of the swarm is updated by the following equations: where c 0 is an inertia factor, c 1 and c 2 are, respectively, the cognitive and social coefficients, and r 1 and r 2 are random samples uniformly distributed in the interval [0, 1]. For path planning with PSO, we encode the position of a particle by flight path X n . The whole swarm thus includes N p candidate path solutions and their evolvement results in the optimal solution. To better explore the search space, we use the recently-developed spherical vector-based particle swarm optimization (SPSO) algorithm [23]. The SPSO uses spherical coordinates to describe the nodes of a flight path so that it can exploit the correspondence between spherical variables and maneuverable parameters of the UAV to speed up the search process. Here, we propose a game theory-based hierarchical approach to extend the SPSO for cooperative path planning involving multiple UAVs during the search for the Stackelberg-Nash equilibrium.
Since there are conflicts and interactions among the leader and followers' strategies within these two objectives expressed in (19a) and (19b), the proposed GPSO algorithm to obtain the overall Stackelberg-Nash equilibrium also covers two loops in general. The pseudo-code for the whole hierarchy of leader-follower optimization is described in Algorithm 1. Therein, to incorporate the interactions within the hierarchy, from line 1 to line 6 of the inner loop, optimal strategies for the followers are first obtained by solving the Nash equilibrium, wherein the leader's strategy X l remains unchanged. The optimal strategies of the followers * X f (X l ) obtained are then used to find the best strategy for the leader at the outer loop overall from line 1 to line 9. Further explanations of both the inner and outer loop are presented respectively in Algorithms 2 and 3 in the following. * X l ; 10. Substituting X l = * X l into * X f (X l ); 11. Obtain * X f ( * X l );

2) HIERARCHICAL GAME-THEORETIC PSO IMPLEMENTATION
As per the optimization hierarchy designed for the Stackelberg-Nash game equilibrium, PSO algorithms are developed to implement the framework and obtain optimal solutions. The implementation procedure consists of four steps: Nash game initialization, inner-loop optimization, Stackelberg game initialization, and outer-loop optimization.

a: NASH GAME INITIALIZATION
Initially, the parameters of the PSO and random strategies of the followers are generated at a given strategy X l of the leader. The followers' payoffs are then optimized to obtain their best VOLUME 10, 2022 strategies, * X f (X l ), with respect to the leader strategy. At this stage, the PSO algorithm is implemented in the inner loop, corresponding to the Nash game for followers. In this inner loop, parameters of the PSO such as weights c 0f , c 1f , c 2f , number of iterations maxIt f , and number of particles nPop f for the followers are first selected for initialization.

b: INNER LOOP OPTIMIZATION FOR FOLLOWER STRATEGIES
After having the leader's strategy X l and initializing random strategies for the followers, the cost of follower n at each iteration can be described as is the best strategies of the other followers previously. By minimizing this cost, the optimal strategy * X f n (It f ) of player n can be achieved correspondingly. Based on * X f n (It f ) recorded after each iteration, the strategy of the remaining followers is then adjusted according to swarm dynamics (20) and (21). Eventually, the inner loop optimization process will yield the best strategies * X f (X l ) of all followers for strategy X l of the leader. At the termination of the inner loop, the Nash equilibrium is obtained for all the followers since none of the players can gain benefit just by altering its own strategy. The pseudo-code for the optimization implementation to achieve the Nash equilibrium is presented in Algorithm 2.
The followers' optimal strategies * X f (X l ) resulting from the Nash game are then fed to the outer loop for Stackelberg game optimization.

Algorithm 2 Inner-Loop PSO Implementation for Followers Path
Require: Leader's strategy X l ; 1. Initialize PSO parameters: c 0f , c 1f , c 2f , maxIt f , nPop f ; 2. Set It f = 0, generate random follower's strategies; 3. Obtain the initial optimal follower's strategies * The GPSO algorithm is initiated with a search map and path planning information depending on the task requirements and operating environment. The swarm for the outer loop of the optimization process are initialized with parameters including weights c 0l , c 1l , c 2l , number of iterations maxIt l , and population nPop l .

d: OUTER LOOP OPTIMIZATION FOR LEADER STRATEGY
From the optimal strategies of the followers * X f (X l ), the payoff J l (X l , * X f (X l )) is obtained to evaluate the leader's profit.
It is then used to adjust the leader strategy according to equations (20) and (21) of the PSO. This strategy is then fed back to Algorithm 2 to start a new cycle in the inner loop for optimization. The optimization process terminates when the maximum number of iterations is exceeded or no profits are gained. Finally, the best strategies * X f ( * X l ) of the followers with respect to the best leader strategies are obtained as an overall result of the game theory-based optimization.
The pseudo-code for the optimization process to achieve the Stackelberg-Nash equilibrium is presented in Algorithm 3.

Algorithm 3 Outer-Loop PSO Implementation for Leader Path
Require: Search map and initial path planning information; 1. Initialize PSO parameters: c 0l , c 1l , c 2l , maxIt l , nPop l ; 2. Set It l = 0, generate random leader's strategies, X l ; for It l = 0 : maxIt l do 3. Run Algorithm 2 to obtain * X f (X l ); 4. Calculate the the leader's profit, J l (X l , * X f (X l )); 5. Record * X l (It l ); 6. Update X l (It l ); end for 7. Obtain ( * X l , * X f ).
The converged outcomes eventually present the global Nash-Stackelberg equilibrium because the leader's strategy is optimized in the outer loop, which covers optimal strategies of all followers in the inner loop given a path of the leader. However, some local optima may be obtained if the results neither converge to a single point nor have oscillatory behavior. In that case, the Nash-Stackelberg equilibrium does not exist. Since the leader's and followers' strategies are generated randomly at the first iteration, Algorithm 1 should be run several times to obtain histograms for the converged cost values, for example, subject to a given threshold.

C. SPEED PROFILES FOR UAV COOPERATION
From Algorithms 1-3, safe and optimal paths can be generated for all vehicles of the formation, defined by a set of K nodes as discussed above. Since distances between two arbitrary nodes and the total path lengths of the UAVs could be different, they would not reach waypoints and targets at the same time by using a constant speed to establish the formation. Therefore, after proceeding with the proposed GPSO hierarchy to obtain waypoints, it is required to apply suitable speed profiles for the UAVs to reach the waypoint and target positions. To achieve this goal, velocity is used as an independent variable along with the generated paths. The following algorithm is therefore developed to compute the required speed profile for each UAV path.
Let T k , k = 1, 2, . . . , K +1, be the time required for a UAV to travel over the flight segment k. For timely cooperation and synchronization, T k should be the same for all UAVs. This leads to the calculation of speed profiles as presented in the pseudo-code of Algorithm 4. Given a reference speed, ν ref , the reference time T k required to complete the longest path segment, l max k , among all the k-th segments is first determined. The flight speed, ν n (k), of UAV n for path segment k is then calculated proportionally to the segment length, i.e. ν n (k) = l n (k)/T k , where l n (k) is the length of path segment k for UAV n .

IV. SIMULATION RESULTS
This section presents the simulation results of the proposed algorithm. Due to similarities in the multi-layer hierarchical structure of UAVs, we conduct simulations for a general layer to evaluate the path planning performance. Generally, a layer consists of one leader V m and several followers V (m+1)n , n = 1, 2, . . . , N . In the simulation, we consider two commonly used formation shapes, including the equilateral triangle and diamond formations [29], [30]. However, it should be noted that there are no restrictions on the choice of the formation shape.

A. EVALUATION WITH EQUILATERAL TRIANGLE FORMATION
In this scenario, the objective is to generate paths for three UAVs flying in an equilateral triangle formation. The incidence matrix D 1 is thus defined as The  20 20]. The formation reference was given according to the target position, i.e. P m r = P m (end), P (m+1)1 r = P (m+1)1 (end), and P (m+1)2 r = P (m+1)2 (end). The parameters of the hierarchical PSO were set by the trial-and-error as c 0 = 0.98 and c 1 = c 2 = 1.5. Both the outer and inner loops of the PSO run with 150 particles and 100 iterations. In the total cost function of the leader, weight factors were chosen as [ω l1 , ω l2 , ω l3 , ω l4 , ω l5 ] = [0.01, 10, 100, 1, 1] in order to obtain a short distance avoidance-free path. Meanwhile, weight factors in the total cost function of the followers were chosen as [ω f 1 , ω f 2 , ω f 3 , ω f 4 , ω f 5 ] = [10, 0.01, 100, 1, 1] in order to obtain a formation-maintaining avoidance-free path. The number of waypoints for each path was set as K = 10, excluding the start and target positions.
To show the merit of the proposed framework using the Stackelberg-Nash game for UAV cooperative path planning, we have compared it with a distributed algorithm based on the Stag Hunt game approach [31], [32]. The paths of the three UAVs are shown in Figure 5. As can be seen in Figure 5a, the triangular formation with the Stackelberg-Nash game is maintained throughout the flight while both intervehicle collisions and obstacle collisions can be avoided. The Stackelberg-Nash equilibrium is reached after 50 iterations for the leader and 55 iterations for the followers at the convergence of cost values as depicted in Figure 6a. Compared to the Stag Hunt game, our algorithm converged faster and with smoother paths. The stable convergence can be confirmed in Figure 7 showing histograms of the final cost values in 35 trials.

B. EVALUATION WITH DIAMOND FORMATION
To further evaluate the performance of the proposed cooperative path planning algorithm, we compared it with stateof-the-art methods that considered the group of UAVs as a rigid body, and path planning was obtained for a virtual vehicle located at the centroid of the group [33], [34]. In this comparison, we considered a task in which four UAVs were required to move in a diamond formation. The incidence matrix D 2 was determined as Similarly to the triangular formation case, the weights for interconnections among UAVs were chosen as The map used was an area of Christmas Island in Australia obtained from a real digital elevation model (DEM) map [35], as depicted in Figure 8. It has the size of VOLUME 10, 2022     The GPSO parameters are remained as presented in the previous simulation for an equilateral triangle formation.
Comparative results for Scenario 1 with four obstacles can be obtained as shown in Figure 8. It can be seen that both methods, the proposed GPSO and rigid body planning, are able to generate collision-free paths with the formation being well-maintained. However, the game theory-based algorithm is capable of shrinking and enlarging the formation shape depending on the presence of obstacles and thus results in a shorter path, 4391.0 m, compared to the rigid body method, 4397.8 m. This advantage is more prevalent in a more complex situation, Scenario 2, as shown in Figure 9. In this scenario, the UAV formation generated by the rigid body method cannot flight through narrow space between the obstacles so it has to travel a long distance, 5228.0 m, to avoid them (see Figure 9b). The game theory-based method, on the other hand, can change the formation size to generate an optimal path with the length of 4389.1 m.
To evaluate the capability of split and merge of UAV n at waypoint k, a formation flexibility index (FFI), ξ n (k), is introduced. It is computed as a ratio of the formation displacement and its corresponding formation reference. For UAV n , it is defined as For the rigid body formation technique, the FFI is zero, i.e., the team of UAVs is inflexible. Meanwhile, a small FFI of less than 0.1 can be obtained for the UAV diamond in Scenario 1. In Scenario 2, the second follower relaxes its formation constraint between the 5-th and 8-th waypoints to avoid obstacles, resulting in the higher FFI of 0.28. This value presents the capability to split and merge the UAV team and further demonstrates the advantage of the proposed approach.
In terms of computational complexity, the proposed method requires running the inner-loop PSO for each leader's strategy to achieve the Stackelberg-Nash equilibrium, resulting in a higher computational cost, particularly for more players and/or with an increased number of constraints. However, since the path planning algorithm is carried out offline, this cost is worth for better overall optimal results.

V. EXPERIMENTAL VALIDATION
In this section, we describe the testbed and the experiments conducted to verify the feasibility and effectiveness of the proposed GPSO algorithm.

A. EXPERIMENTAL SETUP
For experiments, our setup includes three 3DR Solo drones with remote controllers, a ground control station, and communication hardware, as shown in Figure 10. Each drone is equipped with one ARM Cortex A9 processor for running the Arducopter flight operating system and two Cortex M4 168 MHz processors for low-level control. For data acquisition, cameras, laser scanners, or environmental sensors can be attached to the onboard computer of the UAV depending on applications. Communication between the ground control station and the drone is carried out via a private network called 3DR Link Secure WiFi created by modules integrated into the remote controller of the drone [34]. The ground control station used the software named QGround Control to upload the planned path to the drones, fly them autonomously, and download logged data for analysis. In the experiments, the default PID controller was implemented in the 3DR Solo.
The UAV equilateral triangle formation was tested on a park set up with four obstacles as shown in Figure 11. The   20 20]. The corresponding target and reference positions were chosen as P m r = P m (end) = (85; 88.66; 20), P (m+1)1 r = P (m+1)1 (end) = (80; 80; 20), and P (m+1)2 r = P (m+1)1 (end) = (90; 80; 20). The coordinates of waypoints are first converted into their longitude, latitude, and altitude. The speed profiles of the UAVs are then computed according to Algorithm 4, with V ref = 1(m/s). After that, the waypoints and corresponding speeds are uploaded to the drones using the QGround Control software for their autonomous execution. Figure 12 shows the drones forming a triangular shape when performing a bridge inspection task. The objectives of path planning and formation maintenance of the three drones are achieved with our proposed GPSO, as shown in the experimental results of Figure 11, in which the solid lines are the planned paths and the dotted lines are the actual paths of the UAV triangle.

B. VERIFICATION RESULTS
To evaluate tracking performance, Figure 13 presents the position errors and speed profiles of the UAVs. It can be seen that all the three UAVs can track their planned path with small tracking errors of 0.4m to 0.6m maximally. In fact, those tracking errors are mainly caused by the GPS positioning  errors rather than by the tracking controller of the drones themselves. The figure also depicts that all measured speeds accurately track their corresponding desired values generated from the GPSO path planning calculations. This, again, confirms that all UAVs reach the waypoints while maintaining well the formation configuration.

VI. CONCLUSION
In this paper, we have presented a novel method based on the game theory and particle swarm optimization algorithms for the cooperative path planning problem of multiple UAVs navigating in a desired geometric configuration. The UAV collaborative path planning problem is solved by finding the equilibrium of a Stackelberg-Nash game. A hierarchical optimization framework using PSO was integrated to find the game equilibrium by minimizing a global cost function while simultaneously taking into account constraints for the desired shape as well as path length, collision avoidance, altitude, and smoothness. The proposed GPSO cooperative path planning approach can generate a safe path for each UAV in the team with a flexible formation size. Extensive simulation and comparison results are provided for evaluating its performance. Field experiments have also been conducted to demonstrate the feasibility and effectiveness of the proposed method in various scenarios. Our future work will focus on improving the algorithm for online cooperative path planning of UAVs in the presence of dynamic obstacles.