A Task-Driven Sequential Overlapping Coalition Formation Game for Resource Allocation in Heterogeneous UAV Networks

A heterogeneous unmanned aerial vehicle (UAV) network where UAVs carrying different resources form coalition and cooperatively carry out tasks is of crucial importance for fulfilling diverse tasks. However, the existing coalition formation (CF) game model only optimizes the composition of UAVs in a single coalition, which results in disjoined coalitions. In order to tackle this issue, a sequential overlapping coalition formation (OCF) game is proposed by considering the overlapping and complementary relations of resource properties and the task execution order. Moreover, different from the Pareto and selfish orders, a bilateral mutual benefit transfer (BMBT) order is proposed to optimize the cooperative task resource allocation through partial cooperation among overlapping coalition members. Furthermore, using the preference relation between UAVs carrying resources and tasks requiring the same type of resource, a preference gravity-guided tabu search (PGG-TS) algorithm is developed to obtain a stable coalition structure. Numerical results verify that the proposed PGG-TS algorithm increases the average utility of tasks by 12.5% and 38.5% compared with the split-merge preferred OCF algorithm and non-overlapping CF algorithm, respectively. The utility of the proposed BMBT order increases by 25.1% and 34.3% compared with selfish and Pareto orders, respectively.


INTRODUCTION
W ITH the rapid development of aerospace and wireless communication systems, unmanned aerial vehicles (UAVs) have been developed for various large-scale and complex tasks without terrain restrictions such as battlefield operations, search and rescue under disaster circumstances, reconnaissance and monitoring, emergency charging, and alike [1], [2], [3], [4] due to their versatility, cost efficiency, intrinsic maneuverability, and flexible deployment features [5], [6]. Although task execution by UAVs brings more opportunities, challenges such as task assignment and resource allocation are required to be overcome in a multi-UAV network. Most of the existing works adopted the oneto-one task assignment way for optimizing the task performance of UAVs [7], [8]. However, a single UAV may not have sufficient resources to accomplish tasks alone. Hence, a group of UAVs (coalition) needs to cooperate and gather resources to satisfy resource requirement of tasks [5], [9], [10]. A few important researches have investigated scenarios where each UAV coalition selects a task and performs it under a non-overlapping coalition formation (CF) game, where the goal of coalition formation is to complete the assigned tasks as soon as possible under resource constraints [11], [12], [13].
Although UAVs forming coalitions to perform tasks cooperatively is promising, there are still some challenges ahead such as more detailed task resource allocation, task execution schedule, and the cost of UAV coalitions to carry out tasks. Recent advances in UAV technology allow a single UAV can carry a lot of equipment and may have to use various types and different quantities of resources to perform a task [7]. For instance, UAVs can perform the ground reconnaissance and strike operation tasks on the battlefield [2]. In civilian area, UAVs can be used to assist mobile edge computing or provide coverage to ground users by exploiting multiple UAVs as wireless base stations [5]. Therefore, more detailed differentiation of the attributes of resources required for different types of tasks is needed, and then the resource allocation is determined by the capabilities of UAVs to increase the effectiveness of task execution. In this case, the UAVs can actually allocate their resources to multiple tasks and then perform multiple tasks in overlapping coalitions based on task schedule, which is different from the situation that each UAV coalition performs one task in a non-overlapping CF game. Hence, this paper investigates the problem of cooperative task execution in a heterogeneous multi-UAV network by forming an overlapping resource allocation coalition structure.

Related Works and Motivations
Recently, low-cost and highly-flexible UAVs have been widely used to perform complex tasks. In [6], the authors proposed a UAV-enabled mobile edge computing system, in which the UAVs collect data from sensors for forest fire monitoring. To solve the problem of the limited communication quality of ground relay, the authors in [3] proposed a UAV relay-assisted communication network based on matching game to accomplish the data collection tasks. To ensure the efficiency of task execution, the CF game has been used to assign tasks appropriately in multi-UAV networks [12], [13], [14], [15]. In [12], the authors proposed a two-stage optimal CF algorithm that allocates an appropriate number of UAVs to perform tasks. To improve the cooperative reconnaissance and attack performance of heterogeneous multiple-UAV networks with communication constraints, the authors in [15] investigated a novel coalition formation method to find the appropriate coalition structure.
These works mainly focused on the UAV coalition formation according to task requirements. In fact, most tasks are not simply executed one-to-one by UAV coalitions, but there are some task execution schedules based on task attributes. As shown in Fig. 1, four types of task execution schedules based on a CF game in a multi-UAV network are investigated. In Fig. 1a, each UAV decides which task coalition to join. Then, the UAVs form multiple non-overlapping coalitions to perform the tasks separately. Based on this task execution schedule under CF game, in [16], the authors investigated the joint auction and coalition formation problem under a UAV-relay Internet of Vehicles (IoV) scenario. In [14], the authors proposed a CF game model based on a reputation-based schedule to ensure reliable cooperation of UAVs when there are some self-interested UAVs in networks. Considering the communication demands of members within a coalition, the authors in [13] investigated the task assignment problem in a multi-UAV network by jointly optimizing the coalition selection and spectrum allocation. These works all assumed that one UAV can only perform one task [9], [13], [16]. However, it is reasonable that a UAV can perform other tasks after completing one. In Fig. 1b, the UAVs and tasks are assumed as players and can decide how to form a coalition. Then, the UAVs form multiple nonoverlapping coalitions and sequentially perform tasks. In [17], the authors investigated that agents can collect data from located tasks in wireless networks, where agents and tasks were assumed as players based on a CF game that can decide whether to join or leave a coalition. Hence, the agents were required to perform multiple tasks in the same coalition. Subsequently, the authors proposed a hedonic CF game model in UAV-aided wireless networks in their extended version [18]. In Fig. 1c, UAVs form multiple nonoverlapping coalitions. Then, one coalition executes multiple tasks at the same time. In [19], the authors studied the problem of task cooperation in multi-UAV networks, where the UAVs are equipped with various devices and can perform multiple tasks such as data collection and picture capture. The authors in [20] quantified the sub-task types of resources required, and the tradeoff between task completion benefit and energy loss was taken into consideration.
Due to geographical location and task property factors, a UAV may not be able to perform multiple tasks at the same time in some scenarios [22]. Considering the impact of task execution time on task performance, it is necessary that the UAVs priority to performing the task with larger value to improve the task performance. Therefore, the task execution sequence is set to determine the order of task execution. In [22], the authors investigated an energy-efficient trajectory optimization algorithm to solve the cooperative task resource allocation problem under the sequential task execution schedule. However, the process of performing tasks by UAVs was assumed uncorrelated in the above works [13], [16], [17], [18]. On one hand, carrying out one task requires multiple UAVs to provide different resources to complement. On the other hand, multiple UAVs can provide the same resources to improve the performance of corresponding tasks. Hence, the overlapping and complementary relations of resource properties between UAVs and tasks are investigated. In Fig. 1d, UAVs form multiple overlapping coalitions by allocating a part of their resources to multiple tasks. Then, UAVs join different coalitions to execute multiple tasks sequentially. For example, after completing the task of coalition 1, UAV 1 performs the task as a member of coalition 3. In this case, the nonoverlapping CF game model is no longer applicable to such sequential task execution schedule. In [23], the authors investigated the problems of dynamic task assignment and resource allocation in a multi-UAV network based on the task sequence schedule. The authors in [24] mapped the task assignment problem to a two-sided matching problem, and proposed a genetic algorithm to optimize the trajectory of UAVs. However, existing works only addressed a single UAV that performs multiple tasks based on a sequential task execution schedule. Hence, UAVs forming overlapping coalitions to complete multiple tasks based on a sequential task execution schedule needs to be further investigated, which, to the best of our knowledge, is still an open problem. To clearly indicate the difference between the sequential overlapping coalition and the main related schemes, a comparison in terms of multiple performance metrics is provided in Table 1. If the task execution schedule satisfies the corresponding attribute, it is marked with " ✓"; otherwise, with "Â".
Note that preference order is an important notation in the OCF game that is used to compare the performance of two coalition structures, and the Pareto order is a common preference order that guarantees that all members' benefits are not damaged [25], [26]. To avoid falling into local optimum under the strong constraints of the Pareto order, the selfish order is applied in [17], [27] that only focuses on the individual utility. In general, the Pareto order is too restrictive that results in limited utility, and the selfish order only pays attention to individual utility that results in local optimum.
In addition, the studies on task execution by UAVs mainly focused on task completion while ignoring the task completion time and the energy consumption of UAVs [13], [14], [16]. With limited energy available for flight, UAVs are expected to complete tasks timely and efficiently with as little cost as possible. Hence, completion time and energy consumption are also important factors in estimating the performance of task allocation results. Therefore, the benefits and costs of task execution should be comprehensively considered for the performance of task completion.

Contributions and Organization
Motivated by the above observations, the main contribution of this paper is to develop a novel sequential OCF game based on the sequentiality of task execution in heterogeneous and cooperative multi-UAV networks. Compared with the existing CF games that assume non-overlapping coalition, the proposed sequential OCF game in this paper enables UAVs to make more flexible resource allocation decisions so as to improve the task execution performance. To our best knowledge, this is the first work that studies overlapping coalition formation in a task-driven heterogeneous and cooperative UAV network. To balance the benefits and costs of task execution, this paper combines task completion degree, task waiting time, and flight energy consumption of UAVs to determine the utility of task execution. Note that the OCF game model defines the coalition structure as the resource allocation of each UAV, leading to large decisionmaking space and unguaranteed performance of frequentlyused algorithms such as the best response algorithm. Hence, taking the preference relation between UAVs carrying resources and tasks required the same type of resource into consideration, a preference gravity-guided tabu search (PGG-TS) algorithm for overlapping coalition formation is proposed in this paper to improve the utility of task execution. Specifically, the main contributions of this paper are as follows: A sequential OCF game and resource allocation approach is proposed, where the UAVs that allocate resources to the same task form a coalition to perform the task cooperatively. Note that a UAV can allocate its resources to multiple task coalitions, which contributes to the formation of an OCF structure. Hence, compared with the non-overlapping coalition model that only decides the UAVs to leave or join tasks, this model is more flexible in task resource schedule. A bilateral mutual benefit transfer (BMBT) order is proposed. Different from common orders, among which the Pareto order is too restrictive and the selfish order only focuses on individual utility, the proposed BMBT order cares more about the sum utility of related coalitions. Cooperation with other UAVs in the same coalition is encouraged. It is also proved that the proposed OCF game model under the BMBT order is an EPG that has at least one pure NE, known as the stable coalition structure. A relatively low-complexity preference gravityguided tabu search (PGG-TS) algorithm is proposed for distributed overlapping coalition formation. Under the proposed PGG-TS algorithm, UAVs can make exchange operations cooperatively, with which a close-to-optimal solution can be achieved. The results of this paper indicate that the proposed OCF scheme under PGG-TS algorithm outperforms other OCF and CF schemes. The impacts of parameter setting in task-driven heterogeneous UAV networks on network performance and convergence are demonstrated through the comparisons with other algorithms. Numerical results show that the proposed PGG-TS algorithm increases the average utility of tasks by 12.5% and 38.5% compared with the split-merge preferred OCF algorithm and non-overlapping CF algorithm, respectively. The utility of the proposed BMBT order increases by 25.1% and 34.3% compared with selfish and Pareto orders, respectively.  Fig. 1a [14], [16].
multiple overlapping coalitions perform multiple tasks sequentially in Fig. 1d.
The remainder of this paper is organized as follows. In Section 2, the system model and problem statement are described. The proposed problem is formulated as an OCF game in Section 3. In Section 4, an overlapping coalition formation method based on preference gravity modified tabu search algorithm is designed. Numerical results and performance analyses are conducted in Section 5. Finally, Section 6 concludes this paper.
Notations: In this paper, the following notations are listed in Table 2.

SYSTEM MODEL AND PROBLEM FORMULATION
A network consisting of N multi-UAVs is considered. The UAVs are required to perform M tasks that are randomly distributed in the network. The set of UAVs and tasks is denoted as N ¼ f1; . . . ; n; . . . ; Ng and M ¼ f1; . . . ; m; . . . ; Mg, respectively. Different tasks have multiple and heterogeneous subtasks with various resource requirements. As depicted in Fig. 2, the UAVs form coalitions by allocating a part of their resources and perform the tasks cooperatively.

1) Types of sub-tasks
Assume there are Z types of sub-tasks. The set of sub-task types is T ¼ fT 1 ; . . . ; T z b ; . . . ; T zc ; . . . ; T Z g, where z b and z c represent the sub-task types that requires consumable and nonconsumable resources (specific definitions will be provided in the next paragraph), respectively. The types of sub-task are divided into two categories, one is the types of sub-task required consumable resources, e.g., fire rescue (see in Fig. 2 task 1) and wireless charging (see in Fig. 2 task 4); the other is required non-consumable resources, e.g., information collection (see in Fig. 2 task 2), and video surveillance (see in Fig. 2 task 3).
2) Types of resources The required amount of resources by task m is denoted by m and I ðzcÞ m represent the type and amount of consumable and non-consumable resources required to perform task m, respectively. It is assumed that the UAVs are heterogeneous in terms of the types and available amount of carried resources. We roughly categorize the resources into two types, i.e., consumable and non-consumable. Consumable resources will be reduced if they are consumed to complete tasks, while non-consumable resources will not. Let where d n;m is the distance between UAV n and task m; p z c is transmit power of z c th communication device; a is the ; T zc ; . . . ; T Z g Set of sub-task types z b Consumable sub-task type z c non-consumable sub-task type The number of sub-task type resources of task m required B n ¼ fb 1 n ; . . . ; b ðz b Þ n ; m ðzcÞ n ; . . . ; m Z n g The amount of sub-task type resources that UAV n carrying path loss factor; N 0 is the power spectral density of noise. Consumable resources will be reduced if they are used to complete tasks, while non-consumable resources will not.

Overlapping Coalition Model of UAVs Cooperatively Performing Tasks
Tasks cannot be performed well by a single UAV due to the limited resources. Therefore, the UAVs form coalitions by allocating a part of their resources to perform the task cooperatively. Note that one UAV may assign resources for different task coalitions, which contributes to the formation of an overlapping coalition structure. Let A m ¼ fA ð1Þ m ; . . . ; A ðnÞ m ; . . . ; A ðNÞ m g represent each UAV resource allocation vector at task m, where A ðnÞ m is the amount of resources that UAV n allocates to task m and can be expressed as where t ðz b Þ n;m is the amount of the z b th type consumable resource that UAV n allocates to task m; " ðz c Þ n;m is the amount of the z c th type non-consumable resource that UAV n allocates to task m. Furthermore, MemðA m Þ represents the coalition member set of UAVs that allocate resources to task m and can be expressed as

Utility Function of Task Completion
Due to the heterogeneity of tasks, i.e., various resource requirements and task priority, different tasks have different tolerance levels for the completion quality. Hence, a sigmoid function U m ðA m Þ is designed to measure the completion quality of task m [20], that is, the utility of task m completion is formulated as where C req is the expected completion quality of task m; b m is the priority of task m: the smaller b m is, the lower slope the sigmoid curve has, meaning that the requirement of this task for resources is not urgent; otherwise, the requirement is urgent. C m ðA m Þ is the actual completion quality of task m. The goal of resource allocation is to complete tasks timely and effectively. Hence, C m ðA m Þ contains the following three performance indicators: 1) the completion quality of a task, 2) the waiting time of a task, and 3) the energy consumption of UAVs, which is designed as where D is a constant to ensure that C m > 0; v 1 , v 2 , and v 3 are weight coefficients to evaluate the proportions of task completion degree, waiting time, and UAV energy consumption on network utility; rðA m Þ is the completion degree of task m; t ðwaitÞ m is the waiting time of task m; e ðnÞ m is the flight energy consumption of UAV n for performing task m, which is defined by the proportion of resources allocated to UAV n against all resources consumed at task m, and can be expressed as where E n is the total propulsion energy consumption of UAV n. The first performance indicator is designed so that the members of the coalition satisfy the resource requirements of the tasks as much as possible; the second and third indicators are designed so that the time and flight energy cost of formed coalitions to perform tasks are considered. Specifically, definitions for these performance indicators are as follows: 1) The completion quality of a task: in the process of UAVs resource allocation, the completion quality and quantity of sub-task types should be considered. The task completion degree that represents the proportion of actual resource allocation to resource demand of task m is investigated in [20]. When the total number of resource allocated by UAV coalition exceeds the demand of task, the task completion degree will reach 100%; otherwise, it is less than 100%. For task m, its average task completion degree rðA m Þ is defined as ) represents the actual amount of consumable (non-consumable) resource allocated to task m divided by the amount of consumable (non-consumable) resources required to perform task m, respectively; n;m is the total amount of the z b th sub-task type resource allocated by UAVs to task m; s ðzcÞ m ¼ P n2MemðAmÞ R ðzcÞ n;m is the sum transmission rate supported by the z c th type resource in executing task m; ðz c Þ m is the transmission rate of the z c th sub-task type required at task m; j Á j represents the size of the set. For example, task m requires two types of resources,the amount of which are represented as fl ð1Þ m ¼ 10; ð2Þ m ¼ 5g. The total amount of resources invested by the UAV coalition is ð1Þ m ¼ 8 and s ð2Þ m ¼ 3. Then the average completion degree of task m is 2) The waiting time of a task: as UAVs form coalitions to perform each task cooperatively, each UAV should arrive at the task destination timely. As shown in Fig. 3, the time span consumed by UAVs to perform tasks can be decomposed into flight duration t ðflyÞ n and hover duration t ðhoverÞ n . UAVs schedule the task execution sequence according to the priority of tasks [28]. In particular, we distinguish the priority of tasks. The priority of each task is different and denoted by b ¼ fb 1 ; . . . ; b m ; . . . ; b M g. For example, when a UAV is allocated with both task 1 and task 3, it should flight to perform task 1 first if the priority of task 1 is higher than that of task 3. According to this rule, the task execution sequence of UAV n is denoted by Task ðUAV Þ n ¼ ftask ð0Þ n ; . . . ; task ðiÞ n ; . . . ; task ðzÞ n g; task ðiÞ n 2 M, where z is the length of the task execution sequence for the UAV and task 0 n is the initial position of UAV n.
During the execution of tasks, the location and flight path of each UAV are different, leading to asynchronous arrival time at the task destination if UAVs fly at the same speed. Without loss of generality, the time of each UAV member arriving at the task is adjusted to be the same, which is realized in the following way. The flight time of UAV n from task ðiÀ1Þ n to task ðiÞ n is denoted as t ; s ðz c Þ m 6 ¼ 0; which means that the time of UAVs performing non-consumable sub-tasks depends on the largest sub-task execution duration. Therefore, the more complementary and overlapping UAV resources are, the shorter time task execution will take. Then, the total hover time of UAV n is defined as the sum of time that performing the tasks, which is After sorting by task priority, the execution sequence before task m is denoted by Task ðpointÞ m ¼ ftask ð1Þ m ; . . . ; task ðjÞ m ; . . . ; task ðJÞ m g with task ðjÞ m 2 M; task ðJÞ m ¼ m, where J is the length of the task execution sequence before the task. Then, the waiting time of UAVs in task m is defined as 3) The energy consumption of UAVs: the communication energy consumption is often ignored as the propulsion energy consumption is much larger than the communication energy consumption [29]. For a rotary-wing UAV that flying at a constant speed V , the propulsion power is defined as where P 0 and P 1 represent the blade profile power and induced power under the hover state, respectively; U tip and v 0 represent the rotor tip speed and mean rotor velocity under the hover state, respectively; h and s 0 represent the rotor solidity and disc area, respectively; f 0 and r represent the fuselage drag ratio and air density, respectively [29]. Hence, the total propulsion energy consumption of UAV n is the sum of flight energy consumption and hover energy consumption, and can be expressed as In addition, the consumable resources carried by UAVs are limited. The UAVs can only perform non-consumable tasks if the consumable resources are exhausted.
Due to the various types and scattered distribution of the tasks, the UAVs carrying different resources form multiple coalitions to perform tasks. A distributive coalition formation scheme is designed. Specifically, the UAV that detects a new task acts as a coalition leader for this task information transmission and coalition formation. To guarantee there is only one leader in a coalition, we use a token mechanism that the UAV with strong decision-making ability (supported by advanced computation and communication devices) is assigned a larger token number [30]. The UAV with the largest token number will be the coalition leader. The signaling for distributive coalition formation implementation is described and each stage is introduced in detail.
Stage 1 (Task detection): the UAV that detects a new task acts as a leader, which is responsible for the collection of task execution information and coalition formation.
Stage 2 (Coalition formation request): the leader UAV broadcasts information about the task to form a coalition. Stage 4 (Coalition formation process): UAVs then decide the amount of resources allocated to each task coalition based on proposed order during the formation process.
Stage 5 (Resource allocation result notification and final coalition formation): after forming a stable coalition structure, the leader UAV informs the selected UAVs about resource allocation results.

Problem Formulation
The proposed multi-UAV cooperative task execution can be regarded as a typical distributed multi-agent decision process, in which UAVs make decisions through information interaction and evaluate their behaviors according to the potential utility. Coincidentally, OCF game theory shares the same idea where the cooperative relationship among users is considered. In the OCF game, the players form coalitions by devoting a part of their resources and gain utility from the joined coalitions. Note that one player may allocate resources to different coalitions, which contributes to the formation of an overlapping coalition structure. OCF game is leveraged to analyze how UAVs perform the task through cooperation. The optimization objective is to maximize the entire network utility of task execution by forming an optimal resource allocation overlapping coalition structure SC ðÃÞ ¼ fA s.t. E n E ðthÞ n ; 8n 2 N ; if UAV n decides to allocate the non-consumable resources to task m; 0; otherwise: We take the energy consumption, UAV movement, and resources consumption constraints into account. In constraint (15), if the remaining flight energy of UAV n reaches its threshold E ðthÞ n , it must exit the task execution; constraint (16) guarantees UAVs start executing task m before deadline t ðdeadÞ m ; constraint (16) guarantees that the amount of resources UAV invests can not exceed its remaining; constraint (17) implies that the non-consumable resources are used in an all-in mode in the case that UAV n allocates the non-consumable resources to task m, otherwise, in a none-at-all mode. Due to the combinatorial characteristic of the coalition formation problem, obtaining the optimal coalition structure solution is clearly NP-hard. Thus, we design a relatively low-complexity OCF algorithm with which a close-to-optimal solution can be achieved.

OVERLAPPING COALITION FORMATION GAME FORMULATION
OCF game provides an outstanding tool for designing fair, robust and efficient coalition formation strategies. In OCF game, UAVs allocate resources and form overlapping coalition structure based on the preference order to perform tasks cooperatively.

Game Model
The proposed task collaboration based on OCF game is modeled as G ¼ fN ; U m ; SC; Xg, where N is the set of UAV players; U m is the utility function of task coalition m represented in (4); SC is the overlapping coalition structure; X ¼ fx 1 ; . . . ; x n ; . . . ; x N g is the UAVs resource decision vector that determines the allocation of task resources, where X n is denoted as X n ¼ ½A  (2); MemðA m Þ 2 N is the members of coalition m and is represented in (3). Due to the possibility that a UAV may allocate its resources to multiple task coalitions, our proposed model enables overlapping coalitions, which is different from the existing disjoined coalition game models.
In addition, each UAV obtains a share of the utility from the coalitions. The basic proportional fairness of the Shapley value is adopted to divide the utility among UAV members. Specifically, the utility of UAV n can be expressed as Under the utility divison rule in (19) shown on the top of the next page, one UAV is encouraged to allocate more resources to coalitions that will bring it a large utility. Definition 2 (Preference relation [25]). For any UAV player 8n 2 N and any coalition structure SC Q ¼ fA ðqÞ 1 ; . . . ; A ðqÞ m g and SC P ¼ fA ðpÞ 1 ; . . . ; A ðpÞ m g, a preference order 1 n is defined as a complete, reflexive, and transitive binary relation over the set of all coalitions that UAVs can possibly generate. For any UAV player n, SC Q being superior to SC P is denoted as SC Q 1 n SC P .
Consequently, SC Q 1 n SC P indicates that the UAV player 8n 2 N prefers to allocate its task resources that in the coalition structure SC Q rather than the structure SC P .
i [ fm ðz c Þ n g. Note that the two coalition structures after exchange operation have to satisfy the preference order. That is, the new coalition structure SC Q is preferred over the original coalition structure SC P . Hence, the convergence of coalition structure is decide by the preference order. Obviously, different types of preference orders may cause different coalition convergence structures, and then influence the entire network utility. Next, two major orders in the OCF game model [31] are introduced.
Definition 4 (Pareto order). For each UAV player 8n 2 N , and any two coalition structures SC P and SC Q that generated by exchange operation, the Pareto order is defined as Definition 5 (Selfish order). For each UAV player 8n 2 N , and any two coalition structures SC P and SC Q that generated by exchange operation, the selfish order is defined as Pareto and selfish orders are the most common preference relations in OCF game model [31], [32], [33]. The Pareto order is a public preference order. It assumes that all members are equal and require that the player make exchange operation without damaging all others' utilities. It ensures that all exchange operations can improve the sum utility of all coalitions. However, the solution is easy to fall into an inferior local optimum solution due to the strong restriction. Under the selfish order, the UAV only cares about the utility of itself while ignoring the utility of other coalitions and members. Hence, the selfish order can lead the UAV making decisions that "benefit oneself at the expense of others".
To avoid falling into an inferior local optimum solution, a bilateral mutual benefit transfer (BMBT) order is proposed to evaluate the preferences of UAV n of the two coalition structures.

Definition 6 (Bilateral mutual benefit transfer order).
For each UAV player 8n 2 N , and any two coalition structures SC P and SC Q that generated by exchange operation, the proposed BMBT order is represented in (18), where AðnÞ ¼ fA n 2 SC j A ðnÞ m 6 ¼ ;; m 2 Mg is the set of other coalitions that allocated resources by UAV n.
As depicted in Fig. 5, UAV n make an exchange operation based on the proposed BMBT order. According to the proposed BMBT order, 1 and 3 in (19) represent the differences in utility of the original and new coalitions, respectively. 2 and 4 in (19) represent the utility of other coalitions that allocated resources by UAV n before and after the exchange operation, respectively. Under proposed BMBT order, each UAV cares more about the cooperation with other UAVs in the same coalition. Specifically, due to changes in the resource allocation of one UAV, changes in the structures of several related coalitions may be caused. Each UAV is more likely to do the exchange operation, which provides more opportunities  : (19) to increase the sum of the utility of the above changed coalitions.

Analysis of the Stable Coalition Structure
In this subsection, the stability of the final coalition structure under the given preference order is proved by introducing the existence of a Nash equilibrium solution. The notion ocore proposed in reference [32] is suitable for the extension of the stable coalition structure of the OCF game model. To introduce the notion o-core, combining the proposed OCF game model, the following definition is given.
The definition of stable coalition structure solution is given. If no UAV can make an o-profitable exchange operation unilaterally that improve its utility under the preference order, the current coalition structure SC P is called ocore, or o-stable [31]. Cooperation with other UAVs in the same coalition is encouraged. It is needed to explore the stability of the coalition structure under the BMBT order again. Based on the property of Nash equilibrium solution in exact potential game [34], [35], the stable coalition structure in the OCF game model is deduced. According to the proposed BMBT order specified in (18), the utility function of UAV n in OCF game model is designed as Definition 8 (Exact potential game [36]). For a given potential function ', when a single UAV changes its resource allocation decision unilaterally, and the differences of the utility function and the potential function are the same, the game is taken as an exact potential game with a potential function '. That is, R n x n ; x Àn ð ÞÀR n x n ; x Àn ð Þ¼f x n ; x Àn ð ÞÀf x n ; x Àn ð Þ 8n 2 N: (24) Theorem 2. In the overlapping coalition formation game proposed in this paper, there is at least one stable coalition structure under the proposed BMBT order.
Proof. Refer to Appendix B, available in the online supplemental material.
t u

PROPOSE ALGORITHM FOR OVERLAPPING COALITION FORMATION
According to the OCF game model, the benefit and cost of performing tasks are calculated based on the coalition structure of resource allocation. A preference gravity-guided tabu search (PGG-TS) algorithm is proposed for distributed overlapping coalition formation. With the exchange operation based on the preference order, UAVs automatically form overlapping coalitions and improve the utility of network.

Whole-Process Task-Driven Resource Allocation With Overlapping Coalition Formation
We discuss the whole-process task-driven resource allocation algorithm based on the OCF game. Fig. 6 illustrates the OCF algorithm determination process. It consists of three main stages: task information collection, distributive coalition formation, and sequential task execution. The specific procedure is given in Algorithm 1.
1) Task information collection: the UAVs patrol the task area. When a new task is discovered, the UAV collects the task execution information (e.g resources required, and location) and informs the nearby UAVs. 2) Distributive coalition formation stage: all the UAVs perform the exchange operations based on the proposed PGG-TS algorithm. The specific procedure is given in Algorithm 2. Given the current coalition structure, an exchange operation is performed if (18) is satisfied until it reaches convergence by continuously readjusting the coalition structure of resource allocation. 3) Sequential task execution: the UAVs generate the task execution sequence based on stable coalition structure and then sequentially perform the tasks. To increase the utility of the task, the UAVs adjust their hover position for better communication quality. Moreover, when a new task is generated, a new coalition is formed to execute the new task. the UAVs decide whether to allocate resources to the new task based on its position and the amount of remaining resources. As illustrated in Fig. 4, in the proposed distributed coalition formation approach, when making the exchange operation of resources allocation, each UAV only needs to communicate with coalition leaders to judge if the proposed preference order is satisfied. If satisfied, the UAV performs the exchange operation; otherwise, it doesn't. Hence, repeated information interactions are avoided among UAV members.

Algorithm 1. Whole-process Task-driven Resource Allocation With Overlapping Coalition Formation
Input: N , M, L m ð0Þ, B m ð0Þ, t max , t cnt ; If In ¼ 1 then Input L m ðtÞ and B n ðtÞ to Algorithm 2 for optimal overlapping coalition structure SC ðÃÞ ðtÞ; The UAVs sequentially perform the tasks by SC ðÃÞ ðtÞ; For each UAV 8n 2 N do UAV n moves in steps within the coverage area of task m it performs; Calculate the utility of task that it performs at the current position; If the utility of task m increases then Keep moving until the utility of task m can't be improved by UAV's movement; End if End Update L n ðtÞ and B n ðtÞ according to the remaining resources of UAVs and the completion of tasks; t ¼ t þ t cnt ; If new task was detected by UAVs then In ¼ 1; End if End while.

A Preference Gravity-Guided Tabu Search Algorithm for Overlapping Coalition Formation
Existing works mostly adopted the unguided best response algorithm and other similar random algorithms for coalition formation [13], [20]. In CF game model, it only decides whether UAVs leave or join coalitions, which can guarantee the performance of the best response algorithm as such strategy space is small. However, the above algorithms have the disadvantages of slow convergence and are easy to fall into local optimization solutions, making them not applicable in our OCF game scenarios with large strategy space.

Algorithm 2. A Preference Gravity-Guided Tabu Search (PGG-TS) Algorithm for Distributed Overlapping Coalition Formation
Input: N , M, L m ðtÞ, B m ðtÞ, L tabu , K len , K max ; . . . ; SC ðkÞ g to empty; Calculate F ðzÞ n ð1Þ and P ðzÞ n ð1Þ by (26) and (27); To get the initial coalition structure SC ð1Þ , all initial resources carried by UAVs are allocated to the tasks by P ðzÞ n ð1Þ. Loop 8n 2 N UAV n randomly selects partial consumable resource d ðz b Þ n or non-consumable resource m ðzcÞ n to leave the current coalition; Calculate F ðzÞ n ðkÞ and P ðzÞ n ðkÞ by (26) and (27) Tabu search (TS) is a meta-heuristic algorithm for escaping the local optimal solution [37]. The most important idea of tabu search is to mark some searched local optimal solutions by establishing a tabu list. In the future exploration process, tabu search can avoid these local optimal solutions that have been searched. Thus, the tabu search is likely to explore a better solution. We propose a preference gravityguided tabu search (PGG-TS) algorithm for distributed overlapping coalition formation. Specifically, we establish a Fig. 6. Illustration of overlapping coalition formation algorithm determination process. tabu list that records the coalition structure after each exchange operation to avoid repeated operations. Furthermore, since the tabu search algorithm is sensitive to the search strategy [38], preference gravity is designed to guide the tabu search process, which accelerates the convergence process and improves the performance. 1) Tabu list establishment: a tabu list Tabu SC ¼ fSC ðkÀL tabu Þ ; . . . ; SC ðkÞ g in the kth iteration is established according to the resource allocation under historical coalition structures, where L tabu is the tabu length that represents the existence time of the coalition structure. Note that the UAVs cannot repeat the exchange operation of coalition structures that already exist in the tabu list.
2) Preference gravity calculation: to make appropriate resource allocation decisions, the concept of preference gravity is introduced based on the tasks and UAVs with remaining unallocated resources. The resource allocation vector of UAV n for task m in the kth iteration is defined as This vector can change based on the amount of resources that the UAVs allocate to perform tasks. The preference gravity can be regarded as the degree of preference between the remaining resources required by task m and remaining resources carried by UAV n. The preference gravity of zth type resource d ðzÞ n or m ðzÞ n of UAV n for each task is defined as F ðzÞ n ðkÞ ¼ ½f ðzÞ 1;n ðkÞ; . . . ; f ðzÞ m;n ðkÞ, where f ðzÞ m;n ðkÞ is the preference gravity for task m and can be expressed as As depicted in Fig. 7, if the resources carried by the UAVs better match the resource requirements of the task completion, the greater the preference gravity between them will be, indicating that the UAV is more willing to allocate resources to this task. For example, task m requires two types of resource, namely type-2 and 3 resources. UAV 1 exactly carries these two types of resource, while UAV 2 only carries one type of resource that task m needs, i.e., type-3 resource. Therefore, the preferred gravity between UAV 1 and task m is greater than that between UAV 2 and task m.
3) Search strategy: we define probability vector P ðzÞ n ðkÞ ¼ ½p where GðkÞ represents the tradeoff of exploration-exploitation. The update rule of GðkÞ is designed as where G max is the maximum value of GðkÞ. Smaller GðkÞ means that UAV n may choose a suboptimal preference gravity strategy to explore, whereas larger GðkÞ means that UAV n is prone to choose the optimal preference gravity strategy. As k increases, GðkÞ gradually increases from 0 to G max , which means that at first, UAVs are more willing to explore solutions that have not been tried, and then gradually tend to the solutions that bring a larger utility. 4) Coalition formation process: according to the selection probability established by preference gravity, UAVs make exchange operations specified in Definition 3. If the proposed preference order is satisfied, the UAVs make the exchange operation of resource allocation and then improve the total utility of the network; otherwise, the UAVs maintain the original coalition structure. The specific procedure is given in Algorithm 2.
Theorem 3. Under the proposed BMBT order, the stability of the final coalition structure SC ðÃÞ resulting from the overlapping coalition formation algorithm is guaranteed.
Proof. Given the amount of resources carried by UAVs and required by tasks, the total number of possible coalition structures is finite. Due to the setting of the tabu list, each resource exchange operation forms a coalition structure of resource allocation that has not appeared before with a larger network utility than the initial coalition structure. In Theorem 3.2, it is proved that the difference of the utility function of UAVs under BMBT order is the same as that of the potential function when any UAV unilaterally changes the resource allocation strategy. Therefore, the OCF game model under the proposed BMBT order is an EPG. The potential function is the total utility of UAV network, which is limited improvement property. Hence, our proposed PGG-TS algorithm can converge to stable coalition structure in OCF game model. Theorem 4.2 is proved. t u Fig. 7. The preference gravity between the task and UAV.

Complexity Analysis
The time complexity of coalition formation mainly depends on Algorithm 2. As shown in Table 3, the complexity of each step is calculated for the worst-case scenario. Detailed analyses are given as follows.

1) In
Step 1 (namely, coalition formation request and feedback), UAVs receive coalition formation requests from leader UAVs. Then, UAVs can respond to these requests based on their remaining resources. The time complexity of this step is denoted by OðC 1 Þ, where C 1 is a constant decided by the duration of request and feedback. 2) In Step 2 (namely, tabu list comparison), UAVs randomly select resources and make an exchange operation to obtain a new coalition structure, which is not in the tabu list tabu SC . The duration of one comparison is denoted by C 2 and the length of tabu list by L tabu , the complexity is calculated as OðC 2 L tabu Þ.

3) In
Step 3 (namely, coalition structure exchange operations), UAVs estimate the received utility over a period of time and decide to perform the exchange operation if proposed order is satisfied. In the worst scenario, each UAV attempts to make exchange operations with all other coalitions. So the complexity is decided by the scale of coalition structure, which is smaller than NM. The time complexity can be expressed as OðC 3 NMÞ, where C 3 is a constant decided by the duration of estimation and decision. 4) In Step 4 (namely, resource allocation result notification and final coalition formation), the leader UAVs inform the final selected UAVs with the resource allocation coalition structure SC ðÃÞ . The complexity can be expressed as OðC 4 Þ, where C 4 is a constant decided by the duration of signal propagation for result notification. To sum up, the time complexity of algorithm can be expressed as OðC 1 Þ þ OðC 2 L tabu Þ þ OðC 3 NMÞ þ OðC 4 Þ. Due to the cost of the coalition formation, UAVs will not form a very large coalition. Moreover, once the UAVs form a larger coalition, the number of exchange operation before converging to the stable coalition structure decreases since the feasible operating space reduces. Consequently, the complexity of exchange operations is affordable.
To sum up, the time complexity of algorithm can be expressed as OðC 1 Þ þ OðC 2 L tabu Þ þ OðC 3 NMÞ þ OðC 4 Þ. Due to the cost of the coalition formation, UAVs will not form a very large coalition. Moreover, once the UAVs form a larger coalition, the number of exchange operation before converging to the stable coalition structure decrease since the feasible operating space reduces. Consequently, the complexity of exchange operations is affordable.

PERFORMANCE EVALUATION
To verify the efficiency and effectiveness of our proposed sequential OCF scheme and PGG-TS algorithm, the convergence behavior and utility performance under proposed PGG-TS algorithm is compared with that of split-merge preferred OCF and non-overlapping CF algorithms. Then, the performance comparisons under different parameters of the UAV network are also demonstrated. Furthermore, the utility performance under proposed BMBT and other preference orders is demonstrated.

Simulation Setup and Example Results
We use the dataset of the reconnaissance and monitoring tasks in [13] to evaluate our proposed PGG-TS algorithm. Moreover, the flight and hover power consumption parameters of UAVs are set according to [39]. The setting of simulation parameters is presented in Table 4. Additionally, all curves are obtained based on averagely over 1000 independent trials.
The demonstration of the task execution based on OCF in multi-UAVs network is shown in Fig. 8, including 8 UAVs and 5 tasks in the current network. Without loss of generality, the initial position vectors of all UAVs and tasks are randomly and evenly generated in the current delimited area. Each UAV decides the resource allocation according to its remaining resources and task attributes. For example, as can be seen in Fig. 8, UAVs 2 and 4 form a coalition and allocate corresponding resources to perform task 3. When task 3 is completed, UAV 2 flies to the next task 3 and allocates corresponding resources to perform task 3.

Performance Evaluation
To demonstrate the advantages of our proposed PGG-TS algorithm, the benchmark algorithms including the splitmerge preferred OCF algorithm [40] and non-overlapping CF algorithm [13] are compared.

TABLE 4 Simulation Parameters
Parameter Value The amount of data I ðzcÞ m [14] 50 $ 100 M bit Transmit power p zc [14] 1 W Power spectral density of noise N 0 [14] À169 dBm/Hz Bandwidth m ðzcÞ n [14] 1 or 2 MHz Task necessary completion time t ðcomÞ m [14] 50 $ 120 s Path loss factor a [14] 2 The max flight speed of UAVs v ðmaxÞ n [40] 2 $ 14 m/s Fuselage drag ratio f 0 [40] 0:3 Air density r [40] 1:125 kg/m 3 Mean rotor velocity v 0 [40] 7:3 m/s Tip speed of the rotor U tip [40] 200 m/s Task completion degree weight coefficient v 1 0:1 $ 0:6 Waiting time weight coefficient v 2 0:01 $ 0:04 Energy consumption weight coefficient v 3 0:05 $ 0:35 Boltzmann coefficient G max 2 $ 10 1) Split-merge preferred overlapping coalition formation (OCF) algorithm [40]: all users make merge-split preferred operations to increase the utility. If no user can make a profitable merge-split preferred operation, the users form a stable overlapping coalition structure. 2) Non-overlapping coalition formation (CF) algorithm [13]: all UAVs constantly try swap operation to decide which coalition to join with the aim of increasing the utility. Finally, a stable non-overlapping task coalition structure is formed. Fig. 9 shows the average utility curves of tasks versus the number of UAVs. It can be observed that the average utility of tasks increases with the number of UAVs. Besides, due to the more flexible resource allocation coalition structure, the OCF algorithms outperform the non-overlapping CF algorithm. It is observed that the average utility of tasks increases by 12.5% and 38.5% compared with the splitmerge preferred OCF algorithm and non-overlapping CF algorithm, respectively. The reason is that our PGG-TS algorithm seeks a tradeoff between exploration-exploitation and guides UAVs to avoid falling into a local optimum overlapping coalition structure. Moreover, we use the exhaustive search to obtain the optimal overlapping coalition structure. By comparison, we observe that our proposed PGG-TS algorithm achieves a close-to-optimal solution. Fig. 10 shows the average number of iterations curves versus the number of UAVs. Obtaining the optimal coalition structure solution by exhaustive search is NP-hard, number of iterations increases exponentially with the number of UAVs. Furthermore, the average number of iterations based on proposed PGG-TS algorithm is decreased by 26.1% than split-merge preferred algorithm. Although the non-overlapping CF algorithm need less iterations compared to OCF algorithm, it can be observed that the utility is decreased by 35.7% compared to our proposed PGG-TS algorithm in Fig 9.  Fig. 11 shows the detailed cumulative density function (CDF) of the iterations, which are obtained by 1000 independent trials in each network scale. Compared to the splitmerge preferred algorithm that can converge in an average of 350 iterations, the proposed OCF scheme based on the PGG-TS algorithm can converge in an average of 200 iterations. Due to the smaller combination space than OCF    model, the non-overlapping CF scheme can converge in an average of 125 iterations.
As shown in Fig. 12, the convergence behavior of the proposed PGG-TS algorithm is evaluated. It can be observed that the convergence speed and utility under proposed PGG-TS algorithm outperform the split-merge preferred OCF algorithm. Therefore, UAVs can quickly make resource allocation decisions and form overlapping coalitions to perform tasks based on the proposed PGG-TS algorithm, which ensures real-time task allocation. Moreover, Fig. 12 shows that the network utility of the proposed OCF game scheme increases by 28.6% compared to the non-overlapping CF scheme, with only a few extra iterations.
Simulation results in Figs. 9, 10, 11 and 12 show that our distributed coalition formation based on PGG-TS algorithm can effectively reduce the number of iterations with approximately optimal solution. Therefore, the efficiency and effectiveness of our proposed algorithm are demonstrated. Fig. 13 shows the average utility curves of the tasks versus the number of tasks. The amount of resources required exceeds the carrying capacity of UAV resources with the number of tasks, resulting in the decrease of utility. The proposed PGG-TS algorithm increases the average utility of tasks by 11.9% and 42.8% compared with the split-merge preferred OCF algorithm and non-overlapping CF algorithm, respectively. Moreover, with the increase of the number of tasks, the average utility of the tasks under the proposed OCF model decreases more slowly than that under the CF model. The reason for this phenomenon is that the UAVs under the CF model can only join one coalition, which gradually reduces the cooperation among UAVs. However, under the proposed OCF model, UAVs can have more flexible space for resource allocation. Hence, their cooperation is not greatly reduced. Fig. 14 shows that the energy efficiency versus the number of UAVs. It is observed that the achieved energy efficiency of the proposed PGG-TS algorithm increases by 13.3% and 26.3% compared with the split-merge preferred OCF algorithm and non-overlapping CF algorithm, respectively. Furthermore, as the number of UAVs increases, the energy efficiency under the task completion degree-based OCF algorithm first increases and then decreases due to ignoring the energy cost. In Fig. 15, the average completion degree curves of tasks versus the number of UAVs is illustrated. The average completion degree of tasks increases with the number of UAVs. The reason is that more UAVs    gather more kinds of resources to better perform tasks. It is observed that the average completion degree of tasks based on proposed PGG-TS algorithm increases by 10.1% compared with that of the split-merge preferred algorithm.
In Fig. 16, it is shown that the energy efficiency curves of tasks versus the average flight speed of UAVs. We can see that the utility of UAVs first increases and then decreases. The reason is that high speed makes UAVs spend less time to complete tasks, resulting in increased utility. Then, the energy cost becomes critical with the increase of the average UAV flight speed, resulting in reduced utility. Fig. 17 shows the average utility curves of tasks under different preference orders versus the number of UAVs. It is observed that the average utility of tasks increases with the number of UAVs. Particularly, the utility under the proposed BMBT order increases by 25.1% and 34.3% compared with that under selfish and Pareto orders, respectively. The underlying reason is that the BMBT order cares more about the sum utility of related coalitions, instead of individual UAV's utility. Cooperation with other UAVs in the same coalition is encouraged. Specifically, due to changes in the resource allocation of one UAV, changes in the structures of several related coalitions may be caused. Each UAV is more likely to do the exchange operation, which provides more opportunities to increase the sum of the utility of the above changed coalitions. As a contrast, the UAV utility under the Pareto order is difficult to be improved due to the strong restriction, while the selfish order only cares about the individual utility and is easy to fall into an inferior local optimum solution. Hence, our proposed BMBT order achieves a better utility than the Pareto and selfish orders.

CONCLUSION
In this paper, the cooperative task assignment and resource allocation problems in task-driven heterogeneous UAV networks have been investigated. By adopting the proposed sequential overlapping coalition formation (OCF) game, each UAV can decide the amount of resources allocated to each task, while optimizing the tradeoff between the benefits and costs of task execution. Moreover, a bilateral mutual benefit transfer (BMBT) order has been proposed to maximize the utility of coalitions. We have proposed a relatively low-complexity preference gravity-guided tabu search (PGG-TS) algorithm for distributed overlapping coalition formation with which a close-to-optimal solution can be achieved. The simulation results showed that our proposed PGG-TS algorithm increases the average utility of tasks by 12.5% and 38.5% compared with the split-merge preferred OCF algorithm and non-overlapping CF algorithm, respectively. The utility of the proposed BMBT order increases by 25.1% and 34.3% compared with selfish and Pareto orders, respectively.  Zanqi Huang received the BS degree in communications and information systems from the College of Electronic and Information Engineering, Nanjing University of Aeronautics and Astronautics, Nanjing, China, in 2020. He is currently working toward the MA degree in communications and information systems in the College of Electronic and Information Engineering, Nanjing University of Aeronautics and Astronautics. His research interests include game theory and resource allocation in UAV communication networks. " For more information on this or any other computing topic, please visit our Digital Library at www.computer.org/csdl.