Multiagent UAV Routing: A Game Theory Analysis With Tight Price of Anarchy Bounds

We study the multiagent unmanned aerial vehicle (UAV) routing problem where a set of UAVs needs to collect information via surveillance of an area of operation. Each UAV is autonomous and does not rely on a reliable communication medium to coordinate with other UAVs. We formulate the problem as a game where UAVs are players and their strategies are the different routes they can take. Our model also incorporates the useful concept of information fusion. This results in a new variant of weighted congestion-type games. We show that the price of anarchy (PoA) of the game is at most 2, irrespective of the number of UAVs and their sensor capabilities. This also validates the empirical results of earlier works. Furthermore, we identify classes of games for the existence of a pure Nash equilibrium. To the best of our knowledge, these are the first such theoretical results in the related literature. Finally, we conduct experimental studies using randomly generated instances with several multiagent UAV routing policies. Our insights are that PoA increases with the congestion level when the same number of UAVs search a smaller area or more UAVs search the same area, and on an average, our proposed policies are less than 10% worse than the centralized optimal for the problem scenarios attempted. Note to Practitioners—UAVs are becoming increasingly popular for information collection tasks in defense and civilian applications alike. When the collection area is large, it is not unusual that a fleet of UAVs is deployed. Routing of a fleet can be performed in a centralized or decentralized manner. Decentralized routing might be the only possibility when centralized situational awareness is not possible due to bandwidth limitations and centralized optimal routes for each UAV in the fleet are too complex to compute. Autonomous solutions have several other advantages, let alone simplicity. For managers of UAV systems, our work provides the first theoretical characterization of how bad could decentralized routing be. Under various scenarios of information fusion, specifically weak and strong, and the attribution of information collected to each UAV of a team, we prove that the fleet will collect at least 50% of the best-centralized solution. Empirically, we show that, in fact, the performance of the fleet is much better and generally not worse than 10% of the best-centralized solution. Hopefully, our routing strategies provide valuable guidance to the practicing engineer or manager of a UAV fleet.


I. INTRODUCTION
Unmanned aerial vehicles (UAVs) are increasingly being used for intelligence, defense, and civilian information gathering and monitoring. This is particularly due to their utility in high endurance and perilous environments that are-as characterized by [1]-dull, dirty, and dangerous. A popular application of UAVs is information collection via surveillance of an area of operations. Often, a fleet of UAVs is dispatched for large geographical coverage and multiple intelligence, surveillance, and reconnaissance (ISR) missions where the goal is to maximize the amount of information collected, and this results in the problem of routing and coordination.
This problem can be formulated into two different ways depending on the application and environment. In one formulation, all UAVs coordinate and cooperate either directly or through an omniscient planner or both. It relies on the existence of a reliable communication medium between all agents [UAVs, planner (if present), etc.]. In this case, a centralized and cooperative solution is desired which maximizes the system throughput, and it essentially becomes an optimization problem which has been well studied but remains computationally challenging (see Section I-B).
In another formulation, agents are autonomous, i.e., they need to decide their own route themselves. However, if multiple agents obtain the same information then they need to share the payoff received in an appropriate manner. Full autonomy is the general trend in the UAV research due to its nondependence on a reliable communication medium, which often can have several issues as discussed in [2], also for missions that require radio silence, jamming, limited communication radius, or due to different competitive units participating with different subobjectives, or due to competitive self-interested agents who want to maximize their own payoff (i.e., the amount of information collected), among others.
While the former problem has the optimal outcome, it is unclear as to how much efficiency is lost due to duplication of effort in the latter. This paper provides an answer to this question. We formulate the problem by dividing the region of surveillance into discrete cells each having an associated information value, which we treat as an abstract entity allowing flexibility for what it represents; possibly a prior probability of a target being present, which could be obtained using a prior surveillance activity (say using a satellite). Such discretization is common in models for UAV path planning such as [3]. Our basic information model has some similarities with the earlier studied models, e.g., [4], [5], with some interpretational differences with various parameters (see Section I-B for details). In addition, our model incorporates a useful concept of information fusion, which can be defined as "the combination of multiple sources to obtain improved information (cheaper, greater quality, or greater relevance)" [6] and has been used in robotics and military applications [7]. This makes the information collection problem versatile in terms of applicability, especially when the fleet is heterogeneous in sensing capabilities.
Despite the problem being a cooperative scenario on a broad level, the autonomy of the agents and the lack of communication with each other (due to various aforementioned reasons) make it plausible to assume that they can obtain the information about the current positions of other players but cannot coordinate with each other to come up with a cooperative solution. Hence, our model studies what happens if they simply follow a selfish behavior maximizing their own payoffs. A natural framework to model such a situation is a noncooperative game where UAVs are the players and their strategies are the different routes that they can take. The stable outcomes of such a game are given by the Nash equilibria (NE), where no player gains by a unilateral deviation. Furthermore, their inefficiency is measured by the notions of price of anarchy (PoA) and price of stability (PoS), where PoA (PoS) is defined as the ratio of the value of optimal output to the value of the worst (best) NE outcome. We provide results on the existence of NE and tight bounds on the PoA and PoS, summarized in Table I. To the best of our knowledge, these are the first such theoretical results in the related literature. We note that all our results are applicable to a general setting, where the search space is given by a directed graph whose nodes represent the regions and the edges represent the connectivity among regions.

A. Our Contributions
In the basic game, each UAV i has a sensor effectiveness of ρ i ∈ (0, 1]. If i visits a region, having information value v, then it collects ρ i v amount and the remaining (1 − ρ i )v is left at this region. If two UAVs, i and j , visit a region together, then they collect a total of ρ i v + ρ j (1 − ρ i )v = (1 − (1 − ρ i )(1 − ρ j ))v and split it according to their ρ values, i.e., i 's share is a ρ i /(ρ i + ρ j ) fraction of the total. In case of more than two UAVs visiting a region together, the total value function, which captures the total amount of information collected, and the individual shares are defined similarly. Note that it is independent of the order in which the UAVs are considered.
The UAV game turns out to be a novel variant of the class of well-known weighted congestion-type games, where ρ i can be treated as the weight of player i . The critical difference is that the total value function is not a function of the total weight of the players. To the best of our knowledge, no result is known for the weighted congestion-type games with the total value function we consider in this paper. Hence, in addition to the UAV application, our results may be of independent interest.
We show that a single-step p-player ( p ≥ 2) game always has a pure NE (PNE) by showing that it admits the finite improvement property. The existence of a pure equilibrium is particularly important for the UAV game, for its practical use-Nash's theorem [8] only implies the existence of a mixed equilibrium where players randomize their pure strategies for a stable outcome. We show that the PoA of the UAV game is at most 2 using the framework of (λ, μ)-smoothness [9]. Furthermore, in case of a homogeneous fleet, i.e., when each UAV has the same ρ, we show that PoA and PoS are at most 2 − 1/ p and they are tight. These bounds validate previous empirical results on the basic model, e.g., [4], in an even more general setting.
We also study the role of information fusion in the UAV game. In many situations, UAVs together can obtain a more refined information of a region by visiting it simultaneously (via fusion gain) than by visiting singly, in which case, they are awarded a greater payoff for the improved quality. We incorporate this using the fusion parameters γ j , 1 ≤ j ≤ p as follows: when j UAVs visit a region simultaneously, then their resulting payoffs are multiplied by γ j . By definition, we have 1 = γ 1 ≤ . . . ≤ γ p . We consider two special cases of mild and strong fusions where the value function is monotone nonincreasing and monotone nondecreasing, respectively (observe that it is nonincreasing in case of no fusion). For these special cases, we show that a PNE always exists.
In a multiple-step multiplayer game, each UAV needs to decide a walk to visit multiple regions, and its payoff is the sum of partial payoffs it gets by visiting each cell on its walk. We consider two different games, temporal and nontemporal, which differ in the payoff formulations, each being useful in different situations. In the former, the payoff of visiting a region is immediately awarded, and in the latter, it is awarded at the end with no regard to when the region was visited. For both formulations, we show that PNE may not exist. Furthermore, we show that PoA is at most 2, irrespective of the number of UAVs, their sensor capabilities, and the length of the walk. Furthermore, these are strong PoA bounds as they apply to a more general solution concept called (coarse) correlated equilibrium of the game.

B. Related Work
The problem of information collection has been widely studied in the search theory literature. A classical problem here is to maximize the probability of detecting a hidden target, e.g., as in [10]- [16]. On the other hand, Ortiz-Pena et al. [5] associate a potential information gain with each subregion based on an entropy-based function and aims to maximize the total gain. Our basic game model is similar to this and that of [4] in terms of discretization of the search space, time steps, and payoff formulations. Also, our representation of the surveillance region as a graph is similar to that of, say, [17] and [18]; however, we use the reasonable simplification of unweighted edges and the useful generalization of directed edges.
A large body of the previous work on the routing problem, e.g., [19]- [21], assumes a centralized control and full communication among the UAVs and the central controller. In situations where communication is feasible, challenges arise concerning key communication facets of energy consumption, power transmission, and transmission data rate parameters. Mozaffari et al. [22] investigate optimal UAV deployment with respect to wireless coverage, whereas Sikeridis et al. [23] present a holistic view of challenges in efficient resource management and optimal communication establishment in the deployment of UAVs. Several other works focusing on these issues, those in the context of UAVs and others, have also used game-theoretic models, for the corresponding resource allocation problems in minimizing energy or power consumption [24]- [27]. However, as mentioned earlier, in many situations such as when conducting UAV surveillance in sensitive regions, e.g., [28], the infeasibility of communication is a critical constraint, consequently, full autonomy is desirable. The benefits of autonomous UAVs for various domains have been highlighted, such as in [29] for persistent ISR missions.
A game-theoretic analysis is a perfect fit to tackle this problem under these circumstances. Game-theoretic models have been deployed in numerous other routing problems in transportation and networking applications such as [30]- [33]. The class of games, we formulate in this paper, is closely related to the class of congestion games or resource-selection games, and numerous other variants, defined, and studied in-most notably [34]- [40]. Our class of games has some critical differences with these well-studied classes of games, in terms of cost-sharing protocols and player weights, and hence, the results established for them do not directly apply in our case, making our theoretical results on the existence of pure equilibria and bounds on PoA and PoS, interesting and nontrivial.

C. Organization of this Paper
In Section II, we present the UAV game model and establish its relationship to congestion-type games. Section III contains our results for the multiplayer single-step games. The existence of a PNE is shown in Section III-A. Section III-C presents the PoA and PoS results for the case of a homogeneous fleet. Section IV presents our results for the multiplayer multistep games. The results for multistep games with temporal aspects and without temporal aspects are presented in Sections IV-C and IV-D, respectively. In Section V, we perform numerical studies and compare the results for several routing strategies by simulating plausible scenarios with randomly generated game parameters.

A. Preliminaries
Consider a game with n players. Let S i be the strategy set for player i , and S = × i S i be the set of joint strategy profiles. It is known that the NE always exists [8], however, a PNE may not exist, defined as follows.
Definition 1: A joint strategy profile s * ∈ S is a PNE if no unilateral deviation in strategy by any single player is profitable for that player, that is, where π i (s) is the payoff function of player i , and s −i denotes the strategies of all players except i . The social welfare of an outcome is defined as the sum of the individual payoffs of the players. PoS and the PoA are the two well-known metrics used in economics and game theory, to quantify the inefficiency of the equilibria-that is, how bad the social welfare at an equilibrium is as compared to the optimum social welfare achievable. In the following definitions, the best and the worst equilibria refer to those which give the maximum and the minimum social welfare among all equilibria.
Definition 2: The PoS is defined as the ratio of the optimal social welfare to the welfare for the best NE. The PoA is defined as the ratio of the optimal social welfare to the welfare for the worst NE.
The smoothness of these games is defined as follows. Definition 3 (λ, μ)-smoothness [41]: A payoffmaximization game-one where each player strives to The UAV game we model is closely related to the congestion-type games defined as follows.
Definition 4: An arbitrary congestion-type payoffmaximization game consists of a resource set C and a player set P = {1, . . . , p}, where each resource c ∈ C has a joint value function M c : 2 P → R + defined on the subsets of P, which describes the worth of the resource as a function of the set of players sharing it. Each i ∈ P has a strategy set S i ⊆ 2 C and a per-resource payoff function U i : R + → R + -if Q ⊆ P is the set of players using c ∈ C (where i ∈ Q), the value of c is M c (Q) and consequently, i 's payoff for using c is U i (M c (Q)). The net payoff for i ∈ P on playing strategy s i ∈ S i is the sum of its payoffs for using resources c ∈ s i .
Thus, these games can be defined as a tuple (P, C, (S i ) i∈P , (M c ) c∈C , (U i ) i∈P ) with each entity as defined earlier. Arbitrary congestion-type cost-minimization games are similarly defined as well. However, this paper only deals with the payoff-maximization games, and for convenience, we simply call these as the arbitrary congestion-type games. When each strategy has a single resource, i.e., |s i | = 1, ∀s i ∈ S i , ∀i ∈ P, then we call them singleton congestion-type games.
Our game model extensively uses the multisets, a generalization of the concept of sets, with the distinction that a multiset can have multiple instances of any of its elements.
Definition 5: A multiset can be represented by a two-tuple (X, m), where X is the set of distinct elements in the multiset, and m is the multiplicity function, such that, for each x ∈ X, m(x) is the number of instances of x in the multiset. By convention, we have x / ∈ X ⇔ m(x) = 0. Thus, X can be viewed as the support set of the function m. Hence, we often do not explicitly give the ground set X, as just the m can be used to precisely capture the containment of an element in the multiset. For a multiset A, we denote its multiplicity function as m A . We say, "A is a multiset with support in X," to mean that the support of m A , is a subset of X. Finally, some standard operations on multisets are defined as follows: containment as

B. Model Description
We model the UAV routing problem as a game between the UAVs. Let P = {1, . . . , p} denote a finite set of p players, each corresponding to a UAV. The geographical region of surveillance is discretized into a finite set C of smaller subregions referred to as the cells. The information collection environment is then represented as a directed graph, where the cells are the vertices, and the directed edges of the graph capture the connectivity between these cells. The time is also assumed to be discretized into time steps. The division of cells and their connectivity is assumed to be such that moving along any edge and surveilling the subsequent cell, altogether takes an equal time-of 1 time step. The number of time steps for which the game lasts is denoted by l. Consequently, the goal for each player is to move in this network for 1 time step, while capturing the information from the cells visited along the route, with the goal of maximizing this information captured. Thus, the set of strategies for player i , denoted by S i is a set of walks of length l starting from player i 's initial cell. The set of "joint strategy profiles," or simply "outcomes," is denoted by S = × i∈P S i . Each cell has an associated information value denoted by a function v : C → R + . Thus, v(c) denotes the information initially available in cell c.
The UAVs have sensors through which they collect the available information-better the quality of sensors, greater is the fraction of information they can collect from what is available. Consequently, each player i ∈ P is assigned a sensor effectiveness parameter denoted by ρ i ∈ (0, 1], which determines how much information the player can collect from what is available in the cell it visits. Finally, the payoff of i , which is simply the collected information, depends on the outcome, and is denoted by π i : S → R + .

III. SINGLE-STEP GAMES
In this section, we study single-step games where the number of time steps is l = 1. These games have some nice properties that the general multistep games do not. They have a simple structure which allows us to incorporate an important facet-that of information fusion. The strategies of a player are simply the cells adjacent to its initial position. We define the payoffs as follows: as aforementioned, the parameter ρ i denotes the fraction of information a player can collect from what is available. Hence, player i on visiting cell c alone, gets ρ i v(c), leaving (1 − ρ i )v(c) amount of information in the cell. In other words, the information value of a cell depletes by a factor of (1 − ρ i ) after i 's visit. Consequently, if a set of players Q ⊆ P simultaneously visits c ∈ C, the information available depletes by a factor of i∈Q (1 − ρ i )-accordingly, the total information collected from cell c by players in Q is 1 − i∈Q (1 − ρ i ) v(c) which we define the aggregate payoff of the players to be. Furthermore, each player i ∈ Q is said to get as an individual payoff, a share of the aggregate payoff that is proportional to its ρ i . Thus, this equals Next, we extend the model to incorporate information fusion. Information Fusion: In case of fusion, a combination of multiple sources of information can be utilized for a greater quality of information. Consequently, UAVs can obtain a more refined information from a cell by visiting it simultaneously (via fusion gain) than by visiting singly, in which case, they are awarded a greater payoff for the improved quality. We model this by introducing fusion parameters γ 1 , . . . , γ p , where γ j captures the enhancement factor in the information quality as well as in the resultant payoffs when any j players visit a cell simultaneously. We assume that these parameters are specified a priori as input to the problem depending on the complementarity and similarity of the UAVs functionalities. By definition, we have γ 1 = 1 since there is no information fusion with just one UAV, and γ 1 ≤ . . . ≤ γ p , since more UAVs participating in the information fusion should not decrease the fusion gain factor. The case of no information fusion is simply when γ 1 = . . . = γ p = 1, and the payoffs are given by the expression obtained in (2). As the payoffs get improved by the appropriate fusion gain factors, the payoff of i ∈ P on visiting c ∈ C simultaneously with a set of players Q ⊆ P, equals Note that the payoff function in (3) can be written in a general Consequently, the single-step UAV game can be represented as a tuple with all the entities defined as earlier and, thus, is a singleton congestion-type game. We now characterize a class of singleton congestion-type games which this game lies in and show an important result regarding the existence of PNE for this class of games.
c∈C are either all monotonically nondecreasing, or all monotonically nonincreasing. 1 The first condition simply says that if the value of a resource increases, a player's utility for using it should not decrease. The second condition says that the effect of congestion in a resource on its value is always in the same direction, for all the resources-either adding more players to any resource never increases its value, or it never decreases. We need the following lemma for the containment of single-step UAV game in the class of well-behaved singleton congestion-type games.
Lemma 1: Let 0 < y ≤ 1, and let X be a set of n(≥ 0) numbers such that ∀x ∈ X, 0 < x ≤ 1. Then Next, consider the second inequality Thus, proving (4) proves both the inequalities. Now, by the inequality of arithmetic and geometric means, we have (n − S)/n ≥ P 1/n and hence Also, using the binomial theorem Hence, combining (5) and (6), Thus, this proves (4) as required. Lemma 2: Single-step games with no fusion are well behaved.
Proof: Since U i (x) = ρ i x for each player i and, thus, is monotonically increasing, it satisfies the first condition for being well behaved. Now, when there is no fusion, Then, using the second inequality from Lemma 1, we have whenever Q ⊂ Q , which proves the claim. As shown in Lemma 2, when there is no fusion, the value function for each cell is monotonically nonincreasing. The fusion parameters γ 1 , γ 2 , . . . , γ p are nondecreasing and, thus, need not preserve the monotonicity of the value functions. We next consider two interesting special cases.
Definition 7: Given the players and the sensor effectiveness parameters, we say that information fusion is mild if the fusion parameters are gradually increasing so that all the value functions are monotonically nonincreasing. On the other hand, we say that it is strong if the fusion parameters are so rapidly increasing that all the value functions become monotonically nondecreasing.
By definition, single-step UAV games in both the abovementioned cases are well-behaved singleton congestion-type games.

A. Existence of a Pure Nash Equilibrium
We prove the existence of a PNE using the finite improvement property [42] defined as follows.
Definition 8: Finite Improvement Property (FIP): A sequence of strategy-tuples in which each tuple differs from the preceding one in one coordinate (such a sequence is called a path), and the unique deviator in each step strictly increases its payoff (an improvement path), is finite. Clearly, any maximal improvement path is terminated by an equilibrium.
Theorem 1: Every well-behaved singleton congestion-type game admits the FIP and, consequently, has a PNE.
Proof: We extend the argument of [35], which is used there for proving the FIP for symmetric congestion games. Suppose, for a contradiction, that there is an infinite improvement path. Since there are only finitely many joint strategies, there is an improvement cycle, say, of size k, given by σ 1 , σ 2 , . . . , σ k , σ 1 , where each σ j ∈ S is the outcome in the j th step. Furthermore, let Q j (c) denote the set of players going to cell c in the j th step of the improvement cycle. Let , those cells which are not occupied by the same set of players throughout the whole improvement cycle. For a well-behaved game, we have that U i is monotonically nonincreasing for each i ∈ P, and, we have two possibilities for the value functions (M c ) c∈C . First, we prove the result for the case where (M c ) c∈C are all monotonically nonincreasing.
Without loss of generality, suppose the improvement cycle and the cells are enumerated such that . Thus, the unique deviator between σ j and σ j +1 , without loss of generality, say player 1, must be changing its strategy to c 1 from some other cell c i , say. Thus, c i ∈ C # , and further, for this deviation to be an improvement for player 1, it must be that ). This contradicts the assumption that min 1≤ j ≤k,c∈C # M c (Q j (c)) = M c 1 (Q k (c 1 )), and hence, there cannot exist an improvement cycle.
The case where (M c ) c∈C are all monotonically nondecreasing can be shown similarly.
Since the single-step game in the cases of mild or strong fusion is well behaved, it follows the following corollary.
Corollary 1: The single-step game in the cases of mild or strong information fusion has a PNE. Next, we discuss the time complexity of NE computation.

B. Equilibrium Computational Complexity
Computing a NE in a two-player game is, in general, polynomial parity arguments on directed graphs (PPAD) complete [43]. For the single-step UAV game, however, the existence of FIP trivially implies that the problem of computing a PNE is in the class polynomial local search (PLS) [44]-as the problem can be reduced to finding a sink in a directed acyclic graph formed over the outcomes (as vertices) with the directed edges capturing the unilateral improvement deviations. This also gives a finite time algorithm for computing a PNE in the single-step UAV game. However, for the case of strong information fusion, we give an efficient algorithm based on a greedy strategy, whereas it remains to be seen whether the problem can be computed efficiently for the mild information fusion case, or, is PLS-complete-like it is for the closely related class of congestion games.
Efficient Algorithm for Strong Information Fusion: In this case, the value function for each cell is monotonically nondecreasing, i.e., more the number of players in a cell, larger the value of the cell and, in turn, the individual payoff for each player there as well. Let Q(c) ⊆ P denote the set of players which can visit a cell c in one step as a possible strategy, i.e., The algorithm consists of a number of iterations. Starting with the set of all players and the set of all cells, in each iteration, some players are assigned a particular cell as their strategy to play and the set of remaining players and remaining cells are carried forward to the next iteration.
Terminate if no players remaining 3: if score > max Score then 7: max Score ← score, end for c max i is computed. 10: Fix cell c max i as strategy for p 12: end for 13: where P i is the set of remaining players in that iteration, and C i is the set of remaining cells. Then, we choose the cell c max i for which the value thus computed is maximum and assign this cell as the strategy for all the players in Q(c max i )∩P i . Subsequently, we update the set of remaining } and move on to the next iteration.
Theorem 2: Algorithm 1 computes a PNE of the single-step game with strong information fusion in O(|C| · |P|) time.
Proof: Without loss of generality, let the cells be enumerated such that ∀i, c max Suppose player 1 is assigned the cell c i by the algorithm. Hence, 1 / ∈ Q(c j ), ∀ j < i , since otherwise, it would have been assigned a cell before the i th iteration. Thus, the only cells player 1 could possibly deviate to are {c j } j >i . The payoff of player 1 before deviation, by playing , whenever player 1 can access c j . Hence, by the monotonically nondecreasing behavior of each M c , player 1's payoff after deviation to c j , i.e., , which is at most its payoff before deviation. Hence, it has no incentive to switch, and the same argument applies to each player. Hence, the algorithm does produce a NE. It is easy to check that the algorithm takes time O(|C| · |P|).

C. PoS and PoA Bounds
Since single-step games are special cases of multistep games which we analyze in Section IV, the upper bound of 2 on the PoA and PoS proved there applies here as well and can be shown to be tight. In this section, we establish a stronger bound for the special case of homogeneous fleet, i.e., all players are identical, and there is either no fusion or mild fusion.
When all UAVs have same ρ, the payoff of a player simply depends on the number of players it shares a cell with and not the actual subset. This makes it an unweighted singleton congestion game. While results on the PoA bound for these games have been shown for various classes of payoff (or cost) functions, such as affine, polynomial, etc., to the best of our knowledge, the payoff function of the UAV game herein does not fall in any of the previously studied classes. Kleinberg and Oren [40] study a class of games called the Project Game. The scenario therein with identical players very closely resembles the setting studied here and for no information fusion, the following result can be derived from the corresponding results for the Project Game. However, our tight bounds shown below are for the more general case of mild information fusion which does not follow from [40]. The main result is in the following.
Theorem 3: The PoS and PoA, in unweighted singleton congestion games with either no fusion or mild fusion, are at most 2 − 1/ p. Furthermore, these bounds are tight.
Proof: We denote the individual payoff of a player when n players share a cell c, by v n (c), given by γ n v(c)1 − (1 − ρ) n /n. Let σ e be an equilibrium and let σ m be a joint strategy which provides the maximum social welfare. Suppose, starting with σ e , σ m is achieved by a series of deviations, where each deviation refers to a player switching from a cell c i to a cell c j . Since the players are identical, only the cells involved in a deviation matter, and not the player who deviates. We represent this as a deviation graph G, where each cell is a vertex and a deviation from one cell to another is represented as a directed edge. Note that since only the number of players in a cell matters in computing any payoffs, any path in the graph of length more than 1, say between nodes u and v, can be replaced by a single edge (u, v), since both equivalently result in the number of players at u decreasing by 1, the number of players at v increasing by 1, and other cells on the path being unaffected. Thus, G can be reduced to, say G * , that does not have a cycle, nor a path of length more than 1. Thus, G * only has sources, sinks, and isolated vertices. Fig. 1 illustrates this with an example. Vertices such as c 3 with a larger in-degree than out-degree in G become sinks in G * . Similarly, vertices like c 1 having a larger out-degree in G become sources in G * , and the remaining ones like c 7 where the in-and out-degrees are equal in G, become isolated in G * . Now, consider the group of players who are in a cell c at equilibrium. We consider the following three cases.
1) c is an isolated vertex in G * : The payoff of every player here remains the same.
2) c is a sink in G * : There are at least as many players in c at σ m , as there were at the equilibrium σ e . Hence, the payoff of these players is bounded above by their payoff at the equilibrium. to v x−y (c). Hence, if all the players deviate from c, i.e., y = x, the total welfare of this group of x players cannot increase. On the other hand, if y < x, the total social welfare for this group of players can increase by a factor of at most It can be shown that this expression is a monotonically increasing function of ρ for ρ ∈ (0, 1] (by showing its derivative with respect to ρ to be strictly positive for ρ ∈ (0, 1] when y < x). Hence, its maximum value is when ρ = 1, which comes out to γ x−y /γ x + y/x. Using y < x ≤ p, we have γ x−y /γ x ≤ 1, and y/x ≤ p − 1/ p, giving a bound of 2 − 1/ p on the welfare gain ratio (which can occur at y = p − 1, x = p, that is, when all players visit the same cell at equilibrium, and only one of them visits the cell to obtain the maximum social welfare). Thus, for a group of players which are in a particular cell at equilibrium, the sum of their payoffs either remains the same, decreases, or increases by a factor of at most 2 −1/ p as analyzed for the three cases mentioned above. Hence, the total social welfare of all the players, which is the sum of welfare of all such groups, can increase by a factor of at most as much as any of the individual groups, which is nothing but 2 − 1/ p, giving us the required bound. (Note that the analysis mentioned above holds for any equilibrium, and thus, the worst equilibrium, in particular, giving the bound on PoA.) Furthermore, this bound can be shown to be tight not only for PoA but also PoS, with the following example. Let there be p players with ρ = 1 for each player, and no information fusion. Let there be p cells with every cell being a valid strategy for every player. Let the information available in various cells be as follows: v(c 1 ) = p, v(c) = 1 − ∀c = c 1 ; > 0. Clearly, c 1 is a dominant strategy for every player, giving a unique equilibrium (c 1 , . . . , c 1 ). The total welfare here is p. However, it is maximum for the joint strategy  (c 1 , c 2 , . . . , c p

IV. MULTISTEP GAMES
In this section, we extend our analysis from single step to multiple steps. Recall the notation from Table II. The payoff of i depends on the outcome and is denoted by π i : S → R + . It is the sum of the partial payoffs i gets by visiting each cell on its walk. With a slight abuse of notation, we let π i (s, c) denote i 's payoff for visiting c ∈ C, when the outcome is s ∈ S, so that π i (s) = c∈C π i (s, c). Naturally, π i (s, c) is zero if i does not visit c at all when playing s i . However, when it does visit the cell (possibly more than once), the value π i (s, c) can be defined in two different ways depending on the logistics of the real-world scenario, giving rise to two different games-temporal and nontemporal.

A. With Temporal Aspect
In this case, a player gets an instant payoff after visiting a cell (in a manner described below), and these payoffs get accumulated constituting its net payoff. Any player i , on visiting a cell c, gets a payoff that is ρ i fraction of the value left in c at the time of its visit, leaving behind (1 − ρ i ) fraction of that value. Thus, if a sequence of k players say (x 1 , x 2 , . . . , x k ) visit c one after the other, then the i th visitor x i gets a payoff of ρ x i j <i (1 − ρ x j ) v(c) corresponding to that visit; if the same player is also the j th visitor for some j = i , it will get a payoff for each such visit defined similarly. The combined payoff of all these players, from visiting c is ⎛ Note that this combined payoff is independent of the order of the players. Thereby, if these players visit c, in the same time step, then we define their aggregate payoff as above, and further, the payoff of x i as the share of this combined payoff is proportional to ρ x i , that is, Thus, a player's payoff from visiting a cell depends on which players visit before it and which players visit simultaneously.

B. Without Temporal Aspect
In this case, the payoff from visiting a cell is determined at the end of the game, regardless of the order in which the players visi t the cell. Since the order is immaterial, we can represent the visitors of a cell c as a multiset, say P , having support in P and an associated multiplicity function denoted by m P (·). In case of no ambiguity, we drop the subscript and denote it as simply m(·). The payoff of a visitor from a single visit is precisely as in (8), and thus, with possibly multiple visits, the payoff of i ∈ P is given by Next, we establish results for both these games on existence of pure equilibria, PoA bounds, etc.

C. Multistep Games With Temporal Aspect
In this section, we analyze the game with a temporal aspect. As discussed earlier, the payoff of a player from visiting a cell is not only merely dependent on which players visit the cell but also on the order in which they visit the cell. The combined payoff, however, when a sequence of players visit a cell c (some of them possibly simultaneously), does not depend on their order, and can be easily computed as in (8). Let A, B be multisets with support in P. For a cell c, let π A B (c) denote the combined payoff which the visitors in B would obtain by visiting cell c (as many times as the respective multiplicities in B) when preceded by all (and only) the visitors as represented by A. Note that the multiset representation is sufficient for this to be well defined, since the order of visitors in A among themselves, and similarly of those in B among themselves, does not matter when computing the said combined payoff. Indeed, the exact expression can be easily obtained to be Here, the entity j ∈ A (1 − ρ j ) m A ( j ) denotes the fraction of v(c) left in c after visitors in A have visited, and the fraction of it collected by B is computed similarly.

Lemma 3: Let A, A , B, B , and D be multisets with support in P such that A ⊆ A and B ⊆ B. Then, (1) π
Rearranging gives the required result.
Next, suppose c is a cell, and A is a multiset with support in P. For each player j , let S j ⊆ S j denote the subset of strategies in which j visits c exactly m A ( j ) times. Then, S = × j S j is the set of outcomes for which the multiset A precisely captures which players visit cell c and how often. In case of such an outcome, we refer to A as the visitor set for c. Also, for any multiset A and a player i ∈ A, let A| i denote the multiset m A (i )⊗{i }, i.e., A| i only contains i -with the same multiplicity as A-and let A| −i denote the multiset A \ A| i . The following lemma shows an important result.
Lemma 4: Let A be any multiset with support in P, i ∈ A, c be any cell, and S ⊆ S be the set of outcomes for which A is the visitor set for c. Then, ∀s ∈ S , π i (s, c) ≥ π Proof: Let s be an outcome with the visitor set for c being A. Suppose player i visits c m times when the visitor set is A. The outcome s can be naturally associated with two well-defined sequences X 1 , . . . , X m and Y 1 , . . . , Y m as follows. For each j , X j denotes the set of players visiting c in the same time step as the j th visit of i , and Y j denotes the multiset of all the visitors visiting strictly before. Naturally, each X j must contain i , and each Y j must contain i with a multiplicity of j − 1. Also, we have ∅ ⊆ Y 1 ⊂ . . . ⊂ Y m ⊆ A\{i } by definition. Let v be the information initially available in c. The combined payoff of the visitors in X j , as per our notation, is π Y j X j (c). Player i gets a share of it proportional to ρ i , which summed over all visits gives its total payoff from visiting c For each j , let v j denote the information available in c just before the visitors in X j visit, which evaluates to (11) can be written as Now, let s be another outcome with similarly defined sequences X 1 , . . . , X m and Y 1 , . . . , Y m such that the only difference from s is that for each j , the players visiting c in the same time step as the j th visit of player i in the outcome s, now visit strictly before it, in s . Formally, for each j , we have Y j = Y j (X j \ {i }) and X j = {i }. Thus, the visitor set remains A for s . Now, by definition, v j is the information available in c after the visitors in Y j have visited it. Hence, the information left after players in X j \ {i }) subsequently visit it is v j k∈X j \{i} (1 − ρ k ). Hence, as player i 's j th visit to c follows, it gets a payoff that is ρ i fraction of the value available, and this summed over all the visits gives the total payoff of player i from visiting c, for the outcome s Now, applying the first inequality from Lemma 1 on y = ρ i , X = {ρ k |k ∈ X j \ {i }}, we get c). (14) Furthermore, let s be another outcome with similarly defined sequences X 1 , . . . , X m and Y 1 , . . . , Y m and the visitor set for c being still A, such that all the visits of player i are strictly after all the visits of all the players. Formally, for each j , . Now, since s and s are such that all the visits of player i are unaccompanied, we can write its payoff from the j th visit as simply π Y j {i} (c) and π Y j {i} (c), respectively. Now, the multiplicity of i is the same (= j − 1) in Y j and Y j , whereas all other players reside in Y j with the maximum multiplicity possible for the visitor set A. Thus, Y j ⊆ Y j , and Thus, it follows from (14) and (15) . Since this holds for any outcome s for which the visitor set for cell c is A, the lemma is proven.
Using these, we now show that this game is (1,1)-smooth. Let s and s * be any two outcomes. For every player i , let q i denote the outcome (s * i , s −i ). For any cell c, let multisets A c and A * c denote the visitor sets for cell c when the outcomes are s and s * , respectively. Note that when the outcome is q i , the visitor set of cell c can be written as A * c | i A c | −i . With this notation, we can write Finally, for the outcomes q i , we can write i∈P c). Now, if player i does not visit cell c when playing strategy s * i , equivalently, if it is not contained in the visitor set A * c , its payoff from visiting c is simply zero. Hence, we get (18) Here, (18) follows from Lemma 4. Finally, adding (16) and (18) and subtracting (17) gives Now, we show that each term of the summation on the right-hand side of (19) is always nonnegative.
Lemma 5: Let A and A * be the visitor sets of a cell c ∈ C for outcomes s, s * ∈ S, respectively. Then Proof: We will show that Recall the definitions related to multisets from Section IV. It is easy to see that the left-hand side of (20) is not less than that of (21) since the latter possibly excludes some terms in the summation, and each term is nonnegative by definition. It is also easy to see that the right-hand side of (20) is no greater than that of (21) by Lemma 3. Thus, it suffices to prove (21) to prove this lemma. Now, we prove (21) by induction on the number of players in A * \ A, denoted by, say, a. The base case a = 0 is when A * ⊆ A, i.e., A * \ A = ∅. This holds trivially, as both the sides of the inequality to be proven, become equal to π ∅ A (c). Assume, as inductive hypothesis, that (21) holds whenever a < a 0 , for some a 0 ∈ Z + . Now, consider the case when a = a 0 . Arbitrarily fix some x ∈ A * \ A. Then, (A * \ A)| −x , or equivalently, A * | −x \ A has a 0 − 1 distinct elements. We get (and using Lemma 3) (and using Lemma 3) Fig. 2. Two-player game with temporal aspect. The only equilibrium has a welfare of 1, the maximum possible being 2 − , showing that the PoA can be arbitrarily close to 2.
Hence, this completes the inductive step and the proof by induction for (21), as required. Thus, it follows from (20) and (19) that i∈P π i (q i ) + i∈P π i (s) ≥ i∈P π i (s * ). Hence, using Definition 3, we have the desired result as follows.
Theorem 4: The multistep game with temporal aspect is (1, 1)-smooth. As shown in [41], a (λ, μ)-smooth payoff-maximization game has a PoA at most 1 + μ/λ, and this bound applies to the PoA with respect to all equilibrium concepts (mixed, correlated, and not just pure).
Corollary 2: The multistep game with temporal aspect has PoA of at most 2. Thus, the game has a constant PoA bound independent of the number of players, number of time steps. We now show with an example, that this bound is tight.
1) Tight Example: Consider the game as shown in Fig. 2. Let the number of time steps be l(≥ 2). The graph is a simple path as shown with cells c i and c i+1 being neighbors of each other for each i . Initially, the information in all the cells is 0, except for c l and c 2l -these two cells have an information of 1 and 1 − respectively, where is a small positive constant. There are two players with sensor effectiveness ρ 1 = ρ 2 = 1, initially in cells c 0 and c 2l−1 , respectively. Observe that player 2 cannot grab information from both c l and c 2l within one time step. Furthermore, it follows that the path c 2l−1 → c 2l−2 → . . . → c l → c l−1 is a dominant strategy for player 2, in which case it gets a payoff of 1 and in response, player 1 gets 0 from any strategy. Clearly, this is a pure equilibrium, leading to a social welfare of 1. On the other hand, if player 2 captures information from (only) c 2l allowing player 1 to capture from c l which it can reach in the lth time step, then the social welfare can reach its maximum value of 2 − . Since there is only one equilibrium, the PoS, as well as the PoA for this game is 2 − , i.e., it approaches 2 as approaches 0. Thus, we have the following theorem.
Theorem 5: The multistep game with temporal aspect has a PoS and PoA of 2.
2) Existence of Pure Equilibria: Unlike the single-step game, the general multistep game may not always have a PNE, as demonstrated by the following example. Consider the game as shown in Fig. 3. The connectivity between cells is given by the directed edges and the information initially available in each cell is shown. The number of time steps is 4. Players 1 and 2, with sensor effectiveness ρ 1 = ρ 2 = 1, are initially in cells P and Q, respectively. Thus, player 1 has two strategies: paths P → A → C → E → D and P → A → B → D → E. Let these be called "Left" and "Right," respectively. Player 2 similarly has two strategies, say "Up" and "Down" corresponding to paths Q → U → V → B → D and Q → K → L → M → N. Then, the payoff matrix is given by It is easy to check that there is no PNE in this case.

D. Multistep Games Without Temporal Aspect
In this section, we analyze the game without temporal aspect. Here, the payoff of a player from a visit to a cell is merely dependent on which players visit the cell over the complete course of the game and how many times, regardless of the order in which they visit the cell. Thereby, the combined payoff of players from their visits to a cell c (some of them possibly simultaneously), also does not depend on the order of visits, and can be easily computed using (9). Due to the different settings in this game than the one with temporal aspect, we opt for a slightly different notation. Let A, B be multisets with support in P, with m A , m B the respective multiplicity functions, such that B ⊆ A. Then, for a cell c, let θ A B (c) denote the combined payoff which the visitors in B would obtain by visiting cell c (as many times as the respective multiplicities in B), when the complete set of visitors for c is given by A. Naturally, this is only meaningful when B ⊆ A.
Lemma 6: Let A, B, B , and D be multisets with support in P subject to B ⊆ B ⊆ A and B D ⊆ A. Then, (1) Proof: The first part follows from definition, as both sides equal the combined payoff of visitors in B and D when the complete set of visitors is given by A. Next, applying the first part on A, B , B \ B , respectively, we get θ Rearranging gives the result in the second part. For the third part, note that in a game with temporal aspect, in the case when the visitor set of a cell c is A, one possible outcome s corresponds to all the visitors in A visiting in the same time step, and thus, θ A A| i (c) is a possible payoff of player i from cell c when its visitor set is fixed to A. Consequently, the result follows from Lemma 4.
To show smoothness for this game, we proceed similarly as in the game with temporal aspect. Let s and s * be any two outcomes. For every player i , let q i denote the outcome (s * i , s −i ). For any cell c, let multisets A c and A * c denote the visitor sets for cell c when the outcomes are s and s * respectively. Analogous to (16), (17), and (18), we get Next, adding the first and third equations, and subtracting the second gives As seen in Section IV-C, the right-hand side of the above is nonnegative by Lemma 5, and in turn, so is the left-hand side, which provides the desired result and the corollary using [41].
Corollary 3: The multistep game without temporal aspect has a PoA at most 2.
Next, we demonstrate the tightness of this bound.

E. Tight Example
Consider the game as shown in Fig. 4. The set of players is strategy c i 0 → c → c → . . . → c → c gives every player i > 1, a payoff which evaluates to l/( p − 1)l + 1. Thus, no player i wants to deviate to the other possible strategy c i 0 → c i 1 → . . . → c i l as it gives a smaller payoff of l/( p − 1)l + 1 − . Thus, the aforesaid outcome is a PNE, which has a social welfare of 1. However, it can be seen that the social welfare increases as more and more players switch to the respective alternative strategy, and in the extreme case of every player i > 1 switching to the respective strategy c i 0 → c i 1 → . . . → c i l , the social welfare reaches the maximum value of 2−1/( p − 1)l + 1−( p−1) , giving the same value of PoA. Thus, as → 0, it approaches 2−1/( p − 1)l + 1, which, in turn, can become arbitrarily close to 2 if the parameters p or l become arbitrarily large, showing that the bound of 2 is tight.
Existence of PNE: The following example shows that unlike the single-step game the general multistep game without temporal aspect may not always have a PNE.
Consider the game as shown in Fig. 6. The connectivity between cells is given by the directed edges and the information initially available in each cell is shown. The number of time steps is 3. Players 1 and 2, with sensor effectiveness ρ 1 = ρ 2 = 0.8, are initially in cells P and Q, respectively. Thus, player 1 has two strategies: paths P → A 1 → A 2 → A and P → B 1 → B 2 → B. Let these be called "s A " and "s B ," respectively. Player 2 has eight strategies, however, since the sequence of the visits does not matter, there are four distinct ones. Let these be called "A 3 B 0 ,"" A 2 B 1 ," " A 1 B 2 ," and "A 0 B 3 ," where " A i B j " denotes a strategy which visits A i times and B j times. Then, the payoff matrix for this game is given by It follows that there is no pure equilibrium in this case.

F. Maximum Social Welfare Computational Complexity
In Section V, we propose routing policies which achieve near-optimal social welfare and can be computed fast. The need for such policies arises from the hardness of computing the maximum social welfare (MSW) (and a strategy that yields it) in the multistep UAV games. We call this problem UAV-MSW in short. For our experiments, we consider a surveillance environment in the form of a grid of cells-the corresponding graph can be seen to be a more generalized version of the rectangular subgrid graph (RSG) defined as follows [45].
Definition 9: Let G ∞ be the infinite (undirected) graph whose vertex set consists of all points of the plane with integer coordinates and in which two vertices are connected if and only if the (Euclidean) distance between them is equal to 1.
If the edges are directed, we call the graph directed RSG (DRSG). We describe the experimental setup in Section V which shows how DRSGs are a special case of the graphs we consider. Hence, in the interest of showing the computational difficulty, it suffices to show that the following theorem.
Theorem 7: UAV-MSW on DRSGs is NP-hard. Proof: The hamiltonian path (HP) problem is the decision problem to determine if there exists a path in a given graph which visits all the vertices exactly once. Solving HP on RSGs is shown to be NP-complete in [45]. This can be reduced to UAV-MSW on RSGs as follows. Let G = (C, E) be an m × n RSG given as input for the HP problem.  |C|−1. This reduction requires polynomially many calls to the UAV-MSW oracle, and this shows that UAV-MSW is NP-hard. (Note that the reduction works regardless of whether the game is with or without the temporal aspect.)

V. EXPERIMENTATION
In a competitive game environment like in the UAV problem, a typical solution is a PNE. As seen from the previous sections, however, a PNE may not exist in the general multistep game. In this section, we devise alternative easily implementable routing policies with reasonable performance guarantees. We propose such policies and simulate plausible problem scenarios in the temporal setting with randomly generated game parameters and evaluate these policies for their performance and study the dependence on various game parameters.
Previous work using similar models, in particular, [4], has studied fundamentally similar policies, however, they do not provide significant insights on the social welfare performance which is the goal of our experimental evaluation. In addition, we provide the analysis for much broader parameter settings in terms of the number of players, time steps, the search arena size and connectivity, and the information distributions. We discuss this setup in detail as follows.

A. Setup
The randomly generated game instances are in the following setting. The set of cells is in the form of a grid of fixed dimensions (which can be varied across different scenarios to study the dependence on the grid size), which is a common representation, such as in [28]. To allow arbitrary connectivity constraints, we have a cell connectivity parameter δ which functions as follows. For each cell, each of the cells within a Chebychev distance of 1 (i.e., rowwise, columnwise, or diagonalwise adjacent cells and the cell itself) is independently chosen to be its out-neighbor with probability δ = 0.5. 2 Since DRSGs can have edges only between cells that are at Euclidean distance 1, it follows that the set of DRSGs is a subset of the grid environments we consider, and consequently, UAV-MSW is NP-hard due to Theorem 7. In every game instance, the sensor effectiveness parameters of the UAVs are In many real-world problems such as target detection, a few independent cells are of high value, and the value decays as one moves away from these cells. Such cells of the highest value are referred to as the peaks. The cell information values for these scenarios are set as follows.
k peaks (k > 0): We first fix k peaks p 1 , . . . , p k as points from the grid chosen u.a.r. independently. Then, the information value around a peak p i as a function of distance d from it is given by a generalized normal distribution

B. Zero Peaks
In this scenario, cell information values are simply chosen u.a.r. from [0, 1]. Fig. 5 shows randomly generated instances on 6×6 grids for these scenarios. To study any of the scenarios, we generate a large number of instances n with all the parameters randomly chosen as described earlier. We let the UAVs apply a routing policy to move in the search space and collect information accordingly. Our experiments evaluate the performance of these policies and the dependence on various game parameters.

C. Routing Policies
We let the routing policies to be defined with a parameter τ called as the horizon, which denotes the length of the walks that the players consider as strategies. This is often a case in the UAV routing domain, due to limitations on computational power and/or the visual sensor span, etc. Thus, the total number of time steps l could be much larger and the UAVs route by successively committing to walks of length τ . 3 We first describe some rudimentary policies which are later used  in describing our proposed policies and are also useful for comparison as baselines.
1) Greedy Parameter τ : In this policy, the strategies considered at each step are walks of length τ . The strategy which yields the highest payoff to the player disregarding other UAVs in the environment is chosen under this policy.
2) Stepwise Social Welfare Maximizer parameter τ : This policy also considers walks of length τ as strategies. Each player computes the joint strategy which induces maximum social welfare 4 and plays the individual strategy corresponding to this outcome.
Next, we propose routing policies which are based on a pure-strategy NE.
3) Single-Step NE: In this policy, a UAV computes and plays a PNE of the single-step game at every time step. Such a PNE is guaranteed to exist in single-step games via Theorem 1. To ensure that all the players compute the same equilibrium, we require that the procedure of computing it is mutually agreed upon before the game begins, as follows: the equilibrium is obtained by applying the best-response (BR) dynamics, the initial outcome for the BR dynamics is taken as the one with every player playing as per the greedy myopic policy, and the order of players for sequentially computing the BRs is taken to be the descending order of ρ i s.

4) Multistep NE Parameter τ :
In this policy as well, a UAV tries to compute a PNE via a mutually agreed BR dynamics procedure as in the single-step NE case, except that the walk is of length τ . Since such a PNE is not guaranteed to exist, if it is not found within 2 pτ rounds 5 of BR dynamics, the player reduces the horizon to τ − 1 and attempts to find a PNE. The horizon is repeatedly lowered until a PNE is found, for horizon (say) τ . Note that τ ≥ 1 since a PNE always exists in the single-step game. Having computed such a PNE, the players choose the respective strategies which are walks of length τ .  Table III shows the multistep NE policy in operation for various scenarios. When operated with τ = 5, a PNE was found at the highest horizon in 894 out of 1000 times for the two-player game with no peaks. However, this proportion drops as the number of players increases (for the same number of peaks). Also, it drops as the number of peaks increases (for the same number of players). Indeed, for the game with two peaks and five players, it is a modest 156 out of 1000 times. These values and trends critically suggest limiting τ to as high as 5 in our experiments.
For convenience, we denote the Greedy policy with horizon τ by G R τ , and the stepwise social welfare (SW) maximizer policy with horizon τ as SW τ . Similarly, we denote the multistep NE policy with horizon τ by N E τ , and thus, the singlestep NE policy in particular by N E 1 .

D. Evaluation and Results
In this section, we discuss how the policies are evaluated. We compute the social welfare optimality of the N E τ policy by comparing it against SW τ . As explained before using Table III, the marginal utility of having a high horizon sharply decreases while the computation time naturally increases rapidly with horizon as the strategy space for each player increases exponentially with it. Similarly, computing the maximum social welfare for a large horizon becomes computationally intractable. Even for the scenarios studied below with low horizon and number of players, the brute force search takes time exponential in these parameters, and hence, we compute it using an elaborate branch-and-bound technique. We conduct experimental runtime analysis on a small grid size 6 × 6 yielding results as shown in Table IV. For l = 6 time steps, comparing policies SW 6 , G R 6 , and N E 6 shows that the runtime (in seconds) increases with the number of players much more rapidly for SW 6 while it is relatively stable and small for G R 6 and N E 6 . Similarly, for a fixed number of players p = 4, the runtime increases very rapidly with l for the SW policy while it is relatively stable for the greedy and the NE policies. Also, the runtimes are seen to be greater for the scenario with one peak as compared to others.
1) Empirical PoA: We first note that SW τ is guaranteed to achieve the maximum social welfare only when τ = l. Hence, in our first experiment, we set l = τ = 5 and compare N E 5 against SW 5 . To evaluate the performance of N E 5 on a game instance, we compute the social welfare obtained when all the UAVs employ N E 5 and the one obtained when all employ SW 5 , and then obtain their ratio (that of latter to former). To obtain the empirical PoA, we average this ratio over n = 1000 randomly generated game instances. In our results below, we also include the minimum and the maximum ratios obtained across the n instances, for more insight.
We analyze the dependence of the empirical PoA on the number of players by varying p between 2 and 4, fixing the grid size to 5 × 5. Similarly, we analyze the dependence on grid size by varying it between 6 × 6, 5 × 5, and 4 × 4, while fixing p = 4. We consider scenarios with up to two peaks. The results are shown in Table V.
Observe that, in each scenario, the empirical PoA increases with the number of players and decreases with the grid size. A plausible explanation for this is that, with only a few players, or with a large grid, players are more likely to be far from each other, thus with little overlap between their strategies. As a result, their BRs to each other are more likely to be dominant strategies which yield the individual maximum payoffs possible when there are no other players. Thus, the N E 5 policy is more likely to get a social welfare close to the maximum, resulting in a low PoA. To capture this more closely, we define and analyze the degree of overlap as follows.
We fix the number of players to 2 and for n = 1000 instances, we compute the average probability that two strategies, one each drawn u.a.r. from the set of strategies of the two players, respectively, overlap. Observe that when the two players are in the same location or very close, this probability is very high, whereas, when the players are far away, their strategies are nonoverlapping walks, making this probability 0. Table VI shows the dependence of this degree of overlap on the grid size. As we intuitively expect, this dependence does align with that of the empirical PoA in Table V, as both the quantities are higher for the smaller grid size and for the scenario with no peaks.
Overall, the results indicate that the N E 5 policy does well as compared to the optimal SW 5 policy, with the empirical PoA smaller than a meager 1.1 on average. Also, the worst instances found in all the scenarios have a ratio well below the theoretical bound of 2 while there are always instances where the policy does achieve the maximum social welfare.

2) Experiments on Large Problem Instances:
Here, we consider larger game parameters by setting the grid size to 10 × 10, the number of time steps l = 25, and the number of players p = 4. We compare N E τ against SW τ , keeping the horizon equal for the policies being compared. We vary τ from 1 through 5, to study the dependence on the horizon. The evaluation is done by obtaining the ratio of social welfare averaged over n = 1000 randomly generated game instances, as described before. We call this the welfare quotient of N E τ (and not empirical PoA, since it is not compared against the true maximum social welfare). Naturally, lower the quotient, better is the performance. The results are shown in Table VII.
We observe that the average welfare quotient of N E τ for all horizons (up to 5) is very close to 1. As pointed out earlier, SW τ does not yield the true maximum social welfare, and hence, N E τ could yield a higher welfare than SW τ at times causing the quotient to be less than 1, which is prominently reflected in many instances, such as one in the scenario with no peaks where SW 2 achieves only about 75% of the welfare achieved by N E 2 . Thus, having a horizon much lower than the number of time steps critically affects the performance of SW τ , and as a result, N E τ does almost as well in comparison, with a mild rise in the welfare quotient as τ increases.

VI. CONCLUSION
UAVs are becoming increasingly popular in search and reconnaissance missions due to their ability to conduct tedious tasks, inexpensively and without risk to human life. When the area of operations is large, a fleet of UAVs might be deployed. This paper addressed the problem of autonomous routing of a UAV fleet in a communication denied area where the UAVs are not allowed to exchange information or negotiate trajectories. The problem is modeled in a game theoretic framework. More specifically, we formulated the multiagent UAV routing problem as a game where UAVs are players and their strategies are the different routes they can take. We have considered many useful and interesting concepts including temporal aspect and fusion. This paper develops many research results on methodological and practical sides. For the commander or manager of the fleet, it is reassuring to know that the multiagent routing policies developed in this paper can produce outcomes that are theoretically bounded by one-half of the optimal centralized policy if, in fact, one could come up with it. This bound is independent of the length of the routes and independent of the fleet size. Furthermore, we have categorized the single-step game as a new variant of weighted congestion-type games. For this class, and for the general formulations of the multistep UAV game, we prove the existence or nonexistence of the pure equilibrium. Along with this, tight bounds on PoA and PoS for all the various formulations, constitute the majority of the technical results of this paper.
To learn the actual PoA and PoS values in practice, we juxtapose our theoretical results with empirical studies. We propose a multiagent UAV routing policy, which, on the average, produces total information gain much closer to the centralized solution than to the theoretical bound. We also find that PoA increases with the congestion level when the UAVs compete for a smaller area or more UAVs are added to the search area. We also highlight the limitations of having a small horizon, as an upshot of which, our proposed routing policy performs almost as well as the policy which greedily maximizes the social welfare at each horizon.
The future effort could be directed at studying these policies for longer rolling horizons and for persistent surveillance scenarios. It is also natural to think about the obsolescence of information where the collected value can decay due to the passage of time, and a UAV has to be routed back to collect the information being "built up." When intermittent communication is permitted, new cooperation strategies can be conceived and reduction in PoA can be studied. Routing to avoid an adversarial team is also an interesting area that can benefit from the framework developed in this paper.