Joint Power Control and Scheduling for High-Dynamic Multi-Hop UAV Communication: A Robust Mean Field Game

As an extensive prospect in communication technology, multi-hop unmanned aerial vehicle (UAV) faces some challenges as well. How to ensure the availability of time slots in high dynamic scenarios and the rationality and robustness of power allocation in massive scenarios remains to be resolved. In this paper, we propose a robust mean field game (MFG) framework to manage time slot resources and power resources. On the one hand, this framework can predict the potential conflicts of time slots and avoid them in time. On the other hand, it can well adapt to the massive scenarios with complex interference, in order to realize power control. The simulation results show that the robust MFG-based resource management scheme can not only reduce the probability of packet collision and packet loss, but also improve communication energy efficiency (EE) with better robustness.


I. INTRODUCTION A. OPPORTUNITIES AND CHALLENGES
with the rapid UAV enabled communication technologies, a flying ad hoc network (FANET) has gradually expanded from several and slow-rate UAVs to massive and high-dynamic UAVs [1]. Multiple cooperative UAVs have found wider applications to natural disasters (earthquakes and floods), agriculture, and military battlefields [2]. Advantages of FANET communications are as follows: • Effective cost: Deploying massive small-scale UAVs is cheaper than deploying one large-scale UAV. Therefore, multi-UAV cooperation has better economic benefits [3].
• Extend coverage: Massive small UAVs, deployed with the same cost to one large UAV, have greater coverage and can serve/monitor larger areas.
The associate editor coordinating the review of this manuscript and approving it for publication was Bong Jun David Choi .
• Improve performance: Multi-UAV cooperation can perform more complex tasks. For example, it is more resilient in the case of some UAVs crash, and the remaining UAVs can still complete the mission.
• Better concealment: The small multiple UAVs compared with a big single UAV are less likely to be discovered by the enemy's radars. There exist many other reasons, such as rapid deployment and quick information dissemination, that are driving the widespread use of multiple cooperation UAVs. However, the multi-hop UAV communication technology still faces several challenges: • Time allocation: On the one hand, UAVs are with the characteristic of high-speed movement, which leads to rapid topology changes. In this circumstance, the time slots allocated for the current situation may become invalid as the topology changes. Correspondingly, the delivery rate will decrease and the delay of data packets will increase. On the other hand, the power control strategy will cause huge signaling overhead in the case of large-scale UAVs. In addition, UAVs are distributed in three-dimensional space, so the number of nodes around a UAV is greater than that distributed in two-dimensional space on the ground. It becomes more difficult to realize proper power control on account of the complicated interference of UAVs.
• Robust adaption: The fluctuations of the wireless channels are more severe due to the channel state information quantization noise and the estimation error caused by the high-speed movement of UAVs. It is uncertain to acquire the channel state information by the UAVs, as the ideal channel state information is hard to be fed back. The high dynamics and uncertainty of UAVs put forward higher adaptability requirements for resource management strategies. These challenges will cause UAV nodes to exchange information unsuccessfully and consume more energy when performing tasks. Therefore, it is necessary to design an appropriate time slot allocation strategy for high dynamic scenarios, and a robust power control strategy for complex interference and uncertain environment. The goal is to spend less energy, but deliver more data in FANET.

B. RELATED WORK OF FANET
In this paper, we mainly focus on two aspects of the resources allocation strategy: the time slot allocation strategy and the power control strategy. For the time slot allocation strategy, the authors in [4] proposed a linear programming method based on the interference model to achieve more traffic. In [5], a time slot allocation strategy was proposed for multi-hop communication. Based on time slot multiplexing, time slots were sequentially allocated according to services, and the delay was reduced. In [6], the authors proposed a distributed time slot variable length protocol for the asymmetry between wireless links to balance the amount of data transmitted by each node and reduce queuing length. In [7], the authors used a cross-layer idea, and jointly considered routing and scheduling using linear programming to optimize the node's data transmission and slot collision problems. However, some of these time slot allocation strategies did not take into account the impact of node mobility [4], [5], and some did not consider the overhead caused by information interaction between nodes [6]. For power control strategies, in [8] the authors optimized the UAVs' trajectories, to improve the end-to-end throughput. In [9], a strategy for optimizing power control was proposed through the degree connectivity of one-hop neighbors and two-hop neighbors. In [10], the topology control technology [11] was proposed to analyze the information and identify the redundant information. Therefore, the nodes of the network were grouped, and only some nodes need to transmit information, which achieves energy-saving. In [12], the authors proposed a clustering mechanism based on energy to select the node with the largest residual energy as the cluster head. The data in the cluster head can be transmitted in multiple hops. However, some of these power control strategies limited the mobility of nodes [8], and some cannot reach the optimization of the energy of entire network nodes [9], [10], [12].
Power control is affected by interference, which is from neighboring nodes, background noise, the transmitters occupying the same frequency, etc. The interference is complex and mutual, so it is difficult to attain a proper power control strategy. Recently, the MFG [13] is suitable for solving complex interaction problems and it has been applied to medium access control (MAC) in wireless networks [14]. More details on these application areas can be found in [15]. Next, we briefly introduce the MFG and the related work of applying the MFG to wireless communications.

C. RELATED WORK OF MFG
Except for very simple cases, such as random field theory and one-dimensional issing model, the multisystem unification of interactions between individuals is generally difficult to solve accurately. To deal with this problem, mean field theory works by studying a simple model to reflect large-scale complex models. In a system of n individuals, replace the n-system with a single problem plus an average field. The mean field replaces the interactions of all other individuals with an arbitrary individual. In this way, an arbitrary multibody problem can be reduced to a single problem that is easily solvable, thereby gaining some insight into the behavior of the system at a relatively low cost. In the mean field theory, if the field or particle in the original system shows more interaction, then the mean field theory solution will be more accurate.
In a traditional game, each player needs to collect every piece of information about the other players. When the number of players is large, dimensional disasters will appear, resulting in lengthy game analysis and infeasible game solutions. MFG model is based on Mean Field theory. By introducing the concept of average field to approximate the sum influence (such as and interference) of other players in the game, the interaction process of interference is simplified, the difficulty of analysis is reduced, and a low complexity equilibrium solution algorithm is realized. In addition, the increase in the number of players is seen as a challenge to traditional game theory, but it is helpful to solve MFG.
The MFG, which has the characteristics of fast convergence, is committed to solving the decision-making problem of participants. It can be used for many power control problems in wireless networks [16]- [24]. The MFG introduces the concept of the field to represent the influence of other players on the researched player, and simplify the interference interaction process [25]. The method can effectively reduce the difficulty of analysis, and realize algorithms with low complexity to get an equilibrium solution.
In [16], the authors applied the MFG to the ultra-dense heterogeneous network to achieve interference management and power control. The main consideration of the paper was to achieve smaller interference management overhead and less convergence time through MFG interference management. In [17], the authors proposed an MFG guided deep reinforcement learning (DRL) approach for the task placement in the network with a large number of nodes, which can help servers make timely task placement decisions, and significantly reduce average service delay. In [18] and [19], in ultra-dense scenes with complex interference to achieve power control, the authors used the mean field approximation theory and converted the dynamic stochastic game model into an average field game. By solving two easy-to-handle partial differential equations, efficient power control was obtained. In [20], the interference problem of the cellular network was solved in two steps, and the data was transmitted using appropriate power. The first step was to assign the macro cell user a subchannel that could guarantee its quality of service. In the second step, the MFG theory was used to allocate the remaining subchannels to the small cells. This method was easier to implement than other game methods. In [21], the authors summarized the application of various types of game models in the UAV field, and discussed the case of using fixed-wing UAVs as base stations. In this case, there were many communication links, and the MFG was used to control the transmit power and the speed of the UAVs, thereby reducing power consumption and interference, and achieving the purpose of increasing the flight time.
In [22], the authors proposed a power control algorithm for distributed interference and energy perception in ultra-dense device-to-device (D2D) networks based on the MFG theory. In [24], the authors proposed a downlink power control algorithm based on the MFG theory. In [16], [18]- [24], the MFGs are applied to the interference management and power control of wireless communication. It achieved good results with less overhead, so it provided a good reference for our next work.

D. CONTRIBUTIONS
Although many time slot allocation strategies and power control strategies have been proposed, we find that they are unsuitable for high dynamic multi-hop UAV communication. On the one hand, a few of the above time slot allocation strategies take into account the time slot conflicts caused by high dynamic. Therefore, the prediction of the UAV nodes' movement is an effective method to reduce time slot conflict. On the other hand, for large-scale dynamic scenarios, quickly allocating power is a significant challenge. We use MFG to allocate power to high-speed UAVs in a short time. For complex interference and unpredictable randomness at high dynamic, we add robustness to the MFG power allocation algorithm. The main specific contributions of this paper are summarized as follows: • Dynamic time slot allocation strategy based on UAVs' position prediction: In this paper, for the time-varying network topology, the location and speed information provided by the physical layer global positioning system (GPS) module is used to derive MAC layer time slot scheduling. Thus, the impact of time-varying topology caused by the high-speed movement of nodes on network performance can be effectively reduced.
• Power control algorithm using MFG: The large number of nodes with limited resources in the network makes interference more serious. In this paper, the MFG is used to simplify the complex interference suffered by the nodes. The nodes can obtain the optimal power value only by local optimization. This kind of power control algorithm using the MFG can achieve outstanding performance in large-scale scenarios.
• Robust power control: In addition, the relevant state information acquired by the nodes in the high-dynamic network is uncertain because of the fast movement of nodes and the error caused by measurement. Therefore, in this paper, we add robustness to the power control algorithm to make it more robust. The rest of this article is organized as follows. We describe the system model and formulate the problem in Section II. In Section III, we solve the subproblem of slot allocation with the method of prediction and dynamic reservation, and solve the power control subproblem with the robust MFG. The time slot allocation strategy and power control algorithm in the high dynamic scene are simulated and verified in Section IV, followed by the conclusion in Section V.

II. SYSTEM MODEL AND PROBLEM FORMULATION
The multi-hop UAV network is mainly composed of airborne platforms such as fixed-wing UAVs, rotary-wing UAVs, and airships. For convenience, all kinds of communication platforms are unified as UAV nodes, and it is assumed that each node uses an omnidirectional antenna. The onboard GPS can provide accurate time synchronization information for the nodes, and the node communication mode is half-duplex.

A. SYSTEM MODEL
The topology of the multi-hop UAV network is represented by a directed graph G = (V, E), where the vertex V = {v 1 , v 2 , . . . , v N } of the graph represents the set of nodes in the network, and the link E = e 1,2 , e 3,1 , . . . , e i,j represents the set of links between nodes, as shown in Fig. 1. Notations used are listed in Table 1. Suppose all nodes communicate at the same frequency f , and the bandwidth is w. Time is divided into repeated frames, and each frame is equally divided into several time slots which contain the broadcast slot, the response slot, and the data slot, as shown in Fig. 2.

1) COLLISION CONSTRAINT
Generally, in order to avoid the collision, the nodes within the two hops range of each other cannot use the same time slot. Let V τ i be the set of transmitting nodes in τ i , and D (V) denotes the minimum value of the hop between any two nodes in set V, and then the conflict constraint can be represented as D (V τ ) > 2.

2) INTERFERENCE MODEL
In order to better model the dynamic of the channel, we use the Ornstein-Ullenbeck equation [26] to formulate the channel dynamic equation between nodes i and j as where B (t) is the Brownian motion, and the static distribution of h i,j (t) is a Gaussian distribution with a mean of κ h and a variance of σ h . The path loss between nodes i and j is defined in [27] as where x i − x j is the Euclidean distance between nodes i and j, ε > 0 and α ≥ 2 is the path loss exponents. Then channel gain g i,j (t) of link e i,j between nodes i and j is given in [27] as Suppose node i sends data to node j in data slot τ i , and V τ i ∈ V is the set of nodes occupying the same data slot τ i , then the aggregate interference of node j is where p k (t) is the transmit power of node k at time t, and g k,j (t) is the channel gain between nodes k and j at time t.

B. PROBLEM FORMULATION
We formulate the resource allocation problem as a differential game, which is defined as a four-tuple: where N represents the number of players in the game, the players here are the transmitting and receiving communication pair.
• S: S = I is the set of states of players, where I represents the interference state of the player.
• A: the set of actions taken by the players. In the formulated game, the actions refer to transmission powers.
• C: the set of the players' cost functions.
Assuming that the path from the source node to the destination node has been selected, i.e., the next-hop node of each node is determined, and all the nodes on the path are respectively used 1, 2, . . . i, j, . . . , K to represent, and the cost function between any two nodes on the path can be calculated by where τ i is the data slot allocated to node i, and γ i,j is the signal to interference plus noise ratio (SINR) of receiving node j, which is related to the transmit power of simultaneous transmission links at the current moment.
represents the cost relationship between the transmission power and channel capacity. v i,j is the relative speed between nodes i and j. d t i,j is the distance between nodes i and j at moment t, and R i is the communication radius of node i. 1 (·) is the indicator function, if the condition in (·) is true, it is 1, otherwise it is 0. βe −ηγ i,j is related to symbol error rate, where β and η are related to transmission parameters such as modulation order.
A larger SINR has a smaller symbol error rate and a smaller cost.
represents the cost relationship between the speed, data transmission error and distance. SINR γ i,j of node j is which is related to power. So in (5), the first and second terms are coupled to each other. For this cost function, greater transmission power, faster speed, and greater symbol error rate will pay more. On the contrary, it costs less. So the resource allocation optimization problem can be formulated as where (7b) is the power constraint of the transmitting node, (7c) represents the slot allocation rule that the nodes within two hops cannot use the same slot, and (7d) indicates that the packets can be successfully received by the receiving node only if the SINR is better than SINR threshold γ th . From the above modeling, the cost function of joint power control and time slot allocation is obtained, but the power allocation is implemented on the basis of time slot allocation when the actual protocol runs. In other words, when the node time slot allocation is completed, power is allocated to each transmission link that is also connected to the same time slot. Considering the chronological of MAC layer resource allocation and physical layer power allocation in practical applications, we first decompose the original optimization problem into MAC layer time slot allocation subproblem and physical layer power allocation subproblem. The original problem decomposition method applies to problems with coupled variables. For our system model, there is a coupling relationship between the symbol error rate and the transmit power, i.e., there is a coupling relationship between βe −ηγ i,j and p i or γ i,j . After decomposing the problem, on the one hand, we design a time slot allocation strategy suitable for high dynamic scenarios to reduce the packet loss rate. On the other hand, we optimize the power control based on time slot allocation result. The goal of the time slot allocation is to reduce the symbol error rate due to collision and increase the packet delivery rate(PDR). The goal of power allocation is to maximize EE. The consistent goal of the two resource allocations is to reduce the cost function.

III. SOLUTION TO THE FORMULATED PROBLEM
Through the above problem modeling, the optimization problem of joint power and time slot is obtained. In Section III-A, we introduce the slot allocation strategy designed for high dynamic scenarios. The time slot allocation strategy improves the reliability of data packet transmission and reduces the data transmission error rate. In Section III-B, we use the MFG to solve the robust power control problem with complex interference.

A. MOBILITY-AWARE TIME SLOT ALLOCATION
For the problem of time slot allocation, we further improve on the basis of unifying slot assignment protocol (USAP), which is a distributed slot assignment protocol for mobile multi-hop networks [28]. In order to facilitate the introduction of the proposed algorithm, we make some assumptions about the MAC layer protocol. The structure of the frame is as shown in Fig. 2.
The frame is divided into the broadcast subframe, the response subframe, and the data subframe. Each subframe is further divided into several time slots, and the node mainly broadcasts the net manager operational packets (NMOP) of the node in the broadcast subframe, wherein the NMOP mainly includes the time slot allocation information of the node itself and the one-hop neighbor nodes. The node can obtain the slot allocation status of the two-hop neighboring nodes through the relevant slot allocation information in the NMOP of neighbor nodes. The node informs the neighboring nodes of its slot reservation by filling the slot reservation field. It is assumed that the node needs to pass a time frame period after broadcasting the slot reservation request to use the corresponding data slot. The node sends the response frame in the response slot to reply to the time slot reservation request of other nodes.
Next, we will give two typical scenarios in which the high-speed movement of the nodes causes the degradation of the communication quality, and then the corresponding solutions are given.

1) SCENARIO ONE
As shown in Fig. 3, node A sends packets to node C. At time t1, the packets can be sent to node C directly, which only takes one frame. However, the communication link between nodes A and C will break up before time t2 due to the violent relative movement between the nodes, and two situations will occur. The worst case is that the link is disconnected after the end of the broadcast subframe of Frame2 and before the time t2. At this time, node A is not able FIGURE 3. Scenario one. VOLUME 9, 2021 to discover the disconnection between nodes A and C due to the topology being updated (here we assume that the node can update the topology through the exchange of NMOP during the broadcast subframe). The packet is lost and the packet loss rate increases. If the link is disconnected before the start of Frame2, node A discovers that the link is disconnected through the topology update, and changes the next-hop node address field to node B, and forwards the packet through node B. At this time, node A needs to reserve slot in Frame3 to node B, even if the slot is successfully reserved, the data have to be sent in the corresponding data slot in Frame4. After that, Node B needs the same process to reserve slot, and the data packet is sent to Node C at time Frame6. It increases end-to-end delay.
Solution: The high-speed mobile UAVs, even with small masses, have large inertia and do not have an emergency stop. Together with the acquired real-time speed and position information, it is possible to predict the next position of the UAVs. So node A predicts the state of the link between nodes A and C in the next two frames before sending the data packet. We get the longitude, dimension and height information provided by the GPS of the UAV node. Corresponding to the position information and velocity information of node A into the coordinate system, the coordinates of node A at (8) and the distance between nodes A and C dist t+ t (A, C) is If dist t+ t (A, C) is less than the maximum communication distance, the link is considered to be connected in the next frames, and if dist t+2 t (A, C) is greater than the maximum communication distance, the link is considered to be disconnected in the next two frames. When the link state of the next frame is connected while that of the next two frames is disconnected, Node A starts to find a relay node. For example, we choose node B as the relay node. Then node A modifies the slot reservation field in the data packet and sends the data packet. After receiving the data packet, node B obtains the information of reserving slot according to the slot reservation field in the packet and then reserves the slot in the next frame. In the next frame, the packet can still be delivered successfully. In the next two frames, the link between nodes A and C is disconnected and node B can directly use the slot it reserved to relay the packet.

2) SCENARIO TWO
As shown in Fig. 4, at time t1, nodes A and C are sending packets to nodes B and D, respectively, in the same slot without collision. It is assumed that before time t2, nodes A and C move to the two-hop range of the other party. The worst case is that the link is disconnected after the end of the broadcast subframe of Frame2 and before time t2. Since the topology update process has been completed, it is unable to find that the two nodes are within two hops of each other, and then a collision occurs at node D, causing packet loss. If the link is disconnected before the start of Frame2, nodes A and C can find that there exists a potential conflict between nodes A and C, and then one of the two nodes is ought to reserve a new slot in the next frame. The process above increases the end-to-end delay.
Solution: Since the intermediate nodes B and D are one-hop or two-hop neighbor node of nodes A and C, so that they can judge whether nodes A and C use the same slot or not, and further use the node position prediction model to predict whether the potential conflict will happen or not. The predictions of node A and node C may conflict. The node with the previous broadcast time slot position first tries to avoid conflicts, and node A reserves a time slot that does not conflict with the time slot of node C in the next broadcast time slot. Node C knows that the potential conflict has been resolved no sooner than it receives the appointment of node A. As a result, it is not necessary for node C to make adjustments. Fig. 5 shows a more general application scenario. To be specific, node A can send data to node C in one-time slot, and in another time slot, node A sends data to node D and node F sends data to node E simultaneously. There are multiple data services being transmitted at the same time. In the next time, the link may be disconnected as node C becomes far away from node A. Node A can use the prediction mechanism in scenario 1 to reserve a path from node A to node B in advance. Node F and node A are close to each other, causing the time slot of node A and node F to conflict. Therefore, node A and node F adjust the data transmission time slot in advance to avoid conflict. The communication node pair is far away from each other while the sending node is close to each other, it is the root cause of communication deterioration resulting from the high-speed movement. The time slot reservation mechanism with prediction is suitable for general high dynamic scene time slot allocation, these two situations can be avoided as much as possible based on it.

B. ROBUST POWER CONTROL
After giving the time slot allocation strategy, we achieved the minimum value of βe −ηγ i,j . In this subsection, we optimize c i,j (t) = p i (t) . In the high-speed movement scenarios, the nodes are subject to complex interference and inaccurate measurement of the transmitted signals. This kind of uncertainty cannot be ignored, so we need a robust mean  field model [14] to make correct power control in a simple and stable way. In the previous sub-section, we determined the strategy of time slot allocation. Next, the node needs to determine its power according to the current state of the node and the policies of other nodes.

1) STATE SPACE
Suppose that node i sends data to node j, and the interference µ i (t) experienced by link e i,j is equal to the interference I j (t) experienced by node j, which can be written as When there is a large number of interfering nodes in the network, the disturbance of different interfering channels can be eliminated by taking the expectation of these channels. Here we introduce the interference mean field approximation i,j (t), which can decouple the coupling relationship between players. The interference can be written as wherep is the mean transmit power of all the transmitting nodes at the current time. Since the channel gains of all nodes are independent of each other, the form of i,j (t) is similar to g i,j (t), the dynamic of mean field approximation is where κ and σ are limited to be non-negative real numbers, and B (t) is the Brownian motion. In order to achieve robust control, we cannot ignore the uncertainty of the nodes. When we take the nodes' uncertainty into account, the nodes' power algorithm can resist some random uncertain factors and achieve robust power control. Therefore, we need to add the uncertain term ξ i (t) to the dynamic game equation of the nodes. Now, the dynamic equation ds i (t) of node i is expressed as where f i (t) is a positive value function with t,pκ 2 is a constant.

2) ROBUST COST FUNCTION
The definition of the cost function for the mean field game control optimization problem is given in Appendix (31).

VOLUME 9, 2021
For our robust mean field game, the cost of the player i in [0, T ] can be expressed as To make the cost finite under the worst case of disturbance, we restrain the norm of ξ i (t), and the following inequality is introduced.
where ρ represents the robust level, Under a specific structure of µ, one looks for the smallest ρ, for which we can achieve the corresponding robust cost function where c i,j (0) is assumed to be zero. Each player chooses an optimal power control policy Q * (t) to minimize the robust cost function in the time interval [0, T ], that is Q * (t) is obtained under the case that the disturbance ξ i (t) is the worst, and therefore the power control policy obtained can represent the uncertainty of the information about the player's state.

3) HJB (HAMILTON JACOBI BELLMAN) EQUATION AND FPK (FOKKER PLANCK KOLMOGOROV) EQUATION OF SYSTEM
In MFG, we use the HJB equation to express the interaction between individual participants and the mean field, and use the FPK equation to describe the evolution of the mean field. The solution of the MFG system can be obtained by solving the above two equations. The solution of MFG is a two-tuple (u, m), where u = u(t, s) represents the participant value function, and m = m(t, s) is the mean field term, which represents the probability of the state on the participant set density distribution function. In other words, the HJB equation controls the calculation of the participant's optimal path, and the FPK equation governs the evolution of the participant's mean field equation. The interactive evolution between the two can eventually achieve mean field equilibrium.
In this robust power control model, the decision-maker space is the set of sending nodes, the behavior space is all available power, the state space is the interference of nodes, and the utility function is the (15). In order to derive the HJB equation of the system, firstly we define the value function where s i is the state of the player i, and the value function is the solution of the HJB equation. According to Behrman's optimality principle, an optimal control strategy should satisfy that, the remaining decisions must constitute an optimal strategy, regardless of the past states and decisions. That is, any part of the optimal strategy must be optimal. Assuming that dt is an infinitesimal time value, the following formula can be obtained on (18) according to Behrman's principle of optimality.
The formula (18) can be expanded based on Taylor's formula, and the Ito rule can be used to ignore the higherorder infinitesimal. Then, by taking the expectation of both sides of equation (18) and eliminating the infinitesimal values associated with Brownian motion, the following formula can be obtained we can reorganize the above formula and get Furthermore, the HJB equation of the system can be obtained by general generalization where s u (t, s) = ∂ ss u (t, s). H is a robust Hamilton function, which can be expressed as The part −ρ 2 ξ 2 i + ∂ s i µ i,j f i (t) ξ i involving the uncertainty factor is a convex function, and therefore the value of the uncertainty factor that maximizes this part is ξ The FPK equation defined in Appendix (32) of the system can be obtained through the test function method as where q = ∇ s u (t, s), and the Hamiltonian function removing the influence of uncertain factors is given by So far, the HJB and FPK equations of the system have been derived, and the mean field equilibrium of the system can be derived by combining the HJB and FPK equations. According to [29], [30], if the Hamiltonian is smooth, then the HJB equation has at least one solution. Taking the derivative of Hamiltonian, we find that its first derivative continuously exists. That is, this is a smooth curve. So we can derive the mean field equilibrium solution of the system by combining the HJB and FPK equations.

4) SOLUTION OF MEAN FIELD EQUILIBRIUM
In this subsection, the mean field equilibrium is obtained by the finite difference method. Therefore, we need to discretize the time interval [0, T ] and the interference state space [0, I max ], respectively, and the iteration step of time and interference is δ t = T X and δ I = I max Y . Before solving the FPK equation by the Lax-Friedrichs method, we introduce the discrete operators used in the Lax-Friedrichs method. Suppose f (t, s) = f j i , where t = jδ t , and then the following discrete operators can be attained It can be seen from reference [14] that the control variable ∂ q H (t, s i , q, m) can be used instead of p i (t, s), so it can be as follows The discrete operator is brought into the FPK equation to get the expression in (33). From (33), we can get the forward equation update formula like (34). Because of the existence of the Hamiltonian, the finite difference method cannot be directly applied to solve the HJB equation. Therefore, the HJB equation is reformulated as a corresponding optimal control problem, and the new formulating problem is solved by the Lagrange method. Using the Lagrange multiplier [32] λ (t, s), we can get the corresponding Lagrangian function (36). Then we find the extremum of the Lagrangian polynomial, and obtain the updated formula of the backward equation (35). With (34) and (35) After the above process, we have obtained the updated formulas of m t I , λ t I and p t I . According to de Finetti−Hewitt− Savage theory, the convergence rate of the mean field is O( 1 √ n ). Finetti proved the applicability in infinite binary sequences. Hewitt and Savage extended this theory to continuous compact spaces. For transmitting power, it can be initialized to half of the p max . In theory, the initial value of the mean field can be set at will, but it affects the speed of convergence. Here we set the initial value of the mean field to be a Gaussian. The Lagrangian multiplier is a backward equation, so we set it to the heaviest value 0. According to these formulas, the optimal robust power control algorithm shown in Algorithm 1 can be designed. And the formula shows that the UAV nodes only need local information and mean field information to make powerful decisions.
So far, we have the time slot allocation strategy and power control method. As shown in Fig. 6, when a node starts, we input initialized parameters to it, which include the values we give and those obtained from GPS. The distance is first calculated based on the input, and the link is predicted and the time slot is assigned. This step optimizes the second term of the cost function. Based on the results of the time slot allocation, a robust MFG is used for power control. This step optimizes the first term of the cost function. The minimal cost function is achieved through joint optimization of time slot allocation and power control.

IV. SIMULATION RESULTS
In this section, we use the idea of object-oriented programming to simulate nodes within the Matlab platform, and load VOLUME 9, 2021 various protocols for the nodes to realize the simulation of communication between nodes. As shown in Fig. 7, network nodes are abstracted into node objects, and node functions and protocols are also abstracted into corresponding classes. Different classes are responsible for completing the corresponding functions of the nodes. For example, the node mobile module is abstracted as a mobile model class (Waypoint). The mobile model class is responsible for timely updating the position of the nodes during simulation, and controls the nodes moving direction and speed. The node classes are the most basic and important element in the simulation platform. Node objects are composed of classes, and they can communicate with each other through event response mechanisms or signaling interactions, so as to simulate the entire UAV multi-hop network.
Through simulation, the performance of our proposed method is illustrated. We consider a multi-hop UAV network formed by multiple UAVs in a cube of 1km * 1km * 1km, and the number of UAVs is 20, among which 10 nodes send data packets to the corresponding destination nodes using constant bit rate (CBR) application. In order to reduce the impact of network layer performance on the MAC protocol,   [33] to find the route. The detailed simulation parameters are described in Table 2. In Subsection IV-A, we use velocity as a key factor affecting network performance. We consider the comprehensive and difficult situation, let the velocity of UAV change from 0 to 100m/s [1]. By comparison, we prove that our time slot allocation has good performance under high dynamic scenarios. In Subsection IV-B, we set different values for the uncertainties, which proves that our proposed power control algorithm has better EE.
To show the performance of the algorithm, the PDR, average end-to-end delay, and the number of collisions are mainly  considered in the simulation. The proposed algorithm can be mainly divided into two phases, namely the MAC layer slot allocation phase and the power control phase. In order to better demonstrate the performance of the algorithm, the performance of the two-stage algorithm is demonstrated separately.

A. PERFORMANCE OF DLP-TDMA IN MAC LAYER
In this section, we compare the network performance of the dynamic location prediction TDMA protocol (DLP-TDMA) to traditional TDMA protocols with fixed (Fix-TDMA) [34] and dynamic (D-TDMA) [35] time slot allocation. As seen from Fig. 8, the overall PDR of the Fix-TDMA protocol is the lowest, followed by D-TDMA, and the overall performance of the DLP-TDMA protocol is the best. When the speed is zero, the PDR of the three protocols is 1. As the speed increases, the PDR of Fix-TDMA drops the fastest. This is because the time slots of each node are fixed and the topology has changed during the access process, causing packets loss, and indicating that the Fix-TDMA protocol is unsuitable for applications where the nodes move fast. D-TDMA and DLP-TDMA protocols can dynamically adjust the time slot allocation to avoid collisions. As the nodes move faster and the topology changes more severely, the DLP-TDMA protocol based on node position prediction can pre-determine whether the link is disconnected, thereby selecting the relay node in advance. So the DLP-TDMA protocol has a higher PDR than that of D-TDMA protocol, in the case of high dynamic.
With Fig. 9, we continue to verify the PDR in the UAV nodes at high speed. We keep the nodes' moving speed at 80m/s, and change the interval of data sending. The smaller the packet sending interval, the higher packet sending frequency, i.e., the heavier the network load, which is used to measure the network performance. As the network load is reduced, the PDR of the three protocols shows an overall upward trend, indicating that the network performance is degraded when the network load is severe. The DLP-TDMA uses the node prediction mechanism to effectively avoid node conflicts and link disconnection, and so the overall PDR using the protocol node is higher than the other two. As shown in Fig. 10, it can be seen that when the velocity is zero, there is obviously no collision in the network. As the velocity increases, the number of collisions of protocol D-TDMA increases significantly. Each node in the  Fix-TDMA protocol is assigned a unique fixed time slot, and the number of collisions is always zero. The DLP-TDMA protocol is only slightly larger than the number of collisions of Fix-TDMA protocol. That is because the protocol can predict potential conflicts according to topology changes, the collision can be avoided in advance, and thus the number of collisions is greatly reduced. However, the UAV nodes using the D-TDMA protocol cause collisions, which are like scenario two, due to the dynamic movement of the node.
After the above simulation, the results show that the DLP-TDMA protocol can effectively avoid packet collisions when the nodes move quickly, ensuring a higher PDR and smaller end-to-end delay. It is a time slot allocation strategy suitable for multi-hop UAV networks used in high dynamic scenarios.

B. PERFORMANCE OF ROBUST POWER CONTROL IN PHYSICAL LAYER
After the process of slot allocation, the nodes that will send data are determined at a certain moment. In order to illustrate that our proposed power control algorithm can still have good robustness in the scenario with high node density and strong interference, we use 25 pairs of nodes to transmit data in the same time slot. All the transmitting nodes implement power control algorithms in a time interval to improve EE, which is defined by where B is channel bandwidth, and SINR is defined by In Fig. 11, the x-axis represents the time iterative process, the y-axis represents the process of interference iteration, and the z-axis represents the distribution of the mean field. It can be seen from the figure that during the Lagrange multiplier iteration process, the mean field rapidly converges and stabilizes. As time progresses, the mean field also drops rapidly. It shows that our proposed power control method has fast convergence and is suitable for high dynamic scenarios.   12 shows the EE performance of the system when the uncertainty factor obeys the Gaussian distribution of different variances. The horizontal axis is the variance of the uncertainty factor, and the vertical axis is the average EE of the system. The variance of the uncertainty factor is evaluated every 0.1 times from 1.5 to 8. The larger the variance is, the flatter the Gaussian distribution is, and the smaller the range of uncertainty factors is. It can be seen from Fig.12 that the system EE is slowly rising with the increase in the variance of uncertainty factor. However, it can see that the increase in the average EE is small. When the variance of the uncertainty factor decreases from 8 to 1.5, the range of values gets larger and larger, but the magnitude of system performance degradation is small. It indicates that even if the value range of the uncertainty factor changes due to variance, the EE performance of the system doesn't change much because of the robustness of the proposed algorithm. From  Fig. 12, we can also see that the power control method we proposed has better EE than the fixed power control method. It shows that our proposed power control method can transmit more information and consume less energy. As shown in Fig. 13, the subgraph placed above gives the average performance of the system EE as a function of time, and the below subgraph shows the change in system average EE with interference state (this interference is the uncertainty). In the figure, it can be seen that the EE of the system gradually rises with time, and then gradually reaches a stable state, indicating that the overall performance of the system is improved with the execution of the power control algorithm. And it can be seen that system takes only 10 ms to reach a steady-state using the power control algorithm, which is very short. In the below subgraph, it can be seen that the average EE of the system decreases slowly with the interference state increase from 0 to 3, i.e., when the uncertainty increase, the system performance can maintain a stable trend within a range, so the system has robustness.
These simulation results show that our proposed power control algorithm has excellent EE. When the uncertainties in the system become larger, the EE of the system drops slowly and smoothly. That is, our power control algorithm can still be very robust when the system is very uncertain. More importantly, this power control algorithm only needs to obtain local information to quickly converge.

V. CONCLUSION
In this paper, a resource allocation scheme combining the physical layer and the MAC layer is proposed, and an EE optimization model for cross-layer resource allocation is established. By using position prediction, the time slot is allocated efficiently, and by using the MFG, the power is efficiently controlled. The simulation results show that the proposed time slot allocation and power control algorithm can effectively reduce packet loss rate, probability of collision, and improve the system EE. The simulation results also show that applying robust MFG to power control has a good performance.

APPENDIX.
Definition 1: The cost functional J : The notation E x,t means expectation with respect to the measure induced by x α(·) (s) [t,T ] started at s. A is a subset of all progressively measurable stochastic processes α:{0, T } × = A. Definition 2: Let [0, T ], 0 < T < ∞, be the horizon of the game. m t (x) is a weak solution to partial differential equation over [0, T ] and for any infinitely continuously differentiable function φ t over R × (0, T ) with compact support, one has, where E m t is the expectation with respect to m t .
m t+1 Equation (35) and (36), as shown at the bottom of the previous page.