Price of Anarchy in mmWave Backhaul Routing and Link Scheduling

In this paper we present and evaluate the performance of a routing and link scheduling algorithm for millimeter wave backhaul networks. The proposed algorithm models the access point behavior as being selfish by considering access points always aiming to maximize their individual utility, rather than the global optimization objective. Our system utilizes popular concepts from the economics and fairness literature. Specifically, in order to forward packets between the access points that comprise the backhaul network, the Shapley value method is applied, which is shown to induce solutions with reduced latency. The performance of the proposed algorithm is evaluated in terms of total delay and price of anarchy, which represents the inefficiency of a scheduling policy when users are allowed to adapt their rates in a selfish manner and reach an equilibrium. A relaxed version of the problem is also presented, providing a lower bound on the value of the optimal solution. According to simulation results, the system that employs the proposed algorithm outperforms in terms of delay and price of anarchy a system that considers a First-In-First-Out packet forwarding policy, and a system that employs local search global optimization, under which users aim at optimizing the overall network delay.

backhaul solution for 5G and 6G networks needs to accommodate considerably high throughput, reaching multi-Gb/s per link [5], comply to very strict constraints regarding roundtrip latency, and guarantee availability and reliability.A complete solution for supporting this high demand in backhaul networks needs to account for all of the above technical challenges.
An attractive way to address the main technical challenge of increased capacity demand and small cell backhaul connectivity is the use of millimeter wave (mmWave) transmissions (30-300 GHz) [5].One of the advantages of employing mmWave transmissions, through the use of the spectrum bands beyond 30 GHz, is the opportunity to use allocations of larger bandwidth, which result in higher data transfer rates.This significant increase in bandwidth, combined with the capabilities offered by mmWave solutions, allow backhaul links between APs to handle considerably higher capacity than legacy networks in densely populated areas or during large scale public events with dense numbers of users [6].Moreover, the employment of mesh backhaul topologies for the connectivity of small cells, supported by the pencil-beam mmWave links and rapid beam steering capability, allows for improved reliability and flexibility against link failures via the support of multiple alternative routes.According to [4], the availability of mmWave small cells, and their adoption for dense and industrial environments, is expected to considerably increase, especially from 2023 onwards.More specifically, the adoption of small cell networks for mmWave communications is expected to ramp up and, by 2026, finally overtake the deployments in spectrum bands below 6 GHz [2].
However, the employment of efficient link scheduling and routing algorithms that can fulfill the tight constraints and requirements of mmWave backhaul networks in terms of endto-end delay and throughput is not straightforward.A factor that introduces additional complexity to this problem is the vulnerability of mmWave links, and their susceptibility to unfavorable weather conditions.One of the main challenges in the area of mmWave backhauling is the design of efficient link scheduling algorithms that can adapt to the rapidly changing network conditions and support the stringent performance demands of mmWave backhaul networks with reasonable computational complexity.Optimal centralised control in such dense environments with strict performance requirements is an NP-Hard problem.Therefore, it is necessary to design and implement approximation algorithms that can be solved in polynomial time, but that also achieve as close to optimal performance as possible.
However, the network designer has the ability to influence the quality of routing in the backhaul network by designing packet forwarding policies that lead to high quality outcomes by simulating APs employing selfish behaviors.More specifically, previous theory [7] suggests that, even in the absence of selfish users who define packet routes, settings with convex cost functions can benefit from schemes that perform resource sharing in a manner that aims to maximize the individual utility, rather than a global objective.This is precisely the problem we address in this work, in the context of backhaul networks: We design a routing algorithm and packet forwarding policy that is robust against selfish behavior and addresses standard technical challenges of mmWave backhaul networks.More specifically, part of our experimental results shows that the solution reached by simulating decisions by selfish agents outperforms a benchmark local search algorithm that uses the actual global objective function.

A. Review of the Relevant Literature
In this section we provide an overview of the bibliography in the areas of interest of this paper.More specifically, we start by reviewing works that address the problems of routing and resource allocation in mmWave backhaul networks [8], [9], [10], [11], [12], [13], [14], [15].Their aim is to optimize the performance in terms of delay, throughput and energy efficiency.The majority of these proposals provide formulations that result in prohibitively high complexity.Therefore sub-optimal, but computationally efficient, solutions are also described.We continue with an overview of proposals in the area of routing in wireless mesh networks with the presence of selfish nodes [16], [17], [18], which allow us to appreciate the challenges, as well as the benefits of selfish user behavior modelling in such networks.Finally, we provide a brief overview of traditional schemes that address the problems of scheduling and congestion control in multihop wireless networks [19], [20], enabling us to gain insights into the algorithmic approaches considered.
In the recent bibliography, a number of proposals that aim at minimizing the end-to-end delay and optimizing the throughput in mmWave backhaul networks can be found [8], [9], [10], [11], [12], [13].The authors in [8] propose a scheduler for maximum throughput.An edge-coloring based approximation algorithm is also proposed that achieves close to optimal performance with lower complexity.The work is extended in [9], considering no-interference and interference based scenarios.Approximation algorithms with performance bounds that can reach 80% of the optimum are also described.In [10], the problem of joint routing and resource allocation in a picocellular network is formulated.Its aim is to maximize the minimum backhaul throughput to each node, under interference and resource constraints.A lower complexity formulation that takes advantage of the localized nature of interference is also presented.In [11], a semidistributed learning algorithm is proposed, aiming to minimize the end-to-end latency and improve robustness of the backhaul network.Route selection and scheduling is modelled as a Markov Decision Process (MDP).The model is solved using reinforcement learning and improves performance in terms of latency and throughput.
With the aim of end-to-end delay minimization, link scheduling algorithms are presented in [12].Maximal collections of links that are allowed to be concurrently active are computed and then repeated in a cyclic manner.The scheduling and routing decisions are centrally made offline and signaled to the mesh backhaul nodes.The selection of the backhaul paths is based on criteria such as the number of hops, the resilience and load of the gateway nodes.The authors in [13] propose a routing algorithm that performs loadbalancing in mmWave backhaul architectures with multiple base stations (BSs).Their aim is to prevent large amounts of traffic being accumulated at specific BSs.This is achieved by the minimization of a load balancing factor, which represents the ratio of load difference between the BS with maximum traffic load and the one with minimum traffic load.
Apart from the optimization of the end-to-end delay and the throughput, there are works that aim at the minimization of energy consumption [14], [15].A QoS-oriented joint optimization algorithm of concurrent scheduling and power control is described in [14].Firstly, the problem is formulated as a mixed integer non-linear programming (MINLP) problem, where the energy consumption is reduced and the number of the successfully transmitted flows is enhanced.A heuristic mechanism is also proposed.The proposed solution outperforms state of the art solutions under various traffic modes and system parameters.Novel algorithms to solve the joint problem of energy-efficient user association, backhaul traffic routing and BS/backhaul link on/off switching are described in [15].An optimization model is formulated, employing a realistic power model under robust capacity formulation, path conservation and on/off switching constraints.A heuristic method is also proposed that finds solutions that achieve 83% optimality in terms of energy efficiency at 200k times lower execution times.
As it can be seen from the above, a large number of proposals focus on the problems of link scheduling and route selection in mmWave backhaul networks.However, to the best of the authors' knowledge, there are no works in this area modelling the user behavior as being selfish and aiming to address the respective problems from a game theoretic point of view.Game-theoretic approaches in the presence of selfish nodes in multihop wireless networks in general are presented in [16], [17], [18].More specifically, two non-cooperative games that play in sequence for radio resource allocation in multi-channel multi-radio wireless mesh networks are proposed in [16].The first game assigns channels to the radio interfaces of each backhaul node, while the second distributes the resulting radio-channel pairs to the links connecting the different nodes.The proposed games are shown to always reach a Nash Equilibrium (NE), guaranteeing network connectivity and achieving minimal co-channel interference of each individual radio.The authors in [17] study fairness in multihop wireless backhaul networks in the presence of selfish backhaul nodes.An incentive-based mechanism, according to which the optimal strategy for each node is to forward transit data for other nodes, is proposed.This results in the elimination of the topology related unfairness in the backhaul network, i.e., end-to-end throughput obtained by nodes decreasing with the number of hops from the gateway node.The mechanism achieves improved results in terms of throughput and fairness even when there are idle backhaul nodes.Finally, the authors in [18] propose an adaptive cross-layer routing scheme, which selects the most reliable path in terms of packet error ratio.The authors consider the concept of evolutionary game theory, i.e., players having the ability to empirically adapt their routing strategies during the game.The proposed routing scheme is evolutionary stable, while the speed of convergence increases with the amount of information provided by the physical layer to the routing layer.
Our work differentiates from these prior game theory works as follows.In contrast to [16], we study the route selection aspect of the problem, as opposed to channel allocation, and in contrast to [17] we don't focus on the fairness aspect of the problem, but rather on the efficiency and on optimizing the objective of the total delay suffered by the packets in the network.The work in [18] is in fact the one most closely related to ours as the authors study game-theoretic route selection and equilibrium convergence properties.However, in contrast to their focus on convergence to equilibrium, we are more focused on the quality of the induced solution.We prove theoretical results and present experimental evaluations on this aspect, something not present in [18].Another differentiating factor of our work from this prior literature, is that we are the first to study the impact of local packet forwarding policies on the induced routing solution.All previous works are implemented on the basis of the First-In-First-Out (FIFO) packet forwarding.Following the review of the above game theoretic approaches for multihop wireless networks we can also see that, although the proposed systems are evaluated in terms of typical performance metrics, the effect of the selfishness of the users on the overall system performance is not assessed.Game theory offers concepts, which focus on the quality of the induced equilibrium solution for the system, and not for individual players.We leverage these concepts in our work and use local policies that induce solutions that are good in a global sense for the system.
We conclude the literature review with a brief overview of traditional scheduling schemes in multihop wireless networks.The authors in [19] introduce distributed scheduling and asynchronous congestion control algorithms that do not allow nodes to transmit and receive at the same time.Congestion control is performed by exchanging congestion price and packet arrival information asynchronously.Scheduling is performed in the Medium Access Control (MAC) layer through an algorithm that selects maximal matchings in a distributed manner.A Time Division Multiple Access (TDMA) MAC scheduling scheme with the aim to minimize the total scheduling delay is proposed in [20].Firstly, an optimization problem is formulated in order to optimize the order of link transmissions in a conflict-free schedule.A sub-optimal scheme, which calculates the minimum number of slots required to schedule all links and then makes sure that the allocated rates can fit in the frame duration, is also introduced.

B. Contributions and Organization of the Paper
Motivated by the review of the relevant literature, in this paper we propose a routing and link scheduling algorithm that is robust to selfish behavior and aims to minimize the total delay in mmWave backhaul networks.More specifically, the main contributions of this paper with respect to the reviewed literature are summarized as follows: 1) The behavior of APs is modelled as selfish, i.e., aiming to always maximize their individual utility, which may contradict the social objective function set by the system designer.Such a system gracefully handles settings where APs route packet traffic to optimize their individual costs (e.g., packet latency).This is motivated by previous theory suggesting that simulating selfish user behavior and performing resource sharing with the aim to maximize the individual, rather than the global utility, results in improved performance.2) Our system utilizes the Shapley value, an important concept from the fairness and economics literature, which dictates how a joint cost (such as total packet delay on a router) is distributed among individuals.The Shapley value is a method that distributes an aggregate cost (e.g., aggregate packet delay in a queue) to a set of entities that contribute differently to it (e.g., by transmitting with different rates).Specifically, each entity suffers a cost equal to the expected increase that it causes to the aggregate cost over a uniformly random order of appearance in the system (see Definition 3 for the details).The Shapley value is known to possess certain desirable properties, such as fairness and improved system performance in terms of the total user utility [21], [22].In this paper, we show through theoretical analyses and extensive simulations that the Shapley value induces solutions with low latency in our setting as well.3) Routing decisions are modelled following the best response dynamics approach, according to which paths are allocated to APs in an iterative manner, until all of them use the minimum cost path and the network reaches a steady state with no AP wanting to use a different path.The benefit of using this approach comes from the fact that, when used with the Shapley value scheduling policy, it can guarantee that a NE will be reached [23].4) The performance of the proposed schedule is evaluated in terms of the price of anarchy (POA), which represents the inefficiency of a scheduling policy when users are allowed to adapt their rates in a selfish manner and reach an equilibrium.Showing that a system has a low POA exhibits that the solutions at hand properly handle selfish behavior and successfully contain possible inefficiencies caused by it.We prove that the POA of our system is bounded in a setting with M/M/1 delays on routers and exhibit in realistic simulations that it also performs well in practice.5) Measuring the POA requires computation of an optimal solution to compare against.In our setting computing an optimal solution is an NP-Hard problem.Moreover, Authorized licensed use limited to the terms of the applicable license agreement with IEEE.Restrictions apply.
running optimization heuristics to recover it even in typical instances proves to be extremely time consuming.Hence, we compare against a stronger baseline, i.e., against a lower bound to the optimal solution computed by a linear programming relaxation of the problem.This paper is an extension of the work of [1].This paper extends the work in [1] as follows: • A detailed interference model is considered, taking into account the effect of interference between neighboring APs in the link capacity.This allows for the consideration of a wider range of scenarios, in terms of mmWave antenna beam widths and directionality capabilities, compared to the more restrictive assumption of very narrow beam width for the practical elimination of interference, which was considered in [1].• The proposed system considers the case of links breaking, e.g., due to unfavorable weather conditions, a well-known vulnerability of mmWave networks, and responds accordingly.On the other hand, the effect of link interruptions was not studied or evaluated in [1].• This paper employs the edge activation solution presented in [12], rather than the approach of [1] that, at each moment in time, randomly picks the matching to activate among a predefined set of maximal matchings.• A discussion on the scalability of the proposed solution is made.• The performance evaluation of the proposed solution is further enhanced by considering an enhanced interference model and by providing scenarios that employ increasing numbers of users per small cell AP, as well as different patterns of interruptions in link connectivity.• The performance of the proposed solution is also compared against the work presented in [12].In fact, this particular baseline corresponds to the FIFO baseline in our setting.Our algorithms are defined by means of the packet forwarding policies they use on the links of the network and the work in [12] uses the FIFO policy.This paper is organized as follows.Section II introduces the system model.Section III provides a theoretical analysis of the proposed algorithm, which is then described in Section IV.The performance of the proposed algorithm is evaluated through extensive simulations in Section V. Finally, Section VI provides concluding remarks and discusses on plans for future work.

II. THE MODEL
The system model consists of an area covered by a number of small cell APs, each one serving a variable number of UE devices.The small cell APs are connected to each other via point-to-point mmWave backhaul links.Each AP only has one mmWave backhaul radio.The backhaul traffic is forwarded to/from the core network via a macrocell BS, which serves as a gateway node providing wired backhaul connection to the communication infrastructure, see Fig. 1.For the remainder of this paper we will consider the case of the gateway node being the source of all traffic, which is directed to the UE devices associated with each small cell.The different configuration tasks required for the operation of the proposed algorithm, such as the computation of maximal matchings, and initial calculation of the different routing paths to and from the gateway macrocell BS, are performed by a centralized controlling entity that can be either collocated with the macrocell BS, or can reside even deeper in the network.
We consider the constraint that a backhaul node can communicate to only one direction, either uplink or downlink (i.e., half-duplex), and with only one neighboring node, at a time.This can be the result of typical constraints in mmWave antenna designs, e.g., analog beamforming and limited RF units per backhaul node [24].Using highly directional beamforming antennae of the mmWave APs, switching from one neighbor to another is achieved using rapid beam steering.
In order to model the effect of interference between neighboring APs we consider the model of [25], according to which the received power from the transmitting node t e to the receiving node r e of link e is formulated as follows: k is a factor proportional to ( λ 4π ) 2 , where λ is the wavelength.P t is the transmission power of the transmitting node, G t (t e , r e ) is the transmitted antenna gain in the direction from the transmitter t e to the receiver r e , while G r (t e , r e ) is the received antenna gain in the direction from t e to r e .d te re is the distance between the transmitting and receiving nodes and n is the path loss exponent.The received interference from the transmitting node t l of link l to the receiving node r e of link e is formulated as follows: p is the multi-user interference (MUI) factor, which is related to the cross-correlation of signals from different links [26].
The gain of the directional antennae in units of dB is formulated as follows: θ is an angle within the range [0 Authorized licensed use limited to the terms of the applicable license agreement with IEEE.Restrictions apply.
the angle of the half-power beamwidth (HPBW).θ ml is the main lobe width in units of degrees and is formulated as θ ml = 2.6 × θ −3dB .G sl is the sidelobe gain, which is formulated as G sl = 0.4111 × ln(θ −3dB ) − 10.579.Based on the above, the capacity C e of link e can be calculated with the use of the Shannon's channel capacity as follows: W is the channel bandwidth and N 0 is the onesided power spectral density of white Gaussian noise.The access and backhaul networks are considered to be operating in different frequency bands, thus not interfering with each other, while Radio Resource Management (RRM) at the access network is out of the scope of this paper.Time Division Duplexing (TDD) is considered for the separation of the uplink and downlink directions.The reason for this are the numerous advantages of this duplexing scheme with regards to spectrum efficiency, cost of the required hardware equipment, and channel reciprocity.
For our theoretical analysis we model the network as a directed graph G = (V, E).The edges represent the backhaul links in the network between the different APs.Each player in our model corresponds to an AP that receives traffic from the gateway.Each edge in the graph corresponds to a backhaul link between APs and is associated with a capacity c e .The delay experienced by packets passing through it is given as the M/M/1 delay with capacity c e , i.e., 1/(c e − f e ) seconds, when the arrival rate of packets at the link is f e per second.We refer to the sequence of packets that are demanded by an AP and arrive at a given rate as a f low of packets.The aggregate delay of a total of X packets is then X /(c e − f e ) and if, for the sake of simplicity, we focus on the packets arriving over 1 second, we get the following expression for the aggregate delay, which we use for the remainder of the section as the canonical delay function: Each player in the network, t i , wishes to have a packet flow of rate f i routed to it from the gateway s.We write N for the set of players.We consider a setting where a packet flow of an AP will be routed on a single path.The strategy of each player is then to select the s-t i path through which her flow will be routed.A set of player paths P is a solution of the game.Then f (P) is the induced packet flow in the network, where f e (P ) = i:e∈P i f i .Depending on the packet forwarding policy, e.g., FIFO, the aggregate delay experienced by a player at any given link varies.Let d e i (P ) denote the aggregate delay suffered by player i on edge e in solution P. The goal of each player is to select the path that minimizes her total cost d i (P ) = e∈E d e i (P ).Note that this optimal path is typically dependent on the congestion on the links, which in turn depends on the decisions of other players.We will write D(P ) = e∈E d e (f e (P )) = i∈N d i (P ) for the total delay.
To summarize, the players in our game are the APs, each player's strategy set is the set of paths from the gateway to the corresponding AP, and the selfish objective of each player is to minimize the aggregate delay of her packets.We treat the game as a full information game.We next define our solution concept and our performance metric.
Definition 1 (Nash Equilibrium):A solution is a Nash Equilibrium (NE) if no player can improve her cost by deviating to a different path.Specifically, let P be a solution and let P be the solution we get by letting player i deviate to a different path.Then P is a NE if it is always the case that d i (P ) ≤ d i (P ).
Definition 2 (Price of Anarchy):The POA of a network is the ratio of the total delay in the worst equilibrium P, over the minimum total delay among all solutions.More formally, let P be the set of all solutions and let P NE be the set of NE.Then POA = max P ∈P NE D(P ) min P ∈P D(P ) .
We now discuss specific examples of packet scheduling policies, which determine the per player delays d e i (P ).In [27] it is suggested that when the FIFO policy is used, the delays suffered by players routing flow through a specific edge are proportional to their flows.So, if f e is the flow through link e, and f i is the flow of a player i that is part of it, then the delay of player i on e is d e (f e )f i /f e .Policies that order the players and assign strict priorities to them result in delay d e (f >i e + f i ) − d e (f >i e ) for player i, where f >i e denotes the total flow of players with priority higher than i on edge e.This follows by the fact that players with lower priority do not have any impact on the aggregate delay of players with higher priority.Hence, the aggregate delay of players up to and including i is d e (f >i e + f i ) (i.e., as if they are the only ones on the link) and the aggregate delay of players up to and excluding i is d e (f >i e ).It then follows that i is experiencing the difference between these two aggregate delays.Randomizations over such policies result in delays corresponding to weighted Shapley values [21]: Definition 3: A weighted Shapley value is defined by a vector γ of sampling weights, one per player.Consider a given link and the set of players S who route through it.We construct a random ordering π of the players as follows: The player who goes last is picked with probability proportional to their sampling weights, i.e., i has probability γ i / j ∈S γ j to go last.The process is repeated with the penultimate player being selected among the remaining ones in the same fashion, etc. Suppose now the players are placed on the link one by one according to π.The weighted Shapley value of i is the expected increase she causes to the aggregate delay at the time of her placement, over all random orderings π.
A slight generalization of weighted Shapley values, where there are strict orderings between subsets of players, have been shown in [28] to be the unique class of functions that always induce games with a NE.In this paper we consider a special case among these policies, the (unweighted) Shapley value, which uniformly randomizes among orderings of the players (i.e., all sampling weights are equal to 1).From an implementation point of view, this corresponds to uniformly sampling a player from the queue at each step and forwarding Authorized licensed use limited to the terms of the applicable license agreement with IEEE.Restrictions apply.
her next available packet.The expression that gives the mean delay of a player in this setting is where F >i e is a random variable equal to the flow of players that come before i in a uniformly random ordering of the players.In [22] it is shown that the Shapley value is the method that minimizes the POA among all methods that always induce games with NE in a general model that includes our framework.This suggests that this player sampling process should minimize the POA in our setting.

III. THEORETICAL ANALYSIS
In this section, we derive the POA in the backhaul routing game with Shapley values and M/M/1 queueing delays.Our analysis will follow the smoothness framework of [29].We begin with the following proposition, which we use in the proof of Theorem 1.
Proposition 1: Any λ ≥ 0 and μ ∈ (0, 1) that satisfy yield a λ/(1 − μ) upper bound on the POA.The proof of the proposition is included in the appendix.With a closer look on (7), we observe that setting c = 1 is without loss of generality and corresponds to merely changing the units of measurement.We may then rewrite the expression in the proposition to get First we observe that under worst case conditions (which, as we explain later in the paragraph, are unrealistic) the POA can be large.We can see that as follows: By setting x = y = 1/2 − , for very small, we get λ + μ ≥ (1 + 1/(2 ))/2.This implies that, as → 0, λ must be arbitrarily large for (8) to hold, and hence, the bound on the POA is infinite.Moreover, [30] proves that in a setting more general than ours, the smoothness bound is tight and a matching lower bound exists.This means that not only our upper bound is infinite, but that the POA itself is infinite and, hence, a NE can be arbitrarily bad when compared against the optimal solution.In light of this negative result, it is interesting to consider more restricted and practically relevant cases, for which the POA bounds are better.It is tempting to consider, for instance, specific network topologies for which the POA is smaller.However, the work in [31] shows that worst case instances for the POA tend to arise in the simplest networks of parallel links.Still, we are able to give improved bounds under other assumptions of practical interest, such as when the packet demand is not large enough to create unstable queues for the network links.Specifically, we now prove that in case the link capacities are even slightly underutilized, then the POA is bounded by a constant.Moreover, as the ratio between the maximum utilization and the capacities of links becomes smaller, so does the POA.The proof of the following theorem is included in the appendix.
Theorem 1: Assuming that the total flows that can be routed through any given link are equal to at most a β < 1 fraction of its capacity, the POA of Shapley routing games with M/M/1 queues is bounded by This theorem proves a second positive theoretical result with regard to the use of the Shapley value in our setting: not only does it guarantee the existence of a stable solution, but it also gives a constant upper bound on the inefficiency of the NE under our realistic restricting assumption.Given this fact, in the upcoming sections we design a system that utilizes the Shapley value and we conduct extensive evaluations to understand how our result carries over to practical settings.

IV. THE PROPOSED ALGORITHM
Motivated by the positive results of our theoretical analysis on the worst case POA of Shapley sharing in realistic scenarios, we proceed to design an algorithm and a packet forwarding policy that implements these costs.The proposed algorithm can be interpreted both as a mechanism to handle selfish behaviours and lead them to favourable outcomes, i.e., low delay, as well as an algorithm to compute an efficient routing solution by means of utilizing tools from game theory.In this section we first give a description of the specifics of our routing algorithm and forwarding policy, we then present our method for computing a benchmark that lower bounds the optimal solution for comparison purposes, and finally we describe how we practically perform node activation and respond to link breaking.

A. Modelling of Routing Decisions
As explained in Section III, each AP i will route her flow on a path that will minimize her total delay, which implies she will pick a shortest path on the graph where the edge costs are defined as the delay suffered by i's packets given the routing decisions of all other APs.To model AP routing decisions, we follow the sequence of decisions that is defined by best response dynamics: we start APs in an arbitrary configuration and we iterate over all APs checking if the AP is in best response, i.e., if she uses the minimum cost path.To do that, we compute the M/M/1 delay that the AP would suffer on each edge and then perform a Dijkstra shortest path computation using these numbers as the edge costs.If the AP is not using her best response path, we update to the path we computed for her.We keep iterating until the network reaches a steady state, which corresponds to a NE of the game.In such a steady state, no AP is willing to move to a different path.The details are given in Algorithm 1. Function ShapleyShare(i , e, S e ) in Algorithm 1 outputs the Shapley value of player i when she uses edge e together with APs S e .Recall that the Shapley value is given by the expression in (6) and depends on the player set and the link capacity, through the delay function (5).Function Dijkstra(G, l , t i ) gives the shortest path to node t i in graph G with edge costs given by vector l.It  with a NE for certain scheduling policies such as the Shapley value, but not necessarily for others such as FIFO [32].

B. Calculation of the Optimal Solution
We note that the calculation of the POA requires the knowledge of the value of the optimal solution to the routing problem.However, the problem of finding this optimal solution is NP-Hard since the decision version of the problem (i.e., simply finding a set of routes that respect the edge capacities) is the single source unsplittable flow problem [33].To circumvent this, we calculate a lower bound on the value of the optimal solution by solving a relaxed version of the problem, in which APs can fractionally split their flows across different paths.To confirm that this is a relaxation, note that any solution of the standard unsplittable flow version is a solution to this splittable version as well.This problem can be solved by a simple convex program, which we present below.We then use this lower bound as the value of the optimal solution in the definition of the POA.min.
Here d e (f e ) is the (convex) delay function on edge e, is an arbitrarily small parameter, and the variables are f e , for e ∈ E , and f i e , for i ∈ N , e ∈ E .• Constraint C 1 requires the total size of all flows of a AP exiting a node to equal the total size of the flows of the AP that have entered the node.This constraint holds for all the inner nodes of a flow, i.e., neither the source nor the destination.• Constraint C 2 requires the sum of all flow sizes that exit the source node of AP i to equal the total size of the flow of AP i. • Similarly, according to constraint C 3 , the sum of all flows that are inserted to the destination node of AP i must be equal to the total size of the flow of that AP. • Constraint C 4 dictates that the total size of flow f through edge e is the sum of the flows of all APs over that edge.• According to constraint C 5 , the size of the flow f through edge e should not exceed the capacity of the edge.• Constraint C 6 requires all flows to have a non-negative size.Note that any solution where APs do not split their flows across paths can be encoded as a solution of the convex program above, hence, the optimal value obtained is a lower bound on the actual optimal and any impact on the POA values we report is on the pessimistic side.

C. Decision on Edge Activation and Response to Link Breaking
We finally note that in a wireless setting the selection of paths is not the only factor that determines which edges will be transmitting at any given time.Instead, we also need to consider RF constraints.One might attempt to use sophisticated techniques for edge activation, however we consider this aspect to be outside the scope of our paper.In our simulations we use the edge activation approach presented in [12], that decides on the order of activation among a set of maximal matchings.Indicative maximal matchings are shown in Fig. 2.
In order to respond to cases of link breaking, e.g., due to unfavorable weather conditions, the proposed system takes the following actions throughout the duration of a link being broken.
1) The broken link capacity is consider as zero, while its routing cost is considered as infinite.
2) The routing costs of all the remaining links are recalculated based on the respective packet forwarding policy and the Dijkstra algorithm is used to calculate updated shortest paths.
Authorized licensed use limited to the terms of the applicable license agreement with IEEE.Restrictions apply.3) When a maximal matching that contains the broken link is selected, this link is ignored and only the remaining, non-broken, links are activated.

D. Scalability and Complexity Analysis
In this work we view and evaluate our system from two perspectives: i) as a system that handles selfish behavior by the APs and reduces the POA, and ii) as a system that uses principles from game theory to produce and implement an algorithm that schedules and routes packets in a network.In the former sense, our system can scale to arbitrary network sizes with low complexity.The packet forwarding policy is distributed and only depends on the local state on an edge in the network.Implementing the Shapley value cost sharing method on a link simply amounts to periodically picking a random ordering of the link's users and forwarding packets using priorities induced by this order.Each user's expected cost under this implementation, which is clearly efficient, matches the definition of the Shapley value in Section III.In the latter sense, and given that we use best response dynamics, we observe from related literature that finding the equilibrium solution can take an exponential number of steps [34]; however special cases can be shown to converge quickly [34].These special cases include networks with a single source and destination, or networks with linear delay functions, or a relaxation of the NE concept to an approximate version where the dynamics only move players who have a cost improvement larger than a given threshold.Of special interest in our model is the result of [29] which leverages [34] to show that best response dynamics quickly converge to a solution with cost very close to the actual equilibrium.The condition for this is that the game satisfies (λ, μ) smoothness, something that we show is true for our model in Proposition 1.

V. PERFORMANCE EVALUATION
To evaluate the performance of the proposed selfish algorithm for mmWave backhaul networks, we use a simulation model built in MATLAB.The performance of the system employing the proposed algorithm is compared against the system proposed in [12] that uses the FIFO queueing policy, as well as a local search optimization version of best response dynamics, where each AP will move to the path that minimizes the overall delay in the network, as opposed to her own delay.The algorithm that uses FIFO best response dynamics is the same as Algorithm 1, with the only difference being that the routing costs are calculated as proportional shares of the aggregate M/M/1 delay, i.e., w ← j ∈Se ∪{i} f j , l e ← f i w w ce −w .We also experiment with a local search optimization version of best response dynamics, where each AP will move to the path that minimizes the overall delay in the network, as opposed to just her own delay.In this Algorithm, the routing costs are calculated as the aggregate M/M/1 delay, i.e., w ← j ∈Se ∪{i} f j , l e ← w ce −w .The system model consists of a number of APs that are connected to each other via point-to-point mmWave links.An increasing number of UE devices is associated with each AP.
The following individual sub-systems constitute the system model: The channel model simulates the physical layer channel conditions of the mmWave backhaul links by providing path loss, shadowing, and short-term fading and allows for the calculation of the average and instantaneous capacities of each backhaul link.Path loss between APs is 10n log 10 (d ), where n = 2 is the path loss exponent, while d is the distance between the APs in m.The shadowing is log-normal with a standard deviation σ = 7.8 dB [35].The link budget also considers transmitter and receiver antenna gains, the thermal noise power spectral density (PSD), and the system bandwidth, whose values are summarized in Table I.The transmitter antenna power is 30 dBm [36].The received power, transmitter and receiver antenna gains and link capacities are calculated based on the interference model described in Section II.
The traffic generator is responsible for producing the data packets of the UE devices associated to each AP.The packet generation of each UE follows the Poisson distribution, with a variable rate λ p in the range of [25,35] packets/s.The packet size is 1250 bytes.The packets generated are enqueued by the respective AP, and forwarded towards the final destination according to the link scheduling algorithm employed by each of the systems under comparison.
The link scheduler is responsible for the proposed backhaul link scheduling algorithm, as well as the [12] (FIFO) and local optimal algorithms, whose performance is under comparison.
The channel bandwidth is 75 MHz, while the spectrum band considered is the E-band (71-76 GHz).The operation of the proposed algorithm is performed in 1 ms slots.The link capacity is shared using Time Division Multiplexing (TDM).The overall area in which the small cell APs are deployed is a square area whose each dimension is equal to 300 m.The Authorized licensed use limited to the terms of the applicable license agreement with IEEE.Restrictions apply.

TABLE I PERFORMANCE EVALUATION PARAMETERS
distance threshold beyond which two APs are considered to be out of each other's range, d th , is 200 m [35].
Three simulation scenarios are considered: 1) A constant number of APs (N AP = 10), each one with an increasing number of associated UE devices.The range of the number of UE devices per AP is [10,30].Each backhaul link will drop at a random time instance within the simulation duration.Each link will remain broken for a random duration that ranges from 0.1 s to 0.5 s.
2) The number of APs increases, while the number of UE devices per AP is equal to 15.The link breaking pattern is similar to that of the first scenario.3) 10 APs and 15 UE devices per AP.Each link remains broken for an increasing duration that ranges from [0.1 s, 0.5 s] to [2 s, 2.5 s].The total simulation time is 100 s.In order to achieve statistical accuracy, the results have been averaged over 55 simulation runs.In each case, the 95% Confidence Intervals (CI) are depicted in the form of error bars.The systems' performance is evaluated in terms of average and worst delay, as well as of the POA in the case of average delay.
The performance evaluation parameters are summarized in Table I.Fig. 3 depicts the average end-to-end delay with respect to (a) an increasing number of UE devices per AP, (b) an increasing number of APs, and (c) an increasing link breaking duration.The average delay is defined as is the time that packet k was sent, t arrived k is the time it was received by its destination, and K is the set of all packets that have successfully reached their destination.As seen in Fig. 3(a), the average delay of the system employing the proposed Shapley based solution is considerably lower than that of the [12] (FIFO) and local optimal systems.Specifically, the proposed system reduces the average delay by a factor of 1.58.The fact that the local search global optimization version performs worse than the Shapley value, under which APs optimize their own self-interests, might seem surprising at first.However, this observation is also supported by previous theory [7], where it is shown that when the cost functions of a resource selection game are convex, distributing the generated cost among the participants using the Shapley value leads to the best POA.This holds among all cost-sharing methods possessing the following properties: budget-balance (cost shares exactly covering the joint cost), stability (ability to reach a NE), and locality (cost-sharing on a resource to not depend on the system's state beyond that resource).Note that any packet forwarding policy by definition shares the Authorized licensed use limited to the terms of the applicable license agreement with IEEE.Restrictions apply.
aggregate delay among the users in a budget-balanced and local manner, while stability is a property that we desire and is given by the Shapley value.Our theoretical model in Section II is the special case of the one studied in [7] that fits the setting of mmWave networks.In this regard, our experimental results validate the theoretical optimality of the Shapley value as a local cost distribution policy (which in our case is instantiated as a packet queueing policy).Moreover, as shown in Fig. 3(b), the average delay of all three systems follows an increasing course with the increase of the number of the APs.This is a result of the congestion in the network as new APs come with their own sets of users.The system employing the proposed algorithm, achieves a considerably lower average delay, by a factor of 1.64 at most, compared to the [12] (FIFO) and local optimal systems.The effect of the increasing duration of connectivity interruptions due to broken links on the average delay is shown in Fig. 3(c).The average delay of the proposed system remains considerably lower (by a factor of 6.43 at most) than the average delay of the [12] (FIFO) and local optimal systems for the vast majority of the cases.As it can be seen, the [12] (FIFO) and local optimal systems have a comparable performance in terms of average delay.This is actually expected once we inspect the expressions that give the edge costs used in the routing algorithms employed by these two systems.Consider a user with rate r i and a link e with capacity c e and total flow f e .The edge cost used for that user with FIFO is given by: l FIFO i,e = r i ce −fe , whereas the edge cost used in the local optimal system is given by: l local i,e = fe ce −fe − fe −r i ce −(fe −r i ) .Simple calculations show that the relative difference between the two is: , which is a very small number when c e is much larger than f e , which is the case in our simulations and in realistic systems.This implies routing decisions between the two systems are the same most of the time.It has to be noted that, for reasons of computational complexity, and in order to show the performance improvement potential of the proposed Shapley based algorithm under high congestion, we used a relatively small system bandwidth, i.e., 75 MHz.This is the reason why in all systems the average delay shows a considerable increase with the increase of the incoming traffic or the size of the network.
Moreover, by the nature of the problem and depending on the network topology, the average delay of the instance might vary significantly.For this reason we observe wide confidence intervals in the performance evaluations.In order to make sure that the proposed solution clearly outperforms the baselines, we also present indicative examples of the distributions of average delays induced by the different algorithms by means of their empirical cumulative distribution functions (CDFs).Therefore, in Fig. 4 the CDF plots exhibit that the proposed algorithm dominates the baselines in the sense that for (almost) every average delay X, the fraction of instances with average delay at most X is higher for the system that employs the proposed algorithm, compared with the [12] (FIFO) and local optimal systems.Fig. 5 depicts the worst end-to-end delay with respect to (a) an increasing number of UE devices per AP, (b) the number of APs and (c) an increasing link breaking duration.The worst delay is defined as max{t arrived k − t sent k }, ∀k ∈ K .Similarly to the case of the average delay, the worst delay of the system employing the proposed solution is significantly lower compared to the [12] (FIFO) and local optimal systems in all three cases.Specifically, the proposed system, reduces the worst delay by a factor of 2.78, see Fig. 5(a), 1.46, see Fig. 5(b) and 1.69, see Fig. 5(c), as a result of the Shapley value's capability to achieve improved system performance in terms of the total user utility.Similarly to the case of the average delay, this figure also depicts the similar performance of the [12] (FIFO) and local optimal systems, as a result of the similarity between the edge costs used for routing in the two systems established earlier.
Fig. 6 depicts the POA of the average delay with respect to (a) an increasing number of UE devices per AP, (b) an increasing number of APs and (c) an increasing link breaking duration.The POA in this case is defined as the ratio of the average delay over the optimal delay, i.e., the minimum delay among all flows.Therefore, a high value of POA indicates very poor performance in terms of average delay.The optimal delay is approximated by calculating a lower bound on its value by solving a relaxed version of the problem, in which users can fractionally split their flows across different paths.The POA of the three systems under comparison is normalised by the POA of the proposed Shapley based system to exhibit the relative improvement induced by our method without having larger delay instances dominate the metrics.Under this normalization, our algorithm will have a normalized POA equal to 1 and the worse-performing baselines will have a larger normalized POA.As shown, the system employing the proposed selfish link scheduling algorithm has a lower POA, by approximately a factor of 27.79, see Fig. 6(a), 3.32, see Fig. 6(b), and 32.06, see Fig. 6(c), compared to the [12] (FIFO) and local optimal systems.The normalized POA of the [12] (FIFO) and local optimal systems follows Authorized licensed use limited to the terms of the applicable license agreement with IEEE.Restrictions apply.a declining course with the increase in the number of APs and the average duration of link failures, as a result of the fact that the increased congestion conditions also result in an increase of the POA of the proposed system, leading to a reduction of the gap between the POA of the systems under comparison.Therefore, it is shown that the proposed algorithm has the ability to handle selfish behavior, and its consequent inefficiencies, in an appropriate manner, resulting in improved performance in terms of average delay compared to the [12] (FIFO) and local optimal systems.
As discussed above, calculating the optimal delay is an NPhard problem, therefore we introduced the convex optimization program in Section IV-B to calculate a lower bound on the value of the optimal solution.However, for a very small backhaul network instance we also conducted a number of experiments to calculate the actual optimal delay (and not its lower bound) and provide the respective POA for the three systems under comparison.Therefore, Fig. 7 depicts the (a) average delay, (b) worst delay and (c) normalised actual POA versus the number of UE devices per small cell AP, in a backhaul network consisting of 5 APs.The results have been averaged over 15 simulation runs.Due to the small size of the backhaul network, the average and worst delays are quite small for all three systems, but it has to be noted that the system employing the proposed Shapley based link scheduling algorithm still outperforms the [12] (FIFO) and local optimal systems in terms of average and worst delay as well as POA.
All the experiments conducted above consider static UEs.Since our proposed model considers the APs as the players, the movement of a UE is only expected to impact the overall model through the changes that it induces to the packet rate of the corresponding APs, i.e., the one that the user joins and the one that the user abandons.Once the new packet rates have been established, the proposed algorithm runs as previously and reaches a steady state.In this regard, the mobility of users in the proposed model produces slightly modified versions of the original instances.In our experimental framework, the packet generation rate λ p is drawn uniformly in [25,35] packets/s per UE and then multiplied by the number of UE devices.Introducing mobility in effect only modifies this distribution.Under simple mobility models the distribution will remain centered at the same mean but will have a (slightly) larger window of extreme values, resulting in a similar qualitative message of the results.To demonstrate this, we conducted further experiments under the following assumptions: the backhaul network consists of N AP = 10 APs, a range of [10,30] UE devices per AP, and a packet generation rate per UE λ p ranging in [20,40] packets/s.The results have been averaged over 39 simulation runs.Fig. 8 depicts the respective (a) average delay, (b) worst delay and (c) normalised POA versus the number of UE devices per small cell AP where we note the performance improvement of the Shapley based algorithm that manages to handle the larger variation of the packet generation rate in a more efficient manner.

VI. CONCLUSION
In this paper we presented and evaluated the performance of a routing and packet forwarding algorithm for mmWave backhaul networks.The proposed algorithm is robust to selfish behavior and its aim is to minimize the total delay in the backhaul network.We modelled APs as being selfish, i.e., selecting their routes with the aim to minimize their individual delay, rather than considering the social objective of the network as a whole.In our setting, we used unweighted Shapley values in order to sample the players, i.e., the APs that route packets between two endpoints in the backhaul network, an approach which is shown to always result in games with a NE.We also show that the use of Shapley value minimizes the POA, which represents the inefficiency of a scheduling policy when users are allowed to have selfish behavior.According to the proposed routing algorithm and packet forwarding policy, routing decisions are modelled following the best response dynamics approach, i.e., performing iterations until all users are in their minimum cost path.We use best response dynamics with three scheduling policies, Shapley value, FIFO, and local search optimization.According to simulation results, it is shown that the Shapley based scheduling policy results in the minimization of the overall network delay and the POA, compared to the systems that employ the FIFO and local search optimization policies.
Our plans for future work include methods for the enhancement of the proposed solution's scalability to large network sizes, and approaches to determine the optimal backhaul network size for the minimization of the end-to-end delay.Moreover, we plan to extend the proposed solution by lifting the restriction of a single RF chain per AP, offering the AP nodes the possibility to transmit at both directions and to more than one APs at the same time, and the employment of a real-time traffic generation model that will consider specific deadlines per packet.

A. Proof of Proposition 1
Let P be a NE, P * the optimal solution, and P i the solution that we get when only i deviates from her path in P to her path in P * while everyone else stays on their paths from P. The first inequality follows from the NE condition in Definition 1.The second and third inequalities follow by convexity of the Shapley value, both as a function of the player's flow size and the flow size of any other player.The final inequality follows by (7).Rearranging the final inequality, we get the λ/(1 − μ) bound on D(P )/D(P * ), which is the POA.

B. Proof of Theorem 1
We present values for λ and μ that satisfy (8), under the assumption that the corresponding x, y always satisfy x + y ≤ β: λ = 1+2(1−β) 2   2(1−β) 2 , μ = 1 2 .By Proposition 1, it suffices to show that these values satisfy (7) or, equivalently, (8), which we rewrite here, after substituting the values for λ and μ: Authorized licensed use limited to the terms of the applicable license agreement with IEEE.Restrictions apply.
x + y Let β = x + y.Since by assumption β ≤ β and by the fact that the right hand side of ( 9) is increasing as a function of β, it suffices to show: We will consider two cases.First, we assume that x ≥ β /(2− β ).Then we get that 2 x 1−x ≥ β 1−β , and, hence, that (10) holds.For the second case, we assume the opposite, that x < β /(2 − β ).This implies y = β − x > β (1 − β )/(2 − β ), which in turn gives us Again this means that (10) holds, which completes the proof.

Fig. 3 .
Fig. 3. Average delay versus (a) the number of UE devices per small cell AP, (b) the number of APs and (c) the range of breaking duration per link.

Fig. 4 .
Fig. 4. Empirical CDF of the average delay in the case of (a) 10 APs and 30 UE devices per AP and (b) 25 APs and 15 UE devices per AP.

Fig. 5 .
Fig. 5. Worst delay versus (a) the number of UE devices per small cell AP, (b) the number of APs and (c) the range of breaking duration per link.

Fig. 6 .
Fig. 6.Price of Anarchy of the average delay versus (a) the number of UE devices per small cell AP, (b) the number of APs and (c) the range of breaking duration per link.

Fig. 7 .
Fig. 7. (a) Average delay, (b) worst delay and (c) normalised actual POA versus the number of UE devices per small cell AP.

Fig. 8 .
Fig. 8. (a) Average delay, (b) worst delay and (c) normalised POA versus the number of UE devices per small cell AP in the case of wider range of packet generation rate.
Set of Paths P // S e are the users of edge e and P i the current path of user i.
should be noted that best response dynamics are guaranteed to terminate Algorithm 1: Shapley Best Response Dynamics Input: Graph G = (V, E), player set N Output: i then repeat ← true; for e ∈ P i do S e ← S e \{i }; P i ← p; for e ∈ P i do S e ← S e ∪ {i }; return P