Resource Management for Maximizing the Secure Sum Rate in Dense Millimeter-Wave Networks

Due to the densifying of users and the reflections caused by small-scale objects, secure communication is an important research problem in dense millimeter-wave (mmWave) networks. In this paper, we investigate the resource management problem of joint transmission reception point (TRP) selection, power control and beamwidth selection for maximizing the secure sum rate in the dense mmWave network, while considering beamforming training overheads, blockage effect in mmWave communications, and imperfect channel state information (CSI) from a legitimate user equipment (LUE) to the corresponding malicious user equipment (MUE). To handle the problem with low complexity, we first break it down into two subproblems (i.e., TRP selection and joint power control and beamwidth allocation), and propose a two-stage game based decentralized resource management approach to solve them iteratively, where in the first stage we propose a matching game based distributed TRP selection algorithm to solve the first subproblem in which novel utilities for both LUEs and TRPs are designed to handle the high directivity transmission problem, and in the second stage we propose a weekly acyclic game based two-dimensional-strategy concurrent better response algorithm for the second subproblem to deal with the huge strategy space, which has been proved to converge to an invariant two-dimensional-strategy Nash equilibrium (NE). Moreover, to adapt to the three-dimensional optimization variables and the huge strategy space of the considered problem, we also propose a new three-dimensional-strategy iterative weekly acyclic game to solve it, where the three-dimensional optimization variables are optimized alternately during the decision-making process. Finally, extensive simulations are conducted to verify the effectiveness of the proposed schemes.


I. INTRODUCTION
With the popularity of various intelligent devices, conventional communication technologies are hard to meet the demand for the explosive growth of mobile traffic [1]- [5]. Millimeter-wave (mmWave) communication has emerged as one of the key enabling technologies to address this challenge thanks to the huge number of available spectra at mmWave frequencies [6], [7]. Because of its narrow beam and being susceptible to blockage by objects, mmWave communication The associate editor coordinating the review of this manuscript and approving it for publication was Yanjiao Chen . is often considered to be hard to be intercepted and more robust against eavesdropping [8]. However, it is still confronted with security issues due to the following reasons. On the one hand, the eavesdroppers in a dense mmWave network are more likely to reside in the signal beams of legitimate user equipments (LUEs), and thus eavesdroppers can still intercept the confidential messages of LUEs [9]. On the other hand, small-scale objects within the signal beams of LUEs can cause reflections, thus enabling eavesdroppers to receive the signals of LUEs from the outside of the signal beams [10]. Furthermore, the use of highly directional antennas by malicious user equipments (MUEs) can also provide higher degrees of freedom to intercept the signals of LUEs [11].
To enhance the security of wireless communication, the security resource management in wireless networks has attracted great attention from the academic community, and many excellent works have emerged already, such as [12]- [21]. Specifically, in [12]- [15], radio resource allocation for physical layer security in device-to-device (D2D) communications was investigated to improve the system secrecy rate, where the interference between the cellular users and the D2D links is exploited to jam malicious eavesdroppers in [14], [15]. In [16], the optimal resource allocation for traffic offloading via dual-connectivity was studied to guarantee the MUE's traffic offloading security. In [17], the partner selection and incentive mechanism were designed to maximize the secrecy rate for the scenario with multiple eavesdroppers. In [18], multiple relays cooperative communication was proposed to improve the security of the wireless network, where the transmission power allocation and the intercept probability were derived. In [19], a three-stage stackelberg game based joint relay selection and power control scheme was proposed to defend against active eavesdropping attacks in full-duplex networks. All above works mainly aimed at improving the secrecy performance for conventional lowerfrequency (LF) cellular networks and all of them assume that full channel state information (CSI) for all transmission links are available, which is impossibility in practical scenarios. Moreover, most of them focus on the scenario with single eavesdropper, such as [12]- [16], [18], [19]. The resource allocation for improving the security of mmWave communications was investigated in two recent works [20] and [21]. However, [20] and [21] focus only on the scenario with a legitimate transmission link, and also ignored the beam training overheads and the inter-beam inference of mmWave communications, while we concentrate on the scenario with multiple legitimate transmission links and multiple eavesdropping attack links as well as considering the beam training overheads and the inter-beam inference of mmWave communications, which is more practical and more intractable. Hence, the methods in [20] and [21] cannot be applied in our work.
Besides [8]- [11], many efforts have been devoted to study the secrecy performance for mmWave networks recently, such as [22]- [26]. In [22], the secrecy performance for the mmWave network was analyzed systematically, where the locations of eavesdroppers and base stations were modeled as two independent homogeneous poisson point processes by using stochastic geometry theory. In [23], physical layer security in heterogeneous mmWave networks with pilot attack was investigated, where the coverage and secrecy performance was analyzed using stochastic geometry. In [24], a new wireless transmission architecture called switched phasedarray was proposed to enhance the secrecy capacity, where the simulation results revealed that the proposed scheme is a simple and efficient approach to enhance the communication security at mmWave bands. In [25], two techniques for physical layer security were proposed to enhance the security of vehicular mmWave networks, where both techniques make full use of mmWave large antenna arrays and consider mmWave hardware constraints. In [26], the secrecy outage probability and secrecy throughput of non-orthogonal multiple access assisted mmWave unmanned aerial vehicle network were derived using stochastic geometry. In [27], nonorthogonal transmit beam construction schemes were proposed for massive connections over limited radio spectrum to improve the weighted sum of the ergodic rates. In [28] and [29], the secrecy performance enhancement schemes for secure unmanned aerial vehicle (UAV) networks with reconfigurable intelligent surface and for multi-antenna broadcast channels were studied by joint trajectory and passive beamforming design and by cooperative rate-splitting, respectively. However, these works focus mainly on the performance analysis of secure mmWave communications rather than radio resource management. Therefore, the design of resource management scheme for maximizing the secure sum rate in dense mmWave networks is still unknown.
In summary, there are few works on security resource management for mmWave networks. Even though there are many studies on the secure resource management for conventional LF cellular networks, these traditional resource allocation approaches cannot be directly applied to mmWave networks due to the peculiar natures of mmWave communications (e.g., sensitive to blockage, beamforming training and week to penetration). Accordingly, the design of joint TRP selection and resource allocation scheme for maximizing the secure sum rate in dense mmWave networks is an urgent but much more sophisticated problem, which faces several challenges as follows. Firstly, the interference environment in mmWave networks may change dynamically as the change of the higher frequency (HF) transmission reception point (TRP) selected by an LUE due to the high directivity transmissions of mmWave communications, which brings challenges to traditional TRP selection algorithms. Therefore, how to handle the high directivity transmission problem in mmWave communications should be addressed carefully by making use of the distinct natures of mmWave communications, such as high propagation loss and easy blockage. Secondly, blockage effect is one of the distinct characteristics of mmWave communications, which is hard to be predicted but should be considered in the radio resource management of mmWave communications. Thirdly, since accurate monitoring for the eavesdropper's CSI is impossibility in practical scenarios, especially for mmWave communications, imperfect CSI of an LUE to its corresponding MUE should be taken into account to better adapt to the actual network environment. Lastly, the huge action spaces of LUEs in dense mmWave networks along with the constraints imposed on the limited number of beams for each HF TRP pose a great challenge on the complexity and signaling overheads of finding the solution for centralized schemes, and hence efficiently decentralized schemes with lower complexity are urgent desired.
Motivated by these challenges, in this work, we focus on designing efficiently decentralized solutions to the joint TRP selection and resource allocation problem for maximizing the secrecy sum rate in the dense mmWave network. As different from [12]- [19] that focus on the resource management in LF communications and distinct from [20], [21] that concentrate only on a legitimate transmission link and ignored the beam training overheads, we focus on the scenario with multiple legitimate transmission links and multiple eavesdropping attack links while considering the peculiar features of mmWave communications, which is more practical but more complex. Our main contributions are summarized as follows.
• We investigate the joint TRP selection, power control and beamwidth selection problem to maximize the secure sum rate in dense mmWave networks with consideration of the beamforming training overheads, the blockage effect of mmWave communications, and the imperfect CSI of each LUE to its corresponding MUE, which is the first work to maximize the secure sum rate for dense mmWave networks with consideration of various factors.
• To avoid the high complexity, we divide the considered problem into two subproblems, i.e., TRP selection problem and joint power control and beamwidth allocation problem, and propose a two-stage game based decentralized resource management approach to solve them iteratively. In the first stage, we propose a matching game based distributed TRP selection algorithm which can achieve a two-sided exchangestable solution, where novel utilities for both LUEs and TRPs are designed to handle the high directivity transmission problem. In the second stage, we formulate the second subproblem as a weekly acyclic game, and propose a two-dimensional-strategy concurrent better response algorithm to deal with the huge strategy space, which has been proved to converge to an invariant two-dimensional-strategy Nash equilibrium (NE).
• To adapt to the three-dimensional optimization variables and the huge strategy space of the considered problem, we propose a new three-dimensional-strategy iterative weekly acyclic game to solve it, where the three-dimensional optimization variables are optimized alternately during the decision-making process. We also investigate the convergence and the existence of NE for the proposed three-dimensional-strategy iterative weekly acyclic game.
• We provide extensive simulations to show that, 1) the convergence and effectiveness of the proposed algorithms; 2) the secure sum rate decreases with the increasing of Tp/Ts; 3) when Tp/Ts is very small, the secure sum rate decreases monotonously with beamwidth, while when Tp/Ts is large, the secure sum rate increases first and then decreases with the broadening of beamwidth; and 4) only when Tp/Ts is large, it is meaningful to take into account the beam training overheads for improving the secure sum rate. The remainder of this work is arranged as follows. The system model and problem formulation are introduced in Section II. In Section III, we present the details of the proposed two-stage game based decentralized resource allocation approach. In Section IV, we discuss the proposed threedimensional-strategy better response based resource management scheme. In Section V, the complexities of the proposed algorithms are analyzed. In Section VI, we provide the simulation results and analysis for the proposed schemes. The conclusion is drawn in Section VII.

II. SYSTEM MODEL AND PROBLEM FORMULATION A. NETWORK ARCHITECTURE AND NETWORK TOPOLOGY
We consider an uplink heterogeneous cloud radio access network (HCRAN), where an mmWave cell cluster composed of N small cells is covered by a macro cell as shown in Fig. 1. An LF TRP and an HF TRP are located at the center of the macro cell and at the center of each small cell, respectively. U LUEs are randomly distributed in the N small cells. For each LUE u, there exists an MUE m that is in its vicinity and attempts to intercept its information. The sets of HF TRPs, LUEs and MUEs are denoted by N = {1, 2, . . . , N }, U = {1, 2, . . . , U } and M = {1, 2, . . . , M } with M = U , respectively. In the HCRAN architecture, the LF TRP and each HF TRP operate in the band below 6 GHz and in the mmWave band, respectively. All the components in the HCRAN architecture have the similar connections and functions with those in [30]. Therefore, the HCRAN can also gain significant overhead savings in the radio connection or release by decoupling the control-plane (C-plane) and user-plane (U-plane) [31].

B. CHANNEL MODEL, BLOCKAGE MODEL, AND ANTENNA PATTERN
In this work, all HF TRPs are assumed to operate at 28 GHz carrier frequency. According to the mmWave channel model in [32], the channel gain between HF TRP n and LUE u, considering the large-scale fading and small-scale fading, can be given by where |h u,n | 2 denotes the channel gain corresponding to the small-scale fading, d u,n represents the distance between HF TRP n and LUE u in meters, and L(d u,n ) represents the path loss in dB. Due to the short-distance communication feature in mmWave networks, we assume that the channel gain g c u,m between LUE u and MUE m follows the same rule as that in (1). For the blockage model, the model in [33] is considered, which is simple but flexible to capture the coverage trend and the blockage statistics of mmWave networks. Specifically, denote by P los u,n the probability that the communication link between UE u and HF TRP n is line-of-sight (LOS), then it can be defined as where α is the parameter that captures density and size of obstacles, and α increases if the density and the size of obstacles increase. Let ξ u,n be the binary variable that indicates whether the link from UE u to TRP n is an LOS link, i.e., ξ u,n = 1 if it is an LOS link, otherwise ξ u,n = 0. Due to the great path loss of non-line-of-sight (NLOS) transmissions, we only consider the first-order reflected transmissions. Denote by d u,n and d u,n the distance of the LOS path between UE u to TRP n and the distance of its corresponding reflection path respectively, the relative reflection loss of which can be given by [34] u,n = 10αlog where α represents the path loss exponent, and η(θ u,n ) denotes the reflection coefficient which can be expressed as where θ u,n and ω represent the incident angle between UE u to TRP n and the dielectric constant depended on the inherent physical property of reflective material, respectively. Due to its low power consumption and low cost, we focus on analog beamforming, which has been implemented in commercial systems such as IEEE 802.11ad [35]. Then the narrow beams of this work can be generated to improve the security of the network by using the analog beamforming with the aid of multiple-antenna techniques [36], and the beamwidth can be adjusted through changing the number of active antennas [37]. For analytical tractability, the sector based antenna model is adopted to approximate the directional antenna array gain, which has been commonly used in [21], [38], [39] for radio resource management and system performance analysis. Denote by φ t u,n and φ r u,n the beamwidth of the transmitter u and the beamwidth of receiver n, respectively. The transmission gain of transmitter u associated with receiver n can be given by where 0 ≤ z < 1 represents the gain of the side lobe, and z 1 for narrow beams. Similarly, replacing φ t u,n with φ r u,n , we can get the reception gain g r u,n (φ r u,n ) of receiver n associated with transmitter u.

C. PROBLEM FORMULATION
Following [38], we assume that a scheduling period consists of a alignment phase and a data transmission phase as shown in Fig. 2, and the sector level alignment has been performed before the beam level alignment. Denote by ψ t u,n and ψ r u,n the sector level beamwidths of LUE u and TRP n, respectively. According to the continuous approximation of alignment time in [38], the alignment overhead can be expressed as where T p denotes the duration for a pilot transmission. Since the alignment time τ u,n should not exceed the scheduling duration T s , we have the following inequation: In addition, we have φ t u,n ≤ ψ t u,n and φ r u,n ≤ ψ r u,n to guarantee that the beam training process is carried out within the sector. Denote by p u the transmission power of LUE u, which can be chosen from a finite set P u = {p u1 < p u2 < . . . < p uK u }, where K u denotes the cardinality of P u . Since accurate monitoring for the eavesdropper's CSI is impossibility in practical scenarios, different from previous works [11], [12], where full CSI for all transmission links is assumed to be available for the network which is an impractical assumption for physical layer secrecy, in this work we assume that the CSI for the transmission links of LUEs is available for the network, while the channel power gain between LUE u and its corresponding MUE m has X positive states including LOS and NLOS transmissions, denoted asĝ u,m,1 , . . . ,ĝ u,m,X with respective probabilities β u,m,1 , . . . , β u,m,X . These states can be obtained in the scenarios, where the eavesdroppers are active and their transmissions can be monitored in the network [40]. Then the signal to interference and noise ratios (SINRs) experienced by LUE u associated with HF TRP n can be expressed in (8), as shown at the bottom of the next page, where B denotes the mmWave bandwidth, N 0 represents the noise power spectral density, the first term in the numerator denotes the receiving power when the link between UE u and TRP n is LOS, and the second term in the numerator represents the receiving power when the link between UE u and TRP n is NLOS, while the first part in the denominator has the similar meanings with those of the numerator. As it is difficult to obtain the accurate channel information of the transmission link from a legitimate user to a malicious user, we use the approach in [41] to approximate the channel gains of wiretapping transmission links. Then the SINR experienced by MUE m that tries to eavesdrop the information of LUE u can be given in (9), as shown at the bottom of the page, where the meaning of each term in (9) is similar to that in (8), but it's worth noting that different from [42], here the channel gain is replaced with its expectation due to the imperfect CSI.
Similar to [38], the achievable rates of LUE u and the corresponding MUE m in a scheduling period can be given respectively by and where x u,n denotes the binary selection variable, i.e. x u,n = 1 if LUE u selects the HF TRP n as its serving HF TRP, and x u,n = 0 otherwise. From the perspective of information theory, the secure rate for LUE u can be given by [12] where [x] + = max(x, 0). From (7)-(12), one can observe that the secure rate for LUE u depends on the beam alignment overhead, the channel conditions of the desired LUE and its corresponding MUE, and its transmission power. Therefore, the joint optimization of TRP selection, power control and beamwidth selection is essential for maximizing the secure sum rate of all LUEs, which can be formulated as where constraint C1 indicates that there is a limit on the maximum number of beams supported by each HF TRP because of the limited RF chains, in which N RF n represents the maximum number of beams for HF TRP n. Constraints C2 and C3 represent each LUE can at most select one serving HF TRP. Constraint C4 indicates that the transmission power level of each LUE is chosen from a predetermined set P u . Similar to [43], constraints C5 and C6 denote that the beamwidths of LUE and HF TRP can be chosen from predetermined sets t u and r u , respectively. Constraint C7 ensures that the beam training overheads for each beam pair cannot exceed the period T s of a scheduling frame.

III. TWO-STAGE GAME BASED DECENTRALIZED RESOURCE ALLOCATION APPROACH FOR MAXIMIZING SECURE SUM RATE
Since problem P1 is a constrained combinatorial optimization problem, it is challenging to be solved in a centralized manner, the reason of which is that centralized algorithms require to frequently exchange information to acquire the global channel state information of all UEs which will result in heavy signaling overheads especially for large scale or dense networks. Therefore, an efficient decentralized solution with only local information is desired. Game theory, which has the ability to model individual, independent decision makers whose strategies are interactional, provides a set of mathematical tools to investigate the scenarios with incomplete information on wireless environment [44]. Consequently, game theory is particularly suitable for optimizing the performance of decentralized networks, and can avoid heavy signaling overheads caused by the frequent information exchanging. Therefore, we resort to game theory to solve the considered problem. To tackle problem P1 easily, in this section we propose a two-stage game based decentralized resource allocation approach for maximizing the secure sum rate, SINR u,n = ξ u,n p u g t u,n g c u,n g r u,n + (1 − ξ u,n )p u g t u,n g c u,n g r u,n /10 u,n 10 k∈U \u (ξ k,n p k g t k,n g c k,n g r k,n + (1 − ξ k,n )p k g t k,n g c k,n g r k,n /10 u,n where in the first stage we use match game to obtain the TRP selection policy for a given beamwidth control and power allocation policy, and in the second stage we exploit potential game to calculate the joint beamwidth and power allocation policy based on the results of the first stage, and then with the result of the second stage, we return to the first stage to recalculate the TRP selection policy. Repeating the above iterative process until the algorithms converge, we obtain the solution of problem P1. The detailed process will be presented in the following subsections.

A. MATCHING GAME BASED TRP SELECTION UNDER FIXED POWER CONTROL AND BEAMWIDTH ALLOCATION
For fixed power control and beamwidth allocation policy, the TRP selection problem can be given by The problem P2 is a large-scale combinatorial optimization problem in dense mmWave networks, and thus is also hard to find its optimal solution. Therefore, it is desirable to solve the TRP selection problem by a decentralized method in which the TRPs and UEs can interact to make a decision on resource allocation by means of their local information. Since matching game can define individual utility functions for the players according to their interest, as well as can provide an efficient and low complexity solution for many problems, it has been commonly used for radio resource management [45]. Accordingly, we design a matching game based TRP selection scheme to solve problem P2. Each player in a matching game must rank the players in the opposing set according to a preference relation that captures its evaluation for the players in the other set. To better understand the TRP selection process, we define u as the preference relation of LUE u ∈ U and denote (n, µ) u (n , µ) if LUE u prefers HF TRP n in matching µ more than HF TRP n in matching µ . Similarly, we can define the preference relation n for each HF TRP n ∈ N . Since each HF TRP can be selected by more than one LUE, and each LUE can only select one HF TRP, the TRP selection subproblem can be formulated as a many-to-one matching game as follows: Definition 1: For two disjoint sets of players U and N , the TRP selection can be expressed as a matching relation µ : U → N that satisfies: i) ∀u ∈ U, µ(u) ∈ N ; ii) ∀n ∈ N , µ(n) ⊆ U and |µ(n)| ≤ q n where q n is the quota of player n; iii) µ(u) = n, if and only if u ∈ µ(n). Condition i) indicates that each LUE is matched with an HF TRP, condition ii) means that each HF TRP is matched with a set of LUEs, and condition iii) shows the two-side feature of a matching, which means that if an LUE is the matching object of an HF TRP, the HF TRP is included in the matches of this LUE. Now, we design the utility functions for LUEs and HF TRPs to determine their respective preference Otherwise, keep the current matching state unchanged. 5: until (u, u ) blocks the current matching.
profiles. The interference environment in the network may dynamically change with the change of the HF TRP selected by an LUE. However, considering the characteristics of high directivity, high path loss and easy blockage of mmWave communications, the change of the interference caused by the change of the pairing relationship between the LUE and HF TRP will only affect the neighboring nodes of each LUE. To make use of these characteristics, we define the similar local interference graph in [30] to determine the neighboring nodes of each LUE. In the local interference graph, each node represents an LUE. Only the distance between two LUEs is less than a predetermined threshold, the two LUEs are connected with each other. If let ε and S u be the edge set of the local interference graph and neighboring node set of LUE u, S u can be expressed as For a swap pair (u, u ) and any HF TRP n ∈ N in a given matching µ, if there is no overlap between the neighboring nodes of LUE u and LUE u that will swap with LUE u, we define the utility of LUE u for any n ∈ N as where the first term represents the secure rate of LUE u, and the second term denotes the aggregate secure rate of the neighboring nodes of LUE u. If there exists an overlap between the neighboring nodes of LUE u and LUE u that will swap with LUE u, we define the utility of LUE u for any n ∈ N as The utility of HF TRP n for u ∈ U is defined by where U n represents the set of LUEs that select the TRP n as serving HF TRP, and S n represents the set of neighboring HF TRPs of HF TRP n, which can be determined by the similar way as in (15). VOLUME 8, 2020 Using these utilities, the preference relations of LUE u can be expressed as (n, µ) u (n , µ ) ⇔ u (n, µ) > u (n , µ ), (19) which indicates that LUE u prefers HF TRP n in µ to HF TRP n in µ only if LUE u can get a higher utility from the selected HF TRP n. The preference relations of HF TRP n can be expressed as which implies that HF TRP n prefers LUE u in µ to LUE u in µ only if HF TRP n can obtain a higher utility by allowing LUE u to select HF TRP n as the serving HF TRP.
From (16)- (20), we can observe that each HF TRP's preferences depend not only on its selected LUEs but also its neighboring HF TRPs and their corresponding LUEs, while each LUE's preferences rely not only on its selected HF TRP but also its neighboring LUEs and their corresponding HF TRPs due to the inter-beam interference. Therefore, it is easily concluded that subproblem P2 is actually a many-toone matching game with externalities or peer effects. As a result, the deferred acceptance algorithm that has been widely utilized to obtain the stable matching for conventional matching problems, can not be directly applied to our problem. Fortunately, the swap matching in [46] can be used to deal with the externalities. To enable the swap operation between two LUEs for exchanging the selected HF TRPs, we introduce the definition of swap matching as follows.
The players involved in the swap are two HF TRPs and two LUEs. The two LUEs switch their associated HF TRPs while all other associations remain unchanged. Note that it is possible that one of the LUEs involved can be a ''hole'', which represents that an available vacancy in some HF TRP that an LUE can directly move to fill in.
Definition 3: Given a matching µ, a pair of LUEs (u, u ) is a swap-blocking pair only when the following conditions hold: . In Definition 3, condition i) indicates that all involved players' utilities should not decrease after each swap operation, while condition ii) means that there at least exists one of the involved players whose utility increases after the swap operation. Based on the above analysis, the swap matching based TRP selection algorithm can be described as in Algorithm 1.
Two-side exchange stability is a very important notion for swap matchings. Next, we first introduce the concept of two-sided exchange-stable, and then investigate the two-sided exchange-stable stability of the proposed Algorithm 1.
Definition 4: If there exist no swap-blocking pairs, a matching µ is two-sided exchange-stable.
Theorem 1: The secure sum rate of all LUEs increases after each swap operation.
Proof: Suppose the matching state is changed from µ to µ u u . According to Algorithm 1, the swap operation occurs only when both conditions in Definition 3 hold. There are two cases for the neighboring nodes of LUE u and LUE u when a swap operation occurs: i) there is no overlap between the neighboring nodes of LUE u and LUE u , and ii) there exists an overlap between the neighboring nodes of LUE u and LUE u . Next, we discuss the above two cases, respectively. For the first case, let us assume that LUE u meets condition 2 of Definition 3, namely u (µ u u ) > u (µ), then the difference of the secure sum rates between match µ and match µ u u can be given in (21), as shown at the bottom of the next page, where ( * ) is due to the fact that LUE u does not affect the LUEs in the set U/S u /u during the matching process from µ to µ u u , and thus the secure sum rate of the LUEs in the set U/S u /u keeps unchanged. Therefore, the secure sum rate of all LUEs will always increase during the matching process from µ to µ u u . If LUE u meets condition 2 in Definition 3, (21) also holds. If HF TRP n or n meets condition 2 in Definition 3, there must be some LUEs whose secure sum rate will increase when the matching operation occurs. This also proves that the secure sum rate of all LUEs increases after each swap operation.
For the second case, the utilities of LUE u and LUE u are the same, i.e., u (µ) = u (µ) = k∈S u ∪S u ∪{u,u } C sec k .
Let¯ k = S u ∪ S u ∪ {u, u } andˆ k = U/S u /S u /u/u , and suppose that LUE u meets condition 2 in Definition 3, namely u (µ u u ) > u (µ), then the difference of the secure rates between match µ and match µ u u can be expressed in (22), as shown at the bottom of the next page, where ( * * ) is due to the fact that both LUE u and LUE u don't affect the LUEs in the setˆ k during the matching process from µ to µ u u . Similarly, if LUE u , HF TRP n or HF TRP n meets condition 2 in Definition 3, it can also be proved that the secure sum rate of all LUEs is increasing when the swap operation occurs. Therefore, the secure sum rate of all LUEs increases after each swap operation.
Theorem 2: The proposed Algorithm 1 will converge after limited swap operations.
Proof: According to Theorem 1, the secure sum rate of all LUEs is strictly improved after each swap operation. Meanwhile, the limited number of LUEs results in a finite number of potential swap-blocking pairs. Furthermore, the secure sum rate of all LUEs has an upper bound because of the limited radio resources such as frequency band and transmission power. As a result, there must exist a swap operation after which no swap-blocking pair can be found, and the secure sum rate of all LUEs keeps unchanged. Therefore, Algorithm 1 will converge after limited swap operations.
Proof: Firstly, the swap operations take place only when the secure rate of players strictly increases. Secondly, for any LUE u ∈ U, it cannot search another LUE u ∈ U to form a swap-blocking pair when Algorithm 1 terminates due to the convergence of Algorithm 1. As a result, the matches of LUEs must be the best choice in current matching. Therefore, the final matching achieved by Algorithm 1 is 2ES.

B. POTENTIAL GAME BASED POWER CONTROL AND BEAMWIDTH ALLOCATION UNDER FIXED TRP SELECTION
For fixed TRP selection policy, the optimization problem P1 can be rewritten as Since problem P3 still involves in the optimization for three groups of variables, i.e., transmission power levels, transmission beamwidths and reception beamwidths, all direct methods will face a very high complexity because of the huge joint action space. Therefore, to avoid the high complexity, we divide problem P3 into two subproblems, that is, i) discrete power control, and ii) transmission beamwidth and reception beamwidth allocation. To facilitate subsequent descriptions, we first define some notations. LetĀ u andÂ u be the available transmission power level set and the combination of transmission beamwidth and reception beamwidth of LUE u. Denoteā = (ā 1 ,ā 2 , . . . ,ā U ) and a = (â 1 ,â 2 , . . . ,â U ) by the transmission power profile and beamwidth allocation profile respectively, whereā u ∈ A u andâ u ∈Â u . Note that the actionâ u , ∀u ∈ U is the combinations of transmission beamwidth and reception beamwidth of LUE u. Inspired by [47], we propose a twodimensional-strategy iterative weekly acyclic game G = [U, {Ā u } u∈U , {F u } u∈U , {Â u } u∈U , {F u } u∈U ] to solve problem P3, in which two kinds of strategies, i.e.,ā u andâ u are available for each player u ∈ U, and there are two utilities, namely,F u andF u for player u corresponding toā u andâ u during the decision-making process. To simplify the following descriptions, we first introduce the concept of weekly acyclic game as following definition.
Definition 5 (Weekly Acyclic Game) [48]: A game G is weekly acyclic, if for any strategy profile a ∈Ā wherē A =Ā 1 ×Ā 2 ×· · ·×Ā U represents all players' joint strategy space, there exists a better reply path that starts from a and end at some pure strategy NE of game G.
It is worth mentioning that a better reply path represents a sequence of action profiles a[1], a[2 where F u denotes the utility of player u. Then, the discrete power control and joint transmission beamwidth and reception beamwidth optimization problem can be solved according to the following iterative fashion: Repeat this process until no further improvement can be achieved. Then, the final strategy profile must be some pure strategy NE for game G. During the iterative process, the players can be considered to play two games, i.e., atively. Therefore, to clarify the characteristics of game G, we first analyze games G 1 and G 2 .

1) POTENTIAL GAME MODEL WITH LOCAL INTERACTION
There are two main reasons for requiring local interaction mechanism. On the one hand, the change of transmission power level and beamwidth selection of each player may VOLUME 8, 2020 change the interference environment of its neighboring nodes. Therefore, in order to maximize the secure sum rate, the utility of each player should consider the secure rate of its neighboring nodes, where local interaction mechanism is needed to complete some information exchanging. On the other hand, due to the short distance and high directionality transmission natures of mmWave communications, multiple players that have little influence on each other can be selected to simultaneously update their strategies, thus speeding up the convergence of the algorithm. Therefore, a local interaction mechanism is also needed to determine each player's neighboring nodes. For simplicity, the local interference graph scheme based on distance similar to (15) is also used to determine the neighboring nodes of each player. Without loss of generality, the power control game is taken as an example to illustrate how to construct the local interaction game. LetĀ −u =Ā 1 · · ·×Ā u−1 ×Ā u+1 · · ·×Ā U andā −u = (ā 1 , . . . ,ā u−1 ,ā u+1 , . . . ,ā U ) ∈Ā −u be the joint strategy space and strategy profile of all players exclusive of player u, respectively. Additionally, we denote byā S u ∈Ā S u andā D u ∈Ā D u the joint strategy profiles of player u's neighboring LUEs and player u's interactional neighboring LUEs respectively, whereĀ S u = × i∈S uĀ i andĀ D u respectively denote the joint strategy spaces of player u's neighboring LUEs and player u's interactional neighboring LUEs, and D u = S u i∈S u S i represents the interactional neighbor set of player u including the neighboring LUEs S u of player u and the neighboring LUEs i∈S u S i of these neighboring LUEs. Then, we define the utility of player u as (25) where the previous term in (25) represents the secure rate of player u, and the latter term corresponds to the aggregate secure sum rate of player u's neighboring LUEs. Then, the local interaction based discrete power control game can be formulated as

2) NE AND CONVERGENCE ANALYSIS
Since Nash Equilibrium has been considered as a commonly used solution for game-theoretic problems, we introduce its definition as follows. Definition 6 (Nash Equilibrium (NE)): A transmission power level or beamwidth selection profile a * = (a * 1 , a * 2 , . . . , a * U ) ∈ A is a pure strategy NE for game G, if for any player u ∈ U and any alternate strategy a u = a * u , where F u represents the utility of player u when choosing the corresponding strategy.

Definition 7 (Exact Potential Games ):
is an exact potential game if there is a function : A → R satisfying that for any a u , a u ∈ A u , ∀u ∈ U and ∀a −u ∈ × m =u A m , where denotes a potential function of game G.
Theorem 4: The power control game G 1 is an exact potential game.
Proof: A potential function for game G 1 can be constructed as which represents the aggregate secure rate of all players.
Since the local interaction rule results in C sec u (ā u ,ā −u ) = C sec u (ā u ,ā S u ), formula (28) can be rewritten by If the strategy of an arbitrary player u is changed unilaterally fromā u toā u , the change of the potential function due to the unilateral change can be expressed as where C sec i (ā i ,ā S u ) represents player i's utility after player u changing its strategy unilaterally. As player u's action can only affect its neighbors' utilities, we have On the other hand, the change of player u's utility due to the unilateral change can be expressed as Then, from (29)-(33), we have the following equation which indicates that the change of potential function due to any player's unilateral deviation is equal to the change of utility function. Therefore, the proposed power control game G 1 is an exact potential game according to the definition 7.
Similarly, when the transmission and reception beamwidth is optimized, we define the utility of player u aŝ Accordingly, the local interaction based transmission and reception beamwidth allocation game can be formulated as Theorem 5: The beamwidth allocation game G 2 is an exact potential game.
Proof: The proof process is similar to that of Theorem 1, which is omitted here. Therefore, when optimizingā or optimizingâ at each iteration, the total secure rate of all players will be strictly increasing. Next, based on games G 1 and G 2 , we analyze the proposed game G. Similar to [47], we can also obtain the following conclusions: i) the proposed game G at least has one pure two-dimensional-strategy NE, and the optimal solution of problem P3 coincides with a pure two-dimensionalstrategy NE of G, and ii) if all players adhere to the proposed Algorithm 2, the game G will converge to an invariant twodimensional-strategy NE almost surely. The detailed proof process is omitted here.

C. JOINT TRP SELECTION, POWER CONTROL AND BEAMWIDTH ALLOCATION
Based on the analysis above, the joint TRP selection, power control and beamwidth allocation algorithm is summarized as in Algorithm 3, and its convergence analysis is provided in the following theorem.
Theorem 6: The proposed Algorithm 3 converges to a stationary solution of problem P1 after limited iterations.
Proof: According to Algorithm 1 and Algorithm 2, with iteratively updating {x u,n }, {p u }, {φ t u,n }, and {φ r u,n }, the value of the objective function in (13) will be increased or maintained. As a result, a non-decreasing sequence value for the objective in (13) will be obtained with the repeat process in Algorithm 3. Moreover, the the objective value in (13) is up bounded due to the limited radio resource. Therefore, the proposed iterative Algorithm 3 for maximizing the secure sum rate in dense mmWave networks converges to a stationary solution of problem P1 after limited iterations.

D. EFFECT OF BEAMWIDTH ON SECURE RATE
For the sake of analyzing, (12) can be rewritten as where R sec u,n = (log 2 (1 + SINR u,n ) − log 2 (1 + SINR u,m )), γ u,n = Bx u,n (1 −  (8) and (9), respectively. We can find that, if T p T s is very small such as 0.0001, the trend of C sec u will be nearly the same as that of R sec u,n . Since the narrower of the beamwidths between LUE u and TRP n are, the larger log 2 (1+SINR u,n ) will be and the smaller log 2 (1+ SINR u,m ) will generally be for the MUEs with fixed locations. Therefore, at this case, the secure rate C sec u will decrease as the communication beamwidth increases. If At iteration t, a group of LUEs without interfering each other is selected randomly, the set of which is denoted by U s . Every selected player u ∈ U s calculates its utility according to (25) or (35) through exchanging information with its neighboring players by C-plane. The optimization strategy type is chosen according to the probability Prā = ξ and Prâ = 1 − ξ where 0 < ξ < 1.

4:
Each selected player u ∈ U s searches the better reply set and update its strategy: 5: if the strategyā is optimized then 6: Each player u ∈ U s updates its strategy based on the following rule: where ζ ∈ (0, 1) is the players inertia, and Each player u updates its strategy based on the following rule: end if 10: Update t = t + 1. 11: until t ≥ T max . such as 0.01, the secure rate C sec u will be affected by γ u,n and R sec u,n . As the beamwidth φ t u,n φ r u,n of transmission and receiving beams increase, γ u,n will increase according to the law of inverse proportion function. Combining with the fact that R sec u,n will decrease according to the law of logarithm with the beamwidth increasing, the secure rate C sec u will first increase and then decrease.

IV. THREE-DIMENSIONAL-STRATEGY BETTER RESPONSE BASED RESOURCE MANAGEMENT SCHEME
In this section, to adapt to the three-dimensional optimization variables and the huge strategy space of problem P1, we propose a new three-dimensional-strategy iterative weekly acyclic game G = [U, {Ā u ,Â u ,Ǎ u } u∈U , {F u ,F u ,F u } u∈U ] to solve it, where three kinds of strategies, i.e.,ā u ∈Ā u , a u ∈Â u andǎ u ∈Ǎ u are available for each player u ∈ U, and VOLUME 8, 2020 Algorithm 3 Resource Allocation for Maximizing Secure Sum Rate in Dense mmWave Networks 1: Initialize the strategy profilesā 0 = (ā 0 1 ,ā 0 2 , . . . ,ā 0 U ) and a 0 = (â 0 1 ,â 0 2 , . . . ,â 0 U ) randomly, and set iteration index t w = 1. 2: repeat 3: For a given joint power and beamwidth allocation policy a t w , calculate the TRP selection policy x t w +1 via Algorithm 1.

4:
After getting the TRP selection policy x t w +1 , calculate the joint power and beamwidth allocation policy a t w +1 according to Algorithm 2.

5:
Update t w = t w + 1. 6: until The utility of all LUEs is no longer increasing.
there are three utilities, namely,F u ,F u andF u for player u corresponding to strategiesā u ,â u andǎ u , and then the threedimensional optimization variables are optimized alternately during the decision-making process. For the convenience of analysis, we introduce the following definition.
Definition 8 (Three-Dimentional-Strategy Nash Equilibrium): A strategy profile a * = (ā * ,â * ,ǎ * ) is a pure threedimentional-strategy NE if and only if for any player u ∈ U, its utility cannot be improved by deviating unilaterally either atā u ,â u orǎ u iterations, i.e., Similarly, we decompose the game G into three games, i.e., the power control game Using the similar method in the last section, we can also prove that each of the three games is an exact potential game. Therefore, optimizingā or optimizingâ or optimizingǎ at each iteration will result in the increasing of the secure sum rate of all players. Moreover, we have the following theorems.
Theorem 7: The proposed game G at least has one pure three-dimensional-strategy NE, and the optimal solution of problem P1 coincides with a pure three-dimensional-strategy NE of G .
Proof: Assume that an actionā/â/ǎ is randomly selected to optimize at each iteration t, then for player u, the better responseā . Moreover, playing games G 1 , G 2 and G 3 , the total utility is strictly increasing with the path, and thus the better reply path cannot loop back to itself. Furthermore, the strategy profilesĀ,Â andǍ are finite, and the total utility is up bounded, thus the path cannot be indefinitely extended. Hence, the path in better reply starting from any joint action (ā,â,ǎ) must end with a three-dimensional-strategy NE of game G . As a result, if (ā * ,â * ,ǎ * ) is the optimal solution to problem P1, (40) holds for all feasible strategy profiles. Therefore, the optimal solution to problem P1 coincides with a pure three-dimensional-strategy NE of G in light of Definition 9.
Based on the analysis above, we propose a new threedimensional-strategy better response algorithm named as Algorithm 4 to find the pure three-dimensional-strategy NE of G whose process is very similar to that of Algorithm 2, and thus we only provide the flow chart as shown in Fig. 3. Theorem 8: If all players adhere to Algorithm 4, the game G will converge to an invariant three-dimensional-strategy NE almost surely.
Proof: In Algorithm 4, each player will keep with the previous strategyā u [t −1]/â u [t −1]/ǎ u [t −1] with probability ζ even if there exists a potential opportunity to improve the total utility. Denoting a 0 = (ā 0 ,â 0 ,ǎ 0 ) = a[t], there always exists a positive constant Y which does not rely on t, so that the current action profile a 0 will be repeated Y consecutive iterations, i.e., for all players a[t] = · · · = a[t + Y − 1] = a 0 . Such an event occurs with the the probability at least ζ |U s |(Y −1) , where |U s | is the number of selected players. If the joint strategy profile a 0 is a threedimensional-strategy NE, the repeat process is completed. Otherwise, there must be at least one player u ∈ U s and the corresponding actionā u /â u /ǎ u that leads toF ). The event that exactly the action of one player u is changed to a different strategy, i.e., a 1 = a[t + Y ] for some a 1 occurs with the probability at least (1 − ζ )ζ |U s |−1 . Then, (a 1 ) > (a 0 ) if is the potential function of game G . Repeating this process results in a sequence of profile a 0 , a 1 , a 2 , · · · , a Z satisfies that (a 0 ) < (a 1 ) < (a 2 ) · · · < (a Z ), where Z is a constant that is independent of t, and a Z is a pure strategy NE. (41) implies that for any given a 0 at any sufficiently large t, there exist constants Y * > 0 and ζ * > 0 that are independent of t, such that a[t +Y * ] is a pure strategy NE of game G with the probability at least ζ * , which means that a[t] will converge to a pure strategy NE of game G almost surely.

V. COMPLEXITY ANALYSIS
The complexities of the proposed algorithms are analyzed in this section. The complexity of Algorithm 1 searching the stable matching can be given by O( final − initial min ) according to [49], where final , initial and min represent the secure sum rate of the final stable matching, the secure sum rate of the initial matching, and the minimum increase of secure sum rate after each swap operation, respectively. The complexity of Algorithm 2 obtaining the two-dimensional-strategy NE requires T iter (O(C 1 ) + O(C 2 ) + O(C 3 )), where C 1 , C 2 and C 3 are three small constants depended respectively on the complexities of calculating the utility, selecting player and updating strategy selection, and T iter represents the iterations upon Algorithm 2 convergence. As a result, the complexity of Algorithm 3 is T w iter represents the iterations upon Algorithm 3 convergence. The complexity of Algorithm 4 is T iter (O(C 1 ) + O(C 2 ) + O(C 3 )), where T iter , C 1 , C 2 and C 3 have the similar meanings with those in Algorithm 3. Therefore, the proposed Algorithm 3 and Algorithm 4 have lower complexities compared with the best response algorithm in [9] when there are a large number of LUEs and HF TRPs, which can be verified in simulation results.

VI. SIMULATION RESULTS AND ANALYSIS
In this section, extensive simulations are presented to evaluate the performance of the proposed schemes. Similar to [30], we consider an HF TRPs cluster in the macro cell, where the HF TRPs are uniformly distributed within the cluster, and all UEs are randomly located in the coverage area of the cluster. The transmission power set of each LUE is set to {0.05Watts, 0.1Watts, 0.15Watts, 0.2Watts}. The number of channel states for each LUE to its corresponding MUE X = 10, where each state has random small-scale channel gain and probability with the cumulative probability sum of all states being 1 [41]. The channel parameters for mmWave  networks in [32] are used in this work. Other main simulation parameters are summarized in Table 1. Figs. 4-6 show the convergence of Algorithm 1, Algorithm 2 and Algorithm 4, and Algorithm 3, respectively. Fig. 4 and Fig. 5 show that Algorithm 1, Algorithm 2 and Algorithm 4 have good convergence, and all can converge to a stable point within 700 iterations. Moreover, as the number of LUEs increases, the feasible solution space increases, and thus all the convergence speeds slow down. Fig. 6 shows that   Algorithm 3 can converge to a stable point within 2 iterations and hence also has good convergence performance. Fig. 7 provides the effect of the beamwidth on the secure sum rate, where the transmission and reception beamwidths of all LUEs are set to the same for simplicity. We can see that when the beam training overheads are very small (e.g., Tp/Ts=0.0001), the secure sum rate decreases as the beamwidth increases, while when Tp/Ts is relatively large (e.g. Tp/Ts=0.01), the secure sum rate first increases and then decreases with the beamwidth increasing. At this case, there is an optimal beamwidth which maximizes the secure sum rate. The reasons are presented in section III. Fig. 8 shows the effect of beam training overheads on the secure sum rate, where we use the best NE of 100 runs for the proposed algorithms to show the performance difference between considering the beam training overheads and not considering the beam training overheads in order to demonstrate the advantage of considering beam training more clearly. We can see that when the beam training overheads are very small such as Tp/Ts=0.0001∼0.001, considering the beam training overheads has little performance gain, while when Tp/Ts is relatively large such as Tp/Ts=0.001∼0.01, the secure sum rate can be significantly improved by considering the beam training overheads. This means that the values of the scheduling period and the pilot period have a significant impact on whether to consider beam training overheads. Or indirectly, for the scenario with fast channel change or ultra reliable low latency communication, where the Tp/Ts will be larger for a fixed pilot period because its scheduling period is often smaller, considering the beam training overhead is more important to improve the secure sum rate of the network.
The optimal solution of problem P1 can be obtained by using learning methods such as log-linear learning [44] and federated learning [50]. Since federated learning is mainly used to protect data privacy and improve information security in data exchange during the training process but it needs training data, while log-linear learning can achieve the optimal solution with an arbitrarily high probability [44] and does not need training data, we use log-linear learning to obtain the optimal solution of problem P1 for providing the performance gap between the optimal solution and the proposed solution. Fig. 9 provides the comparison of secure sum rates for different algorithms to solve problem P1, where the simulation settings are similar to those in Fig. 8. We can see that the best response algorithm, the immune optimization algorithm (IOA) and the proposed scheme2 have very similar secure sum rates which are very close to that of the log-linear leaning algorithm, and the proposed scheme1 has slightly worse secure sum rate, while the greedy scheme has the worst secure sum rate. Fig. 10 provides the comparisons of secure sum rates for different schemes, where the nearest distance based TRP selection scheme is used for the schemes with the TRP selection not being optimized. We can see that when all schemes consider beam training overheads, whether the beam training overheads are small or large, the proposed schemes have better secure sum rates than only optimizing beamwidth, only optimizing TRP selection, only optimizing transmission power and the maximizing sum rate scheme. Therefore, jointly optimizing TRP selection, power control and beamwidth selection is important for improving the secure sum rate. Fig. 11 shows the comparisons of running time upon convergence for different algorithms to solve problem P1 on a computer with Intel i7 2.6GHz CPU, 16GB RAM, Windows 10 operating system and MATLAB R2018b. We can see that the IOA algorithm has the longest running time which reaches to several hundreds seconds level, followed by log-linear learning, best response algorithm and the greedy algorithm, while the proposed scheme1 has the shortest running time and the proposed scheme2 has slightly longer running time than the proposed scheme1. Therefore, the proposed schemes have satisfied secure sum rate as well as lower complexities.

VII. CONCLUSION
In this work, in order to improve the communication security in dense mmWave networks, we have investigated the joint TRP selection, transmission power control and beamwidth selection problem with the aim to maximize the secure sum rate, considering the beam training overheads, the blockage effect in mmWave communications, and the imperfect CSI from an LUE to its corresponding MUE. To reduce the computational complexity, we proposed a two-stage game based decentralized resource management approach and a threedimensional-strategy better response based resource management algorithm to solve the considered problem. Simulation results have shown that, the proposed schemes have good convergence and satisfied secure sum rate compared to existing schemes. Moreover, some interesting observations are also found which are listed as follows: i) the secure sum rate decreases with the increasing of Tp/Ts; ii) when Tp/Ts is very small, the secure sum rate decreases monotonously with beamwidth, while when Tp/Ts is large, the secure sum rate increases first and then decreases with the broadening of beamwidth; and iii) only when Tp/Ts is large, it is meaningful to take into account the beam training overheads for improving the secure sum rate.