Joint Resource Allocation, User Association, and Power Control for 5G LTE-Based Heterogeneous Networks

The aim of 5G wireless networks to provide Mbps and Gbps data rates to end users is expected to be fulfilled by the advanced technologies such as multi-input multi-output (MIMO), carrier aggregation (CA), inter/intra-cell communication, and adaptive modulation and coding techniques, which would be all realized in the Long Term Evolution-Advanced (LTE-A) heterogeneous network constituted by macrocells (MCs) and small cells (SCs) adopting these 5G advanced techniques. Given the potential of significantly increasing the network performance, the resource allocation (RA) problem involved becomes harder than ever especially when MIMO and CA are included in the RA problem involving multiple types of resources to be concurrently determined for the global optimization. Facing this challenge, we develop a framework to jointly optimize energy efficiency (EE), spectrum efficiency (SE), and queue length for downlink transmissions with an overall and comprehensive consideration of dynamically allocating resource blocks (RBs), component carriers (CCs), modulation and coding schemes (MCSs), and deciding user association (UA) with a power control (PC) mechanism on discrete power levels (PLs) in the heterogeneous LTE-based MIMO wireless networks. Specially, for the complex joint RA, UA, and PC problem, we conduct a mixed integer programming model to accommodate the stochastic optimization problem involved with the drift-plus-penalty (DPP) approach for Lyapunov opportunistic optimization. In particular, although it involves a nondeterministic polynomial time (NP) problem, we can still show a reduced problem to be solved easily through linear relaxation when its coefficient matrix is totally unimodular (TUM), and to be solved efficiently as well even when the TUM property is not guaranteed. Based on the reduction, we further develop a distributed or semi-distributed algorithm operated on two levels to approach the optimal results with lower complexity if the UA requirement can be relaxed. Finally, apart from exhibiting its performance on the weighting parameters, the numerical experiments also show our approach to make a good tradeoff among SE, EE, and queue length, and outperform the greedy-based state-of-the-art algorithms.

energy efficiency (EE). In particular, the 5G cellular wireless systems with a multi-tier architecture consisting of MCs and different types of SCs are expected to serve user equipments (UEs) with varying quality-of-service (QoS) requirements. Resource allocation (RA) in such 5G systems will be extremely complex due to many different types of resources to be concurrently allocated in the irregular and pseudorandom network topology, and the existing management approaches may not be sufficient.
Specifically, for a LTE-A heterogeneous network wherein serval SCs are deployed within a MC service area, how to allocate the radio resources involved while deciding the UE-to-cell association (UA) and the power levels for all transmitters would be a major issue especially when MC and its SCs share the radio resources from the same service provider. To resolve this issue, SCs can use their own frequency bands different from those of MC [2]. However, the easy solution of dedicated radio usage would sacrifice SE. Alternatively, SCs can totally or partially use the same frequency bands of MC as noted in [3] adopting a co-channel deployment or a partial co-channel deployment to manage the radio resources. However, it could cause cross-tier interference, and hence certain partial co-channel deployments were considered in [4], [5] to alleviate this difficulty. Apart from the above, the heterogeneous network is also complicated by the 5G advanced techniques such as carrier aggregation (CA) and multi-input multi-output (MIMO) which are anticipated as the vital breakthrough that is necessary for 5G [6]. Specifically, they will continue to serve as the key techniques of 5G because CA can aggregate several component carriers (CCs) to support high data rate transmission, and MIMO can enhance signal to noise ratio (SNR) through transmit diversity (TD) or increase data rate through spatial multiplexing (SM).
In the heterogeneous and complex environment, RA would be tremendously difficult due to many unique constraints from these techniques to be satisfied simultaneously and the existing RA schemes can hardly be sufficient. For instance, the previous works [7], [8] addressed the problem of downlink radio resource allocation with CA by employing load balancing mechanisms that assign CCs to UEs first and then schedule RBs of CCs for them at every transmission time interval (TTI) to optimize the radio resource usage. Given their contributions, these works, however, consider only heuristic or greedy approaches, which provide no performance guarantees and no benefits brought by MIMO. In addition, even though more sophisticated greedy algorithms such as those in [9], [10] have been proposed to guarantee performance lower bounds, they still involve no MIMO. In fact, most of the RA researches for CA-enabled LTE networks [11]- [13] assumed CC selection, resource block (RB) allocation, and MCS assignment as completely separate problems, which may lead to the degradation of network performance. By jointly considering these problems in a framework, the related works [9], [10], [14]- [16] exhibited more complete solutions among the others. However, they still lack to consider MIMO [9], [10], [14], [15], heterogeneous network with multiple cells [9], [15], [16], or power control [9], [10], [15], [16].
Taking these issues into account, in this work we study a joint RA, UA, and PC problem for downlink transmissions in 5G LTE-A heterogeneous wireless networks. In particular, we consider discrete power control to reflect the fact that 3GPP LTE cellular networks only support discrete power levels (PLs) in the downlink via a user-specific datato-pilot-power offset parameter [1], which would be still useful in the 5G framework. Further, if the discrete power allocation (PC) and the UE-to-cell association (UA) are referred to as a kind of resource allocation, respectively, then RA, UA, and PC in this work could be collectively denoted by UE/RB/CC/MCS/cell/PL allocation problem or multi-resource allocation (MRA) problem for short. Our objective is then to jointly optimize EE, SE and queue length in the long-term, under the constraints resulted from the MRA problem, in addition to those from the LTE. For example, we would take into account the constraint that a UE can be only exclusively served by a single cell, and the constraint that the same RB can not be allocated to two different cells if there is interference between the two cells, which would avoid complicating the power control.
Given the various constraints to be addressed concurrently, the joint stochastic optimization problem is expressed here as a mixed integer programming problem via a transformation, whose solution typically requires prohibitive time complexity. As far as we know, the long-term metrics of EE, SE, and queue length have not been jointly investigated in the LTE-based heterogeneous wireless networks subject to all the specific constraints including those just being exemplified. In particular, although SE, EE, or their relationship had been investigated in, e.g., [17]- [19], these related works typically presume statical channel state and infinite queue length. Based on these assumptions, the channel state information (CSI) and the queue state information (QSI) could be ignored, and the arrival traffic may not be transmitted in time and the corresponding queue length would accumulate unboundedly when the data is given in burst without the awareness of CSI. Unlike the above, in this work we try to maximize EE and SE while guaranteeing the network stability in the heterogeneous networks, wherein RA, UA and discrete PC are jointly considered to be a complex MRA problem. Apart from the MRA that is a combinatorial problem to be solved, our work involves also the stochastic nature caused by varying channel and traffic state. For the combined difficulty, in the high layer, we formulate the joint optimization problem as a stochastic optimization problem and resolve it through the drift plus penalty (DPP) approach in the framework of Lyapunov optimization to accommodate time-varying channel conditions and traffic arrivals without prior knowledge of them. In the lower layer, we show that even it is NP in general, the MRA problem can still be easily solved through linear relaxation when its coefficient matrix is totally unimodular (TUM). Then, inspired by the TUM property, we reduce this problem and develop a distributed or semi-distributed algorithm operated on two levels to approach the optimal result with lower computational complexity if the UA requirement can be relaxed. Finally, to know its performance, we conduct numerical experiments, showing that our approach can make a good tradeoff among EE, SE and queue length, and outperform the greedy-based start-of-the-art algorithms, while showing the performance metrics varied by the weighting parameters. Specially, it is also exhibited that the two-level MRA algorithm proposed would obtain a solution allowing the system to achieve more than half of the optimal spectrum efficiency (SE) and throughput, and approach the optimal energy efficiency (EE) while maintaining a larger queue length, which represents a significant performance gain against the computation cost decreased from nondeterministic polynomial (NP) to polynomial (P). As a summary, the characteristics of this work can be outlined as follows: • Unlike existing studies on LTE-based wireless networks where the performance metrics such as SE, EE, and delay, the system techniques such as MIMO, CA, and PC, or both, are partially considered, in this work a synthetic framework is proposed to jointly consider EE, SE, and queue length in the 5G LTE-based heterogeneous networks equipped with MIMO, CA, and PC. To also account for the time-varying channel and traffic, the joint design is formulated as a stochastic multiobjective optimization (MOO) problem subject to the constraints on the network stability, the constraints from the multi-cell environment, and the unique RA rules for LTE.
• A Lyapunov DPP technique is adopted to transform the MOO problem to a mixed integer linear programming problem. Further, a linear programming model is found to easily solve the high-dimensional MRA problem involved when the coefficient matrix is totally unimodular (TUM) in a reduced model. Based on this model, a distributed two-level MRA algorithm is proposed for more computationally efficient solutions to the NP allocation problem, in addition to the joint optimization algorithm developed for the whole stochastic MOO problem.
• Apart from the Lyapunov approach to guarantee the network stability even without prior knowledge of the system state on channel and traffic, this work is also aided by the weighted sum method that introduces different weights to the objective function to make an optimal tradeoff among SE, EE, and queue length. The remainder of this paper is organized as follows. First, in Sec. II we introduce the system model and formulate our problem in terms of transmission modes for LTE, energy efficiency model, and scheduling constraints. Then, in Sec. III we adopt the weighted sum method to formulate the MOO problem, and use the Lyapunov DPP approach to optimize EE, SE, and queue length subject to the various constraints involved. Following that, a distributed two-level RA algorithm is proposed in Sec. IV to resolve the NP problem if the UA requirement can be relaxed. The performance analysis and numerical experiments are given in Secs. V and VI, respectively, and finally conclusions are drawn in Sec. VII.

II. SYSTEM MODEL AND PROBLEM FORMULATION
In this work, we consider 5G LTE-based networks with a multi-tier and multi-cell heterogeneous architecture, as exemplified in Fig. 1, which consists of |S| base stations (BSs) (including a micro base station (MBS) and |S| − 1 small base stations (SBSs)), and |U| UEs located in their service areas. A number of |C| CCs obtained with CA are deployed in the environment, and without loss of generality, each CC has the same number of |B| RBs to be allocated. Then, |L| MCSs are dedicated to each RB for transmission. Similarly, |P| discrete power levels (PLs) denoted by P = {σ 1 , σ 2 , . . . , σ |P| } × P max are designed for each BS, where 0 < σ 1 < σ 2 , . . . , < σ |P| = 1 and P max denotes the maximum power.

A. TRANSMISSION MODE
As specified in LTE [20], there are different multi-input multi-output transmission modes (TMs) to be used in the system, which can be distinguished in terms of the antenna mapping and the type of CSI feedback adopted. In this work, we consider transmit diversity (TD) and spatial multiplexing (SM) as the two major types of TMs usually thought for the future 5G network development. As specified, TD sends the same data via different antennas, and each antenna stream uses different coding, which could enhance the signalto-noise ratio (SNR) and reduce the block error rate (BLER). Specially, in the case of two antennas, TD could be done based on space-frequency block coding (SFBC), and in the case of four antenna ports, TD can be realized through a combination of SFBC and frequency-switched transmit diversity (FSTD) [21], [22]. Different from the above, SM supports spatial multiplexing of two to four layers that are multiplexed to two to four antennas respectively. In the specification, the open loop spatial multiplexing does not rely on pre-coding matrix indicator (PMI) being reported by UE and selects PMI based on a predefined method, while the closed loop spatial multiplexing selects PMI based on the CSI feedback of UE [21]. In this work, the latter is assumed for the MRA problem to be introduced. Specifically, with TD, RBs are allocated from a single transport block (TB) per carrier component (CC). In contrast, with SM, RBs are allocated from two TBs per CC. Taking both into account, we define a transmission  mode table (Table 1) for our MRA problem wherein each index specifies a specific TM mode used and the modulation and coding scheme (MCS) per TB adopted. As shown therein, 29 MCSs defined in the 3GPP LTE standard [23] are considered in this work. For such a stochastic transmission system wherein traffic arrivals and channel conditions are both time-varying, we propose an online algorithm that can dynamically resolve the stochastic optimization problem with the Lyapunov DPP approach to achieve the system stability while maximizing the network utilization. To this end, we first introduce the channel and power model, the spectrum and energy efficiency, and the queueing dynamic. Then, we formulate the constraints specific to the system, and present a complete programming model for the optimization problem addressing the complex RA, UA, and PC issues to be involved.

B. CHANNEL AND POWER MODEL
For high-rate networks with reduced degree of mobility, it is vital for a resource allocation (RA) algorithm is conducted to accommodate a slow fading network wherein channel conditions would remain unchanged during an allocation period (Ch. 6 of [24]). Accordingly, for downlink transmissions in the networks, the signal to interference and noise ratio (SINR) from BS s to UE u using RB b of CC c at PL p in time t can be represented by where the channel gain from BS s to UE u using RB b of CC c is denoted by h c,b s,u , the distance from s to u by d s,u , and the path-loss factor by ρ. In addition, when s transmits to u on RB b of CC c, the noise on u is denoted by N c,b s,u . The channel is supposed to be Rayleigh fading and its gain to be exponentially distributed, and further, an empirical downlink SINR to channel quality indicator (CQI) mapping for LTE is adopted to estimate the CQIs returned to BSs [25]. If MBS is in charge of RA and these CQIs are collected, MBS would decide each MCS index (u, c, b, s, p) for the downlink transmission from BS s to UE u using RB b of CC c at PL p. Then, it can transmit the decision to all SBSs it associates. Here, based on the 3GPP specification [26], the data rate would be represented by r l through a mapping table for each RB based on MCS l in the index . Let u,c,b,s,p be the index of the highest-rate MCS on u, c, b, s, and p. As u,c,b,s,p is the highest MCS to be obtained, the achieved transmission rate v u,c,b, ,s,p on would be l r l where r l is the data rate corresponding to MCS l in TB i ∈ {1, 2} in the entry of index , and it would be 0 if l is not available in the TM, or exceeds u,c,b,s,p which would result in unacceptable error rate. Providing a RA matrix X which accommodates RA, UA, and PC through a high-dimensional (6-dimensional) representation instead of defining different variables for different metrics brought by RA, UA, and PC, respectively, to unnecessarily complicate its representation, the total data rate can be simply obtained by R tot (t) = ∀u,c,b, ,s,p x u,c,b, ,s,p (t) × v u,c,b, ,s,p (t) . Similarly, the total power consumption can be obtained by P tot (t) = ∀u,c,b, ,s,p x u,c,b, ,s,p (P p s,u (t) + P c s,u ) , where P p s,u (t) denotes the transmit power from s to u at power level p, and P c s,u denotes the constant circuit power for s and u.

C. SPECTRUM/ENERGY EFFICIENCY AND QUEUEING DYNAMIC
Unlike the previous works focusing on SE, EE, or both at the moment of observation [27], [28], in the stochastic system with channel conditions and traffic arrivals to be timevarying, we pay our attention to the limits of the time-average expectations of these metrics. Specifically, the long-term time-averaged expected transmission data rate and energy consumption can be represented by Here, the data rate R tot is normalized by the channel bandwidth. The spectrum efficiency (SE) is defined as the long-term average data rate on all the transmissions in (2), i.e., Following that, the energy efficiency is defined as the ratio of the long-term aggregated data rate (i.e., η SE ) to the long-term total energy consumption. That is, Apart from SE and EE, the queueing delay (or queue length) is also jointly optimized in this work. For this, the system is considered to be time-slotted, and the downlink traffics to UEs at time t are aggregately represented by A(t) = (A 1 (t), . . . , A |U | (t)), which are independently and identically distributed over t with E{A(t)} = λ = (λ 1 , . . . , λ |U | ). In addition, to ensure the system stability, it is assumed that A u (t) will not exceed a peak or maximum value A max u , i.e., 0 ≤ A u (t) ≤ A max u , ∀u ∈ U and ∀t ≥ 0. The assumption is based on the fact that the statistics of A(t) are usually unknown and the capacity region involved is hard to estimate in a practical system. In the real situation, a flow control is generally required to limit the admitted traffic R u (t) to be lower than the arrival A u (t) for the system stability. Thus, an admission control algorithm is required here to determine R u (t) from A u (t), and a BS is conducted to make RA decisions to provide link rates µ u (t) to serve the admitted traffics. With the admitted traffic R u (t) and the service rate µ u (t), the data queue dynamic for UE u ∈ U can then be formulated as In the above, the queue is considered stable if it has a bounded time-averaged backlog and finite average queueing delay [29]. According to Little's law [30], the average delay would be proportional to the average queue length with a specific arrival rate. Thus, it could use the queue length and further the queue stability to describe the delay. Accordingly, the strong stability for the average data queue length on u would be considered for the system, defined as

D. MULTIPLE RESOURCE ALLOCATION AND ITS CONSTRAINTS
For the MRA problem, we use u ∈ U to denote a UE, c ∈ C a CC, b ∈ B an RB, ∈ L an MCS index, s ∈ S a cell, and p ∈ P a PL, as already appeared in the index of x. At each TTI t, the system aims to allocate UEs/CCs/ RBs/MCSs/cells/PLs simultaneously or decide x u,c,b, ,s,p (t) to maximize the network utility. Because the channel condition is time-varying, each UE is conducted to use the reference signals from MBS or SBSs for estimating the channel condition, and accordingly, to transmit its CQIs to MBS or SBSs. Then, a CQI for RB can be mapped to the highest-rate MCS for a UE using the RB [26], and hence the channel conditions on all UEs and RBs can be perceived by the system through the CQIs reported [9]. All cells can eventually share the channel state information (CSI) for the optimization with each other via a backhual network. Unlike the previous works [9], [10], [15] paying no attention to MIMO, our system accommodates 2 different MIMO transmission modes (TMs), i.e., transmit diversity (TD) and spatial multiplexing (SM). As noted before, the TM indices for both selected MIMO TM per CC and MCS per TB are summarized in Table 1, wherein the first 29 MCS indices are given for TD since only a single TB per CC is considered while 29 × 29 = 841 indices starting at 30 are given for SM as two TBs per CC will be used in this TM.
Providing that, for scheduling the multiple resources in the network with CA and MIMO, we have the following constraints which ignore the time index t for brevity. First, as the basic unit for transmission, RB can at most be assigned to a single UE u on a certain MCS , and represented by In the left hand side of (8), the summation on all p, i.e., ∀p∈P , is used to ensure that each RB can be transmitted at only one power level p. In the right hand side, y 1 u,c, ,s is defined to be an auxiliary binary variable representing a CC allocation, where y 1 u,c, ,s = 1 denotes that CC c is assigned to UE u in cell s with TM index . Complying with LTE, all RBs allocated to UE u in CC c should have identical , i.e., Next, a monopoly principle specific to the multi-cell environment is considered that a UE u can only be served by a single cell s, and given that, it can not be served by the other cells s ∈ S\s. To realize the above in a linear form, we have the following two constraints. The first is where ∀ ∈L on x is used to denote that each RB can be transmitted with only one TM index in addition to the fact that only one power level is adopted for the transmission as already noted previously by ∀p∈P in (8). In addition, y 2 u,s is an auxiliary binary variable used to represent an RB allocated to UE u in cell s if its value is 1, and 0 otherwise. Given that, the monopoly principle that a UE u can only be served by a single cell s, is further enforced by the second constraint: In addition, to reduce the inter-cell interference, another monopoly principle about the multi-cell would be also specified that if an RB b of CC c is already allocated to a cell s, it cannot be assigned to its neighboring cells s ∈ N s which would cause significant interferences to s. That is, a specific RB can be either allocated in a cell s or its neighboring cells s , but not both. This involves a logical either-or constraint which can be transformed to regular linear constraints with the aid of an auxiliary binary variable, y 3 c,b,s , to denote whether RB b of CC c can be allocated to cell s or not.
Accordingly, there are two conditions to be specified. The first is for RB b of CC c allocated to cell s, denoted by ∀u∈U ,∀ ∈L,∀p∈P The second is for RB b of CC c allocated to its neighboring cells s ∈ N s , which can be similarly denoted by Further, it is worth noting that in a LTE-based network, the number of CC allocated to cell s would be limited to a certain number, say f s . For example, a UE of LTE 8/9 can only use 1 CC while a LTE UE is allowed to use 2 CCs. Specifically, the cardinality constraint can be realized by the following inequalities. The first is where y 4 c,s is a binary variable whose value 1 denotes CC c being allocated to cell s, and 0 otherwise. With the aid of auxiliary variable y 4 c,s , the cardinality of f s is further enforced by the second constraint: Apart from cell in the above, UE could have its own cardinality constraint on CC as well; that is, each UE u can be allocated at most d u CCs for its transmission. Similarly, it can be realized by linear inequalities, and the first is ∀b∈B,∀ ∈L,∀s∈S,∀p∈P where y 5 u,c is a binary variable whose value 1 denotes CC c being allocated to UE u, and 0 otherwise. Then, with y 5 u,c , the cardinality constraint on d u can be finalized by

E. SERVICE RATE, THROUGHPUT, AND POWER CONSUMPTION
Providing the MRA satisfying the above constraints, the service rate µ u (adopted in (6)) can be obtained by Given that, the total data rate introduced previously can be also represented by R tot (t) = u µ u (t). Clearly, there are two parameters directly impacting the data queue dynamic (6). The first is the data rate µ u (t) just given. The second is the throughput the admitted and transmitted data rate for u. In this work, the time-average throughput r u in the long term serves as one of the performance metrics, which should be higher than the requirement R req u from u. Taking these into account, we have Clearly, in the long term, the time-average throughput r u will not exceed the time-average arrival rate λ u , i.e., Similarly, in the short term, the throughput of u at time t, i.e., R u (t), can not exceed its arrival rate A u (t), i.e., In parallel with the above, the power consumption of u at time t can be obtained by P s (t) = u P p s,u (t) + P c s,u (t), and in the long term, it can not exceed the maximum P max s as well. That is, As mentioned before, our work is to maximize SE and EE simultaneously constrained by the queue stability, involving the three key performance metrics (SE, EE, and queue length) to be optimized at the same time. Here, as the average queue length links the stability and the delay, the queueing delay can be managed by investigating the queue stability. Apart from the queue length or delay to be enforced by using constraints, SE and EE representing our research targets would serve as the objectives in the stochastic MOO problem. However, due to the different units between SE and EE, it is more conveniently considered to maximize the normalized data rate R tot /R max and to minimize the normalized total power consumption ∀u,c,b, ,s,p x u,c,b, ,s,p (P p s,u + P c s,u ) /P max , where R max denotes the maximum data rate and P max the maximum power in the system. Now, given the throughput constraints in (19)-(21), the power consumption constraint in (22), and the scheduling constraints in (8)- (17) in addition to the queue stability constraint, we can formulate the MOO problem with the following programming model: As shown readily, it is a highly challenging stochastic optimization problem involving a large amount of stochastic information on channel conditions and traffic arrivals to be considered, and a high-dimensional variable representation on 6 different types of resources to be determined. This requires an online control and scheduling algorithm to obtain the solutions within a reasonable time limit. In addition, for the MOO problem, it is essential to jointly optimize the multiple types of resources cooperatively representing RA, UA, and PC, which is always a complicated mixed integer programming problem that is NP in general. In addition, BS also needs to concurrently maximize the data rate and minimize the power consumption while keeping the average queue length to be stable, which requires BS to maintain a good balance among SE, EE, and queue length.

III. STOCHASTIC OPTIMIZATION BASED ON LYAPUNOV DPP TECHNIQUE
To resolve the MOO in (23), we adopt the Lyapunov DPP technique to design an online algorithm with admission control to resolve the complex joint RA, UA, and PC problem. Specifically, it involves the following key components.

A. VIRTUAL QUEUES
In (23), C2 represents the stability constraint to ensure the arrivals to be eventually served by the system. To address this constraint, we define virtual queues [29], and after the initial state H u (0) = 0, conduct the queue to be updated by In addition, we transform the average power constraint C3 to a queue stability problem by introducing virtual queues Z(t) = {Z 1 (t), Z 2 (t), . . . , Z s (t)}, and after the virtual Z s initialized with 0, update it by By the transformation, if the virtual power queues Z s (t), ∀s ∈ S are all stable, the power constraint C3 will be satisfied [29].

B. LYAPUNOV DPP AND PROBLEM TRANSFORMATION
Next, according to the Lyapunov DPP, we can define (t) = and Z s (t) queues, and then introduce a quadratic Lyapunov function to represent a scalar metric of queue congestion as follows: It can be seen that a small value of the Lyapunov function would represent small lengths of the data and virtual queues. Thus, by pushing the Lyapunov function towards a lower congestion state, the queue stability can be ensured. To be clearer, a one-slot conditional Lyapunov drift is defined by Clearly, the drift function denotes the expected change of the Lyapunov function between two contiguous slots conditioned on the current state (t). By using the drift ( (t)), we can force the Lyapunov function in a lower congestion state, and make queues keep stable to control the queueing delay [29]. Given that, the queue stability constraint C1, the average throughput constraint C2, and the average power constraint C3 can be transformed to minimize the drift. Specifically, to accommodate the different units used in the metrics, we let R tot (t) = R tot (t)/R max and and Q, respectively, and divide P s (t) and P max s by P max as P s (t) and P max s as well, for the normalization. Providing the above, the MOO problem is transformed via the weighted sum method to whereV denotes the weight on the queue length, and W the weight on the system performances including the spectrum efficiency and the power consumption which cooperatively exhibit the energy efficiency. To further accommodate the quantitative metric difference between the queue length and the system performances, we letV /ω = V with ω to absorb the difference. Here, the weighted sum method is adopted as it is extensively used for MOO problems to provide not only multiple solution points by varying the weights consistently, but also a single solution point reflecting the preferences incorporated in the selection of a single set of weights [31]. However, due to the drift term involved, directly solving (28) is still challenging even given the weighted sum method. For this, our DPP-based dynamic control algorithm is conducted to make decisions on allocating UEs/RBs/ CCs/MCSs/cells/PLs to minimize an upper bound of the following drift-plus-penalty at each time slot t. Specifically, such an upper bound on the the Lyapunov drift ( (t)) resulted from the normalized metrics Q u (t), H u (t), and Z s (t) can be shown by the following theorem.
Theorem 1: For all t and (t), the drift-plus-penalty with any joint RA, UA, and PC strategy satisfies the inequality: Proof: See Appendix A. Given that, we can multiply both sides of (29) by V and add −W E{ R tot (t)| (t)} + (1 − W )E{ P tot (t)| (t)} at both sides to obtain an upper bound of the objective in (28) as wherein Q u (t) denotes the data queue in (6) obtained with the normalized R u (t) and µ u (t), H u (t) obtained with R u (t) and R req u , and Z s (t) obtained with P s (t) and P max s . For a more concise representation, the constants (V , , R req u , P req s (t), and P c s,u ) and the involved terms can be neglected. In addition, R tot and P tot can be shown in terms of µ u (t) and P s (t), respectively, while ignoring any constant to be involved. Further, as R u (t) ≤ A u (t) with the assumption of A u (t) ≤ A max u (t) taken at every slot t in C5 implies r u = 1 , the latter (C4) as well as the other expectation operations could be removed when considering the optimization at each slot t by employing the concept of opportunistically minimizing the expectation. Finally, by optimizing the right hand side of (30), we can transform problem (28) to In the sequel, we aim to find a solution to the MOO problem by decoupling the programming model (31) to an admission control sub-problem and a transmission control sub-problem, which can be solved independently and simultaneously.

1) TRAFFIC ADMISSION CONTROL
From the objective of (31), we can see that the first term, can be used to admit the traffic out of A u (t) arrivals to transmit subject to the constraints involving R u (t). Specifically, by decoupling this term from the joint problem, a traffic admission control sub-problem can be formulated as For the linear problem obtained, we have a simple thresholdbased admission control strategy as This strategy clearly shows that the newly arrivals can be admitted to transmit only when the the virtual queue length H u (t) is larger than the actual data queue length Q u (t). Otherwise, the arrivals will not be admitted so that the data queue can be stable. In fact, the threshold-based admission control would reduce the value of H u (t) to push r u (t) towards R u (t), thus stabilizing all the data queues involved.

2) MULTI-RESOURCE ALLOCATIONS FOR TRANSMISSION CONTROL
As the MRA problem for the transmission control is a core of the framework, our main challenge is to concurrently determine UEs, CCs, RBs, MCSs, cells, and PLs at each time slot t to make an optimal tradeoff decision among SE, EE, and queue length. To this end, the MRA problem is formulated by decoupling the joint optimization problem to concurrently consider the terms with respect to the transmission data rate µ u (t) and the energy consumption P s (t) in the right hand side of (30) subject to the scheduling constraints. Specifically, with their signs reversed for changing minimization to maximization, the transmission control sub-problem can be represented by subject to scheduling constraints (8) − (17), ∀t (34) As shown in Sec. II-D, the variables x u,c,b, ,s,p in the scheduling constraints are binary, and thus the transmission control sub-problem involving only these variables is a binary integer programming (BIP) problem that is NP hard in general. Moreover, as x is not only binary but also highdimensional, finding an optimal solution to this problem would be time-consuming even given optimization tools.
To address this challenge, in addition to solving the BIP with an IP solver, we develop also a distributed or semi-distributed algorithm wherein the network nodes, or SBSs, can perform the allocation independently or by the minimal assistance of the central controller, or MBS, to be a more efficient solution for practical implementations because of reduced computational complexity. VOLUME 8, 2020

IV. DISTRIBUTED TWO-LEVEL MULTI-RESOURCE ALLOCATION ALGORITHM
To realize a distributed or semi-distributed scheduling algorithm to resolve the MRA problem in (34) for the downlink transmissions, we first reduce the programming model to a single-cell problem with only one power level. Inspired by the reduced model, we then propose a distributed two-level MRA algorithm for more computationally efficient solutions.

A. REDUCED MODEL
As the first step for the reduced model, the binary variable is reduced to x u,c,b, that involves no cells s and PLs p. Given that, the constraints to avoid allocating RBs to neighboring cells are no longer necessary, and can be reduced to In addition, the CC cardinality constraint on each cell is also unnecessary and can be ignored while the constraint on the number of CC that each UE can have could be rewritten as where y u,c, is a new auxiliary variable adopted here, and its value 1 represents that CC c is assigned to UE u with TM index , and 0 otherwise. Further, because this scenario considers only one power level in a single cell, constraints (8) and (9) can be simplified as Now, with a linear object function subject to the constraints (35), (36), and (37), we can reduce our model to a single-cell problem without multiple discrete power levels as that in [9] using certain equivalent max operations in their constraints. There is no doubt that an IP problem like the above is NP-hard in general. However, for the reduced problem, the coefficient matrix of constraints could be a totally unimodular (TUM) matrix whose determinate for every square submatrix equals -1, 0, or 1. To show this possiblity, we first let u = |U|, c = |C|, = |L|, and b = |B| to more concisely represent these quantities, and then order the variables x and y to construct the following vectors whose length is uc b + uc . Given that, the constraints (35), (36), and (37) can be more concisely represented. Specifically, let blkd(K , k) be a block diagonal matrix wherein a block K is repeated k times in the diagonal line and the other elements are all 0. Given that, the identify matrix 1 b is denoted by blkd(1, b), which leads to β 1 = [1 b · · · 1 b ] b×b . Then, (35) can be reformulated by where A 1 = [blkd(β 1 , c) · · · blkd(β 1 , c)] cb×cbu . Further, let d = [d 1 · · · d u ] and A 2 = blkd(1 T c , u). Then, (36) can be rewritten as Similarly, by defining β 3 = −blkd(1 b , uc ) and then A 3 = [1 uc b , β 3 ] uc b×(uc b+uc ) in addition to A 4 = blkd(1 T , uc), we can transform (37) to Finally, by integrating all the constraints into a canonical form, we have Av c where and As noted before, the coefficient matrix of constraints, A, could be totally unimodular (TUM). As a simple example, given u = 2, c = 1, = 2 and b = 1, the coefficient matrix A will be  Thanks to the small size, its TUM can be verified by exhaustively checking every square submatrix of A to have determinant equal to ±1 or 0. As Schrijver revealed [32], if A of an integer linear problem (ILP) is TUM, the ILP can be relaxed to a linear programming problem (LP) by removing the integrality constraints. Then, the LP relaxation of an ILP could be solved through any standard LP technique. This is further verified in our numerical experiments for the two-level approach wherein the optimal results for the single-cell problem could be obtained by using a LP solver in usual cases. However, the TUM property is not guaranteed. For example, given u = 2, c = 1, = 2 and b = 2, In the above, the determinant of its square submatrix could be 2 in addition to ±1, or 0. Although it is not always TUM, the non-TUM integer programming problems only contribute a small part of the experiments in Sec. VI, and have a coefficient matrix like (48) being 2-regular 1 which could be still solved efficiently as that shown in [34].

B. DISTRIBUTED TWO-LEVEL MRA ALGORITHM
Inspired by the reduced problem or model, we next propose a distributed or semi-distributed two-level MRA algorithm to resolve the MRA problem for more computationally efficient solutions. Specifically, in view of the start-of-the-art RA algorithms such as those implemented in [9] adopting a two-step assignment in a single cell, we would first allocate CCs in an optimal sense, and then allocate the other resources in the multi-cell scenario.

1) THE FIRST LEVEL RA (ON CC)
For the CC allocation, we sort c CCs in decreasing order on the data rate perceived by cell s according to the 3GPP wherein W i = 1 i represents the weight of CC C s i that is the ith element in the list C s in decreasing order. This weight implies that the higher SNR or data rate the cell s perceived, the higher preference in the objective function. In addition, x s c is a binary variable deciding whether cell s is allocated with CC c or not, and x s s i denotes the variable corresponding to CC C s i . Given that, D1 denotes the constraint to enforce each cell to be allocated at least one CC for its UEs. D2 ensures that each cell can have at most f s CCs. D3 represents a constraint that as long as a CC is allocated to cell s, it can not be assigned to its neighboring cells s ∈ N s to prevent excess interference.
Although the above CC allocation seems to be less complex and may be solved more easily when compared with the joint optimization problem in question, it is still an IP problem that is NP in general if no special structures are imposed. To resolve this problem, we conduct a greedy algorithm performed by MBS to reduce the complexity of solving the RA problem on CC while giving suboptimal solutions to be satisfactory enough. To this end, the data rates of cell s on different CCs c based on the CQIs reported by UEs are recorded in a table L of MBS, in addition to a table C initialized to be empty to record the allocation results. As shown in Algorithm 1, for each CC c ∈ C, it finds the cell s * that has the highest data rate obtainable, and if this rate is non-negative and the number of CC allocated to this cell does not exceed its limit f s * , then it permits the allocation and records this with C(c, s * ) = 1. While allocating, it forbids the neighboring cellsŝ ∈ N (s * ) to allocate the same CC by setting L(c,ŝ) = −1 so that these neighboring cells have no v * greater than 0 to enter the procedure starting at line 6. while v * > 0 do 6: if c C(c, s * ) < f s * then As shown in the literature such as [9], given a number of CCs, the research works usually focus on the RA problem to allocate the radio resources (RBs and CCs) to UEs in a single cell. However, without the viewpoint of multiple cells, they inevitably ignore the problem that if multiple neighboring cells are allocated with the same CCs, they could use the same RBs of these CCs to cause inter-cell interference. In this work, the first level RA is already conducted to avoid the inter-cell interferences by allocating different CCs to the neighboring cells. Then, supposing that UE associating with the nearest BS is given in advance at a larger time scale, our programming model could be degraded for the second level RA to consider only one cell but still take into account multiple PLs and the other resources. This is different from the reduced model in Sec. IV-A or that shown in [9] addressing no power control. Specifically, without the notion of s, the scheduling constraints (8) and (9) could be reformulated for the second level RA as where x u,c,b, ,p corresponds to the binary variable x u,c,b, ,s,p , and y 1 u,c, to y 1 u,c, ,s . In addition, similar to that for the reduced model in Sec. IV-A, the constraints specific to the multi-cell environment should be also deleted or modified to fit the single-cell scenario, as summarized as follows: • First, the monopoly constraint on UE to be served by a single cell, represented by (10) and (11), are no longer needed and thus deleted here.
• Second, the monopoly constraint to enforce that a specific RB is either allocated in a cell s or its neighboring cells s ∈ N s would be modified to consider only the allocation of an RB of a CC for a given cell. Thus, the constraints (12) and (13) can be reduced to ∀u∈U ,∀ ∈L,∀p∈P x u,c,b, ,p ≤ 1, ∀c ∈ C, ∀b ∈ B (52) • Third, the cardinality constraint f s on s, i.e., the number of CC allocated to cell s not to exceed the limit f s , has been done by the first level allocation on CC, and can be eliminated. That is, (14) and (15) can be removed here.
• Fourth, the cardinality constraint that the number of CC allocated to UE u can not exceed d u is still valid despite the number of cells. However, in the case of given s, the notion on cells should be eliminated, and thus (16) and (17) would be changed to Here, we would rather use y 5 u,c to correspond to y 5 u,c given in Sec. II-D to preserve a similar representation than adopt a new auxiliary variable like y u,c,l in Sec. IV-A which is tailored for the notational simplicity in the reduced model considering no power control.

3) HARDNESS RESULT
As shown readily in the above, providing that CCs are given by MBS, the binary variables x in the second level RA as well as r u and R u shown in Sec. III-C can be independently decided by SBSs for their UEs at each time slot t, leading to the Lyapunov DPP-based dynamic control in Sec. III-B realized in a distributive manner. In addition, as indicated in Sec. IV-A, the coefficient matrix involved could be TUM or possibly 2-regular, leading to the corresponding RA problem solved efficiently as shown in our numerical experiments. The above is worth noting because an MRA problem is NP-hard in general if no special structures are imposed. In our case, even the reduced model considered in Sec. IV-A is NP-hard, similar to those already proved in the literature. For example, by mapped to the well-known 3-SAT problem, a radio resource scheduling problem in [16] was proved to be NP-hard subject to the constraints: (i) each RB can be assigned up to one UE, and (ii) only one MIMO mode can be selected for all assigned RBs for a UE. Here, when considered with only one CC and two MCSs, the reduced model can be regarded as a special case of the scheduling problem, in which the MCS selection is equivalent to the MIMO mode selection in the scheduling problem, and hence it could be reduced to a NP-hard problem through the same way of 3-SAT mapping. In contrast to the NP-hardness, if the coefficient matrix in this model is TUM, the LP resulted would be of order (u + cb) × (uc b + uc ) maximizing the objective over all v in R (uc b+uc ) such that Av c. According to [35], the worst-case complexity of solving the LP problem would be O( √ u + cb + uc b + uc ln 1 ), where determines the accuracy of the solutions obtained with the barrier method in [35]. When u approaches infinity, it could be simply represented by O(u 1/2 ), showing a significant improvement on the time complexity for the NP-hard problem. On the other hand, the state-of-the-art greedy algorithm in [9] that is based on submodular set functions has the time complexity O(u 2 d u cb ), or simply O(u 2 ) if u approaches infinity, which is clearly higher than the former.
Finally, the overall optimization algorithm for solving the stochastic MOO problem (28) is given in Algorthm 2 as a summary for easy reference.

V. PERFORMANCE BOUNDS
Thanks to the Lyapunov DPP approach, the data and virtual queues involved will be mean rate stable, which can be proved as that shown in [29]. In addition to the stability on queues, the weighted sum function defined by is also considered that can capture the performance of EE and SE for a specific range of W [36]. Specifically, with x(t) to denote x u,c,b, ,s,p (t) for the simplicity on its representation, we have the following theorem regarding this metric: Theorem 2: If problem (28) is feasible, problem (31) is optimally solved, and E{L( (0)} < ∞, then the weighted sum function f (x(t)) has the performance bounds as E{f (x(τ ))} < f opt (56)
Proof: Please refer to Appendix B.

VI. NUMERICAL EXPERIMENTS
To numerically evaluate our proposal, we conduct a simulation topology consisting of 1 micro base station (MBS) and 3 small base stations (SBSs). As shown in Fig. 2, each base station (MBS or SBS) initially serves 3 user equipments (UEs) that are located within its transmission range. The other parameters for the experiments are summarized in Fig. 3 for reference. As shown therein, the numbers of resources for the multiple resource allocation (MRA) problem involved would be significantly high enough for an optimization tool to obtain a solution to the high-dimensional combinatorial problem within a reasonable period of time. Given that, each UE is simulated to estimate the channel quality on each resource block (RB) of each component carrier (CC) by using reference signals transmitted from base  stations (BSs), measuring their signal to noise ratios (SNRs) and using a mapping table such as that in [37] to obtain the channel quality indicators (CQIs) to be reported to BSs. Here, SNR of each RB perceived by UE is assumed to be a random variable uniformly distributed in the range between −5 and 22.38 according to the SNR-CQI index mapping in [37], so that the allocation results would involve all possible mapping values in the simulation. Given that, the CQIs collected, the MCS index mapping, and TBS index tables specified in [38] are resulted and used by BS to estimate the achievable data rates for UEs required by the different algorithms for comparison in the experiments.

A. PERFORMANCE TRADEOFF ON SE, EE, AND QUEUE LENGTH
In the first set of experiments, we focus on the critical factors to impact the optimization algorithm. To show this, the time-average spectrum efficiency η SE , energy efficiency η EE , data queue length Q, and throughput γ are represented by the mean values of the corresponding metrics in their normal scales obtained from all UEs involved. As shown in Figs. 4(a) and 4(b), the performance trends of spectrum efficiency (SE) and energy efficiency (EE) are shown by varying the system parameters (or weights), W and V , given ω = 0.002. Specifically, in Fig. 4(a), it can be seen that, given the same V , the SE value would continuously increase with W due to the higher transmitted power adopted to enhance the transmission rate. Unlike the performance trend on SE, it can be seen in Fig. 4(b) that EE would rise up first and then turn to decrease as W increases. The trend on EE can be so observed because when W is high, the transmit power consumption is negligible as compared to the circuit power consumption, and in this condition SE would grow faster than the total power consumption dominated by the circuit. Thus, as shown in these sub-figures (Figs. 4(a) and 4(b)), when W is low, EE increases as SE increases. However, after the peak value of EE, the transmit power would dominate the total power consumption instead of the circuit power, and the increment of the total power consumption would be larger than that of SE afterward, leading to EE gradually decreased as shown in this sub-figure ( Fig. 4(b)).
From another viewpoint, it can be also seen that, for a given W , increasing V could decrease SE or EE at a milder degree than the above. Specifically, when W < 0.9 along with the other parameters in the experiments, EE is exhibited to decrease as V increases while SE is shown to increase with the growth of V . However, when W ≥ 0.9, the trend is slightly reversed with some fluctuations. This trend could be observed because, when W is small, increasing V would require the wireless links to increase their data rates so as to decrease the average queue backlog, which leads to a better SE. At the same time, as the emphasis on the data rate is given to all links in the network, the total transmit power would thus increase, which eventually degrades EE. As just indicated, this trend is slightly changed when W is large. In this case, increasing V provides a stronger enforcement on reducing the queue length to a lower congestion state as indicated. However, due to a large weight W on the data rate, lowering the energy consumption would contribute more to EE when a larger data rate is resulted in this case.
Apart from the above, the results on the average queue length (Q) are summarized in Fig. 4(c). As exhibited therein, although with slight fluctuations, this metric has the trend to descend with W and V . With respect to V , it has been shown in the above that as V increases, the system would emphasize more on decreasing the queue length, and therefore the queue backlog declines. With respect to W , it can be seen from the objective function that a larger W represents a stronger emphasis on SE. When applied, the transmission data rate is enhanced and thus the average queue length is decreased.
Finally, we present the performance trend on the system throughput γ defined as the sum of UE throughputs, i.e., γ = ∀u r u . By comparing Fig. 4(a) with Fig. 4(d), we can see that the SE and the throughput have the same trend because the admission control is based on the average queue length, and a higher transmission data rate (SE) would reduce the average queue length, which leads to more traffic to be admitted for entering the queues of UEs (i.e., increasing the throughput). However, it can be also seen that when W < 0.2, the SE (η SE ) is less than the throughput (γ ). This could be observed because in the experiments the traffic arrival rate is conducted to saturate the system; that is, its value is much higher than the throughput requirement, A u (t) R req u , ∀u, t. When W < 0.2 in the experiments, the service rate could not fulfill the requirement, and queue will build up after admitting the arrivals. Given that, the transmission data rate would take a long time to resolve the queue backlog, leading to a low SE and a high queue length as shown in Fig. 4(a) and Fig. 4(c), respectively, in the W < 0.2 range. In this range, the system throughput in the long term would be larger than the SE while satisfying the minimum requirement R req u , ∀u. In contrast, when W increases to be larger than 0.2, the service rate and the SE approach the system capacity that is much higher than R req u , ∀u. In this situation (W ≥ 0.2), the queue length could decrease as shown in Fig. 4(c). Therefore, to satisfy the throughput requirement R req u , ∀u that is less than the capacity, the throughput γ would be less than the SE according to the admission control, which can be observed when comparing Fig 4(a) with Fig. 4(d), as well.

B. PERFORMANCE COMPARISON
In the second set of experiments, we let W be 0.1 and vary V to exemplify the performance differences between the optimization algorithm (Algorithm 2) using an IP solver, and that using the distributed two-level MRA algorithm to resolve the MRA problem. For the distributed two-level algorithm in the latter, we use the second level RA algorithm proposed in Sec. IV-B2 to obtain the optimal solutions in the reduced programming model, or adopt the greedy algorithm as well as the LL+RS algorithm in [9] to obtain the heuristic solutions in the second level. Similarly, we use either an optimal IP solver or the greedy CC allocation algorithm shown in Algorithm 1 to resolve the first level RA problem on CC (49). Specifically, by first indicating the RA algorithm used in the second level, and then that used in the first level, we denote the variant methods based on the distributed two-level MRA algorithm by 2L ''optimal/greedy/LL+SS RA'' with ''optimal/greedy CC'', resulting in 3 × 2 = 6 different two-level (2L) method-names as shown in the legends of Fig. 5.
In Fig. 5(a), providing W = 0.1, the SE is shown to increase as V increases, complying with the results in the first set of experiments with W < 0.9. Clearly, all the two-level methods have the same trend. However, the 2L optimal RA-based methods would outperform the 2L greedy-based methods (including those using the greedy RA algorithm and the LL+SS RA algorithm in the second level) in spite of the CC allocation obtained either optimally or greedily in the first level. In addition, it can be also seen that in the greedy-based methods, the methods using the greedy RA algorithm would outperform those using the LL+RS RA algorithm, in spite of the CC allocation as well. Nevertheless, how to allocate CC still has its own impact on the performance, exemplified by the fact that the greedy CC allocation algorithm would improve the computational complexity at the cost of slightly decreasing SE when compared with the optimal.
Similarly, from Fig. 5(b), we can see that by means of the second level RA algorithm (in Sec. IV-B2) and the greedy CC allocation algorithm (i.e., Algorithm 1) in the first level, the 2L optimal RA with greedy CC method can approach the optimal EE. This suggests that using the distributed two-level algorithm proposed in Sec. IV-B to replace an IP solver to resolve the MRA problem is a good way to trade the EE performance off against the time complexity. In addition, as already noted in Sec. VI-A, the trend is the same for all the methods for comparison that EE would decrease as V increases if W < 0.9. However, when compared with the 2L optimal RA-based methods, the 2L greedy-based methods have worse performances and would drop even more significantly, especially when V increases larger than 0.4.
From the viewpoint of queue length, the tradeoff would be seen more clearly. Specifically, although the 2L optimal RA-based methods can approach the joint optimization with an IP solver in terms of EE, the queue length would be the cost, as shown by its value not so close to the optimal exhibited in Fig. 5(c). In addition, among the 2L greedy-based methods, the methods with the greedy RA algorithm would be better than those with the LL+RS RA algorithm in terms of queue length. The performance difference also provides a valuable reference to choose the greedy-based methods in addition to that based on SE and EE. Further, from Fig. 5(d), we can see that the performance trend on the throughput is the same as that on SE. It is expected because the traffic allowed for transmission reflects the data rate resulted in the dynamic system through the Lyapunov-based admission control. More specifically, the trend complies with that shown in Sec. VI-A, which can be also observed here by comparing Fig. 5(a) with Fig. 5(d) to show that γ would be larger than SE when W < 0.2 (in this case W = 0.1), and these performance metrics would increase as V grows in this case. Finally, as shown in all the sub-figures of Fig. 5, the greedy CC allocation algorithm might decrease the performance metrics only at a slight degree when compared with the optimal CC allocation. Apart from the CC allocation, the whole two-level MRA algorithm would lead the system to achieve more than half of the optimal SE and γ , and approach the optimal EE while maintaining a larger queue length. These confirm our design aim to obtain an effective algorithm to reduce the complexity for the RA optimization while obtaining sub-optimal solutions to be satisfactory enough.
Next recall that, in Sec. II-B, the channel condition is assumed to remain unchanged during an allocation period which leads to a CQI for RB to be mapped to the highest-rate MCS for a UE using the RB [26], and the channel conditions on all UEs and RBs can be perceived by the system through the CQIs reported [9]. However, if certain issues affecting the condition arise, e.g., the channel suddenly varies fast during the period, or the BSs adopt a large-scale CSI (including path-loss and shadowing) scheme for SINR to reduce the CQI overhead [40], the MCSs obtained could be overestimated to result in an unacceptable bit error rate for the transmission. For this, in addition to the full CSI assumption adopted before (represented here by P e = 0), we assume the over-estimation error to be happened with a probability P e = 0.1, 0.3, or 0.5, which diminishes the data rate of a MRA to 0 due to the unacceptable bit error rate resulted, while fixing W = 0.1 and V = 0.5 to exemplify the performance trend. Otherwise, the data rate is considered to be correctly represented by v u,c,b, ,s,p (t).
The results are now summarized in Fig. 6 for reference. Specifically, it is shown in Figs. 6(a) and 6(b) that SE and EE would degrade on their metrics, respectively, and if normalized with respect to the error-free results (i.e., SE and EE with P e = 0), their values would be around the four levels of P e (i.e., 0, 0,1, 0.3, and 0.5) for each method with little fluctuations, as expected. In Fig. 6(d), the throughput is similarly shown to degrade, but if normalized with respect to its error-free result, it would be lower than the P e levels, respectively, except the first 0. That is to say, even with a less data rate due to P e , the throughput or admitted traffic does not decrease as much, and the excess traffic can be absorbed by the increased queue length as shown in Fig. 6(c). This reveals a unique merit brought by the DPP-based dynamic control with the data and virtual queues to achieve the system stability while maximizing the network utility.
Finally, we show that the distributed two-level RA algorithm (represented here by 2L RA-based and greedy-based methods) would be computationally efficient in terms of the LP optimal ratio and the objective improvement. First, by LP optimal ratio, we mean the number of optimal results obtained by a linear programming (LP) problem solver divided by the total number of results in the experiments. As shown in Table 2, this ratio is around 96% despite the CC allocation algorithm, exhibiting the fact that most of the experiment instances would have their coefficient matrices to be TUM and could be solved easily without an integer programming (IP) problem solver for the MRA problem. Second, by objective improvement, we mean the improvement degree on the objective function values obtained by  a linear programming problem solver compared with those by an integer programming problem solver. Here, the small negative value around −0.066% exemplifies the performance trend that LP would only slightly degrade the objective function while significantly improve the computational complexity from nondeterministic polynomial (NP) to polynomial (P). Taking a closer look to see the components in the objective function, i.e., SE, EE, and queue length (Q), we can find also that as the objective is a weighted sum of these metrics, it is not necessary that all of them would be degraded as the objective function itself. Specifically, SE and Q are degraded while EE is improved, and γ is also degraded, reflecting the same trend on SE.
As shown in the above, most of the problems encountered have their coefficient matrices to be TUM and can be solved efficiently. However, a MRA problem may still have its coefficient matrix without the property of TUM or 2-regular. In this case, developing a (re)formulation that leads to an integer polyhedron for a subset of constraints can be valuable. This in fact motivates the study to find the classes of constraints so that specific cutting plans can be found to yield an exact representation or a tighter approximation of the convex hull of the feasible integer points. It further invokes the decomposition-based approaches that decompose the original IP formulation into several subproblems, one or more of which can be solved efficiently through an exact or approximated representation developed for the integer solutions [39]. In our work, such a development for the MRA problem would be very complex if it is not impossible, which requires our future study.

VII. CONCLUSION
In this work, we have addressed a joint stochastic optimization problem on energy efficiency (EE), spectrum efficiency (SE), and queue length for downlink transmissions in the 5G LTE-based heterogeneous wireless networks with the advanced techniques for 5G such as multi-input multi-output (MIMO) and carrier aggregation (CA). Specifically, for the multiple objectives to be optimized concurrently, we proposed a Lyapunov optimization framework on the downlink transmissions with an overall and comprehensive consideration for the joint problem concurrently involving resource allocation (RA), user association (UA), and power control (PC) in the complex networks. In particular, for the multiple resource allocation (MRA) problem involved, which is NP-hard and served as a key issue in this work, we had shown a reduced problem to be solved easily via linear relaxation when its coefficient matrix is totally unimodular (TUM), and accordingly, developed a distributed or semi-distributed algorithm with low computational complexity to resolve this NP-hard problem. Finally, in the numerical experiments, we have demonstrated that our framework can make a good tradeoff among EE, SE, and queue length, provide the design insights on the tradeoff decision through the system weights designed, and produce the performance metrics outperforming the greedy-based state-of-the-art counterparts.

APPENDIX A PROOF OF THEOREM 1
First, by squaring both sides of the data queue dynamic would hold. Then, by summing over all u ∈ U and taking the fact R u (t) ≤ A max u (t) and µ u (t) ≤ µ max u (t) into account, we have Similarly, for the virtual queue dynamics H u and Z s , we have Next, by combining these bounds and taking the expectation with respect to (t) on the both sides of the result, we derive the one-slot conditional Lyapunov drift as In the above, = u∈U ( A max u ) 2 + s∈S ( P max s ) 2 + 1 2 u∈U ( µ max u ) 2 + 1 2 u∈U ( R max req u ) 2 is obtained by combining the constant terms in the right hand sides of (58), (59) and (60), where µ max u denotes the maximum of the transmission rate µ u that can be obtained on u, A max u is A max u /R max , and R max req u represents the maximum request allowed for u. Finally, by removing the expectation operations on the constant terms, we can obtain the inequality shown in (29).

APPENDIX B PROOF OF THEOREM 2
Assume E{R u (t)} ≤ η 1 , E{P s (t)} ≤ η 2 , and E{A u (t)} ≤ η 3 , where η 1 , η 2 , and η 3 are finite positive constants. According to Theorem 4.5 in [29], if problem (28) is feasible and the boundedness assumption is given, then for any δ > 0 there is one policy that can satisfy E{f (x * (t))| (t)} = E{f (x * (t))} ≤ f opt − δ (62) E{ P * s (t)| (t)} = E{ P * s (t)} ≤ P max s + δ (64) E{ R * u (t) − µ u (t)| (t)} = E{ R * u (t) − µ u (t)} ≤ δ (65) As the optimization is turned to minimize the R.H.S of (30), the optimal solution to this problem must satisfy V ( (t)) − E{f (x(t))| (t)} Given that, we can take expectations of both sides of (68) and then adopt the law of iterated expectations, resulting in The results for all t ∈ {0, 1, . . . , T −1} can be futher summed up, and dealt with the law of telescoping sums to yield E{f (x(t))} + TV −Tf opt (70) Next, by rearranging the above while neglecting non-negative terms when appropriate, we can obtain, for all T > 0, As T → ∞, the lower bound is obtained while the upper bound is clearly represented by f opt as the the maximum value of E{f (x(t))} as defined, which completes the proof. YU-CHEN HU (Senior Member, IEEE) received the Ph.D. degree in computer science and information engineering from the Department of Computer Science and Information Engineering, National Chung Cheng University, Chiayi, Taiwan, in 1999. He is currently a Professor with the Department of Computer Science and Information Management, Providence University, Taichung City, Taiwan. His research interests include image and signal processing, data compression, information security, data engineering, and computer networks. He is also a member of the Phi Tau Phi Society, China.