Power-and Rate-Adaptation Improves the Effective Capacity of C-RAN for Nakagami-$m$ Fading Channels

We propose a quality-of-service (QoS) driven power-and rate-adaptation scheme for wireless cloud radio access networks (C-RAN), where each radio remote head (RRH) is connected to the baseband unit (BBU) pool through high-speed optical links. The RRHs jointly support the users by efficiently exploiting the enhanced spatial degrees of freedom attainted by the powerful cloud computing facilitated by the BBU pool. Our proposed scheme aims for maximizing the effective capacity (EC) of the user subject to both per-RRH average-and peak-power constraints, where the EC is defined as the tele-traffic maximum arrival rate that can be supported by the C-RAN under the statistical delay-QoS requirement. We first transform the EC maximization problem into an equivalent convex optimization problem. By using the Lagrange dual decomposition method and satisfying the Karush-Kuhn-Tucker (KKT) conditions, the optimal transmission power of each RRH can be obtained in closed form. Furthermore, an online tracking method is provided for approximating the average power of each RRH for the sake of updating the Lagrange dual variables. For the special case of two RRHs, the expression of the average power to be assigned to each RRH can be calculated in explicit form, which can be numerically evaluated. Hence, the Lagrange dual variables can be computed in advance in this special case. Our simulation results show that the proposed scheme converges rapidly for all the scenarios considered and achieves 20\% higher EC than the optimization method, where each RRH's power is independently optimized.


I. INTRODUCTION
The fifth-generation (5G) wireless system to be deployed by 2020 is expected to offer a substantially increased capacity, despite reducing the energy consumption of the fourth-generation (4G) system [1]. To achieve this ambitious goal, the cloud radio access network (C-RAN) concept has been regarded as one of the most promising solutions, which relied on the techniques of network function virtualization (NFV), software-defined networks (SDN) and cloud computing [1]. In particular, C-RAN is composed of three key components: 1) a pool of baseband units (BBUs) centrally located at a cloud data center and relying on the techniques of cloud computing, NFV, and SDN; 2) low-cost, low-power distributed radio remote heads (RRHs) deployed in the network; 3) high-bandwidth low latency fronthaul links that connect the RRHs to the BBU pool. Under the C-RAN architecture, most of the baseband signal processing of conventional base stations has been shifted to the BBU pool and the RRHs are only responsible for simple transmission and reception functions, to a benefit of the recent development of cloud computing techniques, some of the hitherto centralized signal processing operations can be relegated to the BBU pool, such as cooperative transmission, transmit precoding, user scheduling, etc. Hence, the network capacity and energy efficiency can be significantly improved.
Recently, the performance of C-RAN has been extensively studied [2]- [7], albeit these papers have been focused on the physical layer issues, which giving no cognizance to the delay of the upper layer's. However, most of the multimedia services, such as video conferencing, interactive gaming and mobile TV have stringent quality of service (QoS) requirements in terms of the tolerable delay. Due to the time-varying characteristics of fading channels, it is impossible to impose a deterministic delay-bound guarantee for wireless communications. However, satisfying a certain maximum delay-outage probability is feasible. For example, in the Long Term Evolution (LTE) Advanced standard, the probability that the delay of online gaming is higher than 50 ms should be below 2% [8]. For the sake of analyzing the statistical QoS performance, Wu et al. [9] introduced the notion of effective capacity (EC), which can be interpreted as the maximum constant packet arrival rate that can be supported by the system, whilst satisfying a maximum buffer-violation probability constraint. Since then, substantial research attention has been dedicated to studying the EC maximization problem by accounting for diverse QoS requirements.
However, due to the complex expression of EC, most of the existing papers focus on the EC maximization problem for the simple scenario, where there is only a single transmitter [10]- [16]. Specifically, a QoS-driven power-and rate-adaptation scheme was proposed for singleinput-single output (SISO) systems communicating over flat-fading channels in [10], with the objective of maximizing the system throughput subject to both delay-QoS and average power constraints, which is characterized by the QoS exponent θ. A smaller θ corresponds to a looser QoS guarantee, while a higher value of θ represents a more stringent QoS requirement. The power allocation scheme developed in [10] can be adapted to both the time-variant channel conditions and to the QoS requirements. The results of [10] showed that in the extreme case of very loose QoS requirements (θ → 0), the power allocation reduces to the conventional water-filling solution. By contrast, when the QoS requirements are extremely stringent (θ → ∞), the optimal power allocation becomes the channel inversion scheme, where the system operates under a fixed transmission rate. The problem of maximizing the EC of a frequency-selective fading channel was studied in [11], where the authors showed that independently optimizing the power of each channel by using the method of [10] yields a poor performance. Hence, jointly optimizing the power allocation over all channels was derived in [11], which can achieve both high throughput and a low delay at the same time. The EC maximization problem was studied in [12] in the context of cognitive radio networks, where the power constraints of [10], [11] were replaced by the maximum tolerable interference-power at the primary user. Closed-form expressions of both the power allocation and of the EC were derived for the secondary user. Michail et al. [13] provided a detailed EC analysis of both Nakagami-m, as well as of Rician and generalized-K multiple input single-output (MISO) fading channels by using random matrix theory. The power minimization problem subject to EC constraints was considered in [14] for three different scenarios, namely for a point-to-point link, a multihop amplify-and-forward relay network and for a multiuser downlink cellular network. Significant power savings can be achieved by using the power allocation scheme of [14]. To a further advance, both subcarrier and power allocation were investigated in [15] for a one-way relay network wherein the optimal subcarrier and power allocation was derived by adopting the Lagrangian dual decomposition method. Most recently, Wenchi et al. [16] considered both the average-and peak-power constraints when maximizing the EC, and provided the specific conditions, when the peak power constraints can be removed.
Some other related contributions that aimed for maximizing the effective energy efficiency (EE) of single-transmitter systems are [17]- [21], where the effective EE is defined as the ratio of EC to the average power consumption.
To avoid any traffic congestion and attain a good C-RAN performance, the adaptive power allocation scheme should take the diverse QoS requirements into account to guarantee the satisfaction of users. In this paper, we aim for jointly optimizing the power allocation of each RRH in order to maximize the EC of a user of the C-RAN, where both the average and peak power constraints of each RRH are considered. This user is jointly served by all RRHs of the C-RAN. Unfortunately, the power allocation schemes developed in the aforementioned papers for single-transmitter scenarios [10]- [21] cannot be directly applied to C-RAN's relying on multiple RRHs for serving the user and to simultaneously exploit the spatial degrees of freedom. The reason can be explained as follows. In these papers, there is only a single transmitter and only a sum-power constraint is imposed. The Lagrange method can be used to find the optimal power allocation, which is in the form of a water-filling-like solution in general. However, for the C-RAN, all RRHs have their individual power constraints and the power cannot be shared among the RRHs. Yu et al. [22] provided a detailed reason as to why the classic Lagrange method cannot be readily applied in C-RAN. Hence, new methods have to be developed. Specifically, the contributions of this paper can be summarized as follows: 1) In this paper, we derive the optimal power allocation for each RRH by resorting to the Lagrangian dual decomposition and the Karush-Kuhn-Tucker (KKT) conditions. The power allocation solutions depend both on the user's QoS requirements and on the joint channel conditions of the RRHs. For the special case of a single RRH, the power allocation reduces to the solution obtained in [16].
2) The Lagrangian dual decomposition requires us to calculate the subgradient, where the average power of each iteration should be obtained. However, it is numerically challenging to derive the expression of average power for each RRH. To tackle this issue, we provide an online training method for tracking the average power of each RRH. For the special case of a single RRH, the closed-form expression of average power can be obtained. For the more complex case of two RRHs, we also provide the expression of the average power for each RRH in explicit form, which can be numerically evaluated.
3) Our simulation results will show that the proposed algorithms converge promptly for any of the scenarios considered and can provide as much as 20% higher EC than the independent power allocation schemes, where the power allocation of each RRH is independently optimized by using the method of [16].
The rest of this paper is organized as follows. In Section II, the C-RAN system is introduced along with the concept of effective capacity and our problem formulation. In Section III, we provide the optimal power allocation for the general case of any number of RRHs. In Section IV, we derive the integral expressions's closed-form solution concerning the average transmit power of each RRH for the sake of updating the Lagrangian dual variables. Numerical results are also provided for quantifying the efficiency of the proposed algorithm in achieving the EC in Section V. Finally, our conclusions are drawn in Section VI.

II. SYSTEM MODEL
Consider a downlink C-RAN consisting of I RRHs and a single user 1 , where each RRH and the user have a single antenna, as depicted in Fig. 1. The set of RRHs is denoted as I = {1, 2, · · · , I}. All the RRHs are assumed to be connected to the BBU pool through the fronthaul links relying on high speed fiber-optic cable. The baseband signal processing of conventional base stations is migrated to the BBU pool for processing. In Fig. 2, the upper layer packets are first buffered in first-in-first-out (FIFO) queue, which will be transmitted to the physical layer of the RRHs. At the data-link layer, the upper-layer packets are partitioned into frames and then each frame will be mapped into bit-streams at the physical layer. The channel is assumed to obey the stationary block fading model, implying that they are fixed during each time frame of length of T f , while it is switched independently over different time frames.
To elaborate, we consider a Nakagami-m block-fading channel, which accounts for most of the practical wireless communication channels. The probability density function (PDF) of the Nakagami-m channel spanning from the i-th RRH to the user is given by: where Γ(m) = ∞ 0 w m−1 e −w dw is the Gamma function, m represents the fading parameter, α i denotes the instantaneous channel-power-to-noise ratio (CPNR) from the ith RRH to the user, and α i is the average received CPNR at the user from the ith RRH, denoted as P L i /σ 2 , where P L i is the large-scale fading channel gain spanning from the ith RRH to the user that includes the 1 The method developed in this paper can also be applied to the multi-user scenario, where all users apply the classical orthogonal frequency division multiplexing access (OFDMA) technique to remove the multi-user interference. path loss and shadowing effect, and σ 2 is the noise power. Let us define α = [α 1 , α 2 , · · · , α I ] T .

A. Effective Capacity
Based on large deviation theory, Chang et al. [23] showed that for a dynamic queueing system associated with stationary ergodic arrival and service processes, as well as certain other conditions, the queue length process Q(t) converges to a random variable Q(∞) obeying − lim where Q th denotes the queue length bound and the parameter θ is a positive real value. The above equation shows that the probability of the queue length exceeding a certain bound decays exponentially with the queue length bound. To elaborate, θ is an important parameter, representing the decay rate of the QoS violation probabilities. A smaller θ corresponds to a slower decay rate, which indicates that the delay requirement is loose, while a larger θ corresponds to a faster decay rate, which implies that the system is capable of supporting a more stringent delay requirement.
In other words, when θ → 0, an arbitrarily long delay can be tolerated by the system, which corresponds to the capacity studied in Shannonian information theory. On the other hand, when θ → ∞, this implies that no delay is allowed by the system, which corresponds to the very stringent statistical delay-bound QoS constraint of allowing no suffering at all. Since θ is closely related to the statistical QoS requirement, it is termed as the QoS exponent [9].
In the following, we introduce the important notion of effective capacity, which is defined as the maximum constant frame arrival rate that a given service process can support, while obeying the delay requirement indicated by θ. Let the sequence R[k], k = 1, 2, · · · represent the data service-rate, which follows a discrete-time stationary and ergodic stochastic process. The parameter k is the time frame index. Let us denote by S(t) t k=1 R[k] the partial sum of the service process over the time sequence spanning from k = 1 to k = t. Let us furthermore assume that the Gartner-Ellis limit of S[t], which is denoted by Λ C (θ) = lim t→∞ (1/t) log(E{e θS[t] }), is a convex differentiable function for all real-value of θ [10]. Then, the effective capacity of the service process specified by θ is defined as where E{·} denotes the expectation operator.

B. Problem formulation
For the user, the instantaneous service rate of a single frame, denoted by R(ν), can be expressed as follows: where ν ∆ = (α, θ) represents the network condition that includes both the channel's power gains and the EC exponent requirement, while B is the system's bandwidth. Note that transmit power of each RRH depends not only on its own channel state, but also on the other RRHs' channel state.
In this paper, we aim for optimizing the transmit power in order to maximize the EC for the user under two different types of power limitations for each RRH: under an average power constraint and a peak power constraint. The first one is related to the long-term power budget, while the second guarantees that the instantaneous transmit power is below the linear range of practical power amplifiers. Mathematically, this optimization problem can be formulated as follows, 8 where E α {·} denotes the expectation over α, while P avg i and P peak i denote the ith RRH's maximum average transmit power constraint and peak transmit power constraint, respectively.
To avoid the trivial solution, we assume that the peak transmit power is larger than the average transmit power, i.e. we have P peak i > P avg i , ∀i ∈ I. In this problem, the power allocations are adaptively optimized according to the CSI feedback from the user for each time frame, while the delay requirement is reflected by θ.

III. OPTIMAL POWER ALLOCATION METHOD
By exploiting the fact that log(·) is a monotonically increasing function, Problem (6) can be equivalently simplified as where we have ε (θ) = θT f B ln 2 . In Appendix A, we prove that Problem (7) is a convex optimization problem. Hence, the Lagrangian duality method can be used to solve Problem (7) with zero optimality gap.
Note that in Problem (7), only the average power constraints are ergodic, while the others are instantaneous power constraints. Similar to [24], we should first introduce the dual variables associated with the average transmit power constraints. Then, the original problem can be decomposed into several independent subproblems, where each one corresponds to one fading state. In addition, the instantaneous power constraints are enforced for each fading state. Let λ = [λ 1 , · · · , λ I ] T represent the nonnegative dual variables associated with the average power constraints. The Lagrangian function of Problem (7) can be written as where P (ν) = [p 1 (ν), · · · , p I (ν)] T . Let us now define P = {P (ν) |(7c)}. The Lagrange dual function is then given by The dual problem is defined as As proven in Appendix A, Problem (7) is a convex optimization problem, which implies that there is no duality gap between the dual problem and the original problem. Thus, solving the dual problem is equivalent to solving the original problem.
To solve the dual problem in (10), we should solve the problem in (9) for a fixed λ, then update the Lagrangian dual variables λ by solving the dual problem in (10). Iterate the above two steps until convergence is reached.
1) Solving the dual function in (9): For a given λ, we should find the dual function g(λ), which can be rewritten as where we have:g Note thatg (λ) can be decoupled into multiple independent subproblems, each corresponding to a specific fading state. Those subproblems have the same structure for each fading state. Hence, to simplify the derivation, ν is omitted in the following. Each subproblem can be expressed as: The above problem is convex, which can be solved by using standard convex optimization techniques, such as the interior point method [25]. However, its complexity is high. In the following, we obtain the closed-form solution by solving the KKT conditions of Problem (13).
First, we introduce the nonnegative dual variables of µ i , ∀i ∈ I, and δ i , ∀i ∈ I for the associated constraints in (13b). The KKT conditions for the optimal solutions of Problem (13) can be expressed as with p * i ≥ 0, δ * i ≥ 0, and µ * i ≥ 0, ∀i. Based on these KKT conditions, first the following lemma can be obtained.
Lemma 1: For any two arbitrary RRHs i and j, if p * i > 0 and p * j = 0, then the following relationship must hold Proof : Please see Appendix B.
Lemma 1 shows that the RRHs associated with a smaller λ i /α i should be assigned a non-zero power, while the RRHs having a larger value should remain silent. Let us hence introduce π as a permutation over I, so that we have α π(j) , when i < j, i, j ∈ I. Let I ⊆ I be the set of RRHs that transmit at a non-zero power. Then, according to Lemma 1, it can be readily verified that we have I = {π(1), · · · , π(|I |)}.

Lemma 2:
There is at most one RRH associated with 0 < p * i < P peak i , and the RRH index is Proof : Please see Appendix C.
Next, we derive the optimal power allocation of Problem (13) shown as in Theorem 1.

Theorem 1:
The optimal solution of Problem (13) is shown as follows, where |I | is the largest value of x, so that we have Proof : Please refer to Appendix D.
Corollary 1: When the number of RRHs is equal to one, i.e. I = 1, the optimal transmit power is given by Note that the above result is consistent with the point-to-point result obtained in Theorem 2 of [16].
2) Solving the dual problem (10): To solve the dual problem (10), we invoke the subgradient method, which is a simple method of optimizing non-differentiable objective function [26]. The subgradient is required by the subgradient of g(·) at I ] T in the kth iteration 2 . In the following theorem, we provide this subgradient.
Theorem 2: The subgradient of g(·) at λ (k) in the kth iteration is given by where is the optimal solution of Problem (9) when we have λ = λ (k) , and P avg = [P avg 1 , · · · , P avg I ] T . Proof : Please see Appendix E.
Based on Theorem 2, the Lagrangian dual variables can be updated as where ζ (k) is the step in the k th iteration. The subgradient method is guaranteed to converge if . In summary, the solution of Problem (6) is given in Algorithm 1.

Algorithm 1 Solving Problem (6)
Initialize: (20), increase k by 1; Until convergence To execute Algorithm 1, there is another issue that has to be tackled, namely how to calculate the average power for each RRH in order to obtain the subgradient d (k) . Given the dual variables λ (k) , the expression of optimal power at each RRH depends on the generations of channel gains α. Different generations of α will lead to different orders of λ i /α i , i = 1, · · · , I, and thus different power expressions. Even for a fixed problem-order the power allocation expressions require multiple integrations, which imposes a high computational complexity. As a result, it is a challenge to obtain the expression of average power for each RRH in closed form for any given λ (k) . In fact, even for the simple case of two RRHs, the average powers generally do not have simple closed-form solutions, as shown in the next section.
To resolve the above issue, we propose an online calculation method for tracking the average power required for each RRH. The main idea is to replace the expectation operator by averaging the power allocations for all samples of channel generations during the channel fading process.
Specifically, let us defineP where p * (i,λ (j) ) α (j) , θ denotes the optimal power allocation of the ith RRH for the jth channel generation, and λ (j) represents the corresponding dual variables. Then, the expectation over which can be recursively obtained based on the previousP It is worth noting that this algorithm can be readily applied to the case when the fading statistics are unknown.
Fortunately, for the case of a single RRH 3 , the average power can be obtained in closed form, which is given in Appendix F. The average power expression for the case of I = 2 is even more complex, which is extensively studied in the following section.

IV. SPECIAL CASE: I = 2
To show the difficulty of obtaining the closed-form solution for each RRH, we only consider the simplest scenario of two RRHs, i.e., I = 2. To this end, we first introduce the following lemma, which will be used in the ensuing derivations.
in the region of x ≥ 0, there are only three possible curves for the function h(x), which are shown in Fig. 3. The conditions for each case are given as follows: 3 Note that [16] did not provide the closed-form expression for the average power for the case of one RRH.
Based on Lemma 3, we commence by deriving the expression of the average transmit power.
According to Theorem 1, there are three different possible cases for I = 2: 1) |I | = 0; 2) Obviously, the average power contributed by the first case is zero, hence we only consider the latter two cases, which are discussed in the following two subsections.
A. Scenario 1: There is only one RRH that transmits at a non-zero power, i.e., |I | = 1 Since we have assumed that the inequality (23) holds, only RRH 1 is transmitting at a nonzero power and RRH 2 remains silent in this case. According to Theorem 1, the conditions for |I | = 1 are given by and only RRH 1 transmits at non-zero power, which is given by It may be readily seen that, there are two possible values of p * 1 , and the conditions for each value will be discussed as follows.
: In this case, the following condition should be satisfied: which is equivalent to Notice that the left hand side of (28) is in the form of h(x) defined in Lemma 3, with x = α 1 , a, b, c given by To guarantee that there exists a positive α 1 , the function h(x) should be reminiscent of Fig. 3-(c), and the conditions for Case 3 in Lemma 3 should be satisfied. Otherwise, p * 1 is not equal to P peak 1 . In the following, we assume that the conditions are satisfied. Let us denote the solutions of h(α 1 ) = 0 as α l 1 and α u 1 , where α l 1 < α u 1 . These solutions can be readily obtained by using the classic bisection method. Then, condition (28) is equivalent to By combining (23) and (24), we obtain the feasible region of α 2 in the form of: where the last equality holds by using (28). Similarly, by combining (25) and (30), we obtain the feasible region of α 1 as: In this context, we prove that λ 1 /ε (θ) < α l 1 in Appendix H. Hence, the feasible region of α 1 is given by α l 1 ≤ α 1 ≤ α u 1 . Based on the above discussions, we obtain the conditions for p * 1 = P peak 1 , p * 2 = 0 as follows: where a, b, c are given in (29).
If Condition C1 in (33) is satisfied, the average power assigned to RRH 1 in this case is given by T C1 By substituting the PDF of α 2 in (2) into (34), (34) can be simplified to Unfortunately, the closed-form expression of T C1 RRH 1 cannot be obtained even for the special case of m = 1. However, the value of T C1 RRH 1 can be obtained at a good accuracy by using the numerical integration function of Matlab.
In this case, the following condition should be satisfied: which leads to h(α 1 ) > 0 with a, b, c defined in (29). As seen from Fig. 3, when the first two conditions in Lemma 3 are satisfied, the inequality (36) holds for any α 1 ≥ 0. When the third condition in Lemma 3 is satisfied, the inequality (36) holds when 0 ≤ α 1 ≤ α l 1 and α 1 ≥ α u 1 , where α l 1 and α u 1 (α l 1 < α u 1 ) are the solutions of h(α 1 ) = 0 with a, b, c defined in (29). Now, we first assume that the first two conditions in Lemma 3 are satisfied C2 :ab − c ≥ 0, or ab − c < 0 and c ab where a, b, c are given in (29). Then, the condition in (36) can be neglected. According to (23) and (24), the feasible region of α 2 is given by where (36) is used in the last equality. From (25), the feasible region of α 1 is given by α 1 ≥ λ 1 ε(θ) . As a result, the average power of RRH 1 contributed by the case when condition C2 is satisfied is given by Fortunately, when m is an integer, the closed-form expression of T C2 RRH 1 can be obtained. Let us define If m = 1, the Nakagami-m channel reduces to the Rayleigh channel, and the average power of RRH 1 in (39) can be simplified to: If m is an integer that is larger than one, i.e., m ≥ 2, we can obtain the closed-form expression where J 1 , J 2 , J 3 and J 4 are respectively given by The details of the above derivations can be found in Appendix I.
Next, we consider the case, where Condition 3) in Lemma 3 is satisfied. According to the condition in (36), the feasible region of α 1 is 0 ≤ α 1 ≤ α l 1 and α 1 ≥ α u 1 . Additionally, from (25), α 1 ≥ U should hold. According to Appendix H, U ≤ α l 1 always holds. Hence, the overall feasible region is α l 1 ≥ α 1 ≥ U and α 1 ≥ α u 1 . Furthermore, the feasible region of α 2 is the same as in (38). As a result, the expression of the average power for RRH 1 contributed in this case can be similarly obtained as in (39), except for the different integration intervals for α 1 . The expression for the special case, when m is an integer can be similarly obtained.
Let us now define the following function Then, T C2 RRH 1 given in (39) is equal to T C2 RRH 1 = F (U ). If the following condition is satisfied: with a, b, c given in (29), the average power for RRH 1 contributed under this condition is Note that Condition C3 is the same as Condition C1, but the power allocation for RRH 1 is different.
B. Scenario 2: Both RRHs are transmitting at a non-zero power, i.e., |I | = 2 In this case, RRH 1 will transmit with full power, i.e., p * 1 = P peak 1 , and RRH 2 will transmit at a non-zero power. According to Theorem 1, the following condition should be satisfied: and the transmit power of RRH 2 is given by The conditions for each value of p * 2 will be discussed as follows. 1) p * 2 = P peak 2 : In this case, the following condition should be satisfied: Combining conditions (23), (47) and (49), we can obtain the feasible region of α 1 as follows To guarantee that we have a nonempty set of α 1 , the following condition should be satisfied: which is in the form of function h(x) defined in Lemma 3 with x = α 2 , and a, b, c given by To guarantee the existence of a positive α 2 , the graphical curve of the function h(α 2 ) should be similar to that in Fig. 3-c, and the conditions are given by C4 :ab − c < 0, and c ab where a, b, c are given in (52). Let us denote the solutions of h(α 2 ) = 0 as α l 2 and α u 2 , where we have α l 2 < α u 2 . Then, the feasible region of α 2 is given by α l 2 < α 2 < α u 2 . Under Condition C4, the average powers of RRH 1 and RRH 2 are respectively calculated as where A is defined in (50).

2) p
In this case, the following condition should be satisfied: By combining (23), (47) and (56), the feasible region of α 1 can be obtained as follows: where A is defined in (50). Note that B > A always holds. Hence, to guarantee that there exists a feasible α 1 , the following condition should be satisfied: Again, the right hand side of (58) is in the form of the function h(x) defined in Lemma 3, with x = α 2 , and a, b, c are given by To guarantee that there exists a positive α 2 , the third condition in Lemma 3 should be satisfied: where a, b, c are given in (59). When a, b, c are given in (59), we can denote the solutions of h (α 2 ) = 0 asα l 2 andα u 2 withα l 2 <α u 2 . Hence, the feasible region of α 2 is given bỹ α l 2 < α 2 <α u 2 . The remaining task is to determine the lower bound of α 1 as shown in (57).
Case I: If the following condition is satisfied: the feasible region of α 1 is given by where B is defined in (57). The inequality (61) can be rewritten as which is equivalent to h(α 2 ) ≥ 0 with a, b and c given in (52).
As seen from Fig. 3, when the first two conditions in Lemma 3 are satisfied, the inequality (63) holds for any α 2 > 0. Combining this with the condition (58), the feasible region of α 2 is given byα l 2 < α 2 <α u 2 , whereα l 2 andα u 2 are the solutions of h(α 2 ) = 0 with a, b, c defined in (59). Define the following condition with a, b, c defined in (52) 4 : C5 : C X and ab − c ≥ 0, or C X , ab − c < 0 and c ab and C as Under Condition C5, the average power of RRH 1 and RRH 2 are respectively given by where B is defined in (57),α l 2 andα u 2 are the solutions of h(α 2 ) = 0 with a, b, c defined in (59). On the other hand, when the third condition in Lemma 3 is satisfied, inequality (63) holds for 0 ≤ α 2 ≤ α l 2 or α 2 ≥ α u 2 , where α l 2 and α u 2 are the solutions of h(α 2 ) = 0 with a, b, c defined in (52). Additionally, it is easy to verify that the curve of h(α 2 ) associated with a, b, c defined in (52) is above the curve of h(α 2 ) with a, b, c defined in (59). Hence, the following relations hold:α Combining (68) with condition (58), the feasible region of α 2 is given bỹ As a result, when the following conditions are satisfied with a, b, c defined in (52): C6 : C X , ab − c < 0 and c ab the average power required for RRH 1 and RRH 2, denoted as T C6 RRH 1 and T C6 RRH 2 respectively, is similar to the expressions of T C5 RRH 1 and T C5 RRH 2 except that the integration interval of α 2 becomes (69).
Case II: If the following condition is satisfied: the feasible region of α 1 is B > α 1 > A, where B is defined in (57). The above condition is equivalent to h(α 2 ) < 0 with a, b, c defined in (52). To guarantee the existence of a positive α 2 , the third condition in Lemma 3 should be satisfied. Assuming that this condition is satisfied, the feasible region of α 2 is given by α l 2 < α 2 < α u 2 , where α l 2 and α u 2 are the solutions of h(α 2 ) = 0 with a, b, c defined in (52). Again, by using the relations in (68) and the condition (58), the feasible region of α 2 is given by α l 2 < α 2 < α u 2 . Hence, if the following conditions are satisfied with a, b, c defined in (52): C7 : C X , ab − c < 0 and c ab that the average powers required for RRH 1 and RRH 2 are respectively given by whereJ 1 andJ 2 are given bỹ

C. Discussion of the results
In this subsection, we summarize the results discussed in the above two subsections in Table   I. The average power required for each RRH contributed under condition (23) is given by where ε (·) is an indicator function, defined as

Conditions Power Allocation Average Power
Then, the average power required for each RRH is given by where P denotes the average transmit power under condition λ 1 α 1 > λ 2 α 2 for RRH 1 and RRH 2, which can be calculated as the condition of λ 1 α 1 ≤ λ 2 α 2 . To provide more insights concerning the joint power allocation results for the case of I = 2, in Fig. 4 we plot the regions corresponding to different cases of the dual variables. For clarity, we only consider the case of λ 1 α 1 > λ 2 α 2 . Fig. 4-(a) corresponds to the case of λ 1 = λ 2 = 1. From this figure, we can see that our proposed joint power allocation algorithm divides the region of λ 1 α 1 > λ 2 α 2 into two exclusive regions by the solid lines. If (α 1 , α 2 ) falls into the region T 1 , both RRHs remian silent. On the other hand, if (α 1 , α 2 ) falls into the region T 2 , Condition C2 is satisfied and only RRH 1 will transmit with a non-zero power, but less than the peak power. Fig. 4-(b) corresponds to the case of λ 1 = 1/5, λ 2 = 1. In this case, the region of λ 1 is divided into five exclusive regions. Similarly, in region T 1 , none of the RRHs transmit. In region T 2 and T 4 , Condition C3 is satisfied and only RRH 1 is assigned non-zero power for data transmission, but less than the peak power. If (α 1 , α 2 ) falls into region T 3 , Condition C1 is satisfied, and RRH 1 will transmit at peak power and RRH 2 still remains silent. However, if (α 1 , α 2 ) falls into the region T 5 , Condition C5 holds, RRH 1 will transmit at peak power and RRH 2 is allocated positive power that is lower than the peak power.   Fig. 4-(b). If (α 1 , α 2 ) falls into regions T 6 and T 8 , Conditions C6 is satisfied, hence RRH 1 will transmit at its peak power and RRH 2 is assigned non-zero power for its transmission, but its power is less than the peak power. If (α 1 , α 2 ) falls into region T 7 , Condition C7 is satisfied. Similarly, RRH 1 transmits at its maximum power and RRH 2 transmits at a non-zero power below its peak power. On the other hand, if (α 1 , α 2 ) falls into region T 7 , Condition C4 is satisfied and both RRHs will transmit at their peak power.
It is interesting to find that with the reduction of the dual variables, more RRHs will transmit at non-zero power or even the peak power. This can be explained as follows. The dual variables can be regarded as a pricing factor, where a lower dual variable will encourage the RRHs to be involved in transmission due to the low cost.

V. SIMULATIONS
In this section, we characterize the performance of our proposed algorithm. We consider a C-RAN network covering a square area of 2 km × 2 km. The user is located at the center of this region and the RRHs are independently and uniformly allocated in this area. Unless otherwise stated, we adopt the same simulation parameters as in [16]: The path-loss is modeled as P L i = 148.1 + 37.6log 10 d i (dB) [8], where d i is the distance between the ith RRH and the user measured in km. The shadowing fading is also considered, which obeys the lognormal distribution having a zero mean and 8 dB standard derivation. The step size in the k th iteration is set to ζ (k) = 1/k.
We compare our algorithm to the following algorithms: 1) Nearest RRH serving algorithm: In this algorithm, the user is only served by its nearest RRH, and the optimization method proposed in [16] for point-to-point systems is adopted to solve the power allocation problem in this setting. This algorithm is proposed to show the benefits of cooperative transmission in C-RAN.
2) Constant power allocation algorithm: In this algorithm, the transmit power is set to be equal to the average power constraint for any time slots. Since the peak power limit is higher than the average power limit, both the average power constraints and peak power constraints can be satisfied. This approach was assumed to illustrate the benefits of dynamic power allocation proposed in this paper.
3) Independent power allocation algorithm: In this algorithm, each RRH independently optimizes its power allocation by using the method of [16] that is only based on its channel condition. This algorithm is invoked for showing the benefits of jointly optimizing the power allocation according to the joint channel conditions of all RRHs.
In the following, we first consider the case of I = 2, where the average power required for each RRH can be numerically obtained according to the results of Section IV. Then, we study the more general case associated with more than two RRHs. Fig. 5 shows a randomly generated network topology relying on two RRHs, and Figs. 6-9 are based on this network topology. In this setting, for the nearest RRH serving algorithm, the user is only served by RRH 2 due to its shorter distance compared to RRH 1.
In Fig. 6, we study the convergence behaviour of the proposed algorithm under different delay exponents. It is seen that the proposed algorithm converges promptly for all delay exponents considered, and that a higher QoS exponent may lead to lower convergence speed in this setting.       IV, and stored in the memory.
Let us now study the impact of the delay requirements on the effective capacity performance for various algorithms in Fig. 7. As expected, when the delay-QoS requirement becomes more stringent, i.e., the value of QoS exponent θ increases, the effective capacity achieved by all algorithms is reduced. By employing two RRHs for jointly serving the user to exploit higher spatial degrees of freedom, the proposed algorithm significantly outperforms the nearest RRH based algorithm, where only one RRH is invoked for transmission. For example, for θ = 0.1 the effective capacity provided by the former algorithm is roughly 44% higher than that of the latter algorithm. It is observed that the performance of the latter algorithm is even inferior to that of the naive constant power allocation algorithm. Since our proposed algorithm adapts its power allocation to delay-QoS requirement, whilst additionally taking into account the channel conditions, it achieves a higher effective capacity than the constant power allocation algorithm, especially for the loose QoS requirement region, where the performance gain attained is about 36%. By optimizing the power allocation based on the joint channel conditions, the proposed algorithm performs better than the independent one, where the joint relationship of the channel conditions is ignored. The performance gain is more prominent when the delay-QoS requirement is moderate, which can be up to 20% in this example. By contrast, when the delay-QoS requirement is very stringent or loose, both algorithms have a similar performance, which implies that in these two extreme cases, independently optimizing the power allocation across these two RRHs approaches the optimal solution, and requires a lower complexity than the proposed algorithm. Note that a similar trend has been observed in [16] for a point-to-point system. The figure also shows that our proposed algorithm has a much better performance than the other three algorithms, and the performance gain increases with the Nakagami fading parameter m. It is interesting to observe from the figure that the independent power allocation algorithm performs slightly better than the constant power allocation algorithm, and the performance gain diminishes when m is large. This may be due to the fact that when m is increasing, the channel becomes more deterministic, thus each RRH will use a constant transmission power for the independent power allocation algorithm. By contrast, the performance gain of the proposed algorithm over the existing methods is still significant, when m is high, since the higher degrees of freedom are exploited by our algorithm.     our proposed algorithm keeps on increasing for all values of P peak i . The reason is that the joint relationship of these two channels is effectively exploited by our algorithm. This again reveals the advantage of our proposed algorithm over the existing algorithms.
Finally, we consider a more general case in Fig. 10, where five RRHs are randomly located in a square. In this setting, for the nearest of the RRH serving algorithm, the user is only served by RRH 3. The rest of the figures are based on this network topology.
We first study the convergence behaviour of the online tracking method in Fig. 11, where different QoS requirements are tested. As shown in Fig. 11, the online tracking method converges promptly for all considered values of θ, and there is only a slight oscillation within a small dynamic range. These observations demonstrate the efficiency of the online tracking method. In contrast to the case of two RRHs, a larger value of θ leads to a faster convergence.
In Fig. 12, we again study the effect of QoS requirements on the performance of various algorithms for the general case of five RRHs. As in the case of two RRHs, the effective capacity achieved by all algorithms reduces with θ. The superior performance of our proposed algorithm over the other three algorithms is observed again. However, the performance gain is higher than two RRHs. For example, the proposed algorithm has almost twice the effective capacity of the nearest RRH based algorithm. It also exhibits a substantial performance gain over the independent power allocation algorithm for the two extreme cases, when the QoS delay requirement is very loose and very stringent.

VI. CONCLUSIONS
We considered joint power allocation for the EC maximization of C-RAN, where the user has to guarantee a specific delay-QoS requirement to declare successful transmission. Both the per-RRH average and peak power constraints were considered. We first showed that the EC maximization problem can be equivalently transformed into a convex optimization problem, which was solved by using the Lagrange dual decomposition method and by studying the KKT conditions. The online tracking method was proposed for calculating the average power for each RRH. For the special case of two RRHs, the expression of average power for each RRH can be obtained in closed form. The simulation results showed that our proposed algorithm converges promptly and performs much better than the existing algorithms. Specifically, 20% EC gain can be achieved by our proposed algorithm over the independent algorithm for some cases. By adapting to both the channel conditions and QoS exponent, our proposed algorithm significantly outperforms the constant power based allocation scheme.
APPENDIX A PROOF OF CONVEXITY OF PROBLEM (7) Since the expectation operator is a linear additive operator, we only have to consider the objective function for each new channel fading generation. For simplicity, we omit the dependency of p i (ν) on ν and the objective function can be expressed as where P denotes the collection of power allocations. The second partial derivatives of y(P) can be calculated as ∂y 2 (P) and ∂y 2 (P) Hence, the Hessian matrix of the function y(P) is which is a positive semidefinite matrix. Obviously, the constraints in Problem (7) are linear.
Hence, the proof is completed.
APPENDIX B PROOF OF LEMMA 1 As p * i > 0, p * j = 0, from (14b) and (14c) we have µ * i ≥ 0, δ * i = 0 and µ * j = 0, δ * j ≥ 0, respectively. Then, from (14a) we have Hence, it follows that We first prove the first part of Lemma 2. Assume that there are two RRHs i and j that 0 < p * i < p peak i and 0 < p * j < P peak j for i = j. Then, according to (14b) and (14c), we have Since α i is independent of α j , λ i and λ j are fixed, it is concluded that the above equality holds with a zero probability. Thus, there is at most one RRH i with 0 < p * i < P peak i .
Let us also assume that there are two users i, k ∈ I with 0 < p * i < P peak i and p * k = P peak k . By using (14b) and (14c), we have u * i = 0, δ * i = 0, u * k ≥ 0, and δ * k = 0. According to (14a), it follows that Hence, we conclude that i = π(|I |).

APPENDIX D PROOF OF THEOREM 1
Lemma 2 implies that the optimal solution must be one of the following two cases: • Case I: p * π(a) = P peak a , a = 1, · · · , |I |; • Case II: p * π(a) = P peak a , a = 1, · · · , |I | − 1, p * π(I ) can be calculated as: Then, we have to prove that |I | is the largest value of x that satisfies the following condition First, we show that in both Case I and Case II, for any user π(a) ∈ I , the above inequality holds. For Case I, as p * π(|I |) = P peak |I | , by using (14b) and (14c), it follows that µ |I | ≥ 0 and δ |I | = 0. Then, substituting them into (14a) yields Hence, the above inequality holds for a = |I |. From Lemma 1, the left hand side (LHS) of (D.1) increases as x increases, while the right hand side of (RHS) (D.1) decreases as x increases.

APPENDIX F THE AVERAGE POWER FOR THE CASE OF I = 1
Let us now define the following function g(α 1 ) = 1 Taking the first-order derivative of g(α 1 ) with respect to α 1 and setting it to zero yields: By solving (F.2), we can obtain the solution α * . It can be easily verified that the function g(α 1 ) first increases in the region of α 1 ∈ λ 1 ε(θ) , α * 1 and then decreases when Hence, we can obtain the maximum value of the function g(α 1 ) by substituting α * 1 into (F.1), which yields . (F.3) It can be readily verified that when α 1 → ∞, g(α 1 ) → 0. In addition, g λ 1 ε(θ) = 0. Hence, depending on the comparative value between g(α * 1 ) and P peak 1 , two cases should be considered, when computing the average power as illustrated in Fig. 13. The details are given as follows: Case I: g(α * 1 ) ≤ P peak 1 . In this case, the transmit power is always lower than the peak power as shown in Fig. 13. Hence, the peak power constraints are redundant and can be removed. Thus, the average power can be expressed as which can be expanded as Case II: g(α * 1 ) > P peak 1 . In this case, there must exist two solutions that satisfy g(α 1 ) = P peak 1 as shown in Fig. 13. Denote these two solutions as α L 1 and α U 1 with α L 1 < α U 1 . These two solutions can be numerically obtained by existing algorithms, such as the classic bisection search method.
As seen from Fig. 13, the transmit power is equal to g(α 1 ) when α 1 ∈ λ 1 ε(θ) , α L 1 and α U 1 , ∞ , equal to P peak 1 when α 1 ∈ α L 1 , α U 1 . Hence, the average power can be expressed as: (F.5) Note that O 1 and O 3 can be derived similarly as the average power in Case I, which are omitted for simplicity. We only provide the expression of O 2 as follows: where γ (s, x) = It is easy to verify that h (λ 1 /ε (θ)) > 0 when a, b, c are given in (29). Hence, when the third condition of Lemma 3 given in Condition C1 is satisfied, according to Fig. 3-c, λ 1 /ε (θ) must fall into two regions: 1) 0 < λ 1 /ε (θ) < α l 1 ; 2) λ 1 /ε (θ) > α u 1 . In the following, we prove that the probability of λ 1 /ε (θ) falling into the second region is zero by using the method of contradiction.
Suppose that λ 1 /ε (θ) > α u 1 holds. Then, according to Fig. 3-c, λ 1 /ε (θ) must be larger than the optimal point, i.e., λ 1 /ε (θ) > x * , where x * is given in Appendix G. By substituting a, b, c in (29) into this inequality and after some further simplifications, we have . (H.1) Additionally, by inserting a, b, c in (29) into Condition C1 and after some further simplifications, we have (H.2) By substituting (H.2) into the right hand side of (H.1) and with some simple further operations, one obtains λ 1 P peak 1 > 1, which contradicts with (H.2) that λ 1 P peak 1 < 1. Hence, the assumption that λ 1 /ε (θ) > α u 1 cannot hold, which completes the proof. By plugging the above expression with x = W α 1 into (39), we obtain the average power for RRH 1 as follows: With some simple variable substitutions, the closed-form expressions of J 1 , J 2 , J 3 and J 4 can be easily calculated as in (43) and (44), respectively.