Stochastic Downlink Power Control for Various User Requirements

Power Control (PC) can coordinate mutual interference between cells in heterogeneous cellular networks (HCNs). Most of the existing works focus on real-time PC problems based on instantaneous channel state information (CSI) for all users. However, such scheme may result in low feasible probability and high energy consumption. If the PC problem is frequently infeasible, the users that require low latency communications will fail to get services in time. In this paper, we classify the users into two categories according to their sensitivity to latency: delay-sensitive-users (DSUs) and non-delay-sensitive-users (NDSUs). We use instantaneous signal-to-interference-plus-noise-ratio (SINR) constraints to ensure the success of data transmission per time slot to meet DSUs’ low latency requirements, and the long-term mean data rate constraints to ensure NDSUs’ average data rate requirements. On the one hand, the long-term constraints allow the system to sacrifice NDSUs’ short-term performance to guarantee DSUs’ instantaneous performance when the channel condition is poor. On the other hand, the system will appropriately improve NDSUs’ performance to ensure their target mean data rate when the channel condition is good. Under this scheme, we formulate the PC problems under perfect CSI, bounded CSI error and stochastic CSI error scenarios as a uniform problem, which is a non-convex stochastic constrained problem. The recently proposed constrained stochastic successive convex approximation (CSSCA) technique is utilized to handle this problem. Simulation results show that the proposed scheme can significantly improve the feasible probability of DSUs’ instantaneous constraints and reduce the network’s energy consumption.


I. INTRODUCTION
The fifth-generation (5G) cellular network is targeting to achieve 1000× capacity increase and millisecond-level lowlatency solutions [1]. It is well-understood that the conventional cellular network architecture using macro-cells only cannot possibly support demand going forward. A promising solution is to deploy different types of base stations (BSs), thus forming the heterogeneous cellular networks (HCNs) [2]. HCNs can efficiently improve the system capacity by reusing the same frequency spectrum [3]. However, spectrum reusing will inevitably introduce interference among macro-cells and small-cells, which severely degrades the communication performance and causes a significant The associate editor coordinating the review of this manuscript and approving it for publication was Cong Pu . amount of power waste. To address this difficulty, the power control (PC) strategy that can keep the aggregate interference at receivers within an acceptable level is utilized in HCNs to achieve interference management [4].
PC has been extensively investigated in the literature. In ultra-dense small cell networks, the PC problem is formulated to maximize the energy efficiency of all the small cells while keeping tolerable interference to the macro-cell users [5]. In order to achieve higher energy efficiency, A. Zappone et al. in [6] develop a general framework to achieve globally optimal solutions of energy efficiency maximum problem by merging fractional programming and sequential optimization. P. He et al. investigate the PC in a multi-user wireless system to maximize the energy efficiency, while meeting the power constraints of each individual user and the whole system [7]. VOLUME 8, 2020 This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see http://creativecommons.org/licenses/by/4.0/ The perfect channel state information (CSI) is assumed in the above works. In practical systems, channel uncertainties are inevitable. Generally, the robust resource allocation designs are developed under two types of CSI errors, i.e., bounded and stochastic CSI error [8]. The former assumes that the CSI is bounded in an uncertainty region. The latter assumes that the statistical information of CSI errors is known at the transmitters. Under the assumption of bounded CSI error, C. Shi et al. study the downlink PC in HCNs based on a worst-case robust Stackelberg game [9]. In the sparse code multiple access based cloud-radio access network, the authors in [10] joint optimize the resources allocation and user association based on worst-case optimization. The conservative worst-case optimization will inevitable cause performance degradation. In order to avoid this deficiency, S. Parsaeefard et al. in [11] apply the differential norm and the chance constrained approaches in cognitive radio networks to maximize the secondary users' throughput. Under the assumption of stochastic CSI errors, the works in [12] consider the use of PC and beamforming for femtocells to provide the desired SINR to femtousers near the femtocell edge while minimizing the interference among the serving femtousers and adjacent macrousers. S. Bu et al. propose a game-theoretical scheme using energy-efficient resource allocation and interference pricing for an interference-limited environment in HCNs [13]. The authors in [14] investigate the optimal power allocation scheme to maximize the energy efficiency and secrecy of wireless networks.
With the rapid development of the mobile Internet, the services required by users are becoming more various. Some services require low latency, such as video sessions and online games. On the contrary, some services, such as online video and file downloading, are not sensitive to latency but require high average data rate [15], [16]. We refer to the users who require low latency services as delay-sensitiveusers (DSUs) and the other users as Non-delay-sensitiveusers (NDSUs). However, the requirements of all users in the above works [5]- [7], [9]- [14] are formulated as the same instantaneous QoS constraints. In practical systems, there is no guarantee of the feasibility of real-time PC problems because of channel fluctuation. Under poor channel condition, the PC problem may be infeasible, which will result in DSUs not being able to get service in time and cause a significant experience drop at them. Even if the PC problem is feasible, it will consume high energy to meet the instantaneous requirements of all users. To address these issues, we use long-term mean data rate constraints to formulate the requirements of NDSUs to relax the original instantaneous PC problem and thus more resources are reserved for DSUs to meet their strict low-latency requirements. Furthermore, in order to cope with the inevitable channel uncertainty, we also consider the PC under the bounded and the statistical CSI error scenarios. The main contributions of this paper can be summarized as follows: • For DSUs, the instantaneous SINR constraints are utilized to ensure the data transmission success in each time slot to reduce the delay caused by link outage and data retransmission. For NDSUs, the long-term mean data rate constraints are utilized to ensure their high average data rate. The long-term constraints allow the system to appropriately reduce the short-term performance of the NDSUs to preferentially guarantee DSUs' performance.
• In the bounded CSI error scenario, the actual unknown SINR is replaced by its lower-bound to implement PC under the worst-case to achieve the robustness design.
• In the stochastic CSI error scenario, the outage probability constraints are utilized in each time slot to ensure the reliability of DSUs' data transmission, which are nonconvex and has no closed form expression. To address this issue, a Bernstein-type inequality [17] is utilized to construct a conservative convex approximation of the original outage probability constraints. As for NDSUs, the expectation of the data rate in a single time slot has no concise closed form expression neither. We construct a tight lower-bound of it according to the distribution of CSI error to handle this difficulty.
• The PC problems under perfect CSI, bounded CSI error and stochastic CSI error scenarios are formulated as a uniform problem. Because of NDSUs' long-term constraints, the PC problems are stochastic constrained and is non-convex because of the inter-BS interference. The recently proposed CSSCA technique [18] is utilized to tackle this problem. The rest of this paper is organized as follows. Section II describes the system model. The PC problems under perfect CSI, bounded CSI error and stochastic CSI error scenarios are formulated in Section III. In Section IV, the PC problems under different scenarios are further formulated as a uniform problem, which is then solved through the CSSCA technique. The simulation studies of the proposed algorithm are presented in Section V. Section VI concludes this paper.
The notation used in this paper in summarized in Table 1.

II. SYSTEM MODEL
As shown in Fig. 1, we consider the downlink of an OFDMA-based two-tier HCN consisting of a macro-cell and B small-cells, where each small-cell comprises a small-cell 8900 VOLUME 8, 2020 base station (SBS) and a small-cell user (SU), while the macro-cell consists of a macro-cell base station (MBS) and a macro-cell user (MU). The MBS and SBSs share a spectrum in the network, so the cross-tier interference will greatly restrict the network performance. The set of all base stations (BSs) including both the MBS and SBSs are denoted by B {0, 1, 2, . . . , B}, where indexes 0 and {1, 2, . . . , B} correspond to the MBS and SBSs, respectively. We denote the set of all users including the MU and SUs by K {0, 1, 2, . . . , B}, in which the index k represents the user that is associated with the BS k ∈ B. As previously mentioned, the users can be classified into DSUs and NDSUs according to their sensitivity to delay. And the sets of DSUs and NDSUs are denoted by K 1 and K 2 , respectively. We assume that |K 1 | = K 1 , |K 2 | = K 2 , K 1 ∪ K 2 = K, and The concept of slow adaptive resource allocation [19] is employed in this paper. The slow adaptive resource allocation update the spectrum allocation every time window instead of time slot, where each time window consists of T time slots. Thus the computation complexity caused by frequently spectrum allocation in fast resource allocation can be significantly reduced. We assume that the association relationship between users and BSs and the spectrum allocation scheme remain unchanged in a time window [19]. In this paper, we devote to the PC within a time window. In each slot within a time window, the system determines the transmission power of each BS subject to the performance requirements of the users. We also assume that PC is performed in a centralized manner by the control center and the CSI of the entire network is concentrated in this control center.
Let g k,b and p b represent the complex channel gain between BS b and user k and the transmission power of BS b, respectively. Defining g k g k,0 , g k,1 , .

III. PROBLEM FORMULATION
This section gives the proposed scheme and the associated PC problems under perfect CSI, bounded CSI error and stochastic CSI error scenarios. The conventional real-time user-specific SINR constrained PC problem under the assumption of perfect CSI is [20] (P bm ) min where P max b is the maximum transmission power of BS b at the considered spectrum, k is the target SINR of user k. In each time slot, the system solves (P bm ) to determine the transmission power of each BS. (P bm ) satisfies the hard SINR requirements of both DSUs and NDSUs in each time slot. The constraint C2 can ensure DSUs' low latency requirements by forcing their SINR to exceed the preset threshold in each time slot. But for the NDSUs, such constraints are too harsh and will cause unnecessary energy consumption, since NDSUs have no stringent requirement of low latency. Furthermore, due to the channel fluctuation, (P bm )'s feasibility is not guaranteed, especially in the large scale networks under severe interference environment, which will cause significant experience drop at DSUs.
Continuously stable high SINR can reduce the time to transmit data and thus can achieve low latency transmission. Therefore, this paper utilizes the real-time instantaneous SINR constraints to characterize the maximum tolerable delay for DSUs. Considering that NDSUs require high average data rate and are not sensitive to latency, we replace the original instantaneous SINR constraints of them with mean data rate constraints. On the one hand, the short-term performance of NDSUs can be lowered to preferentially guarantee DSUs' instantaneous SINR when the channel condition is poor. On the other hand, the short-term performance of NDSUs can be appropriately increased to ensure their longterm performance when the channel condition is good. Since this scheme fully considers the channel fluctuation, the feasible probability and energy efficiency can be significantly improved.
DSU's SINR constraints are non-convex due to the fractional structure of SINR. Fortunately, they can be easily equivalent to affine constraints through simple algebraic manipulation. As for the NDSUs, the mathematical expectation of the data rate that is defined on the time varying channel is utilized to model their average data rate. Thus, the PC problem with perfect CSI under the proposed scheme VOLUME 8, 2020 can be formulated as (P ne ) min where R min k is the target data rate of user k ∈ K 2 . W is the bandwidth of the considered subcarrier. For letter convenience, we define SE k R min k /W . From (P ne ), one can see that channel gain is an important information for PC. The channel gain is obtained by channel estimation. However, channel uncertainties are inevitable due to link delay, quantization error, estimation error and measurement error in practical systems. The channel uncertainty may cause the actual communication performance below the preset target. Therefore, we need to take the robustness of the algorithm design into account, which is investigated in the rest of this section.

A. POWER CONTROL UNDER BOUNDED CSI ERROR
In the communication systems where the users employ a quantizer to quantize the CSI and feed it back to the transmitters, the channel uncertainty is bounded [21]. Letg k,b ∈ C and e k,b ∈ C with e k,b ≤ υ k,b represent the channel estimate and the associated bounded CSI error, respectively. The actual CSI is g k,b =g k,b + e k,b and the SINR obtained by user k can be expressed as which is now unpredictable because of the presence of the unknown e k,b , ∀k ∈ K, ∀b ∈ B. Now, the task of PC is to determine the optimal transmission power of each BS within the range of the bounded channel uncertainty, which can be expressed as the following problem.
Due to the presence of constraint C5, the problem (5) is still NP-hard even if the stochastic constraint C4 is omitted [8]. Therefore, we need to handle the constraint C5 properly before solving problem (5). The worst-case optimization can be utilized to handle this difficulty [22], whose basic idea is to replace the original SINR with its lower-bound. Apparently, the SINR γ k can be lower bounded by Substituting (6) in to (5), we can get the PC problem under bounded CSI error:

B. POWER CONTROL UNDER STOCHASTIC CSI ERROR
In the communication systems, where the channel reciprocity holds between uplink and downlink and the CSI is obtained through uplink channel estimation, the CSI error is mainly caused by the interference and noise at the receivers [23] and can be molded by zero mean complex Gaussian random variable [24]. In order to distinguish from the previous notation in bounded CSI error scenario, we represent the channel esti- The SINR obtained by user k can be expressed as Due to the presence of stochastic CSI error, the SINR γ k is now a random variable. The QoS enjoyed by users cannot be described by the deterministic performance in (P bm ) any longer. In order to reduce the delay caused by link interruption and data retransmission, the SINR outage probability of the link between BSs and DSUs should be kept at a low enough level, i.e., where ε k ∈ (0, 1] is the maximum tolerate outage probability of user k. For NDSUs, the expectation of the achievable data rate in each time slot for the given channel estimateĝ k is E e k log (1 + γ k )|ĝ k . Therefore, the long-term mean data rate constraints for NDSUs can be expressed as Now, the PC problem under stochastic CSI error can be formulated as We can observe that both the constraints C8 and C9 are intractable non-convex stochastic constraints and have no closed-form expression. Especially the constraint C9, there are an inner expectation in each time slot and an outer one across time slots. The SINR outage probability in C8 and the inner expectation in C9 are defined on the stochastic CSI error, which cannot be measured directly. Consequently, P e k [γ k ≥ k ] and E e k log (1 + γ k )|ĝ k cannot be learned online based on measured samples through online-SAA [25] or CSSCA [18] techniques. To overcome this difficulty, our scheme is to find some appropriate approximation of C8 and E e k log (1 + γ k )|ĝ k in C9. At last but not least, these approximations must be conservative so that the users' requirements can be certainly achieved. The appropriate approximations are given in the follows. Proposition 1 (Approximation of SINR Outage Probability Constraints): The SINR outage probability constraints of DSUs in each time slot, i.e., constraint C8, can be conservatively approximated by the set defined by the following inequalities: where k ∈ K 1 , y k is an auxiliary variables, Proof: The proof is given in Appendix A. We can observe that the constraint C10 is an intersection of Lorentz cones and linear constraints. Therefore, C10 is convex. As it pointed in Appendix A, these constraints constitute a conservative approximation of C8, which means that the constraint C8 must be achieved if C10 is satisfied.
Proposition 2 (Approximation of Long-Term Mean Data Rate Constraints): The long-term mean data rate constraints of NDSUs, i.e., constraint C9, can be conservatively approximated by Proof: The proof is given in Appendix B. As stated in Appendix B, the lower-bound in (28) is very tight and can be viewed as a good approximation when the channel estimate is accurate enough.
Replacing C8 with C10 and C9 with C11, the original problem (11) can be conservatively approximated by where y {y k } ∀k∈K 1 is an auxiliary variable. Remark 1 (Convexity of the PC Problems): The constraints of (P ne ), (P be ) and (P se ) have the similar structure: several deterministic constraints and one stochastic constraint. The deterministic constraints are convex. The mathematical expectations in these stochastic constraints are all defined on the functions with the structure of log 1 + X Y , so the stochastic constraints are non-convex and have no closed form expression. Therefore, the difficulty of solving (P ne ), (P be ) and (P se ) mainly lie in C4, C7 and C11.

IV. THE PROPOSED POWER CONTROL ALGORITHM
In this section, the PC problems under perfect and imperfect CSI scenarios are further formulated as a uniform problem. A CSSCA based stochastic PC algorithm is then proposed to solve it. This algorithm adjusts the PC scheme according to the real-time CSI per time slot. In the algorithm, the system provides service to all users with minimal transmission power if the channel state allows and provide services to NDSUs with its best effort under the premise of DSUs' requirements otherwise.

A. CONVEX APPROXIMATION OF MEAN DATA RATE
Based on the observations stated in Remark 1, the problems (P ne ), (P be ) and (P se ) can be written in the following uniform problem.
where i ∈ I with I {ne, be, se}, z is the auxiliary variable. z = y when i = se and z = ∅ otherwise. X i is the deterministic constraint corresponding to the demand of DSUs. X ne p ∈ R B+1 + : C1, C3 , X be p ∈ R B+1 + : C1, C6 , X se {{p, z} ∈ R B+1+K 1 + : C1, C10}. ξ ne,k h k , ξ be,k h k , and ξ be,k ĥ k . The definitions of the function w i,k p; ξ i,k for each i ∈ I are given in Appendix C for conciseness.
The function w i,k p; ξ i,k is non-convex for a fixed ξ i,k . And the expectation function f i,k (p) has no closed-form expression. Therefore, (P uf ) is a non-convex stochastic problem. We focus on designing an efficient algorithm to find a stationary point of it. One crucial observation is that the mathematical expectation in the stochastic constraint in (P uf ) is defined on measurable random parameters, i.e., channel estimate ξ i,k . Thus E ξ i,k w i,k p; ξ i,k can be learned online according to the current and historical channel samples through stochastic optimization, such as online-SAA and CSSCA techniques. Compare to the CSSCA technique, the online-SAA usually requires much more channel samples to obtain an accurate approximation of stochastic constraints [18], so the latter is utilized for solving (P uf ).
The CSSCA technique is based on solving a sequence of convex optimization problems obtained by replacing the non-convex stochastic constraint functions in the original problems with some convex surrogate functions. For a fixed ξ i,k , the function w i,k p; ξ i,k is non-convex. Fortunately, as stated in Appendix C, this function is the sum of a convex function and a concave one. Based on this observation, one can construct a structured surrogate function for f i,k (p) by preserving the convex part and linearizing the concave part VOLUME 8, 2020 of w i,k p; ξ i,k . The works in [26] point out that such structured surrogate function enables a faster convergence speed and a more accurate solution.
Proposition 3 (Convex Surrogate Function of Mean Data Rate): Given the approximation center p (n) and the current channel estimate ξ (n) i,k at the n-th CSSCA iteration (the n-th time slot in current time window), the structured surrogate witĥ i,k = 0, and ρ (n) ∈ (0, 1] is a sequence to be properly chosen. The expressions of the above terms of each i ∈ I are given in Appendix D for conciseness. The basic idea of the convex surrogate function is to learn the expectation E ξ i,k [w i,k (p; ξ i,k )] online from the historical PC results and channel estimate. Specifically speaking, the surrogate function is constructed through weighted summing the current and historical approximation functions together. And the weight of the current approximation function is ρ (n) , and that of the historical results is 1 − ρ (n) . The right hand side of the first line in (16) is the accumulation of historical data, where the scalar w ; ξ i,k )] and the vector f i,k ). The second line of (17) is the linearization of the non-convex part of w i,k (p; ξ (n) i,k ) around the point p (n) .

B. THE CSSCA BASED POWER CONTROL ALGORITHM
Replacing the function f i,k (p) in (P uf ) withf (n) i,k (p), we get the convex approximation of (P uf ) at the n-th CSSCA iteration. However, due to the channel fluctuation, this approximation is not necessarily feasible, i.e., the system may not be able to guarantee the needs of all users currently. The requirements of DSUs is inelastic, short-term service interruptions may casue significantly experience drop at DSUs. On the contrary, the requirements of NDSUs is more elastic, temporary performance degradation has little impact on NDSUs' experience. Therefore, we relax NDSUs' performance requirements to improve the feasible probability of DSUs' performance constraints. This purpose can be achieved by solving where u * represents the difference between NDSUs' target data rate and their maximum achievable data rate under current channel condition.p (n) in (18) is the power allocation scheme such that maximizes NDSUs' data rate in current time slot to the greatest extent while meeting the needs of DSUs. If u * ≥ 0, current channel condition is poor and the data rate requirements of NDSUs cannot be met at present. In this case, the system determines the transmission power of each BS according top (n) . If u * < 0, current channel condition is good, and the requirements of all users can be met at present. In order to achieve the objective of the original problem (P uf ), i.e., minimizing the total transmission power, the system further solves to determine the transmission power of each BS in current time slot. The problems (18) and (19) are convex and thus can be optimally solved by existing convex optimization solvers such as CVX [27]. Finally, the approximation center for the next CSSCA iteration (the next time slot) is updated according to where the step size β (n) ∈ (0, 1] is a sequence to be properly chosen. The termp (n) − p (n) is the updating direction of transmission power p, which can be seen as the downward direction in the general descent method. Now, the CSSCA based power control algorithm is described in Algorithm 1. In this algorithm, the choice of the length of a time window is very important. On the one hand, a time window should be long enough so that NDSUs' mean data rate can converge to the target value. On the other hand, a long time window will reduce the system's ability to support user mobility. Therefore, T should be as short as possible while ensuring the convergence of long-term average data rate. Simulation results show that Algorithm 1 demonstrates good scalability, so the length of a time window can be determined by field test. i,k (p) according to (16). 5: Solve problem (18) to getp (n) and u * . 6: if u * ≤ 0 then 7: Solve problem (19) to getp (n) . 8: end if 9: Update p (n+1) according to (20). 10: Estimate the channels ξ Set n ← n + 1. 12: until n ≥ T .

C. CONVERGENCE RESULTS
In this section, we describe the convergence of Algorithm 1. The convergence of the CSSCA algorithm has been established in the works in [18] and [28]. This subsection is to ensure the completeness, state the properties of the step sizes, and check whether the surrogate functions in this paper satisfy the convergence condition or not.
First of all, we make the following assumptions on the problem (P uf ) and the step sizes.
Assumption 1 (Feasibility of the original problem): Let p * F be any stationary point of the following feasibility problem: We assume that f i,k p * F ≤ 0, ∀k ∈ K 2 . This assumption ensures that problem (P uf ) is feasible. It should be mentioned that the system can check the problem's feasibility by solving the corresponding feasibility problem [29]. If the problem is infeasible for multiple consecutive time slots, the problem can be seen as infeasible. In this case, the system can update the resource allocation to start a new round of PC. On the other hand, if the problem is feasible for multiple consecutive time slots, the problem can be seen as feasible.
Assumption 2 (Properties of the step sizes): = 0. We can observe that the functionŵ i,k p, q; ξ i,k in (17) has the following properties described in Lemma 1.
Lemma 1: For all i ∈ I, k ∈ K 2 , we have 1)ŵ i,k p, p; ξ i,k = w i,k p; ξ i,k and ∇ pŵi,k p, p; ξ i,k 2)ŵ i,k p, q; ξ i,k is uniformly strongly convex in both p and q.
3) For any ξ i,k ∈ i,k and q ∈ X i , the function w i,k p, q; ξ i,k is Lipschitz continuous in both p and q.
4) The functionŵ i,k p, q; ξ i,k , its derivative, and its second order derivative w.r.t. p are uniformly bounded.
3) For any p ∈ X i , the functionf (n) i,k (p), its derivative, and its second order derivative are uniformly bounded.
Proof: The proof is given in Appendix E.

Lemma 3 (Asymptotic Consistency of the Surrogate Function):
For all i ∈ I and k ∈ K 2 , we have Under Lemma 3, Algorithm 1 will converge to a stationary point almost surely ( [18], Theorem 1). A formal statement about this convergence is given in the following theorem.
Theorem 1 (Convergence of Algorithm 1): For any subsequence p (n j ) ∞ j=1 converging to a limit point p * , if the Slater condition is satisfied at p * , then p * is a stationary point of problem (P uf ) almost surely.

D. SIGNALLING OVERHEAD AND COMPUTATIONAL COMPLEXITY
The centralized PC is considered in this paper. In each time slot, every BS transmits its CSI to the control center, which requires one transmission of a complex number. After current power allocation scheme is determined, the control center informs each BS of the transmission power, which needs to transmit one real number. Therefore, the signalling overhead between a BS and the control center is O (1) per time slot.
Since the length of a time window is fixed, the complexity of the proposed algorithm depends on the complexity of solving the convex approximation in each time slot. The primaldual interior point algorithm is utilized to solve these convex approximations. Under the perfect CSI and bounded CSI error scenarios, the dimension of the optimization variable is (B + 1) × 1 and the number of constraints is B + 1. The dimension of the coefficient matrix of the Newton equation is (B + 1)×(B + 1). So the complexity of calculating Newton's direction is O (3 (B + 1)) 3 . In addition, the interior point VOLUME 8, 2020 algorithm requires O 1 log ε iterations when the optimization accuracy is ε. Therefore, the computation complexity of the proposed algorithm under perfect CSI and bounded CSI error scenarios is O (3(B+1)) 3 log ε per time slot. As for the stochastic CSI error scenario, the dimension of the optimization variable is (B + 1 + K 1 ) × 1 and the number of constraints is B + 1 + BK 1 . Therefore, the computation complexity of the proposed algorithm under stochastic CSI error scenario is O (3(B+1)+2 BK 1 ) 3 log ε per time slot.

V. SIMULATION STUDIES
In this section, we demonstrate the performance of our proposed PC scheme through numerical simulations.

A. SYSTEM SETUP
The radius of the macro-cell is 1 km, the maximum transmission power of the MBS is 40 dBm, and that of the SBS is 20 dBm. The noise power at the user equipment is −95 dBm. The bandwidth W = 20 kHz and R min k = 20 Kb/s, ∀k ∈ K 2 are considered. The radius of the small-cell is 100 m. The MBS is located at the center of the area and the SBSs are evenly distributed on this area. The distance between SBSs and the associated SUs is uniformly distributed in (0, 100] m, and that between the MU and MBS is uniformly distributed in (0, 500] m. The path loss PL k,b is defined by 10 log 10 PL k,b = −34.5 − 38 log 10 d k,b [5], where d k,b is the distance between BS b and user k. The small scale channel gain k,b ∼ CN (0, 1). Thus the channel gain between BS b and user k is g k,b = PL k,b k,b . Throughout this simulation, the parameter τ k = 1 × 10 −8 . The step size of surrogate function is ρ (1) = 1 and ρ (n) = 2 (2+n) 0.6 , n ≥ 2. The update step size of CSSCA approximation center is β (1) = 1 and β (n) = 2 (2+n) 0.61 , n ≥ 2. The simulation is conducted on MATLAB 2018a. The convex approximation problem at each CSSCA iteration is solved through the CVX toolbox [27]. Every data point in Fig. 4, 5,8,9,11,13, and 14 is the average of 10 simulation experiments with 200 channel realizations per experiment. The positions of the users are updated in each experiment.

B. PERFECT CSI SCENARIO
The benchmark scheme in this scenario is the problem (P bm ). The target SINR for NDSUs in the benchmark is fixed at k = 0 dB, ∀k ∈ K 2 . Fig. 2 records a PC process of 150 channel realizations under perfect CSI scenario. The first subfigure records the real-time SINR of a DSU. We can see that the instantaneous SINR of the DSU exactly reaches the target value in each time slot. The second subfigure records the real-time data rate of a NDSU, in which the mean data rate is the average of the real-time data rate across the whole 150 channel realizations. Although the instantaneous data rate of the NDSU oscillates with the fluctuation of channel gain, its average data rate still reaches the target value. In a word, both DSUs' instantaneous SINR and NDSUs' mean data rate requirements are achieved.  The convergence process of NDSUs' mean data rate under perfect CSI scenario is shown in Fig. 3. The ''mean data rate'' at the i-th channel realization is the average of the data rate from the first to the i-th channel realization, i.e., i r i i where r i is the data rate at the i-th channel realization. In each subfigure, NDSUs' mean data rate reaches the preset target value at almost the same number of iterations under different number of small-cells. So, the scale of the network (i.e., number of cells) has little effect on the convergence rate of the proposed algorithm under the same target SINR. In other word, Algorithm 1 demonstrates good scalability. There are two reasons for this: firstly, the path loss of the channel gain is exponentially attenuated, and the path loss factor is greater than 2, the interference suffered by the users mainly comes from the base station around them; secondly, the small-scale fading of the channel gain is ergodicity, the surrogate function can quickly converge to the actual expectation function. Fig. 4 shows the relationship between the feasible probability of DSUs' instantaneous SINR constraints and the number of SBSs. The feasible probability is the probability  of the event that the instantaneous constraints of DSUs are feasible. In each time slot, we solve the feasibility problem min p {0 : {p, z} ∈ X i } [29] to check whether the PC problem is feasible or not. x% DSUs'' means that the MU and x% of SUs are DSUs and the rest of the users are NDSUs. On the one hand, the feasible probability decreases with the increase of the proportion of DSUs and the target SINR. On the other hand, the feasible probability of Algorithm 1 is always higher than that of the benchmark scheme. Finally, it's important to point out that NDSUs have lower priority than DSU users. In each time slot, PC can only be performed when DSUs' constraints are feasible, so NDSUs' target mean data rate does not affect the feasible probability.
The average total transmission power of the proposed scheme is shown in Fig. 5. The total transmission power is the sum of the transmission power of each BS. With the increase of the proportion of DSUs and the target SINR, the total transmission power of the system is constantly increasing. However, the total transmission power is always lower than the benchmark scheme. Combining the observations in Fig. 4 and 5, we can conclude that the proposed algorithm  can achieve much higher feasible probability for DSUs' instantaneous constraints and has lower energy consumption than the conventional scheme.

C. BOUNDED CSI ERROR SCENARIO
In this subsection, the complex channel gain g k,b = PL k,b (˜ k,b + e k,b ), where˜ k,b and e k,b are the estimate of small scale channel gain and the associated bounded CSI error, respectively. e k,b 2 ≤ 1/100 is considered. The benchmark scheme in this scenario is The target SINR for NDSUs in the benchmark is fixed at k = 0 dB, ∀k ∈ K 2 . Fig. 6 records a PC process of 150 channel realizations under bounded CSI error scenario. It can be observed that VOLUME 8, 2020  DSUs' real-time SINR in each time slot exceeds the target value and NDSUs' average data rate is a little higher than the associated target value. This is because of the worst-case optimization in (P be ), which leads the conservative PC results.
The convergence process of NDSUs' mean data rate under bounded CSI error scenario is shown in Fig. 7. Since the original SINR is replaced by its lower-bound in (P be ), the convergence value of the average data rate is a little higher than the target value. We can also observe that the number of SBSs has few affections on the convergence speed. Therefore, the proposed scheme still holds good scalability when the bounded CSI error presents. Fig. 8 shows the relationship between the feasible probability of DSUs' instantaneous SINR constraints and the number of SBSs. The average total transmission power is shown in Fig. 9. It can be seen that the proposed scheme can achieve much higher feasible probability for DSUs with less energy consumption under the same performance constraints compare to the conventional PC scheme. In other words, the proposed scheme has higher energy efficiency.

D. STOCHASTIC CSI ERROR SCENARIO
The simulation results when the stochastic CSI error presents is given in this subsection. The complex channel gain g k,b = PL k,b (ˆ k,b + e k,b ), whereˆ k,b and e k,b ∼ CN 0, 1 400 are the estimate of small scale channel gain and the associated stochastic CSI error, respectively. The maximum tolerable SINR outage probability ε k = 0.01 is considered. The benchmark scheme in this scenario is The target SINR and outage probability for NDSUs in the benchmark are fixed at k = 0 dB, ∀k ∈ K 2 and ε k = 0.1, ∀k ∈ K 2 , respectively. Fig. 10 records a PC process of 150 channel realizations under stochastic CSI error. DSUs' instantaneous SINR 8908 VOLUME 8, 2020  constraints are violated in some time slots. This is because the probability constraints C8 used in this scenario allow outage to occur within a certain probability. The outage probability under different system setups is shown in Fig. 11. The outage probability increases with the number of small-cells and the target SINR. We can also observe that the practical outage probability is much less than the maximum tolerant value, which means that the robustness of DSUs' instantaneous performance is far more guaranteed. This is because that the constraint C10 in (P se ) gives a very conservative approximation of the original SINR outage probability constraint. From the view of negative side, the proposed scheme introduces unnecessary energy consumption and reduce the feasible probability of the PC problem. To overcome this drawback, a more accurate approximation of C8 needs to be found, which is reserved for future study.
The convergence process of NDSUs' mean data rate under stochastic CSI error is recorded by Fig. 12. We can also find that the proposed scheme under stochastic CSI error still demonstrates good scalability. The feasible probability and average total transmission power are shown in Fig. 13 and 14, respectively, which shows that the proposed scheme under stochastic CSI error has higher energy efficiency with higher feasible probability for DSUs' instantaneous SINR constraints.

VI. CONCLUSION
With the consideration of various user requirements, this paper has proposed a PC scheme that minimizes the total transmission power based on stochastic optimization. We have used instantaneous SINR constraints to ensure the success of data transmission per time slot to meet DSUs' low latency requirements and mean data rate constraints to ensure the NDSUs' need for high average data rate. Under this scheme, we have formulated the PC problems under perfect CSI, bounded CSI error and stochastic CSI error scenarios as a uniform problem. Because of NDSUs' mean data rate constraints, the uniform PC problem is a non-convex stochastic constrained problem. The recently proposed CSSCA technique has been utilized to handle this problem. We have also presented the signalling overhead and computation complexity of the proposed algorithm. Extensive simulation has shown that the proposed scheme can significantly improve the feasible probability of DSUs' instantaneous SINR constraints while maintaining high average data rate for NDSUs and reducing energy consumption in the meantime. Additionally, the proposed scheme demonstrates good scalability, which makes it applicable to large scale HCNs.

APPENDIXES APPENDIX A
Let e k = C 1/2 k v k with v k ∼ CN (0, I). We have the following Lemma about the quadratic functions of Gaussian random variables.
Lemma 4 [17]: Let G = v H Qv + 2 v H u , where Q ∈ H B+1 is a complex Hermitian matrix, u ∈ C B+1 , and VOLUME 8, 2020 v ∼ CN (0, I). Then for any δ ≥ 0, we have where s + (Q) = max {λ max (−Q) , 0} in which λ max (−Q) denotes the maximum eigenvalue of matrix −Q, and · F denotes the matrix Frobenius norm. For user k ∈ K 1 , the probabilistic constraint C8 can be rewritten as where Q C Equation (24) (24) can be further represented as Defining q diag {Q} = C k B k p, we have 1 T C k B k p− 2δ k q 2 2 +2 u 2 −δ k y k ≥ σ 2 k −ĥ T k B k p y k − c 2 k,b p b ≥ 0, ∀b ∈ B, b = k, (26) which can be further expressed as Substituting u = C 1/2 k B kĜk p and q = C k B k p into (25), we can get the constraints C10.

APPENDIX B
Proposition 2 can be equivalent to E e k log (1 + γ k )|ĝ k ≥ log 1+ĥ k,k p k (ĥ k + C k 1) T A −k p+σ 2 k , which gives a lower bounded of E e k log (1 + γ k )|ĝ k . To prove Proposition 2, we only need to prove (28).
First of all, we have the following Lemma about logarithmic functions.
It is worth noting that the right-hand side (RHS) and left-hand side (LHS) of (29) are equal at γ =γ , and the same holds for their derivatives with respect to γ evaluated at γ =γ . The inner mathematical expectation in C9 has the following lower-bound: where α k =γ k 1+γ k and β k = log (1 +γ k ) −γ k 1+γ k log (γ k ) withγ k =ĥ k,k p k b =k ĥ k,b +c 2 k,b p b +σ 2 k . The inequality (a) follows the well known Jensen's inequality E log 1 + 1 . The inequality (b) follows Lemma 5. In equality (c), the random variable X = √ 2 c k,kĝ k,k + √ 2 c k,k e k,k 2 . Therefore, X follows non-central chi-square distribution with 2 degrees of freedom and non-central parameter 2ĥ k,k c 2 k,k , i.e., X ∼ χ 2 2ĥ k,k c 2 k,k , 2 . As pointed out in [30], the RV Z ∼ χ 2 (θ, 2) satisfies E Z log (Z ) = log (θ ) + E 1 (θ), where E 1 (z) The equality (d) follows that lim z→∞ E 1 (z) = 0 and ≈ 0 when the channel estimate is sufficiently