C-RAN Zero-Forcing with Imperfect CSI: Analysis and Precode\&Quantize Feedback

Downlink joint transmission by a cluster of remote radio heads (RRHs) is an essential technique for enhancing throughput in future cellular networks. This method requires global channel state information (CSI) at the processing unit that designs the joint precoder. To this end, a large amount of CSI must be shared between the RRHs and that unit. This paper proposes two contributions. The first is a new upper bound on the rate loss, which implies a lower bound on the achievable rate, obtained by a cluster of RRHs that employ joint zero-forcing (ZF) with incomplete CSI. The second contribution, which follows insights from the bound, is a new CSI sharing scheme that drastically reduces the large overhead associated with acquiring global CSI for joint transmission. In a nutshell, each RRH applies a local precoding matrix that creates low-dimensional effective channels that can be quantized more accurately with fewer bits, thereby reducing the overhead of sharing CSI. In addition to the CSI sharing-overhead, this scheme reduces the data rate that must be delivered to each RRH in the cluster.

However, JT requires ultra high-rate data-sharing and lowlatency CSI-sharing between the BBU and each RRH. This exchange usually necessitates a direct wired-link, whose deployment might be impossible due to cost and other infrastructural constraints typical to urban areas (see, e.g. [2]). The problem may be tackled by reducing RRH cluster size and connecting more functional RRHs via higher latency links. These RRHs carry latency-constrained operations locally, whereas, most computationally demanding functions remain at the BBU. 1 That topology makes JT more challenging compared to a fully centralized BBU with a fixed-wiredfronthaul.
If JT is limited to the latter case, it will lead to small inflexible RRH clustering, excluding the more functional RRHs, that cannot adapt to varying network loads and user deployments. Developments in several fields play an essential role in realizing large-scale JT; also known as cell-free massive-MIMO. These include advanced pilot allocation [4], [5], [6] and robust design [7], alleviating pilot contamination, fronthaul data compression [8] and allocation [9], energyefficient algorithms [10], [11], [12], and retransmission protocols at the network edge [8]. Other key factors are the emergence of software-defined networks (SDN) and fog-based RAN [13], which decouple the control plane and data plane, and its incorporation into C-RAN [3], providing a suitable environment for non-centralized JT [14], [15], [16], [17].
SDN deploys multi-access edge computing (MEC) units close to RRHs for heavy processing; each MEC is connected to the core network separately. Furthermore, SDN coordinates these MECs through a virtual infrastructure manager (VIM) interconnected to the MEC via a dedicated link designated for control signals typically delivered at low latency and lower rate than the data. Merging these modules with state-of-the-art C-RAN control/data plane units [15] may improve spectrum utilization significantly via flexible JT involving an RRH cluster with interconnection used only for control signal [3], [15], [17]. It is, therefore, essential to consider non-centralized Fig. 1. System model. Link L1 interconnects the S-RRHs to higher-level C-RAN functions (cf. footnote 3). The joint precoding-matrix computation unit (JPMCU), located close to the S-RRHs (cf. footnote 2), is connected via the low-latency, rate-limited link L2, which may be physically separated from L1.
JT involving RRHs of different clusters and more functional RRHs (cf. footnote 1).
We consider JT, in which a joint computation unit having global CSI calculates the joint precoding matrix (JPM) used for JT. Explicitly, each distributed transmitter, henceforth dubbed smart-RRH (S-RRH), sends its CSIT to that unit, henceforth dubbed JPMCU, via a low-latency albeit rate-limited link as depicted in fig. 1. 2 Beyond the current C-RAN configuration, 3 the S-RRH terminology is convenient for describing JT in the evolving SDN topology where MECs and their RRHs may be considered S-RRH and the VIM hosting the JPMCU.
D-MIMO setups differ in the type of channel state information at the transmitter (CSIT). In the first type, dubbed centralized channel state information at the transmitter (C-CSIT) [18], each RRH sends its CSIT to the BBU. The latter thus has a single estimate of the global CSIT, from which it calculates the JPM. Finally, the BBU feeds each RRHs its corresponding JPM sub-block perfectly. In another type of CSIT, dubbed distributed channel state information at the transmitter (D-CSIT) [2], [19], [20], no single entity calculates the JPM based on a single global-CSIT estimate. Instead, each RRH broadcasts its local CSIT to other RRHs (e.g., via a low-latency wireless broadcast channel), then estimates the CSIT locally, leading to a different global-CSIT estimate for 2 We use the term joint precoding matrix computation unit (JPMCU), which is not a standard in C-RAN, as a convenient, concise, logical representation of the joint precoding matrix (JPM) computation operation frequently referred to in the paper. 3 Examples of practical systems where our setup (cf. fig. 1) is suitable are C-RAN configurations with functional splittings where the distribution unit (DU) and central unit (CU) are physically separate and connected via a mid-haul link which may have too-high latency (cf. [3], Sec. F., G. and L.) preventing JPM calculation at the CU. Assuming such mid-hauls where several DUs jointly serve multiple MSs, we treat each DU and its corresponding RRH (or RRHs) as the module dubbed S-RRH in this paper for convenience. fig. 1 describes that setup if one treats the BBU as the CU and considers each S-RRH as a distinct DU, where L1 is the mid-haul link. Then, to facilitate JT, one may realize the JPMCU in one of the DUs and use the Xn link [3] (which may be rate-limited) as the L2 link. The fronthaul in this case is embedded within the S-RRH. each RRH. Finally, the RRH calculates its JPM from its locally known global-CSIT.
In this paper, we consider only CSI-quantization errors while neglecting CSI errors due to latency (outdated CSI). Upon receiving global CSIT, the JPMCU calculates the JPM. However, unlike the C-CSIT setup (where the error is only in the CSIT at the BBU), the JPMCU does not send each S-RRH its corresponding submatrices perfectly but instead sends a quantization. The proposed setup is similar to the D-CSIT in that the employed precoding matrix contains errors compared to that of the centralized design. The difference is in the error type. While in D-CSIT, the additional JPM error (compared to C-CSIT) follows from independent CSIT-errors at each S-RRH; in the proposed scheme, that error is due to the quantization of the centralized JPM.
The paper presents two contributions. The first is a new upper bound on the rate-loss, where the JPMCU sets the overall joint-ZF precoding matrix using imperfect CSI (cf. fig. 1), compared to perfect CSI, where CSI errors are due to quantization. That upper bound yields a lower bound on the achievable rate. We assume that each S-RRH quantizes its local CSI using random vector quantization (RVQ) [31]. Similar bounds for the broadcast channel and D-MIMO with imperfect CSI appear in [21], [22] and [5], [28], [29], [32], and [6], respectively, all of which consider C-CSIT. The proposed bound here differs from the latter bounds due to the JPM quantization, which does not exist in the C-CSIT. Furthermore, in [21] and [22], the overall channel to each terminal is quantized as a haul, whereas here, in sub-blocks. This sub-block quantization induces an entirely different CSI error distribution leading to a distinct bound. Moreover, [5], [28], [29], [32], and [6] consider the large system regime, whereas the analysis here does not. 4 Finally, [5] and [6] deal with channel impairment due to pilot contamination, whereas in this paper, the error is due to CSI quantization. Another relevant rate-loss bound is [19], which, unlike here, considers the D-CSIT setup, which is different as discussed above. Moreover, the bound [19] differs from the proposed bound because it assumes single-antenna transmitters and considers the high-signal-to-noise ratio (SNR) regime. A recent bound under no such assumption for the D-CSIT setup appears in [2]. However, beyond the D-CSIT, the latter bound considers the large system regime, whereas the proposed bound does not. Finally, regardless of C-CSIT or D-CSIT, the proposed bound is not limited to the large system regime nor the high SNR regime, as are all the D-MIMO results above.
The second contribution is a new precoding and CSI sharing scheme, dubbed precode and quantize (P&Q), with two key features. First, it reduces the number of CSI quantization bits transferred on the L2-link (cf. fig. 1) between the S-RRHs and the JPMCU. The other feature is reducing the overall data rate between the S-RRHs and the BBU; i.e., the P&Q reduces JT overhead incurred on L1 due to delivering additional user data. There are different approaches for reducing JT CSI-overhead. One method, designated for uplink JT, compresses the CSI delivered to the JPMCU [33]. Other techniques are robust (to inaccurate CSI) precoding [34], [35] and compressive CSI acquisition [36]. De Kerret and Gesbert [20] proposed spatial CSIT allocation policies maximizing the generalized degrees of freedom. Sanguinetti et al. [29] designed linear precoders that minimize power consumption under a target-rate constraint. Pan et al. [12] and [7] presented low-complexity user selection and JT designs.
A key distinguishing characteristic of the P&Q is that it applies front-end precoding matrices at the S-RRHs before CSI quantization. These matrices aim at improving CSI accuracy at the JPMCU. Each S-RRH autonomously calculates and applies a matrix based on its local CSI, thereby creating an effective channel of lower dimensionality that can be quantized more accurately. These channels are then quantized and sent to the JPMCU, which calculates a joint precoding matrix and feeds it back to the S-RRHs. We show, theoretically and numerically, that this scheme significantly increases the network throughput compared to the standard scheme, in which each S-RRH quantizes its local CSI and feeds it back to the JPMCU. This performance gain remains for a wide range of CSI quantization bits and SNR.
Notation: Boldface lower (upper) case letters denote vectors (matrices). (·) * and (·) † denote the conjugate and the conjugate transpose operations, respectively, and and ⊗ are the Hadamard and Kronecker products, respectively. Let a, b, be vectors, thenā = a/ a . and ∠ a, b is the angle a b. In addition, let Q be a set and q ∈ Q, then H denote the projection matrices into space spanned by H and into its orthogonal complement, respectively. Also, χ A (x) represents the indicator function; that is, χ A (x) = 1 if x ∈ A and 0 otherwise, I N denotes an N × N identity matrix and 1 N , 0 N denote an N × 1 vector of ones, and zeros, respectively. Finally, we use log for the base 2 logarithms.

II. SYSTEM MODEL
Consider a cluster of M S-RRHs, each with N t antennas, that jointly serve Q single-antenna MSs, as depicted in Fig. 1.
We denote the set of S-RRHs {1, . . . , M} by M and the set of MSs {1, . . . , Q} by Q. Assuming flat fading channels, the downlink signal, observed by MS-q, is where n q is an additive, proper-complex Gaussian noise n q ∼ CN (0, σ 2 n ), x m ∈ C Nt×1 is the signal transmitted by S-RRH-m; h q,m ∈ C Nt×1 is the channel between S-RRH-m and MS-q. We further denote The channels are Rayleigh, independent identically distributed (i.i.d.) block-fading (see [37], Ch. 5.4). Moreover, we assume large-scale fading (e.g., pathloss and shadowing effects), expressed by an attenuation factor α q,m . Explicitly, the channel varies at each coherence time, whereas α q,m remains constant during the entire codeword. Definition 1: We use a practically oriented short-time power constraint P max for each S-RRH; i.e., E x m 2 |U ≤ P max , ∀m ∈ M for every coherence-time, where U is the overall instantaneous-CSI. We further employ a linear precoding scheme in which x m = q∈Q s q p q,m , where s q ∈ C is the information-bearing signal intended to MS-q and p q,m ∈ C Nt×1 is the precoding vector from S-RRH-m to MS-q. Finally, s 1 , . . . , s q are assumed i.i.d. and s q ∼ CN (0, P q ).
We focus on a fully cooperative multi-cell system; thus, the joint downlink transmission can be conveniently modeled as a large multiple-input single-output (MISO) broadcast channel with M N t transmitting antennas such that the signal observed by MS-q is where E{|s q | 2 } = P q , p q 2 = 1 and p q is the overall joint beamforming vector designated for MS-q; i.e., We assume channel reciprocity (such as in time division duplex) and consider SUD; i.e., each MS treats the interfering signals as noise. Therefore, every S-RRH estimates the channels between it and each MS served by the cluster. Assumption 1: The long-term channel characteristics are locally known at each S-RRH and globally known at the JPMCU; i.e., for each m ∈ M, S-RRH-m knows {α q,m } q∈Q whereas the JPMCU knows {α q,m } m∈M,q∈Q . Since these parameters are conveyed to the JPMCU only once, we neglect the associated overhead on the L2-link (cf. Fig. 1). Moreover, for simplicity, we assume that each S-RRH-m has perfect local CSI {h q,m } q∈Q ; i.e., no estimation errors.
Definition 2: S-RRH -m quantizes its CSI and sends the indices of the quantization codewords {c q,m } q∈Q with an overall number of B bits to the JPMCU. Upon receiving all the codewords U = {c q,m } q∈Q,m∈M , the JPMCU estimates h q , ∀q ∈ Q aŝ whereĥ q,m is the estimate of h q,m , ∀q ∈ Q, m ∈ M. For now, we do not restrict ourselves to a particular quantization or estimation method. Henceforth, we refer to this procedure as the standard CSI feedback scheme. Based on {ĥ q } q∈Q , the JPMCU calculates the overall joint precoding matrix as follows where the columns of N q ∈ C MNt×MNt−(Q−1) form an orthonormal basis for the null space of {ĥ j } j∈Q-q . Henceforth, we refer to this scheme as ZF beamforming. After setting p q , ∀q ∈ Q, the JPMCU quantizes it and feeds each S-RRH with its corresponding components. Definition 3: For each m, the JPMCU quantizes {p q,m } q∈Q with overall B bits and then sends to S-RRH-m. The corresponding estimate at S-RRH-m, is denoted by Because p q is orthogonal to {ĥ j } j∈Q-q rather than {h j } j∈Q-q , there is a performance loss compared to the case of perfect CSI due to residual interference, even ifp q,m is quantized without errors. For simplicity and analytical tractability, we assume that the data signals s q , q ∈ Q are delivered to the S-RRHs without errors. 5 Moreover, we assume the same for CSIT and the JPM after they have been quantized; i.e., that the JPMCU receivesĥ q,m via from all S-RRHs via L2 without errors and that the latter receive errorlessp q,m from the former. 6 III. DOWNLINK C-RAN-JT: PERFORMANCE ANALYSIS FOR ZF WITH IMPERFECT CSI This section introduces a new upper bound on the throughput degradation under limited CSI compared to perfect CSI. We consider vector quantization where the channel directional information (CDI)h q,m = h q,m / h q,m is quantized separately using RVQ [31], 7 with independent codebooks for every q, m. Moreover, for simplicity and analytical tractability, we assume that the channel magnitude information (CMI) h q,m , ∀q ∈ Q, m ∈ M is perfectly conveyed to the JPMCU. 8 We assume the same about {p q,m } q∈Q,m∈M .
We now review some of the properties of RVQ. Letĥ q,m be the output of RVQ with b bits. Then, where s q,m is a random vector uniformly distributed on the unit sphere of the null space ofĥ q,m , and Z q,m is a random variable, independent of s q,m , and distributed as the minimum of 2 b beta (N t − 1, 1) random variables [21]. Under the assumption of perfect CMI, the JPMCU useŝ as the estimate of h q,m .
where s q is defined in definition 1, and denote the signal transmitted by S-RRH-m to MS-q as s q,m = P q,msq . In this paper n q ∼ CN (0, 1) (cf. (4)) and to prevent S-RRHs from violating their power constraint P max , we set 9 From (4), (11) and (10), the signal-to-interference-plus-noise ratio (SINR) at MS-q is andp q,m is the estimate of p q,m (see (5), and (7)) under RVQ, similar to (9). Given that the CSIT at the JPMCU is {p q ,ĥ q }, and that MS-q channel state information at the reciever (CSIR) is {h † qpq , SINR q }, the ergodic-rate achievable under SUD and ZF beamformerp q (cf. (13)), 10 iŝ To evaluate performance, later we will compareR q to the corresponding throughput R q without quantization error; i.e., 9 Note that the signal transmitted by S-RRH-m can be written as xm = È Q q=1 Ô Pq,msqpq,m where pq,m are defined in (5). In the case were hq,m, q ∈ Q, m ∈ M} are i.i.d., it can be shown that pq are also i.i.d. Hence S-RRH-m overall transmit power is E{ xm 2 = È Q q=1 Pq,mE{ pq,m 2 }. Now let ϑ = E{pq,m}, then one may set pq,m, q ∈ Q such that E{ x 2 } = ϑ È Q q=1 Pq,m = Pmax,. Moreover, (5) and (7) imply that pq,m, m ∈ M are coupled since all must increase simultaneously. Nevertheless, while the assumption is not optimal (see, e.g., [11]), we adopt it to simplify the theoretical analysis. 10 The necessary CSI may be obtained using standard techniques, in which, after applying the beamformer, there is a second training phase in which MS-q estimates its effective channels h † qpi , i ∈ Q using dedicated pilots (see, e.g. [22] and [38]). Note that each h † qpi is a scalar channel and that this process is made in one step. It can be shown that the resulting equivalent channel satisfies the suppositions of Proposition 3 in [39] that leads toRq in (14). The expectation in (14) is over the joint distribution of the channels and the RVQ random codebook. Hence both the receiver and the transmitter can calculateRq since these distributions are known. Further details of the technique used to derive (14) appear in [38].
where p q is given in (7) while substitutingĥ q = h q , ∀q ∈ Q, and is assumed fed-back perfectly.
Theorem 1: Consider the signal (3), and the power profile (10), (11). Further consider assumption 1 and definitions 2 and 3, and let ΔR q = R q −R q , be the rate loss whereR q and R q are defined in (14) and (15), respectively. Consider q ∈ Q and assume that h q ,m , p q ,m are quantized with B/Q ∈ N bits (cf. definitions 2 and 3), each, ∀q ∈ Q, ∀m ∈ M; then ΔR q ≤ ΔR 1,q + ΔR 2,q , where Here where β(·) is the Beta function. Remark 1: The perfect-CSI rate, R q , can be calculated based on known results. For example, consider the case where the long-term channel-attenuation is equal for each S-RRH; i.e., α q,m = α q,m ∀m, m ∈ M (cf. (3)), and without loss of generality, assume that α q,m = 1/M ∀m ∈ M. In this case, it is straightforward to show that where, T = M N t − (Q − 1) and Γ (·, ·) is the incomplete Gamma function. In the case where ∃ m = m such that α q,m = α q,m , an expression for R q is complicated. A closedform expression can be found in [40] (after straightforward adaptations to ZF) in the two-user case. For more than two users, such an expression is too complicated; nevertheless, it can be approximated, see [41] Sec. IV.A for the two-user case and [42], [43] for more than two users. Proof of theorem 1: By the assumptions of theorem 1 and using (12), (14), (15), it follows that wherep q and p q are defined in (12) and (15), respectively, and The inequality (21) follows because P j∈Q-q |h † qpj | 2 ≥ 0 and log(1 + x) is a monotone increasing function. The desired bound on ΔR q (cf. theorem 1) then follows from the following lemmas.
Proof: See App. A and B for theorems 2 and 3, respectively.
Remark 2 (The Accumulated Effect of JPM Quantization): Examining proof of theorem 1, we note the additional quantization of the JPM doubles the rate gap given by (16) and (17); i.e., channel and JPM quantization have the same accumulating effect. This property is insightful regarding the bit allocation tradeoff of both quantities.
We now present an asymptotic expression for theorem 1.

Corollary 4:
The bound ΔR q ≤ ΔR 1,q + ΔR 2,q bound on ΔR q (cf. theorem 1) can be further approximated as where Γ(·) is the Gamma function and V M (a) = Γ(2a where ΔR 1,q and ΔR 2,q are defined in (16) and (17), respectively, and a) . It can be shown that By substituting the latter into W 1 (z), it can be shown that . Then, by substituting W 1 , W 2 and W 3 into (23) while taking lower order terms, the desired result follows.
We conclude this section with some insights. From theorem 4, it follows that the rate-gap decreases at the as P → ∞. Therefore, to maintain the overall number of degrees of freedom, 2 −B 2Q(N t −1) should decrease at least like √ P ; i.e., the number of bits per channel should, at least, increase linearly with the SNR in dB as well as with the number of MSs. Otherwise, the network is interference limited. This result is consistent with previous findings on the single-Tx broadcast channel (cf. [21]). Finally, the rate gap decrease 2 −B 2Q(N t −1) implies that it is possible to reduce the rate gap without increasing B by having a smaller Q, or having an effective number of antennas less than N t . The latter insight is the motivation for the P&Q CSI sharing scheme presented in the following section. However, while ΔR q is improved if N t or Q decreases, R q deteriorates due to a loss in antenna gain. This trade-off determines if the achievable rate,R q (cf. theorem 1), increases or decreases.
In the sequel, we show thatR q can be drastically improved under a good precoding strategy in most cases. Numerical results for the proposed bounds are given in Sec. VI.

IV. THE PRECODE AND QUANTIZE CSI SHARING SCHEME
The P&Q CSI sharing scheme aims to reduce CSI overhead in the L2-link and the fronthaul information rate. Each S-RRH, say S-RRH-m, applies a front-end precoding matrix that can be quantized more accurately than h q,m [38].

Definition 4 (MS Allocation Policy):
be the overall effective channel. To set A m , S-RRH-m picks a subset of the MSsS m ⊂ Q, where |S m | =Q, according to the policy detailed next. Knowing {α q,m } q∈Q S-RRH-m, picksQ MSs that have the most significant attenuation; that is,S m includes MSs such that α q,m ≤ α q ,m , ∀q ∈S m , q ∈ Q \S m .
GivenS m , A m is set as the projection matrix into the null space of the matrix whose columns are given by {h q,m } q∈Sm ; i.e., is an orthonormal basis for the orthogonal complement of span({h q,m } q∈Sm ). Thus, S-RRH-m now serves only Q −Q MSs, denoted by S m = Q \S m ⊂ Q. 11 From (25), and because each S-RRH has perfect local CSI,h q,m = 0Ñ t , ∀q ∈S m . Thus, S-RRH-m now sends the JPMCU only Q −Q channels {h q,m } q∈Sm , of lower dimensionÑ t < N t , which can be quantized more accurately. Denote the estimate ofh q,m at the JPMCU byĥ q,m andĥ 11 Under this policy, MSs may remain unserved; i.e., q ∈Sm, ∀m ∈ M. In this case, these MSs can be reallocated at the expense of MSs that are served by the largest number of S-RRHs.
Since the JPMCU knowsS m 12 it also knows thath q,m = 0Ñ t , ∀q ∈S m , m ∈ M; hence it only estimates {h q,m } q∈Sm,m∈M , whereas {ĥ q,m } q∈Sm,m∈M are set to zero; i.e.,ĥ q,m = 0Ñ t , ∀m ∈ M, q ∈S m . Upon receiving the CSI from all S-RRHs, {ĥ q } q∈Q , the JPMCU computes {p q } q∈Q , wherẽ is the overall beamformer designated for MS-q. The estimation process is the same as in section II while using (27) rather than (5). The beamformerp q also reduces data overhead on L1, which in turn reduces fronthaul-data overhead. This reduction in fronthaul data load follows because each S-RRH serves only a subset of the MSs, full data sharing is unnecessary.
serves MS-q, and 0 otherwise (in the standard scheme every S-RRH serves every MS, hence v q = 1 M , ∀q ∈ Q). Therefore, ifp q =p q (v q ⊗ 1Ñ t ), it follows that some S-RRHs, which do not serve MS-q, do transmit s q . Explicitly, ifh q,m = 0Ñ t andp q,m = 0Ñ t for some m ∈ M, S-RRH-m must transmit the signal s q , which MS-q does not receive. To avoid transmitting more data than necessary, we set the beamformerp q orthogonal to i.e., the beamformer's weights corresponding to S-RRHs that do not serve MS-q are zero. By not sending {s q } q∈Sm to S-RRH-m, we reduce the number of data streams for that S-RRH to Q −Q, rather than Q as in the standard scheme.
Definition 5: The P&Q beamformer for MS-q isp q = N q The factorQ q is the number of MSs such that h † qhj = 0, ∀q, j ∈ Q; i.e.,Q q = Q − j∈Q-q χ {0} (M q,j ), where M q,j is the number of S-RRHs that serve both MS-q and MS-j.
We note that the coefficientQ q (cf. definition 5) is the number of MSs served by at least one of the S-RRHs that serve MS-q.Q q − 1 is the number of MSs to which the ZF precoder must zero the interference inflicted by MS-q.
After settingp q according to definition 5, the JPMCU quantizesp q,m , (cf. (27)) and sends each S-RRH its relevant precoders. Moreover, because {p q,m } q∈Sm,m∈M = 0Ñ t , the JPMCU does not have to send S-RRH-m the entire set {p q,m } q∈Q , but rather sends {p q,m } q∈Sm , which consists solely of Q −Q beamformers. In more explicit terms, it sends the quantization of {p q,m } q∈Sm to S-RRH-m. Since the latter have a lower dimensionÑ t < N t , they can be quantized more accurately. Once having received these quantizations, S-RRH-m sets its overall beamformer toward MS-q aŝ wherep q,m denotes the estimate ofp q,m . Definition 6: The overall P&Q beamformerp P&Q Furthermore, let M q,j = M q ∩M j be the set of S-RRHs that serve both MS-q and MS-j, and denote M q,j = |M q,j |. (4), MS-q observes the signal whereh q = [h † q,1 , . . . ,h † q,M ] † andp q is given in definition 6. We note thath q replaces h q because each S-RRH applies A m (cf. (25)); moreover, the sum runs over Q -q because of the particular choice of A m andp j , j ∈ Q (definition 5), as discussed in section IV. The latter can be written as y q = m∈Mqh † q,mp q,m s q + j∈Q-q m∈Mq,jh † q,mp j,m s j + n q , where M q and M q,j are given in definition 7.
The advantage of the proposed scheme is twofold. From [21], it is known that when quantizing an N -dimensional uncorrelated Rayleigh fading channel with b bits, the quantization error is bounded above by 2 − b N −1 . Therefore, the P&Q has a smaller CSI-quantization error because the channels and beamformers areÑ t -dimensional, rather than N t . Furthermore, since each S-RRH serves fewer MSs, fewer channels and beamformers are delivered to the JPMCU and S-RRHs, respectively, through the limited-rate links. Considering an overall budget of B bits for each S-RRH, it follows that the P&Q scheme allocates each channel B/(Q −Q) bits rather than B/Q in the standard scheme. Consequently, the quantization error is bounded by 2 . The second advantage of the P&Q scheme is in reducing fronthaul data load, which is a major problem in C-RAN. This reduction is because each S-RRH serves only Q −Q MSs. Hence, fewer data signals must be transferred via the fronthaul between the BBU to each S-RRH. Moreover, because each S-RRH now serves fewer MSs, the overall power allocated for each MS may be increased.

V. THE P&Q SCHEME: PERFORMANCE ANALYSIS
To analyse P&Q scheme, we assume the following. Assumption 2: The long-term channel attenuation satisfies α q,m = 1/M, ∀q ∈ Q, m ∈ M.
We pose assumption 2, which is restricting than those in section III, to simplify the analysis P&Q that is more complicated than the standard scheme. This assumption holds, e.g., if one places S-RRHs on the edges of a regular polygon with M nodes and MSs close to each other at the center of that polygon. Then, MSs have approximately the same long-term channel attenuation to each S-RRH. In section VI, we present numerical results not adhering assumption 2.
Definition 9: Letp q be the P&Q beamformer without quantization error; i.e.,p q is obtained by replacingĥ q with h q inp q (cf. definition 5) as well as in the calculation ofÑ q . We further denote the P&Q inherent rate-loss by In other words, ΔR AG,q is the difference between the standard-scheme and P&Q-scheme rates without quantization errors, resulting from the loss in array gain.
Proof: Similar to (21), it can be shown that qpj | 2 and the additional term is given by ΔR AG,q = A 1 −Ã 1 where A 1 is defined in (21). The proof then follows from the following lemmas.
Proof: See App. C. After substituting the inequalities of theorem 6 into (33) it remains to show (32). To this end, we use R = ϕ(T, P/M ) (cf. (20)) andÃ 1 = ϕ(T q ,P /M), obtained by applying the former while replacing T and P withT q andP , respectively.
Proof: Due to space limitations, we provide here an outline of the proof (a detailed proof is given in [44] Supplementary B). The first step shows that the sum in (30) runs over constant terms, and can therefore be replaced by a factor Q −Q − 1 in (34). To this end, one must show that Finally, we substitute the latter result forT q in (32) and obtainT q =T , whereT is given in this corollary.
Next, similar to Corollary 4, we have following corollary. Corollary 8: Consider ΔR 1 and ΔR 2 , given in theorem 7. Then the boundΔR ≤ ΔR 1 + ΔR 2 + ΔR AG in (29) can be further approximated as Proof: The proof is identical to the proof of theorem 4.
We conclude this section with a discussion and insights. By examining Corollaries 4 and 8, it follows that the rate loss ΔR in the standard scheme (which here is not a function of q, cf. remark 3) approaches zero as B increases. . In contrast, the rate gap in the P&Q scheme, ΔR, is bounded away from zero. Explicitly, it approaches ΔR AG > 0 (cf. definition 9), which is independent of B and is due to the array-gain loss induced by the dimension reduction. However, the other terms ΔR 1 + ΔR 2 , comprising ΔR, decrease to zero much faster than ΔR 1 +ΔR 2 (cf. (22) and (36)); therefore, ΔR approaches ΔR AG much faster than ΔR approaches zero. Subsequently, R approaches R −ΔR AG , much faster thanR approaches R . The final observation is thatR can be higher thanR as long as ΔR is more significant than ΔR AG . Numerical results presented in the subsequent section indicate thatR is indeed higher thanR for a wide range of quantization bits.

VI. NUMERICAL RESULTS
Beginning with the theoretical analysis (section III), Figures 2(a) and 2(b) depict the standard-scheme performance (ergodic-rate cf. (14)), evaluated via MC (10 4 channel realizations), compared to the bound described in remark 3 (perfect-CSI rate minus rate gap). Also included is the rate under perfect CSI (cf. (20)). Considering that transmitters could always turn off some of their antennas if it yields a higher rate, it follows that in some cases a more accurate estimate of channels with less antenna yields better performance. Hence for each B, we picked N t ∈ {2, . . . , 8} with the maximum rate in both MC and the bound. Fig. 2(a) considers Q = 2 MSs placed at (-80,0) and (80,0) (in meters), served by M = 4 S-RRHs. We placed an S-RRH for each m ∈ {1, . . . , M} such that its x-y coordinates are the real and imaginary of 80e jπ(1+2m)/4 , respectively (in meters), hence α 1 = α 2 , where α q = m α q,m . We 3.5 path-loss exponent and set the power according to (10) and (11) such that α q P max (cf. theorem 1) is 35 dB (black) and 15 dB (blue). fig. 2(b) considers a symmetric network that satisfies assumption 2 with M = 4 S-RRHs and Q = 8 MSs with a similar power allocation. The results show that the bound gets tighter as B increases, and exhibits the same behavior as the MC. Recalling the antenna turn off, we note that the curves are unsmooth for B in which N t that yields the highest rate varies.
To understand the relation between the CSI plus JPM rates on L2 and the data rates on L1, which directly affect the fronthaul rates, we now calculate these rates in a practically oriented setup. We consider 5G numerology = 1, 13 and a beamforming resolution of one resource block (RB) (see, e.g. [45]), which is the smallest resource that can be allocated to a single MS. Accordingly, we update the beamforming weights every 0.36 MHz in frequency and every time slot (RB duration), which is 0.5 ms in this case. This update requires B bits for CSI and B for JPM, hence the overall bit rate at L2 is R L2 = 2Bbit/(0.5ms) = 4B Kbit/sec for every S-RRH. Next, considering the setup in fig. 2(b) at high SNR (35 dB), it takes B = 1000 bit to reach the maximum spectral efficiency of 11.2 bit/sec/Hz. It follows that R L2 = 4 Mbit/sec, and the overall rate (for a single RB) at L1 is R L1 = 11.2 bit/sec/Hz × 0.36 MHz × 8 = 32 Mbit/sec; hence R L2 is 12.5% of R L1 . Repeating this calculation for 15 dB SNR yields 17.3%. These numbers can be reduced by increasing beamforming frequency granularity, which may be updated every two RBs rather than one, and in some cases with even higher granularity [46].  Fig. 2(c) evaluates the P&Q in the same setup as Fig. 2(b) with P max = 35 dB. The figure shows the bound in theorem 7, and the P&Q-rate evaluated via MC where we maximized it also over all feasible values ofQ. The result indicates significant performance gain; that is,R is much greater than R for at least 250 bits. Moreover, for the P&Q, the bound is tighter and approaches the MC simulation way faster than the corresponding bound in the standard scheme. Fig. 3(a) studies the effect of CMI quantization on the overall rate. It depicts the same setup as in fig. 2(a) for fixed B and varying levels of CMI quantization bits. The result shows CMI error is insignificant for B CMI > 12, which is very low compared B = 160 and B = 300.
To further investigate the P&Q, we study a practically oriented setup. The format includes a cluster of M = 4 S-RRHs creating a 100 m edge-length rhombus with an edge angle of 120 • . Each S-RRH has N t = 8 isotropic transmit antennas. Eight single-antenna MSs (Q = 8) were placed uniformly at random in the common area spanned by four hexagons, each one centered at a different S-RRH. We set a minimum distance of 10 m between each MS and S-RRH. The results were averaged over 20 realizations of MS-placements, where each realization determined a set of attenuation factors α = {α q,m : q = 1 · · · 8, m = 1 · · · 4} according α q,m = −128 − 37.6 log 10 (r q,m ) (in dB), 14 where r q,m is the distance from S-RRH-m to MS-q in Km. The noise level at the receivers was −121 dBm. For each realization of MS-placement, we calculated each MS rate by averaging over 40 channel realizations. Similar to fig. 2, we maximized the rate over N t , whereas, in the P&Q scheme, we maximized it overQ while keeping N t = 8. Finally, we set the overall power, transmitted to each MS, fixed; i.e., P q = P q , ∀q, q ∈ Q (cf. (4)). To maintain p q = 1, each S-RRH had to backoff its power until none was violating its individual power constraint P max . We note that while this power allocation strategy is not optimal, it yields good performance in high SNRs (see [44] for details). Fig. 3(b) presents the throughput as a function P max (cf. definition 1). The results show that the P&Q significantly outperformed the standard scheme. In the latter, the network is already interference-limited at 50 dBm, whereas in the former, at 110 dBm. Therefore, while the perfect-CSI throughput in the standard scheme is higher than the P&Q counterpart, the latter goes up much faster. Fig. 3(c) presents the average throughput as a function of B under a per-S-RRH power constraint of P max = 45 dBm. The result shows that the P&Q throughput rapidly increases with B; thus, outperforming the standard scheme for a wide range of B.

VII. CONCLUSION
This article makes two contributions. The first is a new upper bound on the rate degradation experienced by a cluster of S-RRHs, that employ joint ZF with incomplete CSI compared to perfect CSI. The second is a new CSI sharing scheme that reduces CSI and data overhead. We demonstrated, through analytical analysis and simulation, that the proposed scheme achieves a significant performance gain.
Possible extensions of this work would be to optimize the power allocation for each MS and optimize the dimension reduction level; i.e.,Q (cf. definition 4). Incorporating channel estimation errors and inter-cluster interference into the bound are significant directions also. Finally, it is necessary to explore channel models beyond independent Rayleigh fading.

APPENDIX A
To prove theorem 2, we begin by rewriting the decomposition in (8) as whereh q,m = h q,m / h q,m andĥ q,m =ĥ q,m / ĥ q,m , θ q,m is the angle betweenh q,m andĥ q,m , and s q,m ∈ C Nt×1 is a unit-norm random vector that is uniformly distributed over the null space ofĥ q,m [21]. Moreover, we definê wherep j,m = p j,m / p j,m andp j,m =p j,m / p j,m , φ j,m is the angle betweenp j,m andp j,m , and g j,m ∈ C Nt×1 is a unit-norm random vector that is uniformly distributed over the null space ofp j,m . We note that Interchangingp j,m andp j,m yields an equivalent decomposition of the quantized beamforming vector [48]; furthermore,p j,m is uniformly distributed.
Proposition 10: The terms E and F , given in (41), satisfy Now, denote w =ĥ q,m , g j,n ,ĥ q,n , h q,m , h q,n , p j,m , p j,n and using the same independence argument as in (44), the double sum in (54) can be written as Given w, all the arguments inside the internal expectation are constants, except g j,m . Furthermore, recalling that givenp j,m , g j,m is uniformly distributed on the unit sphere of the null space ofp j,m , it follows that E g j,m p j,m , w = 0 Nt . Thus, the double sum in (54) is equal to zero. Applying the Cauchy-Schwarz inequality to (53) and using the independence argument again as in (44), one obtains Next, from (57), where we used similar arguments as in (49) concerning the angles, and in addition, E{ h q,m 2 }/α q,m = N t [21], M m=1 α q,m = α q , and E p j,m 2 = 1 M . To further simplify (58), we treat each of the expressions in the r.h.s. separately. First where we used (50) and (N t −1) . [21] Next, considerĥ q,m = P gj,mĥq,m + P ⊥ gj,mĥq,m , where P gj,m , P ⊥ gj,m are the projection matrices into space spanned by g j,m and its orthogonal complement, respectively. It follows that where (a) follows because P ⊥ gj,m g j,m = 0 Nt and (b) follows from P gj,mĥq,m ≤ 1 (recall that ĥ q,m = 1). (c) follows because g j,m is independent of P gj,mĥq,m , and is uniformly distributed on the unit sphere of the (N t − 1)-dimensional null space ofp j,m . Thus, the expectation on the left-hand side of (c) is taken according to the β(1, N t − 2) distribution [21].
Substituting (60) and (61) into (58) establishes (52) for E. The proof for F is identical and is omitted here due to space limitations.
Proposition 11: The term G in (41) Proof: Similar to the derivation of (57), it can be shown that Next, E{|s † q,m g j,m | 2 } can be bounded using similar arguments as in (61), and by further employing [21], one obtains the desired result.
Proof: The proposition will be proven only for Ξ 1 , where for the rest Ξ i , i > 1 the proof is identical. Similar to (56), and with w =ĥ q,m ,p j,m ,ĥ q,n , h q,m , h q,n , p j,m , p j,n it can be shown that E g † j,n p j,n , w = 0 Nt . Thus, Ξ 1 = 0, which establishes the desired result.
To complete the proof, we apply theorems 9 to 12 on (41), and in turn, substitute the result in (39), which establishes the desired result.