Full-Duplex Cell-Free mMIMO Systems: Analysis and Decentralized Optimization

Cell-free (CF) massive multiple-input-multiple-output (mMIMO) deployments are usually investigated with half-duplex nodes and high-capacity fronthaul links. To leverage the possible gains in throughput and energy efficiency (EE) of full-duplex (FD) communications, we consider a FD CF mMIMO system with practical limited-capacity fronthaul links. We derive closed-form spectral efficiency (SE) lower bounds for this system with maximum-ratio combining/maximum-ratio transmission processing and optimal uniform quantization. We then optimize the weighted sum EE (WSEE) via downlink and uplink power control by using a two-layered approach: the first layer formulates the optimization as a generalized convex program, while the {second layer} solves the optimization decentrally using the alternating direction method of multipliers. We analytically show that the proposed two-layered formulation yields a Karush-Kuhn-Tucker point of the original WSEE optimization. We numerically show the influence of weights on the individual EE of the users, which demonstrates the utility of the WSEE metric to incorporate heterogeneous EE requirements of users. We show that low fronthaul capacity reduces the number of users each AP can support, and the cell-free system, consequently, becomes user-centric.


I. INTRODUCTION
Massive multiple-input-multiple-output (mMIMO) wireless systems employ a large number of antennas at the base stations (BSs), and achieve higher spectral efficiency (SE) and energy efficiency (EE) with relatively simple signal processing [1], [2]. Two distinct mMIMO variants are being investigated in the literature: i) co-located, wherein all antennas are located at one place [1]; and ii) distributed, wherein antennas are spread over a large area [2, and the references therein], [3]- [5]. While co-located mMIMO systems have a low fronthaul requirement, distributed mMIMO systems, at the cost of higher fronthaul infrastructure, have greater spatial diversity to exploit and consequently have greater immunity to shadow fading [2]- [4]. Cell-free (CF) mMIMO is one of the most promising distributed mMIMO variants in the current literature [2]- [5]. CF mMIMO envisions a communication region with no cell boundaries, and promises substantial gains in SE and fairness over small-cell deployments [3]- [5].
Full-duplex (FD) wireless systems have now been practically realized with advanced self-interference (SI) cancellation mechanisms [6]- [9]. Co-located FD massive MIMO systems have also been extensively investigated [10], [11, and the ref-erencestherein]. FD CF mMIMO is a relatively recent area of interest [12]- [14], where access points (APs) simultaneously serve downlink and uplink user equipments (UEs) on the same spectral resource. Vu et al. in [12] considered a FD CF mMIMO system with maximum-ratio combining and showed that if SI at the APs is suppressed up to a certain limit, it has higher throughput than its half-duplex (HD) counterpart and FD co-located systems. Wang et al. in [13] evaluated the SE of a network-assisted FD CF mMIMO system using zero-forcing and regularized zero-forcing beamforming. Reference [14] proposed a heap-based algorithm for pilot assignment to overcome pilot contamination in FD CF mMIMO systems.
In CF mMIMO, APs are connected to a central processing unit (CPU) using fronthaul links. The existing FD CF mMIMO literature assumes high-capacity fronthaul links [12]- [14]. These links, however, have limited capacity, and the information needs to be consequently quantized and sent over them. The limited-capacity fronthaul has been considered only for HD CF mMIMO systems in [15]- [17]. Femenias et al. in [16] studied a max-min uplink/downlink power allocation problem for HD CF mMIMO with limited-capacity fronthaul, while Masoumi et al. in [17] optimized the SE of a HD CF mMIMO uplink with limited-capacity fronthaul and hardware impairments. Bashar et al. in [15] derived the SE of HD CF mMIMO uplink with limited-capacity fronthaul. We consider quantized fronthaul for a FD CF mMIMO system to derive achievable SE expressions. To the best of our knowledge, the current work is first one to do so.
With tremendous increase in network traffic, the EE has become an important metric to design a modern wireless system. Global energy efficiency (GEE), defined as the ratio of the network SE and its total energy consumption, is being used to design CF mMIMO communication systems [18]- [21]. Ngo et al. in [18] optimized the GEE for the downlink of a HD CF mMIMO system. Bashar et al. in [19] optimized the uplink GEE of a HD CF mMIMO system with optimal uniform fronthaul quantization. Alonzo et al. in [20] optimized the GEE of CF and UE-centric HD mMIMO deployments arXiv:2010.14110v4 [eess.SP] 10 Dec 2021 in the mmWave regime. Nguyen et al. in [21] maximized a novel SE-GEE metric for the FD CF mMIMO system using a Dinkelbach-like algorithm.
A UE with limited energy availability will accord a much higher importance to its EE than an another UE with a sufficient energy supply. GEE is a network-centric metric and cannot accommodate such heterogeneous EE requirements [22]. The weighted sum energy efficiency (WSEE) metric, defined as the weighted sum of individual EEs [22], can prioritize EEs of individual UEs, by allocating them a higher weight [23], [24]. The WSEE is investigated in [23] for a general wireless network, and for a two-way FD relay in [24]. It is yet to be investigated for CF mMIMO HD and FD systems.
Decentralized designs, which accomplish a complex task by coordination and cooperation of a set of computing units, are being used to design mMIMO systems [25], [26]. This interest is driven by high computational complexity and high interconnection data rate requirements between radio frequency chains and baseband units in centralized mMIMO system designs [25]. Jeon et al. in [25] constructed decentralized equalizers by partitioning the BS antenna array. Reference [26] proposed a coordinate-descent-based decentralized algorithm for mMIMO uplink detection and downlink precoding. Reference [27] employed alternating direction method of multipliers (ADMM) to decentrally allocate edge-computing resource for vehicular networks. Such decentralized approaches have not yet been employed to optimize FD CF mMIMO systems. We next list our main contributions in this context: 1) Contributions regarding closed form SE lower bound: We consider FD CF mMIMO communications with maximal ratio combining/maximal ratio transmission (MRC)/(MRT) processing and limited fronthaul with optimal uniform quantization. We note that for the FD CF mMIMO systems, unlike their HD counterparts [2]- [5], uplink and downlink transmissions interfere to cause uplink downlink interference (UDI) and inter-/intra-AP residual interference (RI). Further, unlike existing FD CF mMIMO literature [12]- [14], [21], which consider perfect high-capacity fronthaul links, it is critical to model and analyze the UDI and inter-/intra-AP interferences and limited-capacity impairments while deriving lower bounds for both uplink and downlink UEs SE, which are valid for arbitrary number of antennas at each AP. We model the UDI on the downlink and the RI on the uplink, but unlike existing FD CF mMIMO literature [12]- [14], [21], we also consider the quantization distortion due to limited-capacity fronthaul links, as modelled in the total quantization distortion (TQD) terms. We also show the impact of quantization on the uplink RI terms themselves, where the distortion in the downlink and uplink signals get coupled. We derive achievable SE expressions for both uplink and downlink UEs, which are valid for arbitrary number of antennas at each AP. 2) Contributions regarding centralized WSEE optimization: We use the derived SE expression to maximize the non-convex WSEE metric. While energyefficient design of CF mMIMO systems have been studied in literature [18]- [20], most of them focus on the GEE metric, except reference [21]. The GEE, being a single ratio, can be expressed as a pseudo-concave (PC) function and can thus be maximized using Dinkelbach's algorithm [22]. Reference [21] is the only work so far which optimized the EE of FD CF mMIMO. It considered a novel SE-GEE objective, which also reduces to a PC function and is maximized using a Dinkelbach-like algorithm. The WSEE, in contrast, is a sum of PC functions, and is not guaranteed to be a PC function [22]. This makes the WSEE an extremely non-trivial objective to maximize [22]. Further, the algorithm in [21] requires knowledge of instantaneous small-scale channel fading coefficients. The WSEE metric optimized here, in contrast, requires large-scale channel coefficients, which remains constant for multiple coherence intervals [28].

3) Contributions regarding decentralized optimization:
We decentrally maximize WSEE using a two-layered iterative approach which combines successive convex approximation (SCA) and ADMM. The first layer simplifies the nonconvex WSEE maximization problem by using epigraph transformation, slack variables and series approximations. It then locally approximates the problem as a generalized convex program (GCP) which is solved iteratively using the SCA approach. The second layer decentrally optimizes the GCP by using the consensus ADMM approach, which decomposes the centralized version into multiple sub-problems, each of which is solved independently. The local solutions are combined to obtain the global solution. We note that the GCP for the FD system is not in the standard form which is required for applying ADMM, as it involves FD interference terms that couple power control coefficients from different UEs as well as from the uplink and downlink. We therefore create global and local versions of the power control coefficients separately for the downlink and uplink UEs, which decouple the FD interference terms. We consider separate sub-problems for the downlink and uplink UEs with a separate set of constraints for each. These constraints, rewritten using the local variables, define feasible sets for the sub-problems of the downlink and uplink UEs, respectively. We introduce separate Lagrangian parameters for the downlink and uplink UEs, and separate penalty parameters for the downlink and uplink power control variables. This enables us to properly define the augmented Lagrangian and decouple the respective sub-problems at the D-servers which calculate the local solutions, and then eventually coordinate them into the globally optimal solution at the C-server. The FD system required that we introduce these modifications to the standard ADMM approach and to the best of our knowledge, has not been attempted so far in mMIMO literature.

4) Contributions regarding the AP selection algorithm:
We show that there is a fundamental limit to the number of UEs a FD AP can serve with a limited fronthaul capacity. We propose a proportionately-fair rule capping the maximum number of uplink and downlink UEs served by each AP. We use this rule to propose a fair AP selection algorithm which efficiently chooses the best subset of APs to serve each uplink and downlink UE. The proposed approach ensures user-centric architecture for our system. The proposed algorithm, which has a trivial complexity, is shown to perform close to the optimal one proposed in [29]. 5) Contributions regarding the convergence of the distributed optimization algorithm: We not only analytically prove its convergence but also numerically show that it i) achieves the same WSEE as the centralized approach; and ii) is responsive to changing weights which can be set to prioritize UEs' EE requirements.
II. SYSTEM MODEL We consider, as shown in Fig. 1, a FD CF mMIMO system where M FD APs serve K = (K u + K d ) single-antenna HD UEs on the same spectral resource, with K u and K d being the number of uplink and downlink UEs, respectively. Each AP has N t transmit and N r receive antennas, and is connected to the CPU using a limited-capacity fronthaul link which carries quantized uplink/downlink information to/from the CPU. We see from Fig. 1 that due to FD model • uplink receive signal of each AP is interfered by its own downlink transmit signal and that of other APs. These intra-and inter-AP interferences are shown using purple and brown dashed lines, respectively. • downlink UEs receive transmit signals from uplink UEs, causing uplink downlink interference (UDI) (shown as black dotted lines between uplink and downlink UEs). Additionally, the UEs experience multi-UE interference (MUI) as the APs serve them on the same spectral resource. We next explain various channels, their estimation and data transmission. We assume a coherence interval of duration T c (in s) with τ c samples, which is divided into: a) channel estimation phase of τ t samples, and b) downlink and uplink data transmission of (τ c -τ t ) samples.

A. Channel description:
The channel of the kth downlink UE to the transmit antennas of the mth AP is g d mk ∈ C Nt×1 , while the channel from the lth uplink UE to the receive antennas of the mth AP is g u ml ∈ C Nr×1 . 1 We model these channels as g d mk = (β d mk ) 1/2gd mk and g u ml = (β u ml ) 1/2gu ml . Here β d mk and β u ml ∈ R are corresponding large scale fading coefficients, which are same for all antennas at the mth AP [3], [12]. The vectorsg d mk andg u ml denote small scale fading with independent and identically distributed (i.i.d.) CN (0, 1) entries. The UDI channel between the kth downlink UE and lth uplink UE is modeled as h kl = (β kl ) 1/2h kl [12], [13], whereβ kl is the large scale fading coefficient andh kl ∼ CN (0, 1) is the small scale fading. The inter-and intra-AP channels from the transmit antennas of the ith AP to the receive antennas of the mth AP are denoted as H mi ∈ C Nr×Nt for i = 1 to M .

B. Uplink channel estimation:
Recall that the channel estimation phase consists of τ t samples. We divide them as τ t = τ d t + τ u t , where τ d t and τ u t are samples used as pilots for the downlink and uplink UEs, respectively. All the downlink (resp. uplink) UEs simultaneously transmit τ d t (resp. τ u t )-length uplink pilots to the APs, which they use to estimate the respective channels. In this phase, both transmit and receive antenna arrays of each AP, similar to [12], operate in receive mode. The kth downlink UE (resp. lth uplink UE) transmits pilot signals . We assume, similar to [12], [18], that the pilots i) have unit norm i.e., ϕ u , [18]. The pilots received by transmit and receive antennas of the mth AP are given respectively as Here ρ t is the normalized pilot transmit signal-to-noise-ratio (SNR). The matrices W tx m ∈ C Nt×τ d t and W rx m ∈ C Nr×τ u t denote additive noise with CN (0, 1) entries. Each AP independently estimates its channels with the uplink and downlink UEs to avoid channel state information (CSI) exchange overhead [12], [21]. To estimate the channels g d mk and g u ml , the mth AP projects the received signal onto the pilot signals ϕ d k and ϕ u l respectively, aŝ These projections are used to compute the corresponding linear minimum-mean-squared-error (MMSE) channel estimates [12] aŝ , [18].
After channel estimation, data transmission starts simultaneously on downlink and uplink.

C. Transmission model:
An objective of this work is to derive a SE lower bound for FD CF mMIMO systems, where the M APs serve K u uplink UEs and K d downlink UEs simultaneously on the same spectral resource. We note that for the FD CF mMIMO systems, unlike the HD CF mMIMO systems [3], [15], [16], uplink and downlink transmissions interfere to cause UDI and inter-/intra-AP interferences. Further, unlike existing FD CF mMIMO literature [12], [13], [21], we consider a limitedcapacity fronthaul. It is critical to model and analyze the UDI and inter-/intra-AP interferences and limited-capacity impairments while deriving the lower bound.

1) Downlink data transmission
The CPU chooses a message symbol s d k for the kth downlink UE, which is distributed as CN (0, 1). It intends to send this symbol to the mth AP via the limited-capacity fronthaul link. Before doing that, it multiplies s d k with a power-control coefficient η mk , and then quantizes the resulting signal. The mth AP, due to its limited fronthaul capacity, is allowed to serve only a subset κ dm ⊂ {1, . . . , K d } of downlink users, an aspect which is discussed later in Section II-D. The CPU consequently sends downlink symbols for UEs in the set κ dm to the mth AP, which uses MMSE channel estimates to perform MRT precoding. The transmit signal of the mth AP is therefore given as follows Here ρ d is the normalized maximum transmit SNR at each AP. The function Q(·) denotes the quantization operation, which is modeled as a multiplicative attenuationã, and an additive distortion ς d mk , for the kth downlink UE in the fronthaul link between the CPU and the mth AP [15], [19]. We have, from Appendix A, E{ ς d where the scalar constantsã andb depend on the number of fronthaul quantization bits.
The mth AP must satisfy the average transmit SNR constraint, i.e., E{ x d m 2 } ≤ ρ d . Using the expression of x d m from (1), and the above expression of quantization error variance, E{ ς d mk 2 }, the constraint can be simplified as follows The kth downlink UE receives its desired message signal from a subset of all APs, denoted as M d k ⊂ {1, . . . , M }, along with various interference and distortion components, as in (5) (shown at the top of the next page). The mth AP serves the kth downlink UE iff k ∈ κ dm ⇔ m ∈ M d k . Here x u l is the transmit signal of the lth uplink UE, which is modelled next.
2) Uplink data transmission The K u uplink UEs also simultaneously transmit to all M APs on the same spectral resource as that of the K d downlink UEs. The lth uplink UE transmits its signal x u l = √ ρ u θ l s u l with s u l being its message symbol with pdf CN (0, 1), ρ u being the maximum uplink transmit SNR and θ l being the power control coefficient. To satisfy the average SNR constraint, E{|x u l | 2 } ≤ ρ u , the lth uplink UE satisfies the constraint 0 ≤ θ l ≤ 1.
(3) The FD APs not only receive the uplink UE signals but also their own downlink transmit signals and that of the other APs, referred to as intra-AP and inter-AP interference, respectively. Using (1), the received uplink signal at the mth AP is Here w u m ∈ C Nr×1 is the additive receiver noise at the mth AP with i.i.d. entries ∼ CN (0, 1).
The intra and inter-AP interference channels vary extremely slowly and thus can be estimated with very low pilot overhead [13]. The receive antenna array of each AP, with estimated channel, can only partially mitigate the intra-and inter-AP interference [12], [13]. The residual intra-/inter-AP interference (RI) channel H mi ∈ C Nr×Nt is modeled as Rayleigh-faded with i.i.d. entries and pdf CN (0, γ RI,mi ) [6], [12], [13], [24]. Here γ RI,mi β RI,mi γ RI , with β RI,mi being the large scale fading coefficient from the ith AP to the mth AP, and γ RI being the RI power after its suppression.
The mth AP receives the signals from all the uplink UEs, and performs MRC for the lth uplink UE with (ĝ u ml ) H . Due to its limited fronthaul: i) AP quantizes the combined signal before sending it to CPU; ii) as discussed in detail later in Section II-D, the CPU receives contributions for the lth uplink UE only from the subset of APs serving it, denoted as M u l ⊂ {1, . . . , M }. Using (4), the signal received by the CPU for the lth uplink UE is expressed as in (6) (shown at the top of the next page).
We denote the subset of uplink UEs served by the mth AP as κ um ⊂ {1, . . . , K u }. The mth AP serves the lth uplink UE iff l ∈ κ um ⇔ m ∈ M u l . The quantization operation Q(·) is mathematically modeled using constant attenuationã, and additive distortion ς u ml which, as shown in Appendix A, has power E{(ς u ml ) D. User-centric behavior through limited fronthaul: Initial CF mMIMO literature considered system models where all APs can serve all UEs [3]- [5]. However, for geographically large areas, each UE can only have practically feasible channels with a subset of APs in its vicinity. Therefore, recent CF mMIMO literature has increasingly focused on user-centric CF mMIMO system design [2, and the references therein]. In the subsequent discussion, we show that a usercentric CF deployment, as desired by us, is a natural outcome of the design choice to impose fronthaul capacity constraints on the CF mMIMO systen model, as shown in Fig. 1.
The fronthaul between the mth AP and the CPU uses ν m bits to quantize the real and imaginary parts of transmit signal of the mth downlink UE and the uplink receive signal after MRC i.e., √ η mk s d k , and (ĝ u ml ) H y u m , respectively. Due to the limited-capacity fronthaul, the mth AP serves only K um ( |κ um |) and K dm ( |κ dm |) UEs on the uplink and downlink, respectively [15], [19]. For each UE, we recall that there are (τ c − τ t ) data samples in each coherence interval of duration T c . The fronthaul data rate between the mth AP and the CPU is The fronthaul link between the mth AP and the CPU has capacity C fh,m which implies that We propose the following lemma where we consider a proportionally fair approach to calculate K dm and K um in proportion to the total number of downlink and uplink UEs, respectively. We use ε {d, u} to denote downlink and uplink, respectively, and define the total number of UEs, Lemma 1. The maximum number of uplink and downlink UEs served by the mth AP when connected via a limited optical fronthaul to the CPU with capacity C fh,m are given as Proof. LetK um andK dm denote the maximum number of uplink and downlink UEs served by the mth AP. We consider K um ∝ K u andK dm ∝ K d for proportional fairness on the uplink and downlink. Using (8), we get, The lemma follows from definition of floor function · .
Using the maximum limits obtained in (9), we assign K um = min{K u ,K um } and K dm = min{K d ,K dm }. We see that the constraint imposed in (8) is similar to a UE-centric (UC) CF mMIMO system, wherein each UE is served by a subset of the APs [2]. We now define the procedure for AP selection to obtain the best subset of APs to serve each uplink and downlink UE, while satisfying (8). For this, we extend the procedure in [15] for a FD system as follows: • The mth AP sorts the uplink and downlink UEs connected to it in descending order based on their channel gains (β u ml and β d mk , respectively) and chooses K um uplink UEs and K dm downlink UEs, with the largest channel gains, to populate the sets κ um and κ dm , respectively.
• For the lth uplink UE and the kth downlink UE, we populate the sets M u l and M d k , respectively, using the axioms l ∈ κ um ⇔ m ∈ M u l and k ∈ κ dm ⇔ m ∈ M d k . • If an uplink or downlink UE is found with no serving AP, we use the procedure in Algorithm 1 to assign it the AP with the best channel conditions, while satisfying (8).
Algorithm 1: Fair AP selection for disconnected uplink and downlink UEs if M d k = φ then Sort the APs in descending order of channel gains, β d mk , and find the AP n with the largest channel gain. For this nth AP, sort downlink UEs in κ dn in descending order of channel gains and find the qth downlink UE with minimum channel gain and at least one more connected AP. Remove the qth downlink UE from the set κ dn and add the kth downlink UE to it. 3 Repeat the same procedure for all the uplink UEs l = 1 to Ku.
Clearly, Lemma 1 ensures that each AP can only serve a limited number of UEs which do not violate the fronthaul capacity constraints. This makes the system effectively a user-centric system. Algorithm 1 ensures that, under limited fronthaul constraints, the strongest AP-UE connections are retained and the UE-centric cell-free system delivers good performance.

E. Self-interference mitigation methods
To ensure that our proposed FD CF mMIMO system has substantial performance improvement over an equivalent HD CF mMIMO system, we need effective techniques to cancel the self-interference (SI) caused due to inter-AP transmissions. We show in Eq. (5)-(6) that this SI cancellation results in a residual interference (RI) due to the multiplication of a suppression factor, γ RI . We now discuss SI cancellation techniques from the existing literature, which makes the SI suppression easier, by not requiring its instantaneous channel knowledge.
• Passive cancellation: Reference [30], [31] suggests that a careful utilization of the passive self-interference suppres-sion mechanisms (directional isolation, absorptive shielding, and cross polarization) can significantly suppress the SI. Reference [31] also showed that by additionally assuming statistical SI channel knowledge and by using antennas arrays of sources/destinations, the passive cancellation techniques can further suppress the SI. • Large antenna array: Reference [32] argued that with large N, channel vectors of the desired signal and the SI become nearly orthogonal. The beamforming techniques e.g, MRC/MRT inherently project the desired signal to the orthogonal complement space of the SI, which significantly reduces the SI. • Lower transmit power: Reference [32] also demonstrated that an alternative way to reduce interference could be to reduce transmit power, since the SI depends strongly on the transmit power. A cell free massive MIMO system, due to large number of transmit antennas, uses radically less transmit power/antenna than conventional MIMO systems, which significantly reduces the SI. • We therefore, similar to existing massive MIMO FD literature [31]- [33], assume that the SI can be significantly mitigated by utilizing the above mentioned SI cancellation techniques, and without requiring the knowledge of SI channel. However, if required, the residual SI can be further reduced by employing active (time-domain and spatial suppression) techniques developed in [34], which require SI channel knowledge. • Active cancellation: The authors in [34] present an algorithm for SI channel estimation at the relay, which is equipped with large number of antennas. It also noted that the APs, which are infrastructure devices, are in a stationary environment. The SI channel changes much more slowly than the channel from users to the APs. It is therefore reasonable to assume that i) the SI channel remains constant for multiple consecutive blocks; and ii) inter-AP pilot overhead is affordable because of the sufficiently longer coherence time of the residual SI channels. Similar to [34], one can estimate the SI channel by utilizing its slowly-varying nature using a cost-efficient expectationmaximization algorithm with reduced complexity.

III. ACHIEVABLE SPECTRAL EFFICIENCY
We now derive the ergodic SE for the kth downlink UE and the lth uplink UE, denoted respectively asS d k andS u l . The AP employs MRC/MRT in the uplink/downlink and optimal uniform fronthaul quantization. We use ε {d, u} to denote downlink and uplink, respectively; φ {k, l} to denote kth downlink UE and lth uplink UE, respectively; and υ ε mφ {η mk for φ = k, θ l for φ = l}. The ergodic SE expressions are calculated using (5) and (6), as are signal, noise and interference powers respectively, for the kth downlink and lth uplink UEs. The expectation outside logarithm in the SE expressions in (10) is mathematically intractable, and it is difficult to simplify them further [3], [12], [15]. We, similar to [3], employ use-and-then-forget (UatF) technique to derive SE lower bounds. To use UatF, we rewrite the received signal at the kth downlink UE in (5), and at the CPU for the lth uplink UE in (6) as where the effective additive noise terms n ε φ are expressed in (12)-(13) (shown at the top of next page). The term DS ε φ in (11) denotes the desired signal received over the channel mean, and the term BU ε φ in (12)-(13) denotes beamforming uncertainty i.e., the signal received over deviation of channel from mean. It is easy to see that n ε φ are uncorrelated with their respective DS ε φ terms. We, similar to [12], treat them as worst-case additive Gaussian noise, an approximation which is tight for mMIMO systems [12]. Using (11)-(12), we next derive an achievable SE lower bound.
Theorem 1. An achievable lower bound to the SE for the kth downlink UE with MRT and the lth uplink UE with MRC can be expressed respectively as where are the variables on which the SE is dependent. We recall from Section II thatã andb in (14)-(15) depend on the number of quantization bits, ν.
Proof. Refer to Appendix B. The SE expressions are functions of large scale fading coefficients, γ d mk and γ u ml , which we will use to optimize WSEE. This is unlike [21] which requires instantaneous channel while optimizing SE-GEE metric.
Remark 1. MRC/MRT has tractable SE expression that depend solely on large-scale channel statistics, which remain constant over hundreds of coherence intervals [28]. This is in contrast to zero-forcing designs which yield better SE but not tractable SE expressions [2]. Further, MRC/MRT can be implemented in a distributed fashion with low complexity.

IV. TWO-LAYER DECENTRALIZED WSEE OPTIMIZATION FOR FD CF MMIMO
We now devise a decentralized algorithm which maximizes WSEE by calculating the optimal downlink and uplink power control coefficients η * and Θ * , respectively. We use "twolayered" approach to decompose WSEE maximization into a sequential process with two distinct individual steps, each of which is called a "layer". The first layer simplifies the non-convex WSEE maximization into a successive convex approximation (SCA) setting. Its output is a generalized convex program (GCP) which needs to be solved iteratively for the optimal solution. The second layer optimally solves above GCP, either centrally through standard interior-point approaches or decentrally using ADMM method. The proposed procedure is outlined in Algorithm 2.  [23], where B is the system bandwidth, and p ε φ denotes the power consumed by each UE. The fronthaul links consume power for both downlink and uplink transmission. The APs consume power while transmitting data to the downlink UEs, and the uplink UEs consume power while transmitting their data. The power consumed by the system to transmit data to the kth downlink UE and the power consumed by the lth uplink UE are given respectively as [19], [21] p Here α m , α l are power amplifier efficiencies at the mth AP and the lth uplink UE respectively [12], N 0 is the noise power and P d tc,k , P u tc,l are the powers required to run the transceiver chains at each antenna of the kth downlink UE and the lth uplink UE, respectively. The power consumed by the AP transceiver chains and the fronthaul between APs and CPU: Here P tc,m is the power required to run the transceiver chains at each antenna of the mth AP. The fronthaul power consumption for the mth AP has a fixed component, P 0,m , and a traffic-dependent component, which attains a maximum value of P ft at full capacity C fh,m . The term R fh,m , given in (7), is the fronthaul data rate of the mth AP. The WSEE is now defined as the weighted sum of EEs of individual UEs [22].
where w ε φ are weights assigned to the UEs to account for their heterogeneous EE requirements. The WSEE metric can prioritize the EE requirements of individual UEs by assigning them different weights [23], [24]. For example, it could assign a higher weight to a UE that is more energy-scarce. After omitting the constant B from the objective, the WSEE maximization problem can now be formulated as follows The quality-of-service (QoS) constraints in (19a) guarantee a minimum SE, denoted by the constants S d ok and S u ol , for each downlink and uplink UE respectively. The first constraint in (19b) ensures that the fronthaul transmission rate for all APs is within the capacity limit. We observe that the number of quantization bits ν, if included in problem P1, will make it a difficult-to-solve integer optimization problem [15], [19], [35]. We therefore solve it to optimize the power control coefficients {η, Θ}, by fixing ν such that it satisfies the first constraint in (19b) [15], [19], and numerically investigate ν in Section V. We reformulate P1 as follows (2), (3).
The objective in P2 is a sum of ratios, each of which is a PC function (concave-over-linear) of power control coefficients {η, Θ}. It is, therefore, not guaranteed to be a PC function and Dinkelbach's algorithm cannot be applied to maximize it [22]. This makes it a much harder objective to optimize as opposed to the more commonly studied GEE metric, which is a PC function [22] and has been investigated for CF mMIMO systems [18]- [21]. We now maximize WSEE centrally and decentrally using a two-layered approach. The first layer comprises an SCA framework, which formulates a GCP by approximating the non-convex objective and constraints in P2 as convex. In the second layer, the approximate GCP formed in the nth SCA iteration is either solved centrally or decentrally using ADMM.
Since the approximate GCP obtained in the first layer, due to coupled optimization variables, is not in the standard ADMM form, we introduce their local and global versions. The subproblems to update local variables are solved independently, and the local variables are coordinated to calculate the global solution [27], [36]. The updation of variables and coordination continues till ADMM converges. The obtained solution is then used to formulate GCP for the (n + 1)th SCA iteration.
We next provide a centralized SCA to solve P6 in the second layer in Algorithm 3. Solve P6 for the nth SCA iteration to obtain optimal variables,

Algorithm 3: Centralized WSEE maximization algorithm
Assign the SCA iterates for the (n + 1)th iteration, The SCA procedure converges when r SCA ≤ SCA , where SCA is the convergence threshold. Remark 2. Convergence of centralized algorithm: At the nth SCA iteration, P6 is obtained from P5 by applying firstorder Taylor approximations to the constraints (22a) and (23a)-(23b). These approximations are of the form Λ(x) . It is easy to show that P6 is the inner-approximation problem for P5, where we replace each of the constraints (22a) and (23a)-(23b), denoted here as g i (x) ≤ 0, i = 1, 2, 3, with a convex approximation of the formḡ i (x, x (n) ) ≤ 0, i = 1, 2, 3. For each of the approximations, it can be easily shown that the following properties hold [37]: i) g i (x) ≤ḡ i (x, x (n) ) for all feasible x; ii) g i (x (n) ) =ḡ i (x n , x (n) ); and ∂gi(x (n) ) ∂xj = ∂ḡi(x n ,x (n) ) ∂xj , j = 1, 2. The constraints in P6 also satisfy Slater's conditions [35].
This implies that Algorithm 3, by solving the innerapproximation problem, always converges to a KKT point of P2 due to [37]. It must be noted here that even though Algorithm 3 solves the approximate problem P6 in each SCA iteration, it is provably optimal after sufficient number of iterations. This is due to the fact that it provably converges to a KKT point of P2 which is an optimal solution [35].

B. Decentralized ADMM approach
We now use ADMM to decentrally solve P6 in the second layer, an approach well-suited for CPUs with multiple distributed D-servers, connected via a central C-server [25], [26]. The ADMM method decomposes a central problem into multiple sub-problems, each of which is solved by a D-server locally and independently. The C-server combines the local solutions to obtain a global solution. We observe that the constraints in (24a)-(24b) couple the power control coefficients of different uplink and downlink UEs. We next introduce global variables for the power control coefficients at the Cserver, with local copies at the D-servers to decouple P6 into sub-problems for each UE. We observe that the constraints in P6 for the downlink and uplink UEs can be divided between downlink and uplink D-servers, respectively. The D-servers solve sub-problems defined for each downlink and uplink UE. We first define local feasible sets at the nth SCA iteration for them, which are denoted as S d,(n) k and S u,(n) l , respectively. These sets are given as follows Here C d k , C u l ∈ C M ×K d and Θ d k , Θ u l ∈ C Ku×1 are local copies at the D-server of the corresponding global variables at the C-server, which are denoted as C ∈ C M ×K d and Θ ∈ C Ku×1 respectively, and represent the downlink and uplink power control coefficients, C and Θ, in P6. We note that each D-server has its local power control variables and hence the constraints in (25), which are all convex, are independent for each D-server. This ensures that the sets S d,(n) k and S u,(n) l are convex. We define the sets of local variables for the D-servers corresponding to the downlink and uplink UEs as and Ω u l [ C u l , Θ u l , f u l , Ψ u l , λ u l , ζ u l ] respectively. We now reformulate P6 as follows To ensure that the global variables at the C-server have identical local copies maintained at the D-servers, we introduce the consensus constraints (26b)-(26c). The ADMM algorithm can now be readily applied to P7 as it is in the global consensus form [36].
We use ε {d, u} to denote the downlink and uplink respectively, and φ {k, l} to denote kth the downlink UE and lth uplink UE, respectively. The sub-problems of individual D-servers can now be written as follows P7b : max We now define auxiliary functions for the objective in P7b as follows We write, using (28), the augmented Lagrangian function for P7 as where ρ C , ρ θ > 0 are the penalty parameters corresponding to the global variables C and Θ respectively, and χ ε φ ∈ C M ×K d , ξ ε φ ∈ C Ku×1 are the Lagrangian variables associated with the equality constraints (26b) and (26c), respectively. The quadratic penalty terms are added to the objective to penalise equality constraints violations, and to enable the ADMM to converge by relaxing constraints of finiteness and strict convexity [36].
We note that the augmented Lagrangian in (29) is not decomposable in general for the problem formulation in P7b [35]. The auxiliary functions defined in (28) enable us to decompose it and formulate sub-problems for the D-servers. In ADMM method, the D-servers independently solve the subproblems and update the local variables, which are collected by the C-server to update the global variables [36]. In the (p + 1)th iteration, following steps are executed in succession. 1) Local computation: The D-servers for each UE solve P8 to update the local variables as 2) Lagrangian multipliers update: The D-servers now update the Lagrangian multipliers as 3) Global aggregation and computation: The C-server now collects the updated local variables and Lagrangian multipliers from the D-servers and updates the global variables { C, Θ}.
. Using (29) and maximizing w.r.t. each global variable, we obtain a closed form solution The updated global variables in (33)- (34) are broadcasted by the C-server to all the D-servers. 4) Residue calculation and penalty parameter updates: The Cserver calculates the squared magnitude of the primal and dual residuals, denoted as r ADMM and s ADMM respectively, as [36] r (p+1) ADMM The C-server now compares the primal and dual residual norms obtained in (35)- (36). To accelerate convergence, it updates the penalty parameters for the (p + 1)th ADMM iteration, ρ {θ} , appropriately as follows [38]: The parameters µ > 1, ϑ incr > 1, ϑ decr > 1 are tuned to obtain good convergence [38].
to converge if both SCA and ADMM converge. It must be noted here that Algorithm 4, despite solving an approximate problem P7 in each ADMM iteration, indeed converges to an optimal solution of the original problem P2. This is explained as follows. For a given SCA iteration, the convergence of ADMM is guaranteed and investigated in detail in [36]. Hence, every SCA iteration converges to an optimal solution of the approximate problem P6. As discussed in Remark 2, the SCA iterative procedure provably converges to a KKT point of P2 which is an optimal solution [35].
Remark 4. Implementability: The maximal ratio combiner/beamformer considered herein is the simplest receiver/transmitter for a distributed cell-free mMIMO system [2]. Further, the power optimization algorithms require only long-term fading channel coefficients, which remain constant for hundreds of coherence intervals [28]. This is in contrast to the existing work in SE-GEE maximization of FD cell-free massive MIMO systems in [21], which requires instantaneous channel. The current optimization problem whose reduced complexity is discussed below, therefore, needs to be solved over a relaxed time frame, which makes it easily implementable.

C. Computational complexity of centralized and decentralized algorithms
Before beginning this study, it is worth noting that both centralized Algorithm 3 and decentralized Algorithm 4 comprise of multiple steps that involve solving simple closed form expressions. These steps consume much lesser time than the ones which solve a GCP, typically using interior points methods [35]. We therefore compare the per-iteration complexity of centralized and decentralized algorithms by calculating the complexity of solving the respective GCPs.
• Algorithm 3 solves P6 in step-1 of each SCA iteration, which has 4(K u + K d ) + K u + M K d real variables and 6(K u + K d ) + M + M K d linear constraints. It has a worst-case computational complexity O((10( . • Algorithm 4, in step-2 of each ADMM iteration, solves P8 at the D-servers in parallel to update the local variables. We, therefore, need to analyse the computational complexity at any one of the D-servers. Since the downlink has an additional constraint (second one in (25d)), we consider a downlink D-server for worst-case complexity analysis, which in P8 has M K d +K u +4 real variables and M K d + M + K u + 6 linear constraints. It will have a worst-case computational complexity [39]: We consider K d = K u = K/2 uplink and downlink UEs for this analysis. We observe that for a large K, Algorithm 4 has a much lower computational complexity than Algorithm 3.

V. SIMULATION RESULTS
We now numerically investigate the SE and WSEE of a FD CF mMIMO system with limited-capacity fronthaul links. We assume a realistic system model wherein the M APs, K d downlink UEs and K u uplink UEs are all scattered randomly in a square of size D km × D km. To avoid the boundary effects [3], we wrap the APs and UEs around the edges [12]. We use ε {d, u} to denote downlink and uplink respectively, and φ {k, l} to denote kth downlink UE and lth uplink UE, respectively. The large-scale fading coefficients, β ε mφ , are modeled as [18] β ε mφ = 10 PL ε mφ 10 10 σ sd z ε mφ 10 . (39) Here 10 σ sd z ε mφ 10 is the log-normal shadowing factor with a standard deviation σ sd (in dB) and z ε mφ follows a two-components correlated model [3]. The path loss PL ε mφ (in dB) follows a three-slope model [3], [12].
We, similar to [12], model the large-scale fading coefficients for the inter-AP RI channels, i.e., β RI,mi , ∀i = m, as in (39), and assume that the large-scale fading for the intra-AP RI channels, which do not experience shadowing, are modeled as β RI,mm = 10 PL RI (dB) 10 . The inter-UE large scale fading coefficients,β kl , are also modeled similar to (39). We consider, for brevity, the same number of quantization bits ν, and the same fronthaul capacity C fh for all links. We, henceforth, denote the transmit powers on the downlink and uplink as p d (= ρ d N 0 ) and p u (= ρ u N 0 ), respectively, and the pilot transmit power as p t (= ρ t N 0 ). We fix the system model values and power consumption model parameters, unless mentioned otherwise, as given in Table I. These values are commonly used in the literature e.g., [3], [12], [15].
Validation of SE expressions: We consider an FD CF mMIMO system with i) M = {16, 32} APs, each having N t = N r = 8 transmit and receive antennas, K d = 12 downlink UEs and K u = 8 uplink UEs; and ii) unequal uplink and downlink transmit power i.e., p d = 2p u = p. We verify in Fig. 2 the tightness of the SE lower bound derived in (14)-(15), labeled as LB, by comparing it with the numerically-obtained ergodic SE in (10), labeled as upper-bound (UB) as it requires instantaneous CSI. The large-scale fading coefficients are set according to a practical FD CF channel model with parameters specified in Table I. We, similar to [3], [18], allocate equal power to all downlink UEs and full power to all uplink UEs, i.e., η mk = bN t k∈κ dm γ d mk −1 , ∀k ∈ κ dm and θ l = 1.
We see that the derived lower bound is tight for both values of M .  RI suppression factor γ RI and an equivalent HD system which serves uplink and downlink UEs in time-division duplex mode. For the HD system, we i) set γ RI = 0 and inter-UE channel gainsβ kl = 0; ii) use all AP antennas, i.e., N = (N t + N r ), during uplink and downlink transmission; and iii) multiply sum SE with a factor of 1/2. We see that the FD system has a significantly higher sum SE than an equivalent HD system, provided the RI suppression is good i.e., γ RI ≤ −10 dB. It is important to reemphasize here that the gains in sum SE achieved by the FD transmissions completely vanish with poor RI suppression i.e., γ RI > −10 dB. Moreover we note that, contrary to intuitive expectations, the sum SE does not double, even with significant RI suppression γ RI ≤ −40 dB. This is due to the UDI experienced by the downlink UEs in a FD CF mMIMO system as shown in Fig. 1, which cannot be mitigated by RI suppression at APs. Sum SE -variation with quantization bits: We plot in Fig. 3b the sum SE by varying the number of fronthaul quantization bits ν. We consider M = 32 APs, K d = 12 downlink UEs , K u = 8 uplink UEs, and p d = 2p u = 30 dBm power for downlink and uplink, N t = N r = {8, 16} transmit and receive antennas on each AP, and fronthaul capacities C fh = {10, 100} Mbps. We observe that for both antenna configurations, sum SE increases with increase in ν initially and then saturates. Increasing ν reduces the quantization distortion and attenuation, which improves the sum SE. This effect, however, saturates as after a limit most of the information is retrieved. We observe that reducing the fronthaul capacity from C fh = 100 Mbps to C fh = 10 Mbps reduces the sum SE slightly, as the procedure outlined in Section II-D fairly retains the AP-UE links with the highest channel gains and helps maintain the sum SE.
Sum SE -impact of channel estimation error: We know that the channel estimation error is a function of pilot transmit power p t . We now vary p t and evaluate its impact on the sum SE for a full-duplex cell-free massive MIMO system in Fig. 3c. For this study, we considered M = 32 APs, K d = 12 downlink UEs, K u = 8 uplink UEs and transmit power p d = 2p u = 30 dBm. We see that the sum SE increases for p t ≤ −10 dB but saturates beyond that. This is because the channel estimation error reduces with increase in pilot power till p t = −10 dB. Any further increase in p t , only marginally reduces the channel estimation error, which does not affect the sum SE. Our choice of p t = 0.2 W in the numerical studies is, therefore, practical.
WSEE metric -influence of weights: We now demonstrate that the WSEE metric can accommodate the heterogeneous EE requirements of both uplink and downlink UEs. For this study, we consider a particular realization of a FD CF mMIMO system with a transmit power p d = 2p u = 30 dBm, M = 32 APs, K d = K u = K/2 = 2 uplink and downlink UEs and N t = N r = N = 2 transmit and receive antennas on each AP, with QoS constraints S ok = S ol = 0.1 bits/s/Hz. We plot the individual EEs of the uplink (UL) and downlink (DL) UEs versus the SCA iteration index for centralized WSEE maximization, using Algorithm 3, for two different combinations of UE weights. Weights w 1 and w 2 are associated with DL UE 1 and DL UE 2, while weights w 3 and w 4 are associated with UL UE 1 and UL UE 2, respectively.
We plot in Fig. 4a and Fig. 4b the individual EEs of UL and DL UEs, with: i) equal weights (w 1 = w 2 = w 3 = w 4 = 0.25), and ii) w 1 = 0.08, w 2 = 0.02, w 3 = 0.5, w 4 = 0.4, respectively. In Fig. 4a, with equal weights, UEs attain an EE depending on their relative channel conditions, which clearly indicates that in terms of channel conditions, DL UE 2 DL UE 1 > UL UE 2 > UL UE 1. In Fig. 4b, the weights are chosen in an order which is opposite to the channel conditions. The EEs of the UL UEs now dominate the EE of DL UE 1, while reversing their relative order. The DL UE 2, with excellent channel, still attains a high EE, although lower than in Fig. 4a. Convergence of decentralized ADMM algorithm: We plot in Fig. 4c the WSEE obtained using decentralized Algorithm 4 with SCA iteration index. We consider M = 10 APs, K u = K d = K/2 = 2 uplink and downlink UEs and N t = N r = {1, 2} transmit and receive antennas on each AP at transmit power p d = 2p u = p = 30 dBm. We assume the following: i) penalty parameters ρ C = ρ θ = 0.1; ii) penalty parameter update threshold factor µ = 10; iii) ADMM convergence threshold ADMM = 0.01; and iv) SCA convergence threshold SCA = 0.001. We consider two values of the penalty update parameter: ϑ = {1.2, 1.8}. We note that the algorithm in both cases converges marginally quicker with ϑ = 1.2. A smaller penalty update parameter is therefore beneficial as then changes in the penalty parameters are not too abrupt, and a bad ADMM iteration which causes the primal and dual residues to diverge is, consequently, not overly responded to [38]. We therefore fix ϑ = 1.2 for the rest of the simulations.
Comparison with existing schemes: We now compare our proposed FD CF mMIMO WSEE optimization strategy with some existing approaches. In particular, we compare the • proposed fair AP selection algorithm, Algorithm 1, with the optimal AP selection scheme proposed in [29]. • maximum-ratio combining (MRC)/maximal ratio transmission (MRT) considered herein with zero-forcing reception (ZFR)/ zero-forcing transmission (ZFT) [40]. We observe from Fig. 5a that the proposed fair AP selection approach has almost as well as the optimal one in [29]. The proposed procedure efficiently eliminates the AP-UE links that do not have sufficient channel gain and thus contribute little to the system throughput while consuming a significant amount of power. Turning off APs according to the optimal AP selection procedure in [29], thus only provides marginally better WSEE.
MRC/MRT and ZFR/ZFT comparison: For this study, we considered a FD CF mMIMO system with M = 32 multi-antenna APs having N t = N r = 8 transmit and receive antennas antennas each, K d = 12 downlink UEs and K u = 8 uplink UEs. We consider two fronthaul cases: i) perfect highcapacity withã =b = 1, and ii) limited C fh = 10 Mbps capacity with ν = 2 quantization bits. We see from Fig. 5b that for both fronthaul capacities, the MRC/MRT transceiver for the scenario considered herein, although slightly inferior at high transmit power, performs reasonably well when compared with computationally-intensive ZFR/ZFT transceiver.
WSEE variation with parameters: We now vary WSEE with important system parameters and obtain crucial insights into energy-efficient FD CF mMIMO system designing. We consider M = 32 APs, N t = N r = N = 8 AP transmit and receive antennas, K d = 12 downlink UEs, K u = 8 uplink UEs and QoS constraints S ok = S ol = 0.1 bits/s/Hz, unless mentioned otherwise.
We plot in Fig. 6a the WSEE by simultaneously varying downlink and uplink transmit power as p d = 2p u = p. We consider centralized and decentralized optimal power allocation (OPA) approaches from Algorithm 3 and Algorithm 4, respectively. We compare them with three sub-optimal power allocation schemes: i) equal power allocation of type 1, labeled as "EPA 1", where η mk = bN t k∈κ dm γ d mk −1 , ∀k ∈ κ dm and θ l = 1 [18], [19], ii) equal power allocation of type 2, labeled as "EPA 2", where η mk = bN t K dm γ d mk −1 , ∀k ∈ κ dm and θ l = 1 [18], and iii) random power allocation, labeled as "RPA", where power control coefficients are chosen randomly from a uniform distribution between 0 and the "EPA 1" value. We note that the existing literature has not yet optimized the WSEE metric for CF mMIMO systems, and hence we can only compare with above sub-optimal schemes. Further, the decentralized ADMM approach, with lower computational complexity, has the same WSEE as that of the centralized one. Also, both decentralized and centralized approaches far outperform the baseline schemes.
We next characterize in Fig. 6b the joint variation of WSEE and sum SE with the number of quantization bits ν in the fronthaul links. The WSEE is obtained using decentralized Algorithm 4. We consider transmit power p d = 2p u = p = 30 dBm and take two different cases: i) high fronthaul capacity, C fh = 100 Mbps, which is sufficiently high to support all the UEs, and ii) limited fronthaul capacity, C fh = 10 Mbps, which limits the number of UEs a single AP can serve. We observe that for C f h = 100 Mbps, the WSEE falls with increase in ν, even though the corresponding sum SE increases. For C f h = 10 Mbps, both sum SE and WSEE simultaneously increase with increase in ν. To explain this behavior, we note from Fig. 3b that increasing ν improves the sum SE for C fh = 100 Mbps and C fh = 10 Mbps. For C f h = 100 Mbps, the APs serve all the UEs, i.e., K dm = K d and K um = K u , so increasing ν linearly increases the fronthaul data rate, R f h (see (7)). This, as seen from (18), increases the traffic-dependent fronthaul power consumption. Using lower number (1-2) of quantization bits is therefore more energyefficient, as it provides sufficiently good SE with a low energy consumption. However, for C f h = 10 Mbps, K um and K dm have an upper limit, given by (9), which is inversely related to ν. The product, ν(K um + K dm ), remains nearly constant for all values of ν. Thus, R f h (see (7)) doesn't increase with increase in ν and remains close to the capacity, C f h . The traffic-dependent fronthaul power consumption, given in (18), hence, remains close to P ft . A higher number of quantization bits (3 − 4) therefore provides a higher sum SE and hence, also maximizes the WSEE.
Latency: The per-iteration complexity of the decentralized Algorithm 4, as observed earlier in Section IV-C, is lower than the centralized Algorithm 3. We now demonstrate the same by comparing their per-iteration runtime. For this simulation, as shown in Fig. 6c, we consider an FD CF mMIMO system with M = 32 APs, each having N t = N r = 8 transmit and receive antennas, and plot the average runtime of each iteration by varying the total number of UEs, K, with K d = K u = K/2. We note that the decentralized algorithm has significantly lower per-iteration runtime, particularly for large K. Both these algorithms require only large-scale channel coefficients and hence need to be executed only once in hundreds of coherence intervals.

VI. CONCLUSION
We derived a SE lower bound for a FD CF mMIMO wireless system with optimal uniform fronthaul quantization. Using a two-layered approach, we optimized WSEE using SCA framework which in each iteration solves a GCP either centrally or decentrally using ADMM. We showed how WSEE incorporates EE requirements of different UEs. We analytically and numerically demonstrated the convergence of decentralized algorithm. We showed that it achieves the same WSEE as the centralized approach with a much reduced computational complexity.

APPENDIX A
We use the optimal uniform quantization model from [15], [19]. Using Bussgang decomposition [41], the quantization function dx andς d is the normalized distortion whose power is given as E{ς 2 d } =b−ã 2 . Here h(x) is the midrise uniform quantizer with L = 2 ν quantization levels rising in steps of size∆, and ν being the number of quantization bits. The signal-to-distortion ratio SDR = E{(ãx) 2 } The optimal step-size∆ opt maximizes the SDR for a given ν. The optimalã andb values are calculated using the optimal ∆ opt for each value of ν, and are given in Table II [15].
We now calculate the beamforming uncertainty for the kth downlink UE as follows Equality (a) is because i)ĝ d mk are zero-mean and uncorrelated; and ii) E{ ĝ d mk 4 } = N t (N t + 1)(γ d mk ) 2 [12] and E{ e d .
We now simplify MUI for the kth downlink UE: Equality (a) is because: i)ĝ d mq and g d mk are mutually independent; and ii) E{|(g d mk ) We next calculate UDI for the kth downlink UE: E{|h kl | 2 }θ l = ρ u Ku l=1β kl θ l .
We express the total quantization distortion (TQD) for the kth downlink UE as follows The result in (14) follows from the expression for the achievable SE lower bound We now derive the achievable SE expression for the lth uplink UE in (15). We know from Section II-B that g u ml =ĝ u ml + e u ml , whereĝ u ml and e u ml are independent and E{ ĝ u ml 2 } = N r γ u ml . We can express the desired signal for the lth uplink UE as given next E{|DS u l | 2 } = E{|ã m∈M u l √ ρ u E{ θ l (ĝ u ml ) H (ĝ u ml + e u ml )s u l }| 2 } The beamforming uncertainty for the lth uplink UE is E{|BU u l | 2 } =ã 2 ρ u θ l m∈M u l E{ ((ĝ u ml ) H g u ml − E{(ĝ u ml ) H g u ml )} 2 } (a) =ã 2 ρ u θ l m∈M u l (E{ ĝ u ml 4 } + E{|(ĝ u ml ) H e u ml | 2 } − N 2 r (γ u ml ) 2 }) (b) =ã 2 ρ u N r θ l m∈M u l γ u ml β u ml .
We assume, similar to [19], that the quantization distortion is uncorrelated across the fronthaul links. The TQD power for the lth uplink UE is accordingly expressed as  Rohit Budhiraja received the M.S. degree in electrical engineering and the Ph.D. degree from IIT Madras in 2004 and 2015, respectively. From 2004 to 2011, he worked for two start-ups where he designed both hardware and software algorithms, from scratch, for physical layer processing of WiMAXand LTE-based cellular systems. He is currently an Assistant Professor with IIT Kanpur, where he is also leading an effort to design a 5G research testbed. His current research interests include design of energy-efficient transceiver algorithms for 5G massive MIMO and full-duplex systems, robust precoder design for wireless relaying, machine learning methods for channel estimation in mm-wave systems, and spatial modulation system design. His paper was shortlisted as one of the finalists for the Best Student Paper Awards at the IEEE International Conference on Signal Processing and Communications, Bangalore, India, in 2014. He also received IIT Madras Research Award for the quality and quantity of research work done in the Ph.D., Early Career Research Award, and Teaching Excellence Certificate at IIT Kanpur.