Clustering and Beamforming for User-Centric Cell-Free Massive MIMO With Backhaul Capacity Limitation

This paper addresses the joint design of a beamformer and user-centric clustering for scalable cell-free massive multiple-input multiple-output (CF-mMIMO) under severe quantization noise generated in backhaul links for central processing units (CPUs) cooperation. The system model comes up with a plan in which multiple CPUs exchange physical layer data under limited bandwidth to enhance performance while introducing clustering across multiple CPUs. We derive the joint optimization of the minimum mean-square error (MMSE) beamformer and user-centric clustering under quantization noise, and propose its low-complexity design. The superiority of our proposed design method is clarified by comparing its performance with cellular distributed MIMO systems. Throughout the paper, we first answer the problem of how much gain CF-mMIMO systems with multiple CPUs cooperation can obtain.


I. INTRODUCTION
Technical investigations on the sixth generation (6G) systems, or Beyond the fifth generation (5G), which will be commercialized by approximately 2030, are conducting key performance indicators (KPIs) analysis [1], [2], [3], [4], [5].According to published papers, 6G systems will have new indicators that do not appear in 5G systems and will improve each KPI of 5G systems.In 6G systems, the demand growth of data rates per user equipment (UE) will exceed that of system peak data rates [6], [7], [8].Although the number of antennas will increase to enhance data rate per cell as massive multiple-input multiple-output (mMIMO), the performance will remain limited due to spatial correlation.In addition, from the fourth generation (4G) wireless communication systems to 5G systems, cells are deployed more densely to handle the increasing required network The associate editor coordinating the review of this manuscript and approving it for publication was Chen Chen .capacity.However, with densely deployed cells, cell-edge UEs experience larger intercell interference, which results in worse performance compared to the cell-center UEs, which makes UE performance unfair.In multi-cell mMIMO systems, a sub-optimal pilot power allocation scheme is investigated to improve the channel estimation performance using appropriate user groupings [9].The performance of a simple least squares scheme can approach that of the MMSE scheme with small inter-/intra-cell interference and sufficient pilot power.
To solve these problems, CF-mMIMO systems appeared in 2015 [10], [11], [12], [13].CF-mMIMO systems eliminate the concept of cells, and a massive number of access points (APs) are distributed in a network coverage area.In [12], a CF-mMIMO system with a single CPU is introduced, as depicted in Fig. 1, in which all APs cooperate to communicate with all UEs to reduce interference and make UE performance fairer.Additionally, in CF-mMIMO systems, spatial correlation decreases, interference can be suppressed by spatial filtering based on large-system limit, and performance is determined from a large-scale perspective.Furthermore, every AP connects to the sole CPU, which process all the signals, via wired fronthaul links.In addition, CF-mMIMO systems are investigated from various perspectives.For example, transmit power control methods are proposed for CF-mMIMO systems, such as maximizing the minimum spectral efficiency (SE), maximizing the sum-SE, and maximizing the total energy efficiency (EE) [10], [14], [15], [16].For example, [14] proposes an optimization algorithm to maximize the minimum signal-to-interferenceplus-noise ratio (SINR).Ref. [15] considers the system that there is quantization noise at fronthaul links between APs and the sole CPU.It maximizes the minimum data rate by optimizing receiver filter coefficients and power allocation.In addition, the performance of uplink CF-mMIMO systems with a single CPU, taking into account the quantization caused by capacity-limited fronthaul between APs and the CPU, is investigated [17].An analysis of three distinct signal processing schemes clarified that limited-fronthaul systems with a few quantization bits can achieve almost the same performance as ideal-fronthaul systems.Ref. [18] proposes an algorithm to jointly perform UE-APs association and decoding to reduce fronthaul load.The proposed method in [18] achieves almost the same bit error rate performance as the conventional CF-mMIMO system where all APs serve all UEs while reducing fronthaul signaling.Moreover, a beamforming scheme and several power allocation strategies for CF-mMIMO systems where both APs and UEs are equipped with multiple antennas have been reported [19].
However, it is impractical for only one CPU to process all UE signals located in a wide area, e.g., a state or a country.Specifically, in such a situation, the length of optical fibers connecting the APs and the CPU can exceed the limit to guarantee fibers' communication quality, and a longer distance between the APs and the CPU causes signal processing delays and degrades throughput.Therefore, even though CF-mMIMO systems with a single CPU realize theoretically optimal networks, they also confront scalability problems in terms of the increasing numbers of UEs and APs [20], [21], [22], [23].
The first approach to realize scalability in CF-mMIMO systems is user-centric clustering [19], [24], [25], [26], [27].With this approach, the system makes clusters of APs that are tailored for each UE's channel condition, hence named ''usercentric.''There are several methods for making clusters: one common approach is to use the power difference of the estimated large-scale coefficients, which include path loss and shadowing [25].In general, only the UE's surrounding APs will join in the UE's cluster while each AP may belong to multiple clusters.This method also introduces a limitation on the number of UEs by allocating different pilot resources to each UE.It is beneficial for making systems scalable because network operators can determine the number of pilot resources at the time of system design.Another method is to use the Hungarian algorithm, which makes clusters based on network connectivity and is described in [28].However, the signal processing delay mentioned above remains as long as there exists one CPU only.Radio stripes have emerged as another approach for scalability in CF-mMIMO, where APs are connected sequentially and only one AP connects to the CPU directly [29], [30].APs process the signals locally and the reduced amount of information is sent to the CPU.Nevertheless, due to its structure of radio stripes, forwarding the data from APs to the CPU takes longer than the general CF-mMIMO networks [29].We can say that the application range of current radio stripes is still limited especially in terms of latency caused by the increasing data rate in Beyond 5G.In addition, radio stripes can be deployed to specific environments, e.g., stadiums and train stations.
Therefore, the other approach is multi-CPU CF-mMIMO systems depicted in Fig. 2, where the area is virtually divided into smaller subareas and each CPU takes charge of each subarea [20], [25], [27], [28], [31] in combination with user-centric clustering.This approach enables the system to reduce the computational complexity on every CPU and to reduce the required pilot resources, which generally must be orthogonal with each other to prevent pilot contamination.Each CPU manages a subset of APs, which are directly connected to the CPU via fronthaul links.In multi-CPU CF-mMIMO systems, the CPUs must exchange channel state information (CSI) and signals via backhaul links to process the signals of UEs, whose clusters consist of APs belonging to different CPUs.Distributed signal processing is not distinctive to CF-mMIMO systems; similar concepts have been investigated in cellular systems.One example is cellular distributed antenna systems (DAS), which is illustrated in Fig. 3 [32], [33].However, cellular DAS still has cell boundaries, and intercell interference remains.On the other hand, multi-CPU CF-mMIMO systems can process signals not statistically but conclusively thanks to backhaul connections among CPUs.
However, although only higher-layer information is forwarded via backhaul links in current systems such as 5G, forwarding received I/Q data and estimated channel via backhaul links is required to realize multi-CPU CF-mMIMO systems.Signals of various layers, i.e., the physical layer  and higher layers, are forwarded to handle a wide range of communication requirements [34], [35], [36], [37], [38].With this approach, we can forward received I/Q data and estimated channel via backhaul links, but we must also quantize the signals because insufficient capacity is sometimes available with high demand from connected UEs.For example, the performance of cloud radio access networks with limited backhaul links among cloud processors [39], [40], [41] has been investigated.In those works, limited backhaul links are modeled as quantization noise caused by low-bit analogto-digital converters.However, those papers introduce such systems in cellular networks, i.e., cell-edge problems remain, and there is no user-centric clustering.Hence, a natural question that may arise is how much network cooperation gain can be obtained among CPUs connected by backhaul links with the bandwidth limitation.The answer to this natural question will reveal the practicability and the design of scalable CF-mMIMO systems with multiple CPUs.
In this paper, we propose a method of user-centric clustering and beamforming for multi-CPU CF-mMIMO systems considering quantization noise with the assumption that a certain capacity is guaranteed to exchange I/Q data among CPUs via the wired backhaul links explained above.Moreover, we analyze the performance of multi-CPU CF-mMIMO systems, where signals are processed at the CPUs, with the effect of backhaul quantization noise on signals.In particular, we clarify the superiority of our new proposed method compared to single-CPU CF-mMIMO systems and cellular DAS by means of Monte Carlo simulations.Here, as we mentioned in the fourth paragraph of this section, we focus on not radio stripes but a common topology of CF-mMIMO for general performance investigation.Our contributions in this paper are summarized as follows: • We present backhaul quantization noise using a mathematical model in multi-CPU CF-mMIMO systems.
• We newly formulate an optimization problem to derive optimal solutions as an upper bound of multi-CPU CF-mMIMO systems with quantization noise.
• We design a new method of user-centric clustering and beamforming based on dynamic cooperation clustering (DCC) and MMSE for the presented model.
• We discuss the computational complexity of the systems to show the benefit of using our proposed method with multi-CPU CF-mMIMO systems. • We present an answer to the unsolved problem of how much network cooperation gain can be obtained among CPUs connected by noisy backhaul links by comparing its performance with cellular DAS.The remainder of this paper is organized as follows: The system model, including the channel model and backhaul quantization noise, is described in Section II.In Section III, the joint optimization problem of user-centric clustering and beamforming to derive an upper bound of performance is derived, and the realistic design method of user-centric clustering and beamforming for multi-CPU CF-mMIMO systems is introduced.The computational complexity of these methods is also discussed in this section.Numerical results of multi-CPU CF-mMIMO systems are investigated with single-CPU CF-mMIMO systems, as well as cellular DAS, in Section IV.Finally, we conclude this paper in Section V.
Notation: Upper and lower boldface letters X and x denote a matrix and a column vector, respectively.In particular, I L denotes an identity matrix size of L × L. • denotes an elementwise product, or an Hadamard product, of vectors or matrices.∥ • ∥ F , (•) † , (•) H , and (•) * denote the Frobenius norm, pseudoinverse, Hermitian transpose, and complex conjugate, respectively.diag(x) denotes a transformation of a vector into a diagonal matrix, and diag(X) denotes a transformation of a diagonal matrix into a column vector.Finally, E[•] denotes an expectation.

II. SYSTEM MODEL
In this section, we introduce the system model for user-centric multi-CPU CF-mMIMO systems, where we focus on uplink transmission.There are L APs and K UEs distributed over a certain coverage area, and each AP and UE is equipped with a single antenna.Every AP connects to one of C CPUs located in the area via wired fronthaul link.Furthermore, all CPUs are connected with each other via wired backhaul links with capacity limitations.Note that the assumption of singleantenna UEs is common in CF-mMIMO papers thanks to the degree of freedom at multi-antenna APs [42], [43], [44].In addition, to guarantee scalability, the system adopts usercentric clustering, as illustrated in Fig 2.
Although there are multiple CPUs in the area, the signals of the kth UE are processed at its sole host CPU.In this paper, a host CPU is a CPU directly connected to the AP that has the largest channel gain of a certain UE among all APs.Therefore, the beamformer for the kth UE is designed at the host CPU to extract its signals.At this time, the received signals at the APs, which connect to the other CPUs, and the estimated channels at the other CPUs are gathered at the host CPU via fronthaul links and capacity-limited backhaul links.As discussed in Section I, signals of various layers, i.e., the physical layer and higher layers, are forwarded to address a wide range of communication requirements [34], [35], [36], [37], [38].With this approach, we can forward received I/Q data and estimated channel via backhaul links, but we must also quantize signals because sometimes insufficient capacity is available with high demand from connected UEs.

A. MODELING OF CHANNEL INFORMATION SHARING
In general, CF-mMIMO systems must estimate CSI using uplink pilot signals [12], [27], [45], and the channel model takes the impact of channel estimation into account.
For simplicity, in this paper, we define the estimated channel between the kth UE and all APs as follows: where is the channel realization vector following a complex Gaussian distribution with mean 0 and covariance R k .We assume that MMSE is used for channel estimation and e k ∼ CN (0, σ 2 e I L ) ∈ C L×1 is the channel estimation error vector.Furthermore, in this paper, we assume that reference signals are sent from APs in coherence intervals to prevent channel aging effects.
The estimated channel is affected by backhaul quantization noise due to capacity limitations when forwarding the CSI among CPUs.This means that even though the CSI is of the same UE, each CSI gathered at the host CPU from nonhost CPUs of the UE can be different.Therefore, the estimated channel of the k ′ th UE gathered at the host CPU of the kth UE is expressed as p k ∈ {0, 1} L×1 denotes the direct connectivity between the host CPU of the kth UE and all APs and is given by Accordingly, quantization noise is added to the estimated channel from the nonhost CPUs.w ′ k ′ ∼ CN (0, σ 2 wp,k ′ I L ) ∈ C L×1 is the backhaul quantization noise vector whose variance is defined as follows: where C b (bit/s/Hz) is the backhaul link capacity.

B. CLUSTERING MATRIX
In this subsection, we introduce a model of user-centric clustering, where all APs are grouped into a subset of APs based on the channel condition between the UEs and APs.Note that each AP can join multiple clusters, while every UE has only one cluster.The binary diagonal matrix denoting the clustering status of the kth UE is expressed by where d c k,ℓ is an indicator of the clustering defined as In other words, the (ℓ, ℓ)-th element is 1 if the ℓth AP joins in the kth UE's cluster and 0 otherwise.Notably, in contrast to p k , 1 denotes the connected status in D k .The detailed process of user-centric clustering is explained in Section III.

C. SIGNAL MODEL
In this paper, we assume flat fading channels, and one possible enabler to apply the proposed method to frequency-selective fading channels is orthogonal frequencydivision multiplexing (OFDM).In such scenarios, the related signal processing can be executed for each subcarrier.
The received signals at all APs sent from all UEs are formulated as follows: where s k ∈ C 1×1 is the data symbol transmitted by the kth UE, and n ∼ CN (0, σ 2 n I L ) ∈ C L×1 is the additive white Gaussian noise (AWGN) vector at APs.
Similar to the channel model, the gathered received signals at the host CPU from other CPUs are affected by backhaul quantization noise.Therefore, the gathered signals at the host CPU of the kth UE are expressed as is the quantization noise vector for the received signals, whose variance is given by The data symbol sent from the kth UE is estimated by the host CPU using an arbitrary beamformer u k , as follows: Finally, we define the network throughput of the kth UE as where W is the system bandwidth, and Γ k is the SINR of the kth UE expressed as ( 12), shown at the bottom of the next page [46].Note that for a single-CPU CF-mMIMO system, hk is equivalent to ĥk because there is only one CPU and no backhaul quantization noise exists.

III. PROPOSED METHODS
In this section, we propose a new design method for user-centric clustering and beamforming taking into account backhaul quantization noise for multi-CPU CF-mMIMO systems.As indicated by ( 2) and (8), due to backhaul quantization noise, the selection of the host CPU is important to improve UE performance.Therefore, we select a host CPU that directly connects to the AP that has the largest channel gain of a certain UE, as mentioned in Section II.Furthermore, it is beneficial to take quantization noise into account when designing beamformers.This section is divided into two subsections.First, we derive the joint design of user-centric clustering and beamforming with a total throughput maximization problem.In this paper, we pursue the achievable throughput of our proposed algorithms.This is why we solve the max-total problem instead of max-min problems which appear in [14] because it degrades performance of UEs in good condition for the sake of the UEs in bad condition.Since it is a nonconvex optimization problem including a logarithmic fractional function of throughput and combinational constraints, the problem is reformulated by fractional programming (FP) [47], [48] and convex-concave procedure (CCP) [49], and the combinational constraints are relaxed as an entropy penalty method [50].
Second, to reduce the computational complexity of solving the above problem, a simplified design method of user-centric clustering and beamforming is proposed.In this method, clusters are based on the DCC method presented in [25], and quantization noise is considered.Moreover, beamforming is derived as a closed form of MMSE that takes the effect of quantization noise into account.

A. JOINT DESIGN OF CLUSTERING AND BEAMFORMING FOR TOTAL THROUGHPUT MAXIMIZATION
The joint optimized design method for AP access configuration and beamformers has already been discussed in [51], which seeks the access configuration of CF-mMIMO systems with dynamic time division duplex.The joint optimization methods in literature [51] assume perfect knowledge of CSI.However, this paper considers the channel estimation error and the quantization noise.Therefore, to optimize considering noise, a design that is robust to imperfect knowledge of CSI is required.Therefore, the solution to the optimized design problem must search for the worstcase SINR based on CSI held by CPU.The robust design algorithm based on [51] must search for the worst-case SINR in the outer loop, so this application is not realistic from the viewpoint of computational complexity.In this paper, we implement a design assuming that the CSI, including the noise from the channel estimation and the backhaul communication, is the true CSI.Based on the above considerations, we derive the total throughput maximization problem as follows: where [•] ℓ denotes the ℓth element of a vector.The constraints of (13b) and (13c) are applied to AP selection for user-centric clustering to exploit the power of the beamformer.Specifically, when the equality holds, the power of the combiner is also 0, which means the ℓth AP does not serve the kth UE.In addition, the maximum power is normalized to 1 because the range of the penalty function is given as [0, 1].The constraint (13d) is a bound on the number of APs in each cluster.In the objective function (13a), Γk is an achievable SINR using a beamformer ūk , an optimization variable of ( 13), and is given as ( 14), shown at the bottom of the next page.Moreover, to circumvent the combinational constraints on d c k,ℓ , Problem (13) can be relaxed by replacing (13b) with its convex hull d c k,ℓ and by introducing a penalizing term into the objective based on a negative entropy function, as described in [50].In particular, let P(d c k,ℓ ) be the negative entropy function and let λ be a given weight.Therefore, Problem (13) can be rewritten, i.e., relaxed, as maximize (13c) and (13d), where ) is the penalty function, and λ ≥ 0 is a hyperparameter for adjusting the strength of the penalty.
Although Problem (15a) has removed the combinational constraints on d c k,ℓ , it is still intractable because the objective function has nonconvexity on the other optimization variable ūk .To resolve this challenge, we make Problem (15a) convex by FP.The quadratic transform (QT) for the first step of FP is applied to Problem (15a) as follows: subject to (13c), (13d), and (15b), where f k ( ūk ) is the approximated throughput of the first step of FP and can be expressed as where Γ ldt k is given by (18), as shown at the bottom of the next page, γ k = Γk , γ ≜ [γ 1 , . . ., γ K ], and In the next step of FP, we apply QT to (17).At this point, the objective function in (16) after applying QT becomes the difference between two concave functions, which motivates us to utilize the difference of concave programming technique.Therefore, we adopt CCP to find a solution of the problem, and Problem ( 16) is further modified as [49] maximize subject to (13c), (13d), and (15b), where (•) t−1 denotes the solution obtained in the (t − 1)-th iteration, s ≜ [s 1 , . . ., s K ], and where γ k ← Γk , ∀k 5: After optimizing (19) by solving a beamformer design with quantized d c k,ℓ , the effect of rounding errors can be reduced.Finally, (19) is solved by numerical convex optimization solvers such as SeDumi and SDPT3 [52], [53].For convenience, we summarize the step-by-step recipe of the proposed joint design of clustering and beamforming with the style of pseudocode in Algorithm 1.

B. LOW-COMPLEXITY DESIGN METHOD OF CLUSTERING AND BEAMFORMING 1) CLUSTERING METHOD FOR MULTI-CPU CF-MMIMO SYSTEMS
The proposed clustering method in this subsection uses DCC, which has been investigated in CF-mMIMO systems, as well as cellular DAS [54], [55].DCC is a method of user-centric clustering initially proposed for MIMO networks.It operates as if there exists a single CPU that controls multiple base stations (BSs).Since multiple CPUs and backhaul quantization noise exist, the DCC method must be modified.
The proposed clustering method is given as follows: 1) We assume that the CPUs experimentally determine the quantization noise depending on the number of bits.2) Based on the large-scale coefficients obtained during the channel estimation phase, the AP that has the largest received signal strength (RSS) or the statistical value of the estimated channel is selected as the main AP for a certain UE.In addition, the CPU that connects to the main AP directly becomes the host CPU for the UE.
3) The host CPU calculates signal-to-noise ratio (SNR) considering the quantization noise power of all APs taking into account the quantization noise on backhaul links.4) The APs whose power differences of SNR with quantization noise from the main AP are below the threshold, or a certain number of APs join in the cluster of the UE. 5) Steps 2-4 are executed for all UEs.By means of the above steps, clusters are formed using the CSI with penalty of backhaul quantization noise.As a result, the proposed clustering method can effectively suppress the quantization noise with a feasible process.
In this paper, we propose a design method for a beamformer based on MMSE to suppress quantization noise.The estimated transmitted data symbol ŝk from the kth UE is expressed as follows using an arbitrary beamformer: Furthermore, the MMSE beamformer is defined as Since ( 23) is a convex minimization problem, the proposed MMSE beamformer can find the global optimal solution by point zero of the Wirtinger derivative for the expectation.Therefore, by substituting ( 10) for ( 23), the Lagrange multiplier is given as where ψ k ′ is the replacement of the actual CSI h k ′ with the estimated CSI in (2) and is written as Finally, the stationary point of ( 23) is obtained by solving the equation ∂L/∂u * k = 0, and the proposed MMSE beamformer is represented by (26), as shown at the bottom of the next page.

IV. NUMERICAL ANALYSES
In this section, we first discuss the computational complexity of the proposed methods followed by the introduction of the conventional MMSE beamformer to investigate the performance.Finally, we evaluate the throughput of the proposed methods.
Although sufficient size reference signals are still required since there is a larger number of APs in CF-mMIMO systems compared to conventional cellular systems, it is important to control the network size by clustering appropriately to reduce the pilot sequence length.This is why user-centric clustering is investigated in CF-mMIMO papers.
A. COMPLEXITY ANALYSIS Before proceeding to the numerical performance evaluation of the proposed methods compared with the conventional and benchmark methods, we discuss their computational complexity.In particular, the proposed low-complexity methods in Section III-B have superior computational complexity to the joint optimization method in Section III-A.
Current OFDM systems also divide channels into multiple subcarriers [58], where the major signal processing techniques often possess the same complexity order as that of the proposed method.As for the related complexity reduction, various techniques have been proposed in the literature, such as [59], some of which are also applicable to our proposed scheme without loss of generality.

1) DESIGN METHOD OF CONVEX OPTIMIZATION USING FP
The most complicated operation associated with solving (19) is the computation of the corresponding quadratically 388 VOLUME 12, 2024 Authorized licensed use limited to the terms of the applicable license agreement with IEEE.Restrictions apply.
constrained programs (QCPs) ε-solution, whose canonical arithmetic complexity C can be upper-bounded by [60] and [61] C where M , Ñ , and Q m denote the number of constraints in the problem, the size, i.e., vector dimension, of the real-valued multidimensional variable, and the size of the mth constraint space, respectively.The constraint quantity digit(ε) is the order of precision of the ε-solution in terms of its distance to the optimum [61].After ( 19) is transformed into the QCP canonical form described by (44a) in Appendix A, we obtain The total complexity of Algorithm 1 can be estimated using ( 27) and (28a): where only the higher order term is left in the last equivalency.

2) PROPOSED LOW-COMPLEXITY METHOD
The computational complexity of the MMSE beamformer with clustering has already been discussed in [25].The complexity can be represented as where M k is the set of APs that join in the cluster of the kth UE; |M k | denotes its cardinality.Note that we use singleantenna APs.
Although the actual number of APs varies for each UE/AP allocation, unless we apply a fixed number of APs per cluster, the maximum is L. With the assumption that K ≪ L, the total complexity of the proposed low-complexity method can be estimated as follows: By comparing ( 29) and (31), it is clear that the latter method is superior to the optimized method in terms of computational complexity.Note that (29) is the complexity to solve the maximization problem (19), which appears at Lines 6 and 15 of Algorithm 1.As written at Lines 9 and 17, solving (19) can be repeated up to t max times.Therefore, the total complexity of Algorithm 1 is given as O(t max K 3 L 3 √ K + L), and the gap between ( 29) and ( 31) becomes wider when Algorithm 1 repeats more.

B. CONVENTIONAL METHODS AND IDEAL NETWORKS
In this subsection, we introduce the conventional clustering method and beamforming designs for performance comparison with the proposed methods.
First, we discuss beamforming design methods without considering backhaul quantization noise.These methods are not tolerant to quantization noise and thus will be a lower bound of the performance of the proposed methods.
Next, design methods of clustering and beamforming using an ideal CF-mMIMO system model are introduced.These provide an upper bound of throughput performance with the assumption that the model has infinite-capacity backhaul links.

1) CONVENTIONAL BEAMFORMER DESIGNS
First, an MMSE-based beamformer, which we call the benchmark MMSE, is given by excluding the quantization noise term of the proposed MMSE beamformer (26).Therefore, the benchmark MMSE is provided by (32), as shown at the bottom of the next page.This method represents the lower bound of throughput performance in this paper.

2) SINGLE-CPU CF-MMIMO SYSTEMS
Since the performance limitation of multi-CPU CF-mMIMO systems is caused by quantization noise, an ideal multi-CPU CF-mMIMO system with infinite-capacity backhaul links gives an upper bound of throughput performance.
First, the received signals without quantization noise are expressed as and we define the network throughput for the kth UE as follows: where As seen from ( 33), the ideal multi-CPU CF-mMIMO system is equivalent to the conventional single-CPU CF-mMIMO system with user-centric clustering.To fairly compare its performance with the proposed methods, the user-centric clustering scheme for the ideal CF-mMIMO system uses an original DCC framework [25], which is summarized as 1) The AP with the strongest RSS becomes the main AP of the UE based on the large-scale coefficients obtained during the channel estimation phase.
2) The CPU calculates the RSS of all APs.
3) The APs whose power differences of RSS from the main AP are below the threshold, or a certain number of APs join in the cluster of the UE. 4) Steps 1-3 are executed for all UEs.Finally, the MMSE beamformer for the ideal CF-mMIMO system is provided by (36), as shown at the bottom of page 11.

C. SIMULATION RESULTS
Based on the discussion above, we evaluate the performance of multi-CPU CF-mMIMO systems using Monte Carlo simulations.The parameter specification is shown in Table 1.The APs are allocated in grid shapes, where the distances between adjacent APs are equal in the row and column directions.The CPUs are allocated in the same way as the APs, and all APs connect to the geographically nearest CPU via fronthaul links.The large-scale coefficients are given as −35.3 − 37.6 log 10 (d) + z, where d denotes the distance and z represents shadowing with a standard deviation of 10 dB.Other parameters are specified according to [25], and the channels are assumed to be uncorrelated because all APs are equipped with a single antenna.
Finally, we discuss some optimization details.The initial values for the beamformer are given by MRC, and the initial values for clustering are set as d c k,ℓ = 1, ∀ℓ, k.The optional loop in Algorithm 1 is executed.The initial value of the penalty hyperparameter is λ = 0, and the increase in each iteration is assumed to be λ + = 1, except in the first five iterations, where λ + = 0.The maximum number of APs in clusters (13d) is L c k = 50.

1) IMPACT OF BACKHAUL CAPACITY LIMITATION
First, we investigate the impact of backhaul capacity limitation.Fig. 4 compares the cumulative distribution functions (CDFs) of the throughput for the benchmark MMSE beamformer of four values of capacity.''Single-CPU'' in the figures is the throughput of the single-CPU CF-mMIMO system as the upper bound of performance, i.e., without backhaul quantization noise.As the backhaul capacity changes, the throughput performance draws an exponential and linear gradient for C = 4 and C = 25, respectively.This is because when the number of CPUs is smaller, a greater number of APs belong to a specific CPU, and the probability of transmission via backhaul links decreases.When C b = 16, the performance of the benchmark MMSE beamformer with C = 4 is asymptotic to that of the single-CPU system.

2) PERFORMANCE SUPERIORITY OF THE PROPOSED METHODS
Here, we investigate the performance of the proposed methods taking quantization noise into account.Note that the backhaul capacity is set as C b = 4 to analyze the performance characteristics when the noise effect is significant.To make the received signal power fair among simulations, we allocate a fixed number of APs to each cluster.In particular, the number of allocated APs is the average of the number of allocated APs over all channel realizations.Fig. 5 compares the throughput performance of our proposed methods and the conventional methods.''MRC'' and ''BM'' use the proposed low-complexity clustering method, and the conventional MRC/MMSE beamformer.''MM'' uses the proposed low-complexity design method of clustering and beamforming.''OPT'' uses the proposed joint optimization method of clustering and beamforming.In addition, ''OPT_h'' uses the proposed joint optimization of clustering and beamforming without quantization noise of the estimated CSI.
Fig. 5a shows the CDF of throughput when C = 4.The proposed methods suppress the effect of backhaul quantization noise, which results in the performance of ''OPT_h'' asymptotically approaching that of the ideal system.However, ''OPT_h'' uses ideal channel state information, which is not a realistic assumption.Therefore, ''OPT'' shows the quasi-upper-bound with estimated channel state information.When the number of CPUs is small, ''MM'' asymptotically approaches ''OPT,'' indicating that suboptimal results can be obtained with lower computational complexity.The result of ''BM'' also shows that the performance improves with the proposed clustering method only.Furthermore, ''MRC'' performs slightly better than ''DCC-BM'' due to the proposed clustering method.Although the complexity is decreased by applying MRC, the performance is degraded  compared to MMSE.In addition, MMSE is commonly used in CF-mMIMO papers.Fig. 5b shows the CDF of the throughput when C = 25.The performance degradation of ''BM'' is significant because the effect of quantization noise is added on more signals via backhaul links.On the other hand, the performance degradation of ''MM'' is small due to the noise suppression of the proposed beamforming.In particular, ''OPT_h'' asymptotically approaches the ideal system regardless of the number of CPUs.However, ''OPT_h'' does not consider the backhaul quantization noise of the channel estimation in (14) for simplicity of transformation.This is why ''OPT'' has degraded performance in more realistic setups.These results indicate that the proposed methods effectively suppress the effect of quantization noise.In particular, ''MM'' can suppress the noise effect even though the computational complexity is kept to a practical level.
3) PERFORMANCE COMPARISON WITH CELLULAR DAS Fig. 6 shows the average throughput versus the number of APs per cluster.Similar to Fig. 5, ''MM'' uses the proposed clustering and beamforming methods, and ''single-CPU'' denotes the ideal CF-mMIMO system with the MMSE beamformer (36).Note that ''DAS'' shows constant throughput regardless of the number of APs per cluster because the systems do not use clustering.As seen from Fig. 6, the average throughput of the proposed methods approaches that of the ideal CF-mMIMO system as the backhaul capacity increases.Furthermore, the performance improves as the number of APs increases because the proposed methods can mitigate the backhaul quantization noise.
In Fig. 6a, the average throughput of CF-mMIMO systems is smaller than that of DAS when the number of APs in the cluster is less than 20.CF-mMIMO is disadvantageous regarding received power because the number of APs connected to each CPU is fixed at 25. Since CF-mMIMO can utilize the number of APs connected to nonhost CPUs, it can achieve higher throughput than DAS when the number of APs in a cluster is 25.
On the other hand, Fig. 6b shows the advantage of CF-mMIMO systems over DAS regarding the average throughput with 10 or more APs per cluster.Under this setting, DAS cannot fully utilize the spatial degrees of freedom due to the reduced number of APs per cell.In contrast, multi-CPU CF-mMIMO can use a large number of APs to increase the degrees of freedom and receive power.In addition, these results indicate that the proposed methods are effective in increasing the effect of backhaul quantization noise caused by the increase in the number of CPUs.

V. CONCLUSION
In this paper, we analyzed the effect of quantization noise caused by capacity-limited backhaul links for CF-mMIMO systems with multiple CPUs.First, we revealed the impact of quantization noise on performance with a conventional MMSE beamformer.Second, to obtain better UE performance under such conditions, we proposed a new MMSE beamformer and clustering methods using an optimization problem and a low-complexity approach.Finally, we compared the performance of our proposed methods with cellular DAS.
The proposed methods outperform the conventional MMSE beamformer in all simulation cases.Multi-CPU CF-mMIMO systems also outperform conventional cellular DAS, which shows the merits of transforming current mobile networks into CF-mMIMO systems.

APPENDIX A CANONICAL FORM OF QCP FORMULATION
In this section, we describe how Problem ( 19) is transformed into the QCP canonical form to calculate its computational complexity.To that end, first, we consider the canonical form of a real-valued conic QCP, which is expressed as [61]  To put (38b) into the form of (37c), each term in (39) must be expressed with real variables.In particular, the quadratic term in (39) can be rewritten as where ūr k ≜ ℜ( ūk ) T ℑ( ūk ) T T , (41a) The second term in (39) can be written as where Finally, (38a) can be inserted into the following QCP canonical form:

FIGURE 1 .
FIGURE 1.Model of a single-CPU CF-mMIMO system.APs are distributed in the area and connect to the CPU via wired fronthaul links.

FIGURE 2 .
FIGURE 2. Model of a multi-CPU CF-mMIMO system.Each AP and CPU are connected via fronthaul link, and received uplink signals are forwarded to the CPUs.

FIGURE 3 .
FIGURE 3. Model of DAS for cellular networks.Antennas are distributed in the cells, each of which is controlled by one CPU.There is no interaction among CPUs.

FIGURE 4 .
FIGURE 4. Comparison of throughput performance for the benchmark MMSE beamformer with four values of backhaul capacity.

FIGURE 5 .
FIGURE 5. Performance comparison of the proposed methods and conventional methods.

FIGURE 6 .
FIGURE 6.Comparison of average throughput with various numbers of APs per cluster using the proposed methods.