Bayesian Receiver Design via Bilinear Inference for Cell-Free Massive MIMO With Low-Resolution ADCs

We propose a novel joint channel and data estimation (JCDE) scheme to combat the rate limitation in fronthaul links of cell-free massive MIMO (CF-mMIMO) systems introduced by the use of analog-to-digital converters (ADCs) at access points (APs), which makes channel estimation and multi-user detection at the central AP (CAP) challenging. The latter problem is solved here via the new JCDE scheme which differs from state-of-the-art (SotA) alternatives due to two contributions. The first is the design and incorporation of de-quantization (DQ) step which relies only on scalar Gaussian approximation (SGA) assumptions in conformity with mild central limit theorem (CLT), in contrast to the much harder asymptotic conditions required by the classic bilinear generalized approximate message passing (BiGAMP) algorithm. The second is a modification of bilinear Gaussian belief propagation (BiGaBP), whereby quantized outputs are linearized via the Bussgang decomposition enabling tractable signal processing. The resulting DQ-aided JCDE method achieves both low-complexity and high-accuracy by exploiting both the spatial degrees of freedom (DoF) obtained from, and the observations at the CAP to compensate for the low-resolution distortion introduced by, the distributed APs. The efficacy of the proposed method over the SotA is confirmed via computer simulations.

increasing demand in data rates at initial standardization stages of fifth-generation (5G) systems, which aim to to achieve besteffort services focusing on faster downlink speeds. In beyond 5G and future sixth-generation (6G) networks, however, massive multiple-input multiple-output (mMIMO) is also expected to be exploited in the realization of user-centric communication systems, which seek to always satisfy wireless link quality both in the downlink (service) and the uplink (access) connectivities.
Cell-free massive MIMO (CF-mMIMO) has been recently proposed [1] as a promising new concept of mMIMO architectures that enables achieving this goal. The CF-mMIMO architecture virtually configures a mMIMO system using a large number of single-or multiple-antenna APs that are geographically distributed and connected through wired fronthaul links to a common high-performance central processing unit (CPU), while simultaneously serving multiple user equipment (UE). This renders the benefits of lower average path-loss, smaller spatial correlation, and larger degrees of freedom (DoFs) simultaneously possible. In addition, the centralized processing at the CPU of signals collected from local APs enables the alleviation of the cell boundary problem typical of cellular architectures by effective interference suppression, resulting in a more user-centric system, whereby the system can dynamically adjust to serve the needs of users, whether than users having to conditions of the system.
Early CF-mMIMO systems proposed in seminal works such as [1], [2], and [3] are, however, not scalable in practice due to the assumption that all APs are ideally connected to one CPU, responsible for coordinating and processing the signals of all UEs. In order to address this issue, a usercentric decentralized approach was proposed in [4], [5], and [6], which suggested the formation of local clusters [7], [8] consisting of a local CPU -also referred to as CAP -and associated APs, such that each UE is not served by all APs simultaneously, but by a subset of the latter that provides the best channel conditions. In particular, in [5] and [6], the dynamic cooperation clustering (DCC) method of [9] was applied to dynamically change the CAP-AP connections so as to achieve a scalable CF-mMIMO system, considering the time variability of the network structure due to the mobility of UEs.
Next to scalability, the energy consumption and deployment cost of the network are further challenges of the CF-mMIMO architecture, as these parameters increase rapidly with the number of APs in the system. These challenges are mitigated by the use of low-cost hardware components, which in turn translates to the requirement of robustness against hardware impairments [10], [11], [12], [13]. In particular, rate limitations in wired fronthaul links and the constrained computational capabilities of APs are two factors now considered unavoidable in practical CF-mMIMO system [14], [15], which require appropriate data compression and demodulation techniques at the CAPs and APs to be circumvented.
In light of the above, we contribute in this article with an appropriate receiver design to address the aforementioned inherent problems of scalable CF-mMIMO system, including channel estimation (CE) and multi-user detection (MUD) in the uplink. We highlight that while preceding work such as [10], [11], [12], [13], [14], and [15] focused on metrics such as achievable data rate, spectral efficiency, and/or energy efficiency in their system performance evaluation, our contribution is fundamentally algorithmic, such that our numerical results are presented in terms of normalized mean square error (NMSE) of estimated quantities and bit error rate (BER), computed from actual constellation points. Besides the contribution of the newly proposed receiver, the performance evaluation here presented is therefore complementary to the aforementioned work in demonstrating the practical feasibility and the relevance of robust receiver designs for CF-mMIMO systems.
When considering the SotA literature on techniques to combat the rate-limited fronthaul link problem, one finds that a common approach is to perform data compression at the APs, which amounts to quantizing the signals using a small number of bits in order to reduce the load on the fronthaul links [10], [11], [12], [13], [14], [15]. A practical solution to that end is to employ low-resolution ADCs with 1-3 bits instead of the typical 10+ bits at APs, which has the further advantages of enabling low power consumption and reduced hardware cost. Other possible solutions are compute-and-forward schemes based on the cardinality reduction of symbols [16], [17], and singular value decomposition (SVD)-based latent semantic analysis (LSA) techniques [18], [19], which can efficiently reduce the fronthaul load, but require higher performance of the APs.
We highlight that among the many low-resolution ADC solutions to the rate-limited fronthaul problem are also clever estimate-compress-forward (ECF) methods such as those proposed e.g. in [14] and [15], in which the APs performs, besides data compression, other signal processing operations such as channel estimation and data filtering. Indeed, such an approach has been shown via numerical results to outperform the simpler "compress-only" counterparts, but their performance and implementation cost are in a trade-off relationship, which ultimately compromises the scalability of the system.
Keeping our aim sharply at the objective of reducing deployment costs in order to add feasible scalability to CF-mMIMO systems, we therefore focus our contribution on the previously mentioned paradigm, whereby APs only compress their received signals using low-resolution ADCs, before forwarding the resulting quantized signals to the CAP. Under such a premise, it is the CAP that is required to employ sophisticated (and low-complexity) signal demodulation techniques in order to mitigate the severe distortion in the signals forwarded by the APs as a consequence of the nonlinear quantization noise induced by low-bit ADCs at fronthaul links. In summary, we assume that only downconversion and quantization are performed at the APs, while baseband CE and MUD are performed at the CAP.
Having clarified our basic assumption, we also note that several methods have been proposed for recovering and estimating signals from quantized observations, including the optimal maximum likelihood (ML) estimation, as well as sub-optimal variation of the latter based on sphere decoding (SD) [20], whose exponential complexities are too high to be feasible for scaled CF-mMIMO systems. In comparison, the near-ML estimator based on convex optimization [21] and spatial filtering based on Bussgang's theorem [22], [23], [23], [24], [25], can be leveraged as the basis of effective low-complexity estimation alternatives, which can achieve high estimation performance by exploiting the abundant DoF provided by CF-mMIMO systems.
The challenge in the latter approach is, however, the high communication overhead implied by the underlying requirement that channel state information (CSI) acquisition is performed via conventional coherent two-stage estimation, where CE is based on predefined pilot sequences, followed by MUD based on the estimated CSI. Indeed, very long orthogonal pilot sequences must be used in order to obtain accurate CSI knowledge from quantized observations, because the use of coarse quantization not only breaks the orthogonality of pilot sequences, but also greatly reduces the number of effective measurements available for CSI estimation [26], [27], [28], [29].
A solution to the latter challenge is JCDE via Bayesian bilinear inference, which exploits estimated data symbols as a soft pilot sequence, thus enabling a significant reduction in the length of pilot sequences without sacrificing estimation performance. Taking into account the inherent large-system setting of the CF-mMIMO architecture, whereby the number of APs is assumed to be larger than that of UEs, the BiGAMP algorithm proposed in [30] emerges as a particularly attractive framework for the design of JCDE schemes for CF-mMIMO systems, as it enables low-complexity bilinear inference by exploiting the law of large numbers.
Indeed, the BiGAMP methodology has been extended to bilinear inference based on quantized observations by means of a unified generalized approximate message passing (GAMP) framework [31], [32], [33], and it has been empirically shown [27], [28] that Bayesian JCDE schemes outperform the conventional two-step estimation and can asymptotically approach the Genie-aided ideal performance provided a massive number of observation dimensions and sufficient time resources can be assumed. The working premise of BiGAMP is, however, heavily dependent on the large-system limit approximation based on zero-mean independent and identically distributed (i.i.d.) random observations [34], [35], which renders the approach vulnerable to the lack of DoF caused by spatial correlation among fading coefficients for instance. In addition, the convergence behavior of iterative estimation based on BiGAMP is not stable, which may result in serious performance degradation, especially in the presence of nonorthogonal pilot sequences [36].
A recently-proposed approach to tackle the latter issue is the BiGaBP algorithm [37], which aims to generalize BiGAMP by relaxing its dependence on the large-system limit approximation and leveraging the belief propagation (BP) concept [38], which ultimately allows JCDE to be performed robustly even with non-orthogonal pilot sequences. Subsequently to [37], the BiGaBP inference framework was recently employed [39] in the receiver design of grant-free (GF) access schemes, where the approach was shown to outperform earlier linear Bayesian counterparts. In light of all the above, higher-complexity alternatives based on vector AMP (VAMP) [40], [41], [42], [43] and the expectation propagation (EP) algorithm [44] will not be considered in this article, as they do not offer the same scalability potential of the technique proposed here.
The contributions of the article are summarized as follows: • New Scalable CF-mMIMO Receiver: A novel JCDE algorithm, dubbed DQ-aided BiGaBP, is presented, wherein message passing DQ (MPDQ) is performed via bilinear inference based on a spatial-temporal representation of the received signal given through a linearization using the Bussgang decomposition. A major challenge overcome by the proposed receiver design is extending BiGaBP to multi-dimensional observations with different quantization resolutions under consistent statistical assumptions and approximation accuracy. 1 To the best of our knowledge, such an extension of the BiGaBP algorithm to incorporate arbitrary distributions on the output has not yet been reported in the literature. The proposed extension is accomplished by integrating a novel DQ step here-derived, and a new purpose-designed JCDE step here obtained via a modification of BiGaBP which incorporates a linearization of quantized outputs by means of the Bussgang decomposition. We further emphasize that in addition to these contributions, both of which are novelties compared to the scheme proposed in [37], our proposed design is suitable for low-resolution CF-mMIMO system, as it is easily applicable to complex inference problems in which signals with different resolutions are observed at different intensities. • New Insight on DoF-ADCs Resolution Trade-off: Our receiver design takes maximum advantage of spatial DoF via a Bayesian framework that enables performance improvement in terms of both CE and MUD when processing signals quantized by low-resolution ADCs at APs and forwarded to a CAP. Thanks to the feature, the simulation results here presented reveal an original tradeoff relationship between increased DoF due to spatially distributed APs and reduced signal resolution due to low-resolution ADCs, which contributes new insight on 1 To this end, we take advantage of the fact that the BiGaBP framework relies on the SGA in conformity with CLT, whose underlying assumptions are much softer than the large-system limit assumption on which the BiGAMP algorithm heavily relies. the design of scalable CF-mMIMO system, by indicating that the ratio of the number of antennas concentrated in the CAP to the number of antennas distributed at the APs should be designed depending on ADC quality, in order for system performance to be optimized. Counter-intuitively, it is also shown that the performance of spatial filtering may actually be severely degraded by equipping APs with multiple antennas, although the proposed DQ-aided BiGaBP method proves somewhat robust to the up-scaling of AP antennas, thanks to the JCDE mechanism. In addition to the aforementioned contributions, we highlight that, to the best of our knowledge, there is no previous work in the literature offering a systematic bilinear Bayesian receiver design for scalable CF-mMIMO systems for highly correlated channels and considering realistic constraints such as limited-capacity fronthaul links to the CAP and quantized signal distortion induced by the use of low-resolution ADCs.
Notation: Throughout the article, the following notation is used unless otherwise specified. Vectors and matrices are denoted by lower-and upper-case bold-face letters, respectively. The conjugate, transpose, and conjugate transpose operators are denoted by (·) * , (·) T , and (·) H , respectively. The probability density function (PDF) of the outcome a of a random variable a given the occurrence b of a random variable b is denoted by p a|b (a|b). The expectation of a given random variable is denoted by E[·], while E a|b [·|b] denotes the conditional expectation of the variable a, given the outcome b of the variable b. The real and imaginary parts of a complex quantity are respectively denoted by {·} and {·}. The imaginary unit is denoted by j √ −1. The a × a square identity matrix is denoted by I a , while the (i, j)-th element of a matrix A is denoted by a ij . The diagonal matrix constructed by placing the elements of the vector a on its main diagonal is denoted by diag [a], while diag [A] denotes a diagonal matrix obtained by replacing the off-diagonal elements of A with 0's. The block diagonal matrix created by aligning the matrices A 1 , . . . , A D along the main diagonal is denoted by blkdiag [A 1 · · · A D ]. The real and complex Gaussian distributions with mean a and variance b are respectively denoted by N (a, b) and CN (a, b), while the PDF and the cumulative distribution function (CDF) of the standard normal distribution are denoted by φ( dυ, respectively. The notation a ∼ p a (a) indicates that a variable a obeys p a (a). Finally, the symbol ∝ is used to denote proportionality.

II. SYSTEM MODEL
Consider an uplink CE and MUD problem in a scalable CF-mMIMO system that adopts user-centric DCC [5], where the CPU uses the knowledge of the large-scale fading coefficients of all UEs to cluster a group of APs in order to achieve scalability and mitigate inter-cluster interference. As a consequence of such an architecture, each local cluster consisting of a CAP, its surrounding APs, and the corresponding UEs, as illustrated in Fig. 1, can be considered individually for the purpose of system design, without loss of generality, as it is free of interference from other clusters. In what follows, it shall be assumed that the cluster contains U spatially distributed single-or multiple-antenna APs connected by wired fronthaul links to a common CAP, serving M synchronized single-antenna UEs. We shall also assume that the signal received at the APs is individually quantized using low-resolution ADCs and subsequently forwarded to the CAP through fronthaul links. In contrast, the signal received at the CAP is completely forwarded to the CPU free of distortion.
For future convenience, let U {0, 1, . . . , U} be a set of receiver indices, reserving the index u = 0 to the CAP, such that the other indices u ∈ U with u ≥ 1 correspond to the remaining APs. Each u-th AP is assumed to have N u receive antennas, while the M UEs served by the system, with m ∈ M {1, . . . , M}, are assumed to be single-antenna devices, distributed randomly following a Poisson point process (PPP) with intensity μ UE within the coverage area of the cluster.
Following [5], [19], the mean power of channels between every UE and a receiver (i.e., CAP or AP) will be determined according to the urban micro cell path-loss model of [45], namely δ(d m (u)) = 22.7+36.7 log 10 (d m (u))+26 log 10 (f c ), ( where f c [GHz] denotes the carrier frequency, and d m (u) [m] is the distance between the u-th receiver and the m-th UE, with u ∈ U and m ∈ M.
Using equation (1), and considering that spatial correlation exists among the N u receive (RX) antennas of the u-th AP, the associated channel vector can be expressed as [46] where ρ is the transmit (TX) power assumed identical to all UEs, γ m (u) = 10 −δ(dm(u))/10 is the path-loss of the channel between the m-th UE and the u-th AP, h ∼ CN (0, I Nu ) is a complex standard normal vector corresponding to N u uncorrelated small-scale channels, and the RX fading correlation matrix Θ m (u) is expressed as with a(ω m (u)) 1, e jπ cos(ωm(u)) , . . . , e jπ cos(ωm(u))(Nu−1) denoting the antenna steering vector at the u-th AP, pointing to the angle of arrival ω m (u) corresponding to the m-th UE.
At the k-th discrete time instance, with k ∈ K {1, . . . , K}, the complex baseband signal y k (u) ∈ C Nu×1 at the u-th receiver can be written as the following linear regression model [5] where vector stacking the scalar symbols from all UEs, and z(u) is an additive white Gaussian noise (AWGN) vector at the u-th receiver.
Assuming that the channel matrix H(u) is constant during K successive transmissions, and concatenating K successive RX vectors as given by equation (4) yields the following compact spatial-temporal RX signal representation where We highlight that each row of the TX symbol matrix X corresponds to a given UE, carrying K p pilot symbols with Consequently, the spatial-temporal matrices can be sub-divided into pilot and data parts as follows:

A. Quantization With Low-Resolution ADCs at APs
Let the in-phase and quadrature components of the signal received at each antenna of each AP be quantized separately using an ADC with b-bit resolution according to the sets of 2 b + 1 quantization thresholds The joint operation of the 2N u b-bit ADCs at the u-th receiver can be represented by the quantization function Q Nu u,b (·) : R Nu → L Nu b (u) that maps the RX vector y k (u) ∈ C Nu×1 onto the corresponding quantized output vector r k (u), defined by where the n u -th output is obtained from the corresponding quantization label as [25] {r nuk (u) It will be assumed hereafter that the optimal quantization labels are determined via the Lloyd-Max algorithm [47]. To that end, let the RX symbols from all users at the u-th AP be modeled as a complex Gaussian random variable with mean 0 and variance ψ y (u) approximated by where σ 2 is the noise variance and γ m (u) are path-losses already defined after equation (2). Then, the labels employed by the u-th AP are given by [25] is a set of labels that minimize the mean square error (MSE) between y k (u) and r k (u), while and the coefficients α(u) are computed so as to ensure that E |r nuk (u)| 2 = ψ y (u), namely with Φ(x) denoting the standard normal CDF, as defined at the end of Section I.

B. Linearization Based on Bussgang's Theorem
The correlated distortion introduced by the finite-resolution ADC process makes subsequent stochastic signal processing of quantized signals very challenging. Under the assumption that the quantizer input follows a Gaussian distribution, however, the Bussgang's theorem [22] can be leveraged to circumvent this issue. To that end, let the quantized output r k (u) be modeled [22] by the following linear relationship with the received signal y k (u), where G(u) ∈ C Nu×Nu is the Bussgang's gain, and The uncorrelated nature of d k (u) with respect to y k (u) enables to address the quantization distortion of the ADC output, and the input signal itself, separately. Consequently, the generation of spatial filters and the statistical processing described in Section III can be significantly simplified. For further details, we refer the reader to [23] and [25].
Regarding the RX vector y k (u) ∼ CN (0 Nu×1 , C yy (u)), it follows from equation (11) that with where the n u -th diagonal element of G(u) is given by diag [C yy (u)] = ψ y (u)I Nu , yielding [25] g nu (u) = such that we have G(u) = g(u)I Nu .
Notice that in the ideal case of ADCs with infinite resolution (b = ∞), equation (14) yields g(u) = 1, and consequently G(u) = I Nu , which in turn if substituted into equation (11) results in r k (u) = y k (u), indicating the consistency of the Bussgang's decomposition.

C. Effective Signal Model at a CAP
Thanks to the Bussgang's linearization described by equation (11) and detailed above, the uncompressed signals received directly by the CAP and the low-resolution compressed signals gathered from the surrounding APs can be combined into a single effective RX vector, which can be compactly expressed as (15), shown at the bottom of the next page, where N = U u=0 N u , and we have implicitly defined the vectors r k , y k , d k and z k , and matrices G and H.
In view of equation (15), it will prove convenient to define a set of indices related directly to the entries of the vector where Finally, a complete spatial-temporal effective receive signal model concatenating K consecutive instances of r k can be built by extending equation (15) in similarity to equation (5), yielding The goal of the article can now be stated concisely: design a novel JCDE algorithm that enables the CAP to simultaneously detect the intended symbol matrix X d and accurately estimate the effective channel matrix H, out of the spatial-temporal effective receive signal matrix R given in equation (17), which thanks to the Bussgang linearization approach utilized contains both its own undistorted RX signal matrix Y d (0) and the signals obtained from the APs, that are subjected to distortion due to the coarse quantization performed by low-resolution ADCs. In this section, a MPDQ process based on a DQ-aided BiGaBP mechanism to design Bayesian JCDE receivers for scalable CF-mMIMO architectures is described. As illustrated in Fig. 2, the estimation procedure of the proposed method is divided into two steps: the DQ process and the JCDE procedure. By iterating between these steps alternately, the estimation accuracy of the unknown data matrix X d and effective channel matrix H can be gradually improved, given the knowledge of the effective signal matrix R and pilot symbols X p .
To elaborate, in the DQ step, the RX signal Y Y (0) T , . . . , Y (U ) T T is first recovered and, subsequently, estimates of the signal distortion D are obtained using the knowledge of R, X p , and of the tentative estimates of X d and H obtained in the JCDE step. In turn, in the JCDE step, estimates X d and H are computed employing the BiGaBP approach, using the knowledge of R, X p , and of the tentative estimates of D obtained in the DQ step. We highlight that the bilinear inference in the JCDE step is performed by exchanging beliefs (i.e., likelihood information reflecting estimation reliability) and soft replicas (i.e., tentative estimates) on the tripartite factor graph (FG) consisting of factor nodes (FNs) and two variable nodes (VNs), which correspond to the channel coefficients and TX symbols, respectively.
In what follows, we provide detailed descriptions of the DQ and JCDE steps of the proposed technique briefly explained above, leaving tedious derivations of certain equations to Appendices A and B for the sake of readability. To that end, and for future convenience, let us define the soft replicas of x mk and h nm asx n,mk , ∀n, andĥ k,nm , ∀k, respectively, such that their MSEs can be respectively expressed aŝ where the quantitiesx n,mk , ∀n andh k,nm , ∀k, denote the estimation errors corresponding tox n,mk x mk −x n,mk , ∀n, andh k,nm h nm −ĥ k,nm , ∀k, respectively. For simplicity, in what follows we assume that quadrature phase-shift keying (QPSK) modulation with unit power is employed, where the constellation set is expressed as

A. DQ Step
Let us start with the DQ step, which processes the RX symbols with indices ∀n ∈ N AP , forwarded from the u-th AP via the fronthaul link at the k-th discrete time, outputting therefore r nk , ∀ n ∈ N AP . In the sequel, we refrain from explicitly including the qualifier "∀ n ∈ N AP " after each equation, in order to avoid being repetitive. In addition, in this subsection the real and imaginary parts of a complex number c will be abbreviated as c (0) = {c} and c (1) = {c}, respectively, i.e., c = c (0) + jc (1) .
With that clarified, since soft replicas are not available at the first iteration (t = 1), the initial conditional expectation of y nk , given r nk , is calculated aŝ y m,nk y nk p y nk |r nk (y nk |r nk ) dy nk = y where θ ∈ {0, 1} indicates a binary index that distinguishes between real and imaginary parts, and we highlight that y (0) nk and y (1) nk are independent of each other. Owing to the CLT, valid for the large-system regime (M 1), and making use of the variance approximation given in equation (8), it follows that y . . .
that the conditional PDF of y (θ) nk , given r (θ) nk , appearing in the integral on the last line of equation (19) can be expressed via a truncated Gaussian distribution as where the denominator is introduced to normalize the integral of the conditional PDF to 1.
The relationship between p y (θ) Fig. 3a, where we highlight that the prior information of r nk such that the gray area under the normal distribution corresponds to the probability mass of r (θ) nk , which after normalization yields the truncated Gaussian model given in equation (20) and shown with a dotted line in Fig. 3a.
Utilizing the mean and variance obtained from equation (20) for θ = {0, 1}, the soft replica of y nk and its corresponding MSE can be obtained aŝ where η (s) and ζ (s) are functions yielding, respectively, the mean and variance of a random variable s obeying the truncated Gaussian distribution, which are respectively defined as [48] η (s; μ s , σ s , a, b) ζ (s; μ s , σ s , a, b) with α a−μs σs and β b−μs σs . Owing to the availability of the soft replicas as prior information at the second and subsequent iterations (t = 1), the conditional PDF of y nk can be expressed as For the sake of readability, the detailed derivation of equation (23) is given in Appendix A. Here, suffice it to emphasize that the expression on the right-hand side of equation (23) is simply the truncated Gaussian distribution. In order to determine the original distribution to be truncated in (23), we use the knowledge of the soft replicas to express the RX symbol as equation (24), shown at the bottom of the next page. Under an SGA of the effective noise κ y m,nk , we obtain of the soft replicas and their MSEs obtained in the JCDE step enables the correction of the mean and the variance of the conditional PDF of (25), allowing for more accurate DQ processing. In particular, using equations (23) and (25), the soft replica of y nk and its MSE are expressed aŝ Utilizing the estimates given by the equations (21) and (27), and thanks to the relationship obtained from the Bussgang decomposition given in equation (11), the soft replica of the signal distortion and its MSE can be calculated asd One DQ step is completed when the process described above is carried out for all u ∈ U \ {0}, that is, for all signals except the RX signals that are received by the CAP directly from users, namely, r nk , ∀n ∈ N 0 . In light of that, we set botĥ d m,nk = 0 andψ d m,nk = 0, ∀n ∈ N 0 .

B. JCDE Step
Next, let us describe the JCDE step for estimating x mk , ∀ m, k ∈ K d , and h nm , ∀ n, m. For starters, we emphasize that in the JCDE step, all belief distributions propagated from FNs to VNs are approximated by a scalar Gaussian distribution based on the CLT, so that only information on the mean and variance of the estimated belief distribution is actually required [38], [49].
The iterative process, which is carried out for all RX indices in parallel, starts at the FNs with soft interference cancellation (soft IC) performed on y nk with utilizing all the soft replicas obtained in the previous JCDE iteration, which for future convenience will be compactly denoted by Eĥ ,x ∀m ĥ k,nm ,x n,mk ∀m . At the first iteration (t = 1), the soft replicas are appropriately initialized, e.g., X d = 0. Subsequently, in the detection of an arbitrary TX symbol x mk , the cancellation process is expressed as equation (29), shown at the bottom of the page, where g n is the n-th diagonal element of G.
Here it is worth emphasizing that, in contrast to [37], the cancellation process performed according to equation (29) does not resume to the subtraction of inter-UE interference, but instead incorporates the removal of signal distortion as well. Then, under the SGA of the effective noise κ r m,nk whose variance is given by equation (30), shown at the bottom of the page, the conditional PDF ofr m,nk , given x mk , can be expressed as where the simplicity of equation (30) owes to the uncorrelated nature of the quantization distortion following the Bussgang model of equation (11). The estimation of channel coefficients h nm can be obtained similarly, such that under the SGA, the conditional PDF of where ξ nm ργ m (u) s.t. n ∈ N u . The subsequent VN process can be performed in a manner similar to the JCDE algorithm described in [37], such that further details are omitted and offered only in a summarized form in the pseudo-code given in the next subsection.

C. Algorithm Description
The pseudo-code of the DQ-aided BiGaBP scheme described above is given in Algorithm 1. Besides the effective RX signal matrix R and the pilot matrix X p , the algorithm requires only system parameters such as the quantization thresholds and labels {T b (u)} Notice that although the large-system approximation assumption is relaxed in BiGaBP schemes compared to the conventional BiGAMP, the operating principle of BiGaBP still depends on an accurate SGA of residual components, as can be inferred from Subsections III-A and III-B. In CF-mMIMO systems, adopting single-antenna APs can provide independent fading between distributed antennas; however, the spatial correlation between co-located antennas, mounted on the CAP and multiple-antenna APs, may affect the performance of the proposed algorithm.
In particular, mismatches between the SGA and the stochastic behavior of actual effective noise may result in belief outliers, degrading the accuracy of soft replicas and causing performance deterioration due to error propagation [49], especially during earlier iterations of the algorithms and/or in systems with short pilot sequences [37]. In order to mitigate such potential issues, belief damping [37], [50], which prevents the algorithm from converging to local minima and belief scaling [39], [49], which controls convergence speed, are introduced into Algorithm 1. Specifically, belief scaling is applied in line 15 through the parameter γ s (t) = ( t tmax ) 2 [49], which is designed to be a function of the number of iterations, while belief damping is introduced in lines 11, 12, 14, and 15, as described in [50] and summarized as follows.
Let a quantity z be calculated by a function f z . Then the t-th damped value of z, here denoted z (t) , is computed as the weighted average of z (t−1) and f z , with weights set by the damping factor η d ∈ [0, 1], i.e.

D. Introduction of Local MMSE Filters
Before we move forward to assessing the performance of the proposed DQ-aided BiGaBP algorithm, let us introduce a further modification of the latter aimed at suppressing the harmful effect of spatial correlation between the antennas mounted on the CAP, which can be achieved by introducing a local minimum mean square error (MMSE) filter to whiten the signals y k (0) received by the CAP and exploited in the data estimation process described in Section III-B. In particular, to that end, the following processing is performed by the FNs for the signals with indices ∀n ∈ N 0 .
Then, under a vector Gaussian approximation (VGA) of κ y k,m , the conditional PDF of (35), given x mk , becomes pỹ m,k (0)|x mk (ỹ m,k (0)|x mk ) with the covariance matrix given by N0m . The subsequent VN process is similar to that of the JCDE algorithm of [37], modified by considering (36) under the VGA, such that only the calculations actually required are summarized in Algorithm 2, with details on the derivation of the message update rule offered in Appendix B. By replacing the JCDE step of Algorithm 1, i.e., lines 9 to 15, with Algorithm 2, the pseudo-code of the DQ-aided BiGaBP with local MMSE filters is obtained.

IV. PERFORMANCE ASSESSMENT
Computer simulations were conducted to validate the performance of the proposed Bayesian JCDE receiver under various system conditions. The coverage area was set to a square of side 400 [m], with the CAP located at the center and the U APs distributed over an evenly spaced square mesh, while, at each realization, a random number M of UEs were randomly distributed inside the area following a PPP with intensity μ UE = 10.
Simulated results are offered both in terms of BER and NMSE, averaged over 1000 different random UEs positions and channel realizations. The frame format (i.e., pilot and payload symbol sequences) was set to follow [51], with K p = 14 and K d = 126 such that K = 140. Gray-coded QPSK-modulated signals were used for X d , but since the number of UEs is random at each realization, with μ UE = 10, both over-and under-loaded conditions arise. Therefore, in realizations where K p ≥ M , orthogonal pilot sequences are used, while in cases when K p < M, the codebook of pilot sequences is given by the rows of a matrix constructed from K p random columns of an M × M discrete Fourier transform (DFT) matrix.
The propagation model follows the urban macro cell scenario [52], with the height of the CAP and the APs set In addition, we defined the antenna decentralized ratio (ADR) as the ratio between the total number of antennas allocated to the APs |N AP | = N − N 0 and the number of RX antennas N ; thus, η r (N − N 0 )/N = 1 − N 0 /N . As for algorithmic parameters, the damping factor η d was set to 0.5 and the number of iterations was set constant to t max = 32, although other criteria, based on convergence could also be employed.

A. Remarks on Complexity
Before proceeding to the presentation of simulation results on the proposed DQ-aided JCDE method, let us offer a brief analysis of the computational complexities of the algorithms introduced in Section III. Starting with the MPDQ mechanism embedded in Algorithm 1, notice that the corresponding DQ step is performed only over the |N AP |K RX symbols collected from APs. The associated complexity is therefore governed by equations (26) and (28), which consist of only scalar-by-scalar calculations, whose number of multiplications, divisions, additions, and subtractions is of order O(|N AP |M K). Similarly, the complexity associated with evaluating the mean and variance of the truncated Gaussian distribution in equation (27)   It follows from the above that the complexity of Algorithm 1 is dominated by the JCDE step, which is of order O(N M K) per iteration and therefore similar to those of existing linear/bilinear Bayesian inference schemes found in literature, e.g. [28], [29], [31], [32], [33], [34], and [39].
In turn, the complexity of Algorithm 2 is dominated by the inverse operation of the N 0 × N 0 matrix Ω k in equation (37), which is of order O(N 2 0 M K + N 3 0 K + |N AP |M K) per iteration. Given the much higher complexity compared to Algorithm 1, we emphasize that Algorithm 2 is offered for the sake of completeness and as an possible (but not mandatory) add-on to Algorithm 1, at the choice of the system designer, to improve performance. It shall be shown in the sequel, however, that Algorithm 1 already outperforms the SotA without the inclusion of Algorithm 2 2 .

B. BER Performance
Our first set of results is given in Fig. 4, where the performances in terms of BER as a function of the SNR, with , of the following CF-mMIMO systems adopting single-antenna APs (i.e., N u = 1, ∀u ∈ U \ {0}, and thus U = |N AP | = N − N 0 ) are compared: • Cent. MMSE: Baseline linear MMSE receiver with a fully centralized setup; i.e., all N RX antennas are concentrated at the CAP (η r = 0), where MUD and CE are sequentially performed using the linear MMSE filters. • Cent. BiGaBP: SotA BiGaBP-based JCDE receiver [37] with a fully centralized set up; i.e., η r = 0, providing a reference performance to verify the gain achieved by distributed antenna placement with APs. • BMMSE: SotA linear receiver based on BMMSE [23], [25], where MUD and CE are sequentially performed. Provides a performance reference to two-step CF-mMIMO estimation schemes, due to the near-optimal MUD performance for a given CE accuracy level, achieved due to the large spatial DoF. The results in Fig. 4 clearly demonstrate the efficacy of the antenna dispersion strategy of CF-mMIMO systems compared to centralized architectures, even with coarse quantization at the APs. It is also confirmed, furthermore, that the proposed methods, i.e., DQ-BiGaBP (Alg. 1) and DQ-BiGaBP (Alg. 2), significantly improve over all SotA methods, which suffer from much higher error floors as a result of their poorer CE performance in presence of non-orthogonal pilots caused either by system overloading or quantization errors. In fact, as shown in Figs. 4a and 4b even in the scenario with N = 50, the systems running DQ-BiGaBP (Alg. 1) exhibit error floors due to the lack of spatial DoF. However, it is found that the latter issue is mitigated by the improved DQ-BiGaBP (Alg. 2) scheme, owing to the local MMSE filter which suppresses the harmful effects of strong spatial correlation between antenna elements mounted on the CAP, at the expense of a small performance degradation in the low SNR region caused by noise enhancement when the determinant of the RX covariance matrix is small.
For the results above, it is fair to infer that a simple but effective way to remedy the issue of spatial correlation in CF-mMIMO systems is to boost effective DoF by increasing the numbers of RX antennas and/or the APs. In order to verify that hypothesis, we study in Figs. 4c and 4d the performance of CF-mMIMO systems with N = 100. In that case, it is found that indeed, the lower-complexity DQ-BiGaBP (Alg. 1) scheme not only already exhibits an error floor-free water-fall BER curve, approaching that of the ideal Genie-aided JCDE reference, but also outperforms DQ-BiGaBP (Alg. 2) in the low SNR region, which is affected by the noise enhancement. From these results, one can conclude that: a) systems with a large number of power-saving and low-cost APs running the low-complexity DQ-BiGaBP (Alg. 1) is generally the setup of choice; but b) if large-scale systems are not feasible due e.g. to deployment limitations, then the DQ-BiGaBP (Alg. 2) with a local MMSE filter can be used to compensate for the lack of spatial DoF. Similar discussion can be made for other system configurations except for extremely small or large η r . The impact of η r on system performance, including the extreme cases where η r is close to either 0 or 1, will be studied in Subsection IV-D.

C. MSE Performance
Next, we evaluate the CE performance of the proposed lowcomplexity method, DQ-BiGaBP (Alg. 1), in terms of the NMSE of the channel estimates. In view of Fig. 4, results for the Cent. MMSE and Cent. BiGaBP methods are omitted as these SotA alternatives have shown to perform poorly, while results for DQ-BiGaBP (Alg. 2) are omitted since the Genieaided JCDE method has already provided a lower-bounding reference. With that clarified, the NMSE performances of the remaining three algorithms with the same settings as in Fig. 4 are compared in Fig. 5. To gain deeper insight, the NMSE performance of the estimated channels between APs and UEs, "(APs-UEs)," and between CAPs and UEs, "(CAP-UEs)," were evaluated separately and plotted for each of the methods considered.
As expected, the performances of "(APs-UEs)" are found to be much worse than those of "(CAP-UEs)" owing to the coarse quantization at the APs. In particular, in the high SNR regime, error floors due to uncertainty caused by quantization noise are unavoidable, and their levels can only be reduced by increasing the number of quantization bits. In contrast, the proposed JCDE mechanism improves SNR, achieving lower NMSEs by exploiting the quasi-orthogonality of the data structure, 3 thus significantly outperforming the BMMSE-based scheme and approaching the Genie-aided JCDE method.
In particular, it is found that the gain in SNR achieved by the proposed method increases with the number b of quantization bits, reaching approximately 10 dB for NMSE = 10 −2 in the case of "(CAP-UEs)", i.e., b → ∞. In addition, since in the Genie-aided JCDE scheme CE is performed under perfect knowledge of the TX symbols X [X p , X d ], the asymptotic proximity of the proposed method to that reference indicates that no performance improvement over the proposed DQ-BiGaBP scheme is to be expected from an increase of the pilot sequence length beyond K p = 14 (≤ K = 140). In addition, the importance of having a sufficient number of antennas at the CAP is evidenced by the large gap in the performance of "(APs-UEs)."

D. Insights on Optimal Configuration of CF-mMIMO Systems
To gain insight on the optimal setup of CF-mMIMO systems in terms of antenna allocation among the CAP and APs, we compare in Fig. 6 the BER performances of systems with a total number of RX antennas set to N ∈ {50, 100}, a TX power of 20 [dBm], and different numbers of quantization bits b ∈ {1, 2, 3}, as a function of the ADR η r , which indicates a more centralized (cellular-like) system for η r → 0, or a more distributed system for larger η r , respectively. Again, and as expected, the performance of the proposed DQ-BiGaBP method is found to with growing η r and N , approaching the Genie-aided performance as η r and N increases, due to the reduction of correlation between RX antennas and increase of the DoF that follows from larger b and N .
this is referred to as "quasi-orthogonality of the data structure." In turn, the performance of all methods, including the ideal Genie-aided, degrades with decreasing b, which emphasizes the crucial role of the double-precision observations at the CAP in mitigating the distortion of the severe information compression carried out at the low-cost APs, in enabling effective information recovery via JCDE. This is also indicated by the poor NMSE performance of "(APs-UEs)" shown in Fig. 5. In other words, it is found that the CF-mMIMO systems should not have all the antennas distributed, and that incorporating both sufficient spatial DoF obtained by the distributed APs, and resolution compensation by doubleprecision observations at the CAP, is what enables the scaling CF-mMIMO systems, by allowing both low-complexity and high-accuracy estimation performance to be achieved simultaneously.

E. Multiple-Antenna APs
Finally, we study the effect of increasing the number of antennas at APs. In particular, we compare in Fig. 7 the BER performances of CF-mMIMO systems adopting multipleantenna APs against those of systems with single-antenna APs having the same (or approximately the same) ADR.
Quite counter-intuitively, it is found that the systems with single-antenna APs outperform those with four-antenna (Fig. 7a) and five-antenna (Fig. 7b) APs, respectively. These results can be understood based on those of the preceding Figs. 4 and 5, which indicate that the spatial correlation that occurs at each multiple-antenna AP contributes to reducing effective spatial DoF that causes the observed BER deterioration compared to single-antenna AP systems.
It is also found that the BMMSE-based method suffers from higher error floors in the multiple-antenna case, which can be credited to noise enhancements induced by the required inversion of ill-conditioned (i.e., nearly underdetermined) channel matrices, due to spatial correlation among fading coefficients. In contrast, the proposed method approaches the Genie-aided JCDE performance and achieves BER = 10 −5 , because it exploits the spatial fading independence between the APs in the iterative process to enhance the reliability of the soft IC, thereby enabling the transformation of the problem into an overdetermined condition by digging out the hidden DoF (which is the very aim of the JCDE mechanism). All in all, this result suggests that the Bayesian JCDE receiver is robust to changes in the configuration of the CF-mMIMO system, which is always assumed to enjoy macro-diversity.

V. CONCLUSION
We proposed a novel JCDE receiver based on Bayesian bilinear inference for uplink scalable CF-mMIMO systems adopting power-saving and low-cost APs with low-resolution ADCs. To consider the CE and MUD at a CAP, first, the effective RX signal model observed at the CAP in the form of tractable spatial-temporal linear representation was reformulated using a linearization based on the Bussgang's theorem. Based on the above signal model, a novel MPDQ algorithm was then proposed considering the BiGaBP framework under consistent assumptions and approximation accuracy. The proposed algorithm possesses linear computational complexity with respect to the number of RX antennas, the number of UEs, and the frame length of transmitted signal, respectively. Finally, computer simulations were conducted to demonstrate the validity of our proposed method in terms of BER and NMSE performances in various CF-mMIMO system configurations. The results revealed the effectiveness of the CF-mMIMO system over the conventional centralized system. Furthermore, comparative study with the SotA alternatives and the case studies covering various scenarios provided an insight into the optimal system design and the robustness to changes in the system configuration.

APPENDIX A DERIVATION OF EQUATION (23)
Straightforwardly, using Bayes' rule and the product rule, equation (23)