Bilinear Expectation Propagation for Distributed Semi-Blind Joint Channel Estimation and Data Detection in Cell-Free Massive MIMO

We consider a cell-free massive multiple-input multiple-output (CF-MaMIMO) communication system in the uplink transmission and propose a novel algorithm for blind or semi-blind joint channel estimation and data detection (JCD). We formulate the problem in the framework of bilinear inference and develop a solution based on the expectation propagation (EP) method for both channel estimation and data detection. We propose a new approximation of the joint a posteriori distribution of the channel and data whose representation as a factor graph enables the application of the EP approach using the message-passing technique, local low-complexity computations at the nodes, and an effective modeling of channel-data interplay. The derived algorithm, called bilinear-EP JCD, allows for a distributed implementation among access points (APs) and the central processing unit (CPU) and has polynomial complexity. Our simulation results show that it outperforms other EP-based state-of-the-art polynomial time algorithms.


I. INTRODUCTION
C ELL-FREE massive multiple-input multiple-output (CF- MaMIMO) networks enable primary goals for sixth generation (6G) wireless communication systems, such as ubiquitous coverage with uniform quality of service (QoS) and ultra-high-rate, energy-efficient data transmission [1]- [4].In CF-MaMIMO systems, a large number of geographically distributed APs are jointly serving a much smaller number of user equipments (UEs).The joint processing is coordinated by one or multiple CPUs which are connected to the APs via fronthaul links.The geographically distributed nature of CF-MaMIMO networks enhances the attractive properties of centralized MaMIMO systems by reducing the average minimum distance between APs and UEs.This allows CF-MaMIMO networks to provide uniform high data rates over the coverage area and high energy efficiency.However, in contrast to centralized MaMIMO, channel hardening and favorable propagation typically do not hold [5]- [8].Thus, the low-complexity matched filter, which provides near-optimal detection performance in MaMIMO systems [9], is not effective for CF-MaMIMO systems, and optimal joint signal processing at the centralized CPU is not computationally affordable.Similarly, existing pilot decontamination solutions for centralized MaMIMO [10]- [14] are not effective.Thus, the quest for low-complexity detection techniques with performance approaching that of centralized joint optimal processing schemes is still an open challenge as well as the search of effective pilot decontamination methods.
Distributed processing at the APs can efficiently reduce computational complexity at the CPU, see e.g., the extensive analysis in [15], [16] and [17], whereas semi-blind channel estimation has shown to combat effectively pilot contamination [12]- [14].Therefore, in this work, we propose a distributed semi-blind JCD algorithm based on EP which exhibits a low complexity.
EP is an approximate inference technique which iteratively finds a tractable approximation of factorized probability distributions by projecting each factor onto an exponential family [18], [19].EP was already applied for data detection in previous works.In [20], the authors proposed a lowcomplexity EP-based MIMO detector while assuming perfect channel state information (CSI) and a Gaussian approximation for the posterior distribution of data symbols.In [21], the authors extended the EP-detection algorithm proposed in [20] to imperfect CSI by embedding channel estimation errors in the EP formulation.
The EP method was also applied to blind channel estimation and noncoherent detection.In [22], the authors presented a blind channel estimation algorithm for multi-cell MaMIMO systems.In [23], a noncoherent multi-user detection scheme was proposed for the single-input multiple-output (SIMO) multiple access channel (MAC).In both schemes, the approximated channel and symbol distributions were chosen to be multivariate Gaussian and multinoulli distributions, respectively.In [22], the proposed approximate joint posterior distribution of channels and transmitted symbols resulted in two decoupled EP-based schemes for channel estimation and data detection but, because of the high complexity of the EP-based symbol detection, only EP-based channel estimation was retained for practical system implementations followed by conventional linear symbol detectors.The decoupling between channel estimation and symbol detection implied that the EPbased channel estimation could only exploit prior statistical knowledge on the transmitted symbols but not their deterministic imperfect knowledge.In [23], the proposed approximate posterior distribution factorization yielded a factor tree with a branch per user.The detection scheme was derived by applying message-passing rules for EP on this tree.The resulting algorithm could be applied to both pilot-assisted and pilot-free communications.However, the complexity of the proposed algorithm was exponential in the product of the number of channel uses and the logarithm of the symbol constellation set cardinality and, hence, it was not affordable for practical high data rate systems.
The EP framework was also used to develop decentralized detection schemes for MaMIMO systems in [17], [24]- [26].In these works, the computational complexity was reduced compared to centralized schemes by processing the signals received by antenna sub-arrays locally via EP message passing and then combining the messages from sub-arrays at the CPU.The posterior data symbol distributions were approximated by multivariate Gaussians and perfect CSI knowledge was assumed.In [17], the authors reduced message sizes by utilizing averaging.In [24], the decentralized MaMIMO receiver embedded both detection and decoding and the extrinsic information from the decoder was utilized as a priori information for the EP-based detector.In [25], the authors introduced a pre-processing based on QR-factorization and a variance compensation scheme in the decentralized detector of [24].In [26], two decentralized EP detection approaches were proposed, the first based on user grouping and, thus, group-wise joint detection, and the second on a daisy-chain architecture.
EP-based receivers were also applied in CF-MaMIMO systems.The authors of [27] utilized a centralized EP-based detector with Gaussian data approximations which incorporates channel estimation errors as in [21].The channel estimation error accounted for pilot contamination and general estimation errors due to noise.In [28], the authors proposed a distributed EP detector for CF-MaMIMO based on the decentralized subarray-based detector in [17].The aforementioned approach was extended to an iterative channel estimation and data detection (ICD) scheme in [29].Here, the data detection was based on EP and the iterative algorithm took into account the channel estimation errors.The channel estimation was based on minimum mean squared error (MMSE) estimation with detected data symbols as additional pilots.
In this work, we propose a new distributed algorithm for JCD in CF-MaMIMO systems.Our contributions can be summarized as follows: • We develop a novel message-passing algorithm for a bilinear inference problem arising in JCD.The inference is based on the approximate EP method and assumes general multivariate Gaussian and multinoulli distributions for the posterior distributions of the AP channels and data symbols, respectively, as in [23].In contrast to the approximate posterior distributions of [22], [23], the factorization of the proposed approximate posterior joint distribution for channels and data symbols allows for an alternating refinement of channel and data estimates and a distributed implementation of the algorithm with local processing at the APs.Furthermore, the inclusion of single-user soft-input soft-output (SISO) decoders is straightforward.Finally, the algorithm can automatically take into account the effect of pilot contamination1 .• We show that the proposed bilinear-EP JCD algorithm has polynomial complexity in system parameters and bridges the complexity gap between algorithms based on EP that approximate the posterior distribution of data with a Gaussian distribution such as, e.g., [17], [20], [22], [24]- [30] and those that assume more precise categorical distributions [23].• We consider four baseline schemes, namely a centralized linear MMSE detector, the detector in [28] assuming perfect CSI, the ICD algorithm in [29], and a modified version of the proposed JCD algorithm with perfect channel knowledge which provides a lower bound to the symbol error rate (SER).Our simulation results show the superior performance of our approach compared to the first three baseline algorithms.It is worth noting that the appealing features of the proposed algorithm stem from the choice of the factorization of the approximate joint posterior distribution and the way to take into account the interplay between the two sets of variables, i.e., channel coefficients and data symbols.This method can be applied to other bilinear inference problems such as gain calibration, matrix factorization and compressed matrix sensing [31].
This paper is organized as follows.In Section II, we introduce the CF-MaMIMO system model.In Section III, we review the EP algorithm and its application on graphs.Based on this theoretical framework, we develop the bilinear-EP JCD algorithm in Section IV.Then, in Section V, we present numerical results which show the superior performance of the proposed algorithm.Finally, some conclusions are drawn.
Notation: Lower case, bold lower case, and bold upper case letters, e.g., x, x, X, represent scalars, vectors, and matrices, respectively.I N is the identity matrix of dimension N .{x l,k,t } = l,k,t x l,k,t denotes the collection of all variables obtained by varying the possible indices.A \ B is the set complement operator between two sets A, B. δ(•) is the Dirac delta function.The indicator function 1 x∈S takes value 1 if the condition in the subscript is satisfied and zero otherwise, e.g.element x is in the set S. The operator vec(X) maps the matrix X onto the vector obtained by stacking the columns of X on top of one another.⊗ is the Kronecker product between two matrices.E p(x) {•} stands for the expectation operator with respect to the probability distribution p(x) and Ê{•} stands for the empirical expectation operator.We denote by π(•) and N (x; µ, C) the categorical distribution and the circularly symmetric Gaussian distribution of complex-valued vectors x with mean vector µ and covariance matrix C, respectively.The notation x ∼ p indicates that the random variable x follows the distribution p.

II. SYSTEM MODEL
We consider a communication system in the uplink transmission that consists of L APs and K single-antenna UEs.Each AP is equipped with N co-located antennas.Each user sends a signal vector T consisting of a Pdimensional pilot vector x p;k ∈ C P and a T -dimensional data vector x k ∈ S T where S is the employed constellation set.We combine pilot and data vectors of all users in the signal matrix X = [x 1 , x 2 , ..., x K ] T ∈ C K×(P +T ) .The matrix X can also be expressed as X = [X p , X] where X p and X are matrices consisting of the pilot and data vectors of all users, respectively.The equivalent complex baseband received signal at AP l ∈ {1, ..., L} is given by where Y l ∈ C N ×(P +T ) denotes a matrix whose element y l,i,j is the received signal at antenna i of AP l during the j-th channel use.Here, Y p;l and Y l are the components of the matrix Y l corresponding to the pilot matrix X p and the data matrix X, respectively.H l ∈ C N ×K is the block Rayleigh fading channel matrix whose element h l,n,k is the coefficient of the channel between user k and antenna n at AP l, which is assumed to be constant during P + T consecutive channel uses, and N l ∈ C N ×(P +T ) is a matrix whose element n l,i,j is the additive white Gaussian noise (AWGN) at antenna i during the j-th channel use at the l-th AP.Therefore, both channel and additive noise are zero mean complex Gaussian vectors with covariance matrices Ξ l ∈ C N K×N K and σ 2 I N (P +T ) , respectively, i.e., vec(H l ) ∼ N (0, Ξ l ) and vec(N l ) ∼ N (0, σ 2 I N (P +T ) ).Furthermore, we assume that the channels of different users are uncorrelated.Thus, the covariance matrix Ξ l is a block diagonal matrix of K blocks

III. EXPECTATION PROPAGATION ON GRAPHS
In the following, we associate a factor graph to the factorized true distribution and describe a message-passing scheme that results from the local computation of the approximate factors on the factor graph nodes.More details on the derivation of the message-passing update rules can be found in [23], [32], [33].
We aim at computing an approximation of a joint probability distribution which is assumed to be factorizable as follows where x α is a sub-vector of x, i.e., x α ⊆ x, and α x α = x.The approximation p(x) of the joint distribution reflects the same factorized form, where Z is a normalization constant and p is constrained to lie within the exponential family F. Furthermore, we assume that the approximation can be fully factorized as follows where the sub-vectors x β contain elements which always occur together in a factor.The sub-vectors x β are not overlapping, i.e., x β ∩ x β ′ = ∅ ∀β ̸ = β ′ .Furthermore, we have Therefore, the sub-vectors x β define a partition of x.Additionally, it holds for all sub-vectors x β and x α that the sub-vector x β either lies fully within x α , i.e., x β ⊆ x α , or they are completely disjoint, i.e., x β ∩ x α = ∅.The factorization described above induces a representation of the joint distribution by a factor graph with factor nodes associated to functions Ψ α and variable nodes associated to sub-vectors x β .An edge exists between factor node Ψ α and variable node x β if x β ⊆ x α .We define the neighbor set N α of factor node Ψ α as the set of indices β of all variable nodes x β that are connected to Ψ α , i.e., N α = {β | x β ⊆ x α }.Similarly, we define the neighbor set N β of variable node x β as the set of indices α of all factor nodes Ψ α that are connected to x β , i.e., Since the approximation p lies within the exponential family F, the approximate factors pα and pβ belong also to F. The assumption of the fully-factorized approximation in (4) yields the following factorizations for the approximate factors, where Z β is a normalization constant and m Ψα;x β (x β ) is interpreted as a message from the factor node Ψ α to the variable node x β also belonging to the specified exponential family.Thus, we can update the approximate factors, and therefore also the approximate joint distribution p(x), by exchanging and updating messages on a factor graph.The EP algorithm is an iterative algorithm.Hence, we denote the messages during iteration i as m (i) .At first, we initialize all the factor-to-variable messages m (0) Ψα;x β (x β ).For initialization, prior statistical knowledge or, if not available, uninformative distributions can be used.A particular example is discussed in section IV.Then, the variable-to-factor messages and the factor-to-variable messages are updated, respectively, as follows, where the distribution q 8) is given by x β ′ ;Ψα (x β ′ ) dx α \x β .
(9) Here, Zα is a normalization constant and proj{•} denotes the projection operator defined as proj{f (x)} = arg min where D KL (f (x)||g(x)) is the Kullback-Leibler (KL) divergence between f and g and F is an exponential family distribution.The message-passing update rules ( 7) and ( 8) are illustrated in Fig. 1.

A. Factorization and Messages
In the following, we derive an EP-based algorithm for semiblind JCD which we call bilinear-EP JCD.In this section, we focus on data signals since the received pilot signals are utilized to characterize the a priori channel distribution.Inspired by [23], we propose a novel factorization of the joint posterior distribution of data and channel.We decompose (1) in T vector relations, one for each channel use t, as follows where h lk is the k-th column, y lt and n lt are the t-th column and x kt is the element in row k and column t of the matrices H l , Y l , N l and X respectively.Furthermore, we define We can factorize the posterior density function using Bayes theorem as follows where the last equality follows from (11) and (12).We further assume that the a priori channel distributions are independent across users k and APs l.Then, exploiting the independence of the noise across time t and APs l, the independence of the data symbols {x kt }, and (12), we can factorize (13) as follows For the probability distributions in (14) we adopt the following compact notation Ψ 0,lt := p y lt |z l1t ,...,z lKt , The factor graph induced by the factorization in ( 14) is illustrated in Fig. 2 and contains cycles.According to the system model in (1) the true factors are given by where we assume uniform prior distributions p x kt for data symbols and Ψ 2,lk is given by the pilot-based Bayesian MMSE channel estimate of the channel H l as follows Please note that the dimensions of the parameters of all factorto-variable messages are given by the type of distribution and the dimensions of the variable represented.The variable-tofactor messages are just multiplications of factor-to-variable messages and are used to compute the update of the factorto-variable messages.Due to space limitation, we present the derivation only for some of the messages in the Appendix.Other messages that are not explicitly derived can be obtained by applying the rules presented in Section III.

B. Initialization and Scheduling
All messages with the exception of the constant messages representing a priori knowledge are initialized as uninformative distributions, i.e., for Gaussian messages the diagonal entries of the covariance matrices are set to infinity and uniform probability mass functions (PMFs) are utilized for categorical messages.In our implementation, all Gaussian messages are parameterized by the precision matrix Λ = C −1 and the mean vector γ = Λµ.Uninformative Gaussian messages are initialized by all-zero precision matrices.
The scheduling is given in Algorithm 1.The first message to be computed in each iteration is m Ψ 1,lt ;z lkt .In the first iteration, it is computed based on the CSI from the Bayesian MMSE estimation together with the constant data prior to generate a first estimate of z lkt .During this step, the initialized uninformed message m Ψ 1,lt ;z lkt is updated to contain information.Note that the messages m Ψ 2,lk ;h lk ≡ Ψ 2,lk and m Ψ 3,kt ;x kt ≡ Ψ 3,kt represent the a priori distributions on data and the Bayesian MMSE channel estimates, respectively, and are not modified by the message passing algorithm.Hence, they are not included in the scheduling.
For our simulations, we used parallel scheduling, i.e., the updates of the messages in the innermost for-loops are done independently suitable for efficient implementations.

C. Update Rules
In this section, we provide expressions to update the messages that appear in Algorithm 1.According to (8), the inputs to update a factor-to-variable message are always variable-tofactor messages.Each variable-to-factor message is computed by applying (7).As an example, we derive the variable-tofactor message forwarded from variable node h lk to factor node Ψ 1,lkt as follows.
Since m h lk ;Ψ 1,lkt is a Gaussian distribution, its computation reduces to determine its covariance and mean.By applying the Gaussian multiplication lemma [23], we obtain All other variable-to-factor messages can be computed analogously.Their derivation is omitted due to space constraints.
In the following we provide the rules to update factor-tovariable messages based on the incoming variable-to-factor messages.
The covariance and mean are given by where with ω(x kt ) defined in (30).Finally, Update of m Ψ 0,lt ;z lkt = N (z lkt ; µ Ψ 0,lt ;z lkt , C Ψ 0,lt ;z lkt ).The covariance and mean are given by The covariance and mean are given by The factor ω(x kt ) is defined in (21).Finally, Update of m Ψ 1,lkt ;x kt = π 1,lkt (x kt ).The probabilities of the categorical distribution are given by with

D. Inference
The desired estimates can be obtained computing the distributions px kt and ph lk as follows xkt = arg max are obtained by applying the Gaussian multiplication lemma [23].Interestingly, the availability of the posterior distributions allows us to determine the reliability of the estimates xkt and ĥlk .

E. Algorithm Stability
Each parameter of a distribution is iteratively computed by a soft update [20], i.e., given the old parameters θ (i−1) , the new parameters θ (i) are computed according to the messagepassing rules and then updated as where η ∈ [0, 1] is the soft update parameter.Additionally, the computation of some messages, e.g., (17) may lead to precision matrices with negative eigenvalues.Inspired by [23], we set to zero the negative eigenvalues to assure positive semidefiniteness as required for Gaussian distributions.

F. Analysis of Fronthaul Load
In this section we discuss a distributed implementation of the bilinear-EP JCD with special attention to the messages exchanged through the fronthaul.As apparent from (26) in the Appendix, AP l requires the messages m Ψ 1, lkt ;x kt from the other APs l ̸ = l.The data prior m Ψ 3,kt ;x kt ≡ Ψ 3,kt is assumed to be uninformative and thus not needed.The messages are exchanged via the CPU which computes, based on the incoming messages from all APs, the following message and then forwards them to the APs.Then, AP l can remove its own message from the incoming message from the CPU to obtain the desired message as in (26) , i.e., In the following, we quantify the fronthaul load for the communication between APs and CPU in terms of number of messages per iteration.Each AP transmits KT messages m Ψ 1,lkt ;x kt to the CPU.Additionally, the message m x kt ;tot is transmitted to each AP totalling LKT messages to be transmitted.

G. Computational Complexity
The order of computational complexity at the APs is mainly determined by the computation of inverse matrices of dimension N .Then, we focus on the messages which require matrix inversion and present the highest computational complexity, namely, messages m Ψ 1,lkt ;z lkt and m Ψ 1,lkt ;h lk .Their order of complexity can be obtained from ( 22) and (30).We can observe that KT |A| weighted sums of covariance matrices per AP need to be inverted where A := {a ∈ R + | a = |x| 2 , x ∈ S} denotes the set of distinct amplitudes in the signal constellation set S. Therefore, at the AP, the complexity order per iteration is O(KT |A|N 3 ).The computation of the message m x kt ;tot in (25) at the CPU requires the multiplication of real-valued scalars of order O(LKT |S|).Therefore, the order of the computational complexity at the CPU is O(LKT |S|) per iteration of the bilinear-EP JCD algorithm.The total computational complexity that takes into account the processing in each AP and at the CPU is then given by O LKT (|S| + |A|N 3 ) .

A. Setting
In this section we present the setting that we utilized for our simulation.We consider a square surface with 400m side length and position L = 16 APs on a rectangular grid, i.e., on the points of set {(i × 400  3 m, j × 400 3 m) | i, j ∈ {0, 1, 2, 3}} as in [15].Each AP is equipped with N = 1 antenna.The K = 8 UEs are uniformly randomly distributed over the square surface.Each UE transmits with a power of p = 14 dBm and at each AP the received signal is impaired by an additive white noise with variance v = −96 dBm .The diagonal elements of the channel covariance matrix Ξ l are determined by the distance between the UE k and AP l using the fading model in [15], i.e., for UE k and for all co-located antennas n at AP l where d kl is the distance between UE k and AP l.The pilot sequences are orthogonal, specifically, X p = I, P = K.The variability of the scenario is captured by sampling 300 realizations of the positions of the UEs.In each position we perform 10 4 transmissions with different small-scale fading realizations.In each transmission, T ∈ {10, 100} symbols are sent per UE.We use 4-QAM as modulation scheme.Our numerical results are obtained with I = 10 iterations and we use η = 0.7 as soft update parameter.

B. Performance evaluation and complexity analysis
In this section we present our simulation results and discuss the performance-complexity trade-off.As usual in the investigation of CF-MaMIMO to analyze the QoS distribution, we study the performance of the proposed algorithm in terms of the empirical cumulative distribution functions (CDFs) of the symbol error rate SER := Ê{1 x̸ =x } for detection and normalized mean squared error (NMSE) defined as NMSE := Ê ∥ ĥ−h∥ 2 ∥h∥ 2 for channel estimation.
The samples for the empirical CDFs of the SER and NMSE are obtained per each large-scale channel fading and UE realization and per each large-scale channel fading, AP and UE realization, respectively, by averaging the instantaneous values corresponding to different small-scale realizations.Additionally, in the CDF of the NMSE we omit weak channels, i.e., channels where the received power is lower than the noise variance to avoid to take into account errors on weak and insignificant channels.In our simulation, we consider four baseline schemes, namely, the detector in [28] assuming perfect CSI, the ICD algorithm in [29], which is also a semi-blind algorithm based on EP with polynomial complexity order, a modified version of our proposed bilinear-EP JCD algorithm with perfect channel knowledge which provides a lower bound to the SER, and a centralized conventional receiver based on a Bayesian MMSE channel estimation algorithm and a subsequent linear MMSE filter for data detection.
Next, we compare the computational complexity order of the proposed bilinear-EP algorithm with the baseline EP algorithm in [29], a non-coherent centralized EP algorithm proposed in [23], and the baseline centralized approach based on linear MMSE filters for both channel estimation and detection.The results are summarized in Table I.Notably, the proposed bilinear-EP JCD algorithm and the algorithm in [29] have linear and cubic (quadratic) complexity order in T (K), respectively.Thus, the proposed algorithm is more efficient for long data sequences and a large number of UEs.However in our approach, the highest order term N 3 scales linearly with the number of UEs K.As the bilinear-EP JCD algorithm, the non-coherent centralized EP algorithm in [23] adopts an exact categorical distribution for data.When it is used for uncoded transmission, as in the proposed approach, its complexity order is given by O K 6 |S| T ) .Due to its exponential complexity in the length of the symbol sequence length T , the required computational efforts are prohibitive for a comparative study of the algorithm performance.Essentially, the algorithm in [23] approximates a maximum-a-posteriori (MAP) decoder whereas our algorithm approximates a MAP detector.It is worth noting that the extension of proposed bilinear-EP JCD algorithm with a decoder is straightforward through the use of loopy belief propagation which is known to have polynomial complexity as well.In contrast to the other algorithms, the conventional centralized MMSE algorithm is not iterative in nature and, thus, a fair comparison with the other receivers which require multiple iterations is not straightforward.
In the following, we analyze the performance.Fig. 3 shows the empirical CDF of the SER of the proposed bilinear-EP JCD for T = {10, 100} and the above mentioned baselines.For both values of T the proposed bilinear-EP algorithm outperforms the ICD algorithm in [29] by more than one order of magnitude, while the complexity of bilinear-EP is only linear in T and not cubic.Our approach also outperforms the EP-based detector in [28] with perfect CSI and the centralized MMSE approach.The gap in performance with the second baseline approach indicates that the Gaussian approximation of the data symbol distribution and the averaging of the messages in [28], [29] determine a degradation of performance.We observe an improvement of the CDF of the SER obtained  with our bilinear-EP JCD algorithm when the length of the transmitted data symbols increases from T = 10 to T = 100.The performance of our modified algorithm obtained assuming perfect CSI shows that there is room for further improvement.Fig. 4 shows the CDF of the NMSE.We observe an improvement of the NMSE by more than an order of magnitude of our proposed algorithm compared to the ICD scheme in [29].
Additionally, the performance of our channel estimation also improves when T increases.

VI. CONCLUSION
In this paper, we considered a CF-MaMIMO system and tackled the quest of low-complexity JCD with near-optimal performance and robustness to pilot contamination.We derived a blind or semi-blind distributed JCD algorithm by formulating the problem in the framework of bilinear inference and obtaining the solution as an unfolding of a message passing algorithm incorporating EP-rules over a factor graph.
The appealing features of the proposed algorithm stem from our choice of the approximate posterior joint distribution of data symbols and channels.Our simulation results show that the proposed scheme significantly outperforms the selected baseline schemes based on the detector in [28], which assumes perfect CSI, and the ICD algorithm in [29].Additionally, bilinear-EP JCD has polynomial computational complexity and allows for straightforward embedding of state-of-the art SISO decoders, such that the decoding complexity can also be kept of polynomial order.This enables further investigations on the rate that can be achieved using our proposed JCD algorithm.

APPENDIX
In this section, we derive some of the factor-to-variable messages.When it does not cause ambiguity, we adopt the following abbreviated notation m Ψ 1,lkt ;z lkt = m Ψ 1,lkt ;z lkt (z lkt ).
Derivation of message m Ψ 1,lkt ;x kt Message m Ψ 1,lkt ;x kt is obtained by applying (8) and ( 9).This computation requires the knowledge of messages from variable nodes to factor nodes, namely, m h lk ;Ψ 1,lkt , m z lkt ;Ψ 1,lkt , and m x kt ;Ψ 1,lkt as we can evince from the factor graph in Fig. 2. In the following, we derive these messages by applying (7).Thus, we obtain m h lk ;Ψ 1,lkt (h lk ) ∝ m Ψ 2,lk ;h lk t̸ =t m Ψ 1,lkt ;h lk which is a Gaussian distribution with covariance matrix and mean given by ( 15) and ( 16), respectively.As product of Gaussian distributions, the mean and covariance matrix above are obtained by applying the Gaussian multiplication lemma, see, e.g., [23].Similarly, we derive the other variable-to-factor messages as m z lkt ;Ψ 1,lkt = m Ψ 0,lt ;z lkt and which is a categorical distribution.We normalize it as follows By utilizing the variable-to-factor messages impinging on the factor node Ψ 1,lkt computed above, we apply ( 9) to obtain the m h lk ;Ψ where the last expression is derived by applying the sifting property of the Dirac delta function, see, e.g., [34].Then, the factor m h lk ;Ψ 1,lkt z lkt x kt in (28) can be rewritten as shown in (29) at the top of the page.Therefore, using the Gaussian multiplication lemma we obtain q Ψ 1,lkt ;x kt ∝ N (0; µ z lkt ;Ψ 1,lkt − x kt µ h lk ;Ψ 1,lkt , C z lkt ;Ψ 1,lkt Next, we observe that q Ψ 1,lkt ;x kt is already a categorical distribution in the exponential family.Then, the projection operator in (8) leaves its argument unchanged, i.e., proj{q Ψ 1,lkt ;x kt } = q Ψ 1,lkt ;x kt .Finally, by applying (8), we obtain the message m Ψ 1,lkt ;x kt which is given in (24).
Derivation of message m Ψ 1,lkt ;z lkt For the derivation of message m Ψ 1,lkt ;z lkt we consider again factor node Ψ 1,lkt in Fig. 2. Thus, the same variable-tofactor messages computed above are necessary to determine q Ψ 1,lkt ;z lkt .Analogously, the distribution before projection onto the family of exponential functions is We observe that q Ψ 1,lkt ;z lkt is a Gaussian mixture in |S| components with the parameters C(x kt ) and μ(x kt ) which depend on the specific value x kt and are defined in (22) and (23), respectively, which is obtained by applying again (29) and then the Gaussian multiplication lemma.Additionally, each component of the Gaussian mixture distribution is weighted by the unnormalized factor ω(x kt ) given in (30) at the top of the page.Then, as in (9) we have to project the Gaussian mixture distribution onto the family of Gaussian distributions, i.e., we need to determine the Gaussian distribution N (z lkt ; ẑlkt , Σ lkt ) := proj{q Ψ 1,lkt ;z lkt } whose moments are matched to the distribution q Ψ 1,lkt ;z lkt .Denoting with ω(x kt ) the normalized weights obtained from ω(x kt ) given in ( 21), the parameters ẑlkt and Σ lkt of the moment matched distribution are shown in (19) and (20), respectively.Finally, we have The parameters C Ψ 1,lkt ;z lkt and µ Ψ 1,lkt ;z lkt of the updated message are then given by the Gaussian multiplication lemma shown in ( 17) and ( 18), respectively.
1 and A p := (X T p ⊗ I N ).Due to the independence of the channels among different users, we have µ lk;MMSE = 1 σ 2 CA H p vec(Y p;l ) (k−1)N +1:kN and C lk;MMSE = C (k−1)N +1:kN,(k−1)N +1:kN .We choose the approximate factor-to-variable messages to be distributed as