Expectation Propagation for Flat-Fading Channels

This letter addresses the problem of signal detection in flat-fading channels. In this context, receivers based on the expectation propagation framework appear to be very promising although presenting some critical issues. We develop a new algorithm based on this framework where, unlike previous works, convergence is achieved after a single forward-backward pass, without additional inner detector iterations. The proposed message scheduling, together with novel adjustments of the approximating distributions’ parameters, allows to obtain significant performance advantages with respect to the state-of-the-art solution. Simulation results show the applicability of this algorithm when sparser pilot configurations have to be adopted and a considerable gain compared to the current available strategies.


I. INTRODUCTION
D ETECTORS for flat Rayleigh fading channels have been widely studied in the literature (e.g., see [1], [2], [3] and references therein).This kind of channels is commonly adopted because it properly represents real scenarios, such as narrowband communications over multipath wireless channels.The framework based on factor graphs (FGs) and the sum-product algorithm (SPA) provides a powerful tool for the accomplishment of both detection and decoding [4].However, when this framework is applied jointly to detection and channel estimation, discrete and continuous random variables appear in the FG and the SPA becomes impractical.
A common approach foresees the use of canonical distributions [4], e.g., the most suitable for the analyzed scenario is the Gaussian one.A possibility is to directly reduce complicated functions to a unique Gaussian through a message projection, as performed in the Kalman smoother [5].Although this approach is effective in pilot-aided transmission, it eventually fails when the spacing between pilot blocks is longer than a maximum accepted distance, as we will demonstrate in the numerical results.
In order to overcome this limitation, in this letter we focus on a different kind of receiver, which is based on the expectation propagation (EP) framework [6].This message-passing algorithm is based on approximating the distributions of the The authors are with the Department of Engineering and Architecture, University of Parma, 43124 Parma, Italy (e-mail: elisa.conti@unipr.it;amina.piemontese@unipr.it;giulio.colavolpe@unipr.it;armando.vannucci@unipr.it).
Digital Object Identifier 10.1109/LWC.2023.3295952variables of interest, rather than those of the exchanged messages.This way, a prior belief coming from the rest of the FG drives the approximation of intractable messages.The combination of the SPA rules with the approximation deriving from EP leads to a hybrid framework which, in the literature, is commonly referred to as EP-belief propagation (EP-BP).This framework has been deeply studied and analyzed in different scenarios, including, e.g., the transmission over phase noise channels [7], the MIMO sparse codes multiple access detection [8] or joint channel estimation and decoding for faster-than-Nyquist signaling [9].As it is inherent to the EP update strategy, improper distributions, e.g., a Gaussian with a negative variance, could appear.In [2], where, unlike other approaches [10], [11], the presence of improper distributions is accepted, the EP framework has been applied to a flat-fading channel, modeled as a complex autoregressive moving-average process.The authors proposed an EP smoothing approach where inner detector iterations are required to achieve convergence and it is tested using differential encoding.However, when using a more powerful encoder, hence working in a lower signal-to-noise ratio (SNR) regime, the approach in [2] is not equally effective and this is the focus of our investigation.
The main contributions of this letter can be summarized as follows.
• An implementation of EP-BP that overcomes its limitations and reduces the algorithmic complexity is proposed.• An improved scheduling which allows to achieve convergence without the aid of multiple inner detector iterations (iterative refinement [6]) is adopted.• A new technique for balancing the variance of the involved Gaussian distributions is introduced.• The problem of improper distributions is addressed, the scenarios where they can compromise the algorithm operations are identified and a novel strategy to handle these events is presented.Finally, we demonstrate through simulations that the proposed techniques present a superior performance, in terms of biterror rate (BER) and computational complexity, with respect to state-of-the-art EP implementations [2].Moreover, the proposed solution ensures a higher robustness against concentrated pilot distributions, laying the foundation for overcoming the limitations related to the classical and well-established Kalman filter.

II. SYSTEM MODEL
We consider the transmission of a sequence of K complex coded symbols c = (c 0 , c 1 , . . ., c K −1 ) over a flat fading channel.The sequence c is the result of the binary encoding, through the code C, and modulation over an M-ary constellation (X = {x 1 , . . ., x M } ⊂ C) of the information bit sequence b.Assuming linear modulation with Nyquist shaping pulses, matched filtering, and channel variations slow enough so that no intersymbol interference arises, the discrete-time received signal can be expressed as where n k is the AWGN and g k is the fading process.The vector of noise samples, n = (n 0 , n 1 , . . ., n K −1 ), 1 has independent and identically distributed (i.i.d) complex circularly symmetric components with n k ∼ N C (0, N 0 ). 2  The fading channel follows Clarke's isotropic scattering model and the fading coefficients g = (g 0 , g 1 , . . ., g K −1 ) form a sequence of zero-mean complex Gaussian random variables with unit variance, statistically independent of both c and n, with autocorrelation sequence where J 0 (•) is the zero-th order Bessel function, T is the symbol period, and f D is the Doppler spread.At the receiver, the adopted model for the fading is an autoregressive process of order N, AR(N), so that g is approximated by the process h = (h 0 , h 1 , . . ., h K −1 ), which obeys 3 where Clearly, there is a channel mismatch between the AR(N) process {h k } and the one modelled by (2), hence it is critical to properly set the values of the coefficients ρ in (3) and of the random increment's variance 2σ 2 ν .A classical approach in the literature [1], [12] consists in obtaining the coefficients ρ by solving the Yule-Walker (YW) equations R h (m) = R g (m), for m = 1, 2, . . ., N , and computing σ 2 ν as a function of ρ through the constraint R h (0) = R g (0).However, in the present context, it turns out that this solution is not the best choice to approximate the actual fading statistics [3].Therefore, we will select the value of σ 2 ν that optimizes the performance, as detailed in Section V.

III. THE EXPECTATION PROPAGATION FRAMEWORK
Solving the fading estimation problem through messagepassing algorithms requires the factorization of the joint posterior pdf of the information bits and the fading coefficients, given the received samples.From the system model described above, it can be expressed as 1 Bold lower-case letters (a) denote vectors, bold upper-case letters (A) denote matrices.The superscripts (•) T and (•) H denote the transpose and the Hermitian of a matrix or vector, respectively.Given a square matrix B with dimension N B × N B , the generic i, j element is denoted by B(i, j ), i, j = 1, . . ., N B . 2 A complex circularly symmetric Gaussian random variable v with mean μ and variance 2σ 2 is denoted by v ∼ N C (μ, 2σ 2 ).The corresponding probability density function (pdf), with argument x, is denoted as g C (x , μ, 2σ 2 ).The pdf of a Gaussian vector is written as g C (x , μ, C ), where the inverse of the covariance matrix, Σ = C −1 , is called precision. 3We assume that h l = 0 for l < 0.
where the information bits are uniformly and identically dis-  3) and from the fact that the AWGN channel is memoryless, given h.In (4), we defined as the functions associated with the factor nodes in the FG representation of the joint pdf (4), which is reported in Fig. 1.Due to the presence of continuous random variables in the FG, we follow the commonly adopted technique based on canonical distributions [4].In this way, only the parameters characterizing a specific pdf within the family are exchanged in the FG.From (3), the distributions that better approximate those of the fading samples are the Gaussian ones.According to the SPA rules [4], which we assume the reader to be familiar with, the observation message, i.e., the message from factor node f k to variable node h k is where the definition of P d (c k ), as well as of the subsequent messages, can be found in Fig. 1.Being a mixture of M Gaussian distributions, the exact propagation of the observation message is infeasible.Depending on how it is treated, different algorithms can be derived.A straightforward projection of the mixture message (7) onto a single Gaussian yields the Kalman filtering approach of [1], [3].Once the projection is performed, all the subsequent messages that are sent along the Markov chain at the bottom of Fig. 1 belong to the same approximating family because of the closure of the Gaussian family under both the multiplication and the convolution operations [1], [4].
A different possibility consists in approximating the distributions of variable nodes, rather than the messages, and this is the key idea on which the EP framework is based.The approximation of p d (h k ) in ( 7) is thus performed in EP, under the influence of a temporary prior distribution p (p) (h k ) which is the product of the incoming messages to the variable node h k from the lower part of the FG, i.e., p (f ) 4 EP is a message-passing algorithm whose approximations are generated by projecting the variable distribution on a fully-factorized exponential family F, where the projection is performed by minimizing the Kullback-Leibler (KL) divergence.These two features allow to compute the projection by solving a set of moment matching equations [6].In the present case of a Gaussian approximating family, the distribution resulting from the projection is thus the unique Gaussian whose mean, variance and scale factor equal those of the distribution to be approximated.
The specific algorithmic steps of EP can be found in [13].The aim is to approximate the marginal distribution of h k which, deriving from the combination of the prior and observation messages [13], is still multimodal.Denoting with η MIX ,m and C MIX ,m the mean and covariance matrix of the m-th mode, respectively, its expression is given by [4], [13] where 5 P m = P d (c k = x m )/|x m | 2 and w m is the symbol-dependent coefficient that weights the m-th mixture component.The approximation of the mixture (8) with a Gaussian p (EP ) (h k ) = proj KL [p(h k )] is performed, as usual in EP, by matching the moments of the distributions.The zero-th order moment is the mass of the distribution, i.e., Z = M m=1 P m w m .However, the mass of the messages is not significant in message-passing algorithms like SPA or EP, and it is replaced with its normalized version p h (h k ) = Z −1 p(h k ).The matching of the first and second order moments is performed by solving respectively.After few calculations, we obtain where W m = Z −1 P m w m .Now, it is possible to compute the parameters of the updated message pd (h k ) where η p,k and Σ p,k are the mean and variance of the prior distribution.The approximated message is then propagated following the SPA rules [4].
IV. PROPOSED ALGORITHM The classical EP implementation [2] presents some critical issues which do not lead to a satisfactory performance in the considered scenario.In the following, we present novel techniques which allow to overcome the limitations related to the standard EP structure.
Damping and Boosting Factors: In order to face the EP instabilities, as discussed in the literature, a number of solutions exist, including damping [10], [11].However, unlike its common use, in the analyzed scenario the introduction of this correction factor is studied specifically to provide a performance improvement through the balancing of overly confident estimates without the involvement of old updates.In fact, in our implementation, the damping factor, ξ d , which belongs to the interval [0, 1], is applied to the precision of the approximating message pd (h k ) only.In this way, the algorithm underestimates the updates reliability as commonly done in the literature when dealing with sub-optimal algorithms.We propose to set this factor to the maximum value of P d (c k ), i.e., the probability mass function of the symbols.The main advantage of this approach is that a lower number of turbo iterations is required for achieving convergence.At the first turbo iteration 7 (which is the only one in the separate detection and decoding case) this quantity is equal to 1/M.On the contrary, from the 2nd iteration ahead this value becomes a clearer and clearer representation of the decoder confidence about the transmitted symbols.It follows that the closer this value is to 1, the higher the weight given to the approximating message after the KL projection.
Moreover, we propose an additional technique which allows to obtain a further performance improvement: the use of a boosting factor for pilot symbols.Similarly to damping, the precision ( 11) is multiplied by a factor that, this time, is greater than 1, so that its effect is the opposite: it increases the confidence of the observation message in correspondence with a pilot symbol, i.e., when p d (h k ) is a unimodal Gaussian distribution and, therefore, no approximation is performed.The selected value for the boosting factor is ξ b = 2.
Negative Variances: Due to the subtraction in (11), it may occur that the algorithm generates negative variances, so that the approximation of the observation message, pd (h k ), is not necessarily a valid pdf [2].This happens when the precision of the prior distribution is larger than the one of the marginal after the KL projection.Since the focus of EP is on the marginal distribution of the channel parameters, the algorithm could accept improper distributions as messages.However, when the scheduling described in the following is adopted for the separate detection and decoding case, the only message which can present such variances is the observation one. 7The detector subgraph operates first by exchanging horizonal messages along the Markov chain in the lower part of the FG shown in Fig. 1.This message passing inside the detector can be iterated with multiple forwardbackward passes (N d ) before sending vertical messages to the upper part of the FG (decoder).An inner message passing procedure can take place also there until convergence or until a maximum number of decoder iterations (Nc ) is reached.The exchange of information between detector and decoder can be iterated for N t times.We will refer to these iterations as turbo iterations.
Authorized licensed use limited to the terms of the applicable license agreement with IEEE.Restrictions apply.

Algorithm 1 Proposed EP Algorithm
k |k +1 (h k ) computed at the (n iter ,T − 1)-th turbo iteration, respectively, for any value of k 4: else 5: Set α k (h k ) and β k (h k ) to unitary messages for any value of k 6: end if 7: for k = 0 : K − 2 do 8: Compute the temporary prior (η p,k and Σ p,k ) by multiplying Compute the parameters of the exact marginal distribution (i.e., for each m, compute Wm , η MIX ,m , C MIX ,m in [13]) 10: Perform the moment matching, using ( 9)-( 10) 11: Compute pd (h k ) via ( 11)-( 12) and perform either damping or boosting on Σ d depending on k 12: Perform the observation message projection 14: end if 15: Compute the forward messages p ) through the SPA rules [4] (refer to Fig. 1) 16: end for 17: Then, repeat the steps 9-14 19: Compute the backward messages p [4] 20: end for 21: Multiply the forward message obtained through 7-16 with the backward one computed in 17-20 for each k 22: Compute for each k and each c k ∈ X the probabilities Pu (c k ) to be sent to the decoder When turbo iterations are considered, improper distributions could instead propagate giving rise to numerical problems.Hence, in this case, we propose an alternative approach to deal with negative variances, where the algorithm performs a message projection instead of a variable distribution projection [13].The proposed method leads to a performance improvement with respect to the propagation of improper distributions and consists in approximating the message p d (h k ) locally with a Gaussian distribution whose mean and variance are obtained through moment matching without the influence of the rest of the network [14], [15].
Algorithm Summary: With reference to the generic n iter ,T turbo iteration, the proposed algorithm structure is outlined in Algorithm 1.
The forward and backward recursions lead to two different approximations for p d (h k ).The main advantage of the adopted scheduling is that a satisfying performance is already achieved after a single forward-backward sweep inside the detector.

V. SIMULATION RESULTS
The performance of the proposed algorithm was analyzed in terms of BER versus E b /N 0 , E b being the received signal energy per information bit and N 0 the one-sided noise power spectral density.We considered a (3,6)-regular rate-1/2 LDPC code with length 4000 and QPSK modulation (M = 4).In all simulations, in order to allow the convergence of the algorithm, pilot symbols were inserted in the transmitted codeword. 8The presence of pilots involves a slight decrease of the effective information rate, resulting in an increase in the required SNR.This increase has been artificially introduced in the curve labeled "All Pilots" , inserted as ideal bound, for the sake of comparison.The curves labeled "Kalman filter" in Figs. 2 and 4 refer to the algorithm proposed in [3].The algorithms were tested for both the AR(1) and AR(2) fading models.The values of the parameter ρ were set according to the YW equations, whereas the increment variance σ 2 ν was set to the one minimizing the BER, found through computer simulations.
In Fig. 2, we compared the performance of the proposed algorithm with both the Kalman filter and the "batch-EP" of [2], labeled with "classical EP", when the normalized Doppler bandwidth of the fading process f D T was set to 10 −2 and one pilot symbol was inserted in every block of 20 transmitted symbols.Firstly, we analyzed the practical case of separate detection and decoding. 9Figure 2 shows that the proposed EP algorithm can achieve a performance gain of approximately 0.9 dB with respect to the Kalman filter for both the AR(1) and the AR(2) models.Moreover, a significant gain is observed comparing the proposed technique with the classical EP of [2].Regarding the latter algorithm, the authors proposed an EP smoothing approach where an iterative refinement of the observation approximations is foreseen.Consistently with that used in [2], a maximum number (N d ) of 5 inner detector iterations was allowed, and it resulted to be sufficient for achieving convergence.For this algorithm, the curve shown in Fig. 2 was obtained using the AR(1) model since the AR(2) model did not yield further improvement.The complexity of the considered algorithms was evaluated with reference to one inner detector iteration for a single time epoch k and it is reported in Fig. 3.They were compared considering the total number of required operations, including two-terms multiplications and additions.It can be noticed that the additional operations involved by the EP framework lead inevitably to a complexity increase which, relying on our implementation, is still limited and reasonably comparable to the one of the Kalman filter.On the other hand, a steeper increase is registered for the classical strategy [2].In Fig. 2, the BER curves related to the joint iterative detection and decoding, where a maximum of 200 turbo iterations is allowed, are also reported with reference to the AR(1) fading model.It is possible to notice that the curves related to the proposed EP algorithm and the Kalman filter (optimized according to [3]) are approximately overlapped.In fact, when a sufficient number of turbo iterations is performed, the decoder provides symbol probabilities approaching the ones passed in correspondence with a known symbol.Therefore, the observation message is composed by a single Gaussian and the operations performed by the two detectors coincide.In Fig. 4, keeping fixed the effective information rate, different pilot distributions were considered in order to show the higher robustness of EP with respect to the Kalman filter against pilot symbols arrangements in sequences separated by longer blocks of code symbols.The normalized Doppler bandwidth was set to 5•10 −3 .The AR(1) fading model was adopted for the sake of simplicity, however, similar conclusions can be drawn for AR (2).It is possible to notice that both the proposed algorithm and the Kalman filter are practically insensitive to a change in the pilots blocks distance from 20 to 40 symbols.Differently from the EP-based algorithm, increasing the gap to 80 symbols leads to a significant loss for the Kalman filtering approach.Finally, considering a pilots distance of 160 code symbols, the EP detector is still able to provide reasonable fading estimates and, with the aid of 6 turbo iterations, it loses approximately 2 dB from the case where the 1/20 pilots distribution is adopted.On the contrary, in the same conditions, the Kalman filter does not work because of its lower robustness against longer pilots spacing.

VI. CONCLUSION
In this letter, we proposed a new solution for signal detection over channels with flat-fading through EP.The presented method allows to solve the numerical instabilities related to the traditional procedure provided in the literature.We showed through numerical results the advantages in terms of BER that can be achieved in the separate detection and decoding case with respect to the classical Kalman filter approach and we demonstrated the higher robustness of this algorithm against more concentrated pilots distributions.
The advantages of this framework offer a variety of possibilities for new research directions, such as its application to the multiple-input multiple-output channel model and the study of more severe fading conditions, for instance frequency selectivity.Moreover, we could extend our analysis to other kinds of divergence measures for the involved approximations.

Manuscript received 1
June 2023; accepted 7 July 2023.Date of publication 17 July 2023; date of current version 9 November 2023.This work was supported in part by the Italian Ministry of Universities and Research (MUR) under the PRIN Liquid_Edge Contract and in part by the University of Parma through the Action Bando di Ateneo 2022 per la ricerca co-funded by MUR D.M. 737/2021 PNR PNRR NextGenerationEU.The associate editor coordinating the review of this article and approving it for publication was N. Wu. (Corresponding author: Elisa Conti.)

Fig. 1 .
Fig. 1.FG corresponding to (4).P (b, c, ) ∝ p(r |c, h)p(h)P (c|b)P (b) tributed, μ C is the function that maps binary information messages b into the codewords c, and I [ • ] is an indicator function equal to 1 when c = μ C (b) and to zero otherwise.The factorizations of the pdfs p(h) and p(r |c, h) in (4) come, respectively, from the assumption that the fading process follows the AR(N) model (
c 2023 The Authors.This work is licensed under a Creative Commons Attribution 4.0 License.
For more information, see https://creativecommons.org/licenses/by/4.0/Authorized licensed use limited to the terms of the applicable license agreement with IEEE.Restrictions apply.