Efficient Decoder Design for Low-Density Lattice Codes From the Lattice Viewpoint

Low-density lattice codes (LDLCs) achieve near-capacity performance on additive white Gaussian noise (AWGN) channels. The <inline-formula> <tex-math notation="LaTeX">$M$ </tex-math></inline-formula>-Gaussian decoder is the state-of-the-art message passing decoder for LDLCs in terms of the error performance. However, this decoder has complexity <inline-formula> <tex-math notation="LaTeX">$O(M^{d-1})$ </tex-math></inline-formula> with messages represented by Gaussian mixtures, where <inline-formula> <tex-math notation="LaTeX">$d$ </tex-math></inline-formula> is the degree of an LDLC and <inline-formula> <tex-math notation="LaTeX">$M$ </tex-math></inline-formula> is the number of Gaussian functions for approximating each check node message. In this paper, we establish the correspondence between Gaussian functions for approximating a variable node message and points of a certain lattice. Based on this lattice viewpoint, the problem of approximating a variable node message is formulated as a lattice point enumeration (LPE) problem. Then, an LPE decoder with linear complexity <inline-formula> <tex-math notation="LaTeX">$O(d)$ </tex-math></inline-formula> is proposed. Our simulation results validate that the LPE decoder achieves almost the same error performance as the <inline-formula> <tex-math notation="LaTeX">$M$ </tex-math></inline-formula>-Gaussian decoder.

Compared to the GMR decoders, the M-Gaussian decoder proposed in [15] only requires the selection of one integer parameter M. Each check node message is approximated with a mixture containing M Gaussian functions. Computing each variable node message involves a product of d−1 check node messages. As a result, the complexity of the M-Gaussian decoder is O(n · t · M d−1 ). The best-known error performance is achieved by the 2-Gaussian decoder for low-to-moderate LDLC dimensions, while the 3-Gaussian decoder is required for moderate-to-high dimensions (see [15,Fig. 9]). Later, a shuffled version of the M-Gaussian decoder was proposed in [16] with average complexity O(n · t · 1.4 d−1 ). Compared to parallel message passing decoders which update variable node messages in parallel, shuffled message passing decoders update variable node messages in a specified sequence. As a consequence, the messages updated later benefit from the information of the messages updated earlier. Thus, the shuffled decoders enjoy faster decoding convergence than the parallel counterparts. It is very likely to explore the shuffled design for many message passing decoders. However, in this paper, we will focus on the design of a parallel message passing decoder for LDLCs. The shuffled version is out of the scope of this paper. To further reduce decoding complexity, a faster decoder was proposed in [17] with complexity O(n·t·d). Each variable node message is approximated with a mixture containing at most two Gaussian functions. However, compared to the 2-Gaussian decoder, the decoder in [17] has performance loss of 0.2 dB and 0.3 dB in the waterfall region for n = 10 3 and n = 10 4 , respectively (see [17, Fig. 3]). The complexity and error performance of Gaussian-mixture-based decoders are summarized in Table 1. The error performance is evaluated by the distance from the channel capacity at the symbol error rate of 10 −5 .
There exists a clear trade-off between the error performance and the complexity of existing decoders [15], [17]. Approximating each variable node message with more Gaussian functions leads to better error performance but higher complexity. The opposite holds when approximating the variable node message with fewer Gaussian functions. However, the M-Gaussian decoder may consider many unimportant Gaussian functions for approximating messages. Then, an intriguing question is, is it possible that a decoder with polynomial (or even linear) complexity in d can achieve the best-known performance as the M-Gaussian decoder? The difficulty in answering this question lies in the lack of a flexible scheme for choosing Gaussian functions for approximating variable node messages. Our work in this paper provides an affirmative answer. We first mathematically establish the correspondence between the Gaussian functions used to approximate a variable node message and the points of a certain lattice. This lattice viewpoint converts the problem for approximating messages to a lattice point enumeration (LPE) problem which can be solved by the list sphere decoding (LSD) algorithm [18], [19]. As a result, the number of Gaussian functions for approximating messages can be flexibly chosen by adjusting the radius of the sphere. The lattice viewpoint serves as the foundation for our decoder design. Contributions of this work are summarized as follows: • We formulate the problem of approximating a variable node message as an LPE problem. Whether a Gaussian function is essential for approximating the variable node message depends on the distance between its corresponding lattice point and a query point. From this lattice viewpoint, we establish a unified framework for interpreting some existing decoders [15], [17]. By geometrically interpreting the Gaussian functions used for approximating a variable node message in various decoders, it can be deduced that the error performance of the M-Gaussian decoder for any value of M can be approached by a decoder that enumerates the lattice points within a sufficiently large sphere. • The lattice viewpoint makes it possible to allow only essential Gaussian functions to be considered for approximating a variable node message. For efficiently addressing the LPE problem, a simplified LSD algorithm is derived by exploiting the relationship between diagonal and off-diagonal entries of the generator matrix of the underlying lattice. Based on the LSD algorithm, an LPE decoder is proposed for decoding LDLCs. • The complexity of the LPE decoder is dominated by that of the LSD algorithm. With a judiciously chosen search sphere, we show that the LPE decoder achieves the complexity O(n · t · d) and almost no performance loss compared to the M-Gaussian decoder. In addition, a complexity comparison of the LPE decoder and the M-Gaussian decoder in terms of the required number of floating-point operations is provided to demonstrate the superior efficiency of the LPE decoder. Part of this work was presented in a conference version [1]. In this paper, different from [1], a rigorous derivation of the lattice viewpoint for the decoder design is presented. Besides, geometrical interpretation of different decoders is demonstrated. Further, performance analysis and numerical evaluation for the proposed LPE decoder are provided.
The rest of this paper is organized as follows. Section II introduces the preliminaries of LDLCs and basic operations on Gaussian and Gaussian mixtures as well as the LSD algorithm. In Section III, we present a detailed derivation of the lattice viewpoint for LDLC decoding, and geometrical interpretation of different decoders. A simplified LSD algorithm and the LPE decoder are presented in Section IV. The analysis on the complexity and required number of floating-point operations of the LPE decoder is provided in Section V. Numerical results are shown in Section VI. Finally, Section VII concludes the paper.
Notation: Non-bold italic letters denote scalars. Boldface lowercase and uppercase letters denote vectors and matrices, respectively. The function sgn(a) is 1 if the scalar a is positive and −1 if it is negative. The Hadamard product of two vectors a and b is represented by a b. The Hadamard inverse of the vector a is denoted by a •−1 [20]. Let A T and A −1 denote the transpose and inverse of the matrix A, respectively. Let diag(a) denote the square diagonal matrix with the elements of vector a on the main diagonal. We use to denote the Gaussian function of the random variable x with the mean m and the variance v. The convolution of two functions f (x) and g(x) is denoted by f (x) * g(x).

II. PRELIMINARIES
This section presents necessary background knowledge.

A. LATTICES AND LATTICE CODES
An n-dimensional lattice , defined by an n × n full-rank generator matrix G, is a discrete additive subgroup of R n . A lattice point x ∈ is an integral linear combination of all columns of G where b ∈ Z n is the information column vector. If a lattice point is transmitted as a codeword over the AWGN channel, the channel observation is where n is the additive Gaussian noise random vector with zero mean and covariance matrix σ 2 I. For simplicity, in this paper, we consider the power-unconstrained AWGN channel. Thus, any lattice point could be a codeword. Since the definition of the signal-to-noise ratio (SNR) becomes meaningless without power constraint, the volume-to-noise ratio (VNR) [21], measured by the lattice constellation density and σ 2 , is considered for the power-unconstrained channel: The definition of VNR generalizes the concept of channel capacity by measuring the maximum lattice point density that can be recovered. That is, the unconstrained AWGN channel capacity is achieved if the error probability can be arbitrarily small when

B. LOW-DENSITY LATTICE CODES
A low-density lattice code (LDLC) [4] is characterized by the inverse of its generator matrix H = G −1 . The matrix H is sparse and is named the check matrix of the LDLC. More explicitly, every row and column of H has only d nonzero entries where d n is named the degree of the LDLC. All nonzero entries are assigned random signs. Besides, The absolute values of the d nonzero entries in each row or column are given by a generating sequence {1, w, . . . , w} of d elements, where w = α d−1 with a constant α satisfying 0 < α < 1 [13], [14], [15], [16], [17].

1) LDLC MESSAGE PASSING DECODER
Since H = G −1 , the check equations for an LDLC are naturally represented by Hx = b. Based on H, a Tanner graph is constructed with check nodes representing the check equations and variable nodes the coordinates of x. An edge between the i-th check node and the j-th variable node of the Tanner graph exists if H i,j = 0.
The LDLC message passing decoder derived in [4] is shown below. During one iteration, the following steps are performed: • Initialization: the k-th variable node sends its connected check nodes the channel message for k = 1, . . . , n. • Check node messages: After receiving the messages from d connected variable nodes, the j-th check node calculates the message sent to its i-th connected variable node in three steps, for j = 1, . . . , n and i = 1, . . . , d. 1) Convolution step: where j i is the index of the i-th variable node connected with the j-th check node, and f j i ,j (x) is the message sent from the j i -th variable node to the j-th check node. 2) Stretching step: 3) Periodic extension: Then, the j-th check node will send the message p j,j i (x) to its i-th connected variable node. VOLUME 4, 2023 • Variable node messages: Once obtaining the messages from d connected check nodes, the k-th variable node calculates the message sent to its i-th connected check node in two steps, for k = 1, . . . , n and i = 1, . . . , d. 1) Product step: where k l is the index of the l-th check node connected with the k-th variable node. 2) Normalization: • Final decision: when the maximum number of iterations is reached, the final step is performed for every variable node: the codeword is estimated asx with each coordinatê and the integer message vector is detected aŝ For the derivation of the above decoder, interested readers are referred to [4].

2) CONVERGENCE OF THE VARIANCES OF VARIABLE NODE MESSAGES
As the iteration proceeds, the variable node messages tend to be single Gaussian functions. Let V i denote the variance of the variable node message that is sent from the k-th variable node to its i-th connected check node. The convergence of V i is [4] lim

C. OPERATIONS ON GAUSSIAN FUNCTIONS AND GAUSSIAN MIXTURES
To calculate variable node messages, the product of two Gaussian functions and the moment matching approximation are described below.

2) MOMENT MATCHING APPROXIMATION
A Gaussian mixture is a weighted sum of Gaussian functions, i.e., The moment matching approximation is commonly used in [15], [16], [17] to approximate a Gaussian mixture using a single Gaussian function by minimizing the Kullback-Leiber divergence between them [15]. The mean and variance of the approximating Gaussian function arē respectively.

D. LIST SPHERE DECODING ALGORITHM
Given an arbitrary point q ∈ R n and a lattice with the generator matrix G, the LPE problem is to find a list of lattice points satisfying The list sphere decoding (LSD) algorithm, the variant of SD [18], [19] with a list output, is an efficient tree search for solving the LPE problem. Suppose that G is an upper triangular matrix, the search can be performed layer by layer using successive interference cancellation. The Euclidean distance between the query point q and the current searched point is used as the metric. Let D 2 k denote the squared partial Euclidean distance at the k-th layer. The metric is accumulated as where k = n, n − 1, . . . , 1 and D n+1 = 0. In this paper, we consider the depth-first SD. The search starts from the n-th layer (root layer) to the first layer (leaf layer). If D k ≥ β for k = n−1, . . . , 1, we will move up to the (k +1)-th layer and update z k+1 following the Schnorr-Euchner ordering [22]. If D k < β for k = n, . . . , 2, we will move down to the (k − 1)-th layer for the search. If the algorithm reaches the first layer (leaf layer) and D 1 < β, the current searched lattice point is stored and the algorithm continues to seek another point satisfying (19). The algorithm will stop when D n ≥ β and output a list of integer vectors z.
In case G is not upper triangular, one could apply the QR decomposition to G such that G = UR, where U is an orthogonal matrix and R is an upper triangular matrix. Then, the constraint (19) becomes ||U T q − Rz|| 2 < β 2 .

III. LATTICE VIEWPOINT FOR LDLC DECODING
In this section, we first further derive the expression of variable node messages. Based on the derivation result, we then elaborate on the lattice viewpoint for LDLC decoding. Similar to what is done in [15], [23], we also apply the moment matching approximation to the variable node messages so that the messages exchanged between nodes are single Gaussian functions.

A. VARIABLE NODE MESSAGES WITH THE MOMENT MATCHING APPROXIMATION
As shown in (8), each variable node message is obtained by calculating the product of d − 1 check node messages and the channel message. Without loss of generality, we focus on the message sent from the k-th variable node to the d-th connected check node: where a l and v l are the mean and variance ofp k l ,k (x), the check node message before the periodic extension. For ease of presentation, in the sequel, we change the notations of some variables in (21) by letting p l (x) = p k l ,k (x) and h l = H k l ,k . Then, Due to the product step,f k,k d (x) is a Gaussian mixture containing infinite Gaussian functions. For practical implementation, only a finite number of Gaussian functions can be considered. Assume thatf k,k d (x) is approximated by considering N Gaussian functions. Define N integer vectors where C is a normalization factor such that N s=1 c s = 1.
The mean (25) and scaling coefficient (26) can be written in a vectorized form as and and The derivation of (28) from (26) is given in the Appendix. We can see that the s-th Gaussian function for approximating the variable node message corresponds to an integer vector z s .
Next, to simplify the message passed between nodes [15], [23], the moment matching approximation is used such that the variable node message is further approximated by a single Gaussian function. According to (17) and (18), for achieving good approximation by moment matching, we are keen on finding a list of z s giving sufficiently large c s |m s | and c s m 2 s , i.e., where ρ 1 and ρ 2 are two scalar parameters and should be judiciously chosen to achieve a desired trade-off between the approximation accuracy and search complexity. According to (27) and (30), it is intriguing to note that Then, we have and c s m 2

B. LATTICE VIEWPOINT
The straightforward search for (31) is infeasible due to the complicated expressions of c s |m s | and c s m 2 s . For the tractability of search, we alternatively consider searching the list of z s defined by The search for (35) should find a list of z s corresponding to Gaussian functions having sufficiently large c s > Ce − 1 2 β 2 . Nevertheless, in what follows, we will show that the list of z s defined by (31) can still be found by the search for (35). Based on (32), we have Thus, the searched integer vectors in (35) correspond to Gaussian functions with and Note that the search region associated with (35) is a sphere of radius β. Then, the list of z s defined by (31) can be a subset of (35) as long as the sphere of radius β covers the search region associated with (31). By applying the Cholesky factorization to Q, the constraint in (35) becomes where the subscript s of z is omitted for ease of presentation and R is an upper triangular matrix satisfying Q = R T R. Note that each lattice point Rz corresponds to a Gaussian function for approximating a variable node message. We now have a lattice viewpoint to interpret the problem of finding essential Gaussian functions for approximating the variable node messages. Clearly, searching a list of z with the constraint (39) is an LPE problem with the query point q = −R(h a). Then, the LSD algorithm can be utilized to enumerate the desired integer vectors.

C. GEOMETRICAL INTERPRETATION OF DIFFERENT DECODERS
From the lattice viewpoint, we can geometrically compare the number of Gaussian functions used for approximating each variable node message in the M-Gaussian decoder [15] and the low complexity decoder [17]. Each Gaussian function corresponds to a lattice point. A 2-dimensional example is depicted in Fig. 1 where calculating each variable node message needs the product of two check node messages  lattice points on the blue solid edges, which are determined by considering the two closest lattice points to the query point in each lattice dimension, are used for approximating each variable node message. Similarly, nine lattice points within the parallelogram shaped by the red dashed edges are considered by the 3-Gaussian decoder.
The decoder in [17] focuses on the parallelogram shaped by the blue solid edges, i.e., the region considered by the 2-Gaussian decoder. However, at most two lattice points (marked by purple color in Fig. 1) within this region are considered by the decoder in [17]. The two lattice points are approximately to be the nearest to the query points. Clearly, this decoder sacrifices the approximation accuracy for the variable node messages.
A trade-off between decoding performance and complexity exists in different decoders. The M-Gaussian decoder greedily considers all lattice points within a parallelogram, while the decoder in [17] only considers a fixed number of them. Consequently, the M-Gaussian decoder has better error performance but higher complexity than the latter one. Besides, for approximating variable node messages, unimportant Gaussian functions may be used if M is too large and essential Gaussian functions may be ignored for small M. For achieving a flexible trade-off, in this paper, we consider the lattice points within a sphere centering at the query point, which can be searched by applying the LSD algorithm. In the example of Fig. 1, five lattice points within the sphere of radius β are found by the LSD algorithm, which considers one more essential Gaussian function than the 2-Gaussian decoder and discards four unimportant Gaussian functions used in the 3-Gaussian decoder.
The interpretation from the lattice viewpoint indicates that the performance of the M-Gaussian decoder can be approached by choosing lattice points within a sufficiently large sphere.

IV. SIMPLIFIED LSD ALGORITHM AND THE PROPOSED LPE DECODER
In this section, based on the specific relationship between diagonal and off-diagonal entries of R, a simplified LSD algorithm is derived first. Then, the LPE decoder is introduced.

A. GENERATOR MATRIX OF THE LATTICE FOR THE LPE PROBLEM
This subsection provides an explicit expression of R which is the generator matrix of the lattice involved in the LPE problem presented in Section III-B. Thanks to the specific structure of Q (see (29) and (30)), we first have where The matrixR is the Cholesky factor of I − tt T , which can be efficiently solved by the method proposed in [24]. For obtaining R, one more step is needed: Then, the entries of R are where 1 ≤ i ≤ d and 0 l=1 = 0. It is notable that R dd = 0 since d l=1 t 2 l = 1, which means that the last row of R is all zeros. However, since z d is fixed to be zero according to our formulation in (23), the first d−1 rows of R are sufficient for applying the LSD algorithm.

B. SEARCH RADIUS FOR THE LSD ALGORITHM
The choice of search radius will affect the complexity of the LSD algorithm since it affects the number of visited nodes inside the search space. Note that the basis length of the lattice spanned by R is We could heuristically set the initial choice of the search radius as the upper bound of the largest basis length, max j 1/(h 2 j v j ). However, according to (13), v j will converge to zero if |h j | = 1, which causes an infinitely large search radius. Thus, we set the initial choice of the search radius by only considering |h j | = w, i.e., By considering the right-hand side of (28) as a function of z s , we notice that (28) In this paper, we set = 10 −5 . Since we choose to apply the LSD algorithm with the Schnorr-Euchner ordering, the Babai point must be the first candidate to be enumerated. Thus, finding out D B introduces no additional complexity. By simulation, at the first iteration, we observe that the LSD algorithm may output an empty list of candidates with the search radius (44) at a slight probability. For this rare case, the affected variable node message just keeps unchanged and waits for being updated at the next iteration. The probability of obtaining an empty list will be presented in Section VI-C.

C. SIMPLIFIED LSD ALGORITHM FOR LDLC DECODING
The LSD algorithm can be simplified by exploiting the specific expression of R ij and R ii in (42). Denoting p = h a, the left-hand side of (39) can be further expanded as where for i = 1, . . . , d − 1. For the LSD algorithm, we mainly focus on simplifying the calculation of γ i in this subsection. Define two d-dimensional vectors f and g with elements and for i = 1, . . . , d. According to (42), By substituting (50) into (47), we have To further simplify the calculation of Substituting (52) back into (51), for i = 1, . . . , d − 1.
By introducing the vectors f, g and u, we simplify the calculation of γ i . By (42), (48), and (49), we further have

D. LPE DECODER
The proposed decoder based on the lattice viewpoint is introduced in this subsection. For computing check node messages, the convolution and stretching steps are the same as (5) and (6), respectively. Since the integers in the periodic extension step (7) are found by the LSD algorithm under our formulation, we can avoid the periodic extension step. Thus, only the steps for obtaining variable node messages are presented.
• Variable node messages: for computing the message sent from the k-th variable node to one of its connected check node, 1) obtain the input messages sent from d − 1 remaining connected check nodes: where l = 1, . . . , d − 1.

Algorithm 1: Simplified List Sphere Decoding
Input: f, g, p, t, u d , R 2 ii , β (see (45), (48), (49) and (52)) Output: L, unchanged and waits for being updated at the next iteration. Else, for each candidate in L, calculate the corresponding Gaussian function according to (24), (25), and (28). Obtain the Gaussian mixture by taking the sum of these Gaussian functions.
Note that the scaling coefficients in the mixture should be normalized by their sum. Then the variable node message is approximated by applying the moment matching approximation to the Gaussian mixture.

V. COMPLEXITY ANALYSIS FOR THE PROPOSED DECODER A. COMPLEXITY ORDER ANALYSIS
The complexity order of our proposed decoder is determined by that of computing check node messages and variable node messages. As presented in Section IV-D, the complexity of computing a variable node message is further dominated by that of the LSD algorithm. In this subsection, we first analyze the complexity order of the simplified LSD algorithm. Then, the complexity of the proposed LPE decoder is presented.

1) COMPLEXITY ORDER OF SIMPLIFIED LSD ALGORITHM
We first analyze the complexity of the LSD algorithm which is the most complicated step for computing variable node messages. The LSD algorithm traverses d − 1 layers and its complexity depends on the number of visited nodes at each layer. Each node at the k-th layer represents the integer z k .
as the set of searched partial integer vectors z k:d−1 [z k , . . . , z d−1 ] T . Due to the constraint of the search radius β, visited nodes at the k-th layer consist of the nodes satisfying z k:d−1 ∈ P k (β) and those simultaneously satisfying z k:d−1 / ∈ P k (β) and z k+1:d−1 ∈ P k+1 (β). Thus, the number of visited nodes at the k-th layer is |P k (β)|+|P k+1 (β)|. The complexity for searching all nodes at the k-th layer is Then, for all layers, the overall complexity is The cardinality |P k (β)| can be estimated by calculating the volume ratio between the search sphere and the fundamental region of the lattice spanned by R k:d−1,k:d−1 [24], i.e., where V d−k is the volume of a (d − k)-dimensional ball of unit radius and is upper bounded by 5.2638 [24]. According to (42), we have Then, For ease of analysis, assume that the variances of check node messages v i 's are approximately the same if their corresponding h i 's are the same. Note that this approximation should hold accurately after the first few iterations. Because of the convolution (5) and stretching (6) steps, the variance of the i-th input check node message is whereV j is the variance of the variable node message with the moment matching approximation from the last iteration. According to (13) and (61), Besides, note that β ≤ β 1 due to (45). We then have It is now prepared to decide the order of |P k (β)| with the knowledge of the relationship between σ 2 and v i . By combining with (58), (60), (62), and (63), two cases are considered depending on whether there is one check node message corresponding to |h i | = 1: • When |h k | = · · · = |h d−1 | = w, where (a) is due to w = α d−1 and the fact that α is a constant. • When |h k | = · · · = |h i−1 | = |h i+1 | = · · · = |h d−1 | = w and |h i | = 1, .
According to (61), for By combining (64) and (66), the complexity of the LSD algorithm is The complexity of computing a variable node message is dominated by that of the LSD algorithm. As mentioned in Section IV-C, the complexities for computing the inputs to the simplified LSD algorithm are all O(d). According to (67), the search complexity of LSD algorithm is also O(d). For computing the check node messages, the convolution of several Gaussian functions is equivalent to taking the sum of their means and variances, whose complexity is again O(d).
Thus, we conclude that the complexity order of the LPE decoder is O(n · t · d). It is worth noting that LDLCs achieve slightly better error performance than the multilevel LDPC lattices [27] of the same dimensions. With linear complexity, the proposed LPE decoder further makes LDLCs competitive with the multilevel LDPC lattices.

B. ANALYSIS ON THE NUMBER OF OPERATIONS
Although the complexity order for the LPE decoder has been analyzed in Section V-A, for evaluating the complexity more explicitly, the comparison of floating-point operations (flops) between the LPE decoder and the M-Gaussian decoder [15] is presented in this subsection. A flop is assumed to include a real addition, subtraction, multiplication, or division. Besides, computing a square root needs 6 flops according to the IEEE floating-point representation [28] and 14 flops are needed for approximating an exponential function [29].

1) LPE DECODER
For one particular check node message, in the convolution step, calculating the mean needs d − 1 multiplications and d − 2 additions. In the stretching step, one more division is needed. Similarly, 2(d−1) multiplications and d−2 additions are needed to calculate the variance in the convolution step and two divisions are used in the stretching step. Thus, 5d−4 flops are counted for each check node message. For one particular variable node message, four d-dimensional vectors t, f, g, p, and the squared diagonal entries of R should be calculated first. The elements in these vectors can be either directly computed or iteratively obtained as mentioned before. Overall, computing these vectors needs 23d flops.
For the simplified LSD algorithm, 12 flops (from step 14 to step 16 in Algorithm 1) are required whenever a node is visited at one layer. Then, d−1 k=1 12N k flops are needed for the search where N k is the number of visited nodes at the k-th layer. Then, for each candidate in L, a Gaussian function is computed as described in (24), (25), and (28). Note that all Gaussian functions have the same variance which needs 2d flops and should be computed only once for all candidates. Besides, calculating the scaling coefficient is simple since the value of (z + h a) T Q(z + h a) is already given in the list output of Algorithm 1. Taking the normalization of scaling coefficients into consideration, the number of flops for computing one variable node message are 12 d−1 k=1 N k + (4d + 15)L + 2d with L being the list size. Finally, the moment matching approximation needs 6L flops.
It is critical to emphasize that N k and L are related to P k (β) which is defined in Section V-A. Explicitly, we have N k = |P k (β)| + |P k+1 (β)| and L ≤ |P 1 (β)|. From (64) and (66), N k and L are O (1). Therefore, the number of flops needed per variable node message is linear in d. In each variable node, the LPE decoder and the M-Gaussian decoder need to store L and M d−1 Gaussian functions, respectively. Since L is merely O(1), the LPE decoder needs less storage than the M-Gaussian decoder.

2) M-GAUSSIAN DECODER
Similar to the LPE decoder, for one check node message, the convolution and stretching steps take 5d−4 flops. Additional 3M + 1 flops are needed in the periodic extension.
For updating variable node messages, a forward-backward recursion is utilized in [15] for computing d messages sent from one variable node simultaneously. For the comparison, we first compute the number of flops needed for calculating d messages. Then, we use the average number of flops over d as the computational cost for one message. Given d input check node messages p l (x), the forward-backward recursion first computes two sequences of auxiliary messages for l = 1, . . . , d − 1 with θ 0 (x) = 1, and for l = d, d−1, . . . , 2 with φ d+1 (x) = 1. Then, each message sent from the k-th variable node is computed by multiplying two auxiliary messages as well as the channel message: Counting the numbers of flops needed for the aforementioned calculations is trivial referring to (14)- (16). However, it should be mentioned that each auxiliary message is a Gaussian mixture, and normalization of scaling coefficients is always needed. Besides, the variances of all Gaussian functions of a mixture are the same, which should be computed only once. The number of flops for obtaining d variable node messages is The numbers of flops needed per message for the LPE decoder and the M-Gaussian decoder [15] are summarized in Table 2.

VI. NUMERICAL RESULTS
Since the M-Gaussian decoder with a sufficiently large M achieves the best-known performance [15] and can be interpreted from the proposed lattice viewpoint, it is employed as the benchmark for the comparison with the LPE decoder in diverse aspects.

A. NOISE THRESHOLD
The asymptotic performance of the LPE decoder is first evaluated by considering the noise threshold. However, the true density evolution needs the joint distribution of the means and variances of messages, which is computationally intractable for LDLCs. Alternatively, we perform the Monte Carlo density evolution [15], [30] for acquiring the noise threshold.
We first define two sets of messages M (1) and M (w) . Messages in M (1) correspond to |h i | = 1 and those in M (w) to |h i | = w. The set M (1) contains 10 5 messages while M (w) has (d − 1) · 10 5 messages. All messages are denoted by means and variances. For initialization, all means of messages in both sets are randomly generated from the Gaussian distribution N (0, σ 2 ) and all variances are set to be σ 2 .
For the inputs of check/variable nodes, one message is drawn from M (1) and d − 1 messages are taken from M (w) , all randomly. The calculation of check/variable node messages follows decoding steps in the LPE decoder. The outputs of check/variable nodes will be the updated message sets for variable/check nodes iteratively. When the mean of message variances in M (w) is below 0.001 within 50 iterations, the convergence is declared [15], [30].
As reported in [15], increasing M beyond 3 provides no visible improvement in the error performance. We only compare the noise thresholds of LDLCs employing our decoder with those using the 3-Gaussian decoder, as indicated in Fig. 2. Since the value of d is commonly set to 7 [4], we also set d = 7 in the simulation. The noise threshold provides the limit of error performance of a decoder when the code length tends to infinity. As will be shown in Section VI-B, the gaps from the capacity of different decoders approach to the noise thresholds indicated in Fig. 2. The lowest noise threshold is 0.64 dB with α = 0.75. The 3-Gaussian decoder and the LPE decoder achieve almost the same noise thresholds.

B. FINITE LENGTH PERFORMANCE SIMULATION
Assuming that no power constraint is considered, the symbol error rate (SER) performance for different code parameters is shown in Fig. 3. Note that a symbol error is declared ifb i = b i for i = 1, . . . , n. The generating sequences are 1, 1/ √ 3, 1/ √ 3 for n = 10 2 with d = 3, and 1, 1/ √ 7, . . . , 1/ √ 7 for n = 10 3 , and 10 4 with d = 7, respectively. Simulation results are obtained by assuming that the all-zero codeword is transmitted through the  AWGN channel and the maximum number of iterations is set to 100. As illustrated in Fig. 3, the SER performance is almost the same for n = 10 2 and n = 10 3 when applying the LPE decoder and the M-Gaussian decoder. However, for n = 10 4 , the LPE decoder outperforms 0.15 dB over the 2-Gaussian decoder for the SER of 10 −5 and still achieves almost the same performance as the 3-Gaussian decoder. The 2-Gaussian decoder may miss some essential Gaussian functions for approximating variable node messages, whereas the LPE decoder achieves better approximation. As a consequence, the LPE decoder provides better SER performance than the 2-Gaussian decoder.

C. PROBABILITY EVALUATION FOR EMPTY LIST OUTPUT OF LSD ALGORITHM
As mentioned in Section IV-B, the LSD algorithm may output an empty list at a low probability. We evaluate the probability by decoding 100 noisy codewords with parameters n = 10 4 and d = 7. Note that each variable node should generate d messages. Thus, there are 7 × 10 6 samples under evaluation in total. The empty list output is only found at the first iteration. The variable node message will not be updated if an empty list is output. We provide the probability of an empty list output at the first iteration in Table 3. The probability of an empty list is around 10 −5 . In other words, for decoding one codeword, the message updating could be delayed at most by one iteration at a probability of 10 −5 .

D. MESSAGE VISUALIZATION
For examining the approximation accuracy, the variable node messages in the LPE decoder are visualized in this subsection, as well as those obtained by applying the 3-Gaussian decoder. For different iterations, messages in the 3-Gaussian decoder and the LPE decoder are compared for the same pair of variable and check nodes. Since the LPE decoder may encounter an empty list output at a probability of 10 −5 , the comparison is shown for both non-empty list case and empty list case.
The message visualization for the non-empty list case is depicted in Fig. 4. Messages before and after the moment matching approximation are visualized. At the first iteration, the messages in two decoders are slightly different but become almost the same with the iteration going on. The similarity of messages for the 3-Gaussian and the LPE decoder indicates that our decoder design has almost no degradation in terms of approximation accuracy. The consistent results of noise thresholds and error performance for the 3-Gaussian and the LPE decoder in previous subsections are further validated.
The message comparison for the empty list case is shown in Fig. 5. At the first iteration, the message in the LPE decoder is set to be the channel message since it cannot be updated as shown in Fig. 5(a). Although messages in two decoders at early iterations mismatch with each other in terms of their means and variances, they still become similar after sufficient iterations. In other words, the effect caused by the empty list at the first iteration can be remedied as long as the number of iterations is sufficiently large.

E. COMPARISON ON THE NUMBER OF FLOPS
As analyzed in Section V-B, the number of flops needed to compute a variable node message depend on the number of visited nodes N k and the list size L of the simplified LSD algorithm. Since the number of visited nodes at each layer and the list size may vary when computing different variable node messages, the average values of N k and L are considered [31]. We first evaluate the average number of visited nodes and that of the list size per variable node message numerically. Then, we calculate the average number of flops needed per message for both the M-Gaussian decoder and the LPE decoder.
The average list sizes at different iterations are depicted in Fig. 6 for n = 10 3 and 10 4 with different values of VNR. The maximum average list size does not surpass 13 for all considered cases, which potentially indicates that the best-known performance could be achieved by considering a fewer number of Gaussian functions than that used in the M-Gaussian decoder. Besides, the trend of the average list size matches well with the inherent property of the message passing decoder. At early iterations, the average list size increases since the decoder tries to approximate every variable node message with a maximum number of Gaussian functions. After reaching the peak, the list size starts decreasing because of the convergence property of the decoder. Particularly, the list size becomes one when the convergence is declared, which means that only one Gaussian function is dominant for approximating each variable node message. Besides, the convergence speed of the list size is sensitive to the value of VNR. It takes only 10 iterations to reach the convergence at high VNR (2 dB), but needs more than 40 iterations to declare the convergence at low VNR (0.7 dB).  The average number of visited nodes at each layer is shown in Fig. 7 by simulation with n = 10 3 , 10 4 and d = 7. Since the number of visited nodes may vary with iterations and values of VNR, the first 50 iterations are considered at both low and high VNR. It is observable that the trend of the average number of visited nodes is consistent with that of the average list size.
For the M-Gaussian decoder, the number of flops needed per variable node message is a constant regardless of the iteration index as given in Table 2. While for the LPE decoder, the number of flops is related to the average number of visited nodes and list size which both vary with iterations. For this reason, the maximum average number of visited nodes and list size over different iterations are considered to compute the number of flops needed for the LPE decoder. In Fig. 8, we show the average numbers of flops needed per variable node message for the LPE decoder and the M-Gaussian decoders with M = 2 and 3. For d ≥ 4, it is shown that the LPE decoder needs fewer flops on average than the M-Gaussian decoders. In particular, the number of flops of the LPE decoder is 33.9% of that of 2-Gaussian decoder and only 3.2% of that of 3-Gaussian decoder when d = 7.

F. RUNTIME COMPARISON
The runtime comparison between the M-Gaussian decoder and the LPE decoder is shown in Fig. 9 using MATLAB 2017b on a single computer, with an Intel Core i5-6500 CPU, a RAM of 8 GB, and Windows 10 Enterprise operating system. By choosing n = 10 3 , the exponential and linear complexity orders in d for two decoders are confirmed in Fig. 9, respectively. It is obvious that the LPE decoder always has less runtime than the 3-Gaussian decoder and becomes faster than the 2-Gaussian decoder when d is larger than 5. The result of the runtime comparison is consistent with that of the comparison on the number of flops in Section VI-E. Specifically, for d = 7, the LPE decoder achieves around (1-0.0791/0.4865)×100% = 83.7% runtime saving compared to the 3-Gaussian decoder.

VII. CONCLUSION
In this paper, an efficient LPE decoder for LDLCs has been proposed from the lattice viewpoint. For approximating the variable node messages in the form of Gaussian mixtures, each Gaussian function is first related to a lattice point in a specific lattice. A simplified LSD algorithm is then derived for efficiently enumerating lattice points corresponding to essential Gaussian functions. Compared to the M-Gaussian decoder whose complexity is O(n·t·M d−1 ), the LPE decoder has only linear complexity O(n · t · d). Nevertheless, the LPE decoder still achieves almost the same noise threshold and error performance as the M-Gaussian decoder. As an example, for n = 1000 and d = 7, compared to the 3-Gaussian decoder, the proposed LPE decoder reduces the number of flops and runtime by 96.8% and 83.7%, respectively.

APPENDIX DERIVATION OF (28) FROM (26)
Defineâ i a i + z s,i /h i . The exponent of (26) becomes Letâ = [â 1 , . . . ,â d ] T . The first term in the right-hand side of (71) can be written as where (b) is due to (24). The second term in the right-hand side of (71) is  He unified all known constructions of perfect roots-ofunity (also known as CAZAC) sequences, which have been widely used as communication preambles and radar signals. He published two books and 220+ journal/conference publications and is the inventor of 38 patents. He pioneered the lattice approach to signal detection problems, including sphere decoding and complex lattice reduction-aided detection. His research areas include coding and information theory, wireless communications, optical camera communications, and thermographic signal processing.