Low Complexity Iterative Receiver With Lossless Information Transfer for Non-Binary LDPC Coded PDMA System

In this work, we first proposed a non-binary low-density parity-check (NB-LDPC) coded pattern division multiple access (PDMA) scheme with the order of the Galois field equal to the size of modulation alphabet which can avoid the symbol-to-bit or bit-to-symbol probability conversion between the detector and decoder as in binary coded system. Specifically, we considered a 4-ary LDPC over Galois field (GF(4)-LDPC) coded PDMA system with quadrature phase shift keying (QPSK) modulation. At the receiver side, Gaussian approximation based message passing (GAMP) detection algorithm instead of standard message passing (SMP) is employed to achieve a tradeoff between the computational complexity and the detection performance. When iterative detection and decoding (IDD) algorithm is used, the symbol-wise extrinsic information of the detector and GF(4)-LDPC decoder can be exchanged without information loss. At last, we proposed a symbol-wise EXIT (S-EXIT) based iterative optimization algorithm to improve the system performance. Both the S-EXIT chart based analysis and numerical simulation results show the validity of the proposed scheme above.


I. INTRODUCTION
In current, non-orthogonal multiple access (NOMA) becomes a hot research topic in fifth-generation or beyond wireless communication system. The philosophy of NOMA is not only coincide with the information theory perspective to achieve multiple access channel capacity but also can support more users over limited resource which is important for high spectral efficiency and high throughput communication systems. In recent years, there are a large number of achievements have been reported on this issue. For example, in [1], a sparse spreading signature based NOMA scheme called low-density signature multiple access (LDSMA) is proposed, where a factor graph based message passing algorithm is used as multiple user detection (MUD) algorithm at the receiver side. The proposed sparse signature sequence The associate editor coordinating the review of this manuscript and approving it for publication was Rui Wang . combined with the message passing detection algorithm can efficiently reduce the computational complexity of the detector [2]. In [3], a modified NOMA scheme called pattern division multiple access (PDMA) is proposed, which employs unequal diversity pattern sequence to accelerate the convergence of the message passing based detector. The above two schemes employ the same kind of message passing based detection algorithm, which is referred to as standard message passing (SMP) algorithm in this paper. Take a closer look at the SMP algorithm, we find that the message updating at the function node decoder (FND) has an exponential computation complexity. When the system overload factor or the constellation size is large, the computational complexity is extremely high and sometimes becomes intolerable. In [4] and [5], Gaussian approximation based message passing algorithm (GAMP) has been used in multiple-input multiple-output (MIMO) system, especially in massive MIMO system, and shows effectiveness and advantage over traditional linear detection algorithm such as zero-forcing and minimum mean square error or theirs variants, in which case SMP algorithm is impractical. The difference between GAMP algorithm and SMP algorithm is that at the FND, the former models the received combined signal as Gaussian distributed random variable which thus leading to a linear computation complexity, while the later exhibits an exponential computation complexity since it is a chip-wise maximum a posteriori (MAP) detector.
Low-density parity check (LDPC) code constructed over Galois field (GF) of order q, i.e. GF(q)-LDPC, can achieve better performance than their binary counterpart especially in high order modulation system [6], [7]. However, a key issue obstacle non-binary LDPC from widely use is its underlying high decoding complexity if the GF order is high [8]. However, this drawback does not play a leading role in PDMA system with quadrature phase shift keying (QPSK) since we only consider LDPC code over low-order GF, i.e. GF(4)-LDPC code, which is of reasonable computational complexity. The main contributions are summarized in the following.
• We proposed a GF(4)-LDPC coded PDMA scheme with QPSK modulation constellation. Furthermore, GAMP based detection algorithm is employed to tradeoff the computation complexity and performance. GAMP algorithm has linearly complexity which is suitable for high overload case; • GF(4)-LDPC decoder is of reasonable computation complexity and low memory consumption when compared with high order LDPC code such as q ≥ 16, and can be coupled with the GAMP-based MUD seamlessly without information loss for QPSK system.
• We proposed a symbol-wise EXIT (SEXIT) chart based iterative optimization algorithm to further improve the receiver performance by optimizing the non-binary LDPC degree distribution.
In this paper, we use the same notation as in reference [5], in which N c (x; m, σ 2 ) 1 π σ 2 exp − |α − m| 2 /σ 2 denotes a complex Gaussian probability density function, i.e. x is a complex Gaussian random variable with mean m and variance σ 2 .
The remainder of this paper is organized as follows. In Section II, the system model of PDMA system is presented. Section III is about the SMP algorithm and low-complexity GAMP algorithm. In Section IV, the proposed symbol-wise iterative optimization algorithm is described in detail. Numerical results are presented in Section V, followed by concluding remarks in Section VI.

II. SYSTEM MODEL
In this work, we assume that orthogonal frequency division multiple (OFDM) is available for the uplink system. Consider a GF(4)-LDPC coded PDMA system with QPSK modulation, in which N t users transmit over N r resource elements (RE) to communicate to the base station (BS) simultaneously. For PDMA system, the system overload factor is defined as β = N t /N r and β > 1 [9]. For each user, source bits are encoded with a GF(4)-LDPC encoder, then the coded symbols are directly mapped to QPSK constellation A according to LTE standard [10]. In the next step, spreading each QPSK symbol onto N r REs using an user-specific pattern sequence (PS) with length N r [11]. Let vector {0, 1} N r ×N t denotes the pattern matrix (PM) of the PDMA system. As shown in Fig. 1, S 2,3 and S 3,6 are PMs corresponding to PDMA systems with 150% and 200% overload respectively [2]. At last, the derived signal is transmitted over wireless channel. For simplicity, we assume that all users and the BS are equipped with a single antenna. Furthermore, we consider the case that all users transmit with equal power, which is the worst situation in multiuser detection perspective. At the receiver side, the received signal associated with the N r REs can be expressed as where is the N r × 1 complex-valued vector denotes the channel coefficients vector from user i to the BS over the N r REs. More specifically, for 1 ≤ j ≤ N r , h j,i denotes the channel coefficient from user-i to the BS over RE-j. For vector h i , diag (h i ) returns a diagonal matrix with diagonal given by h i . The N r × 1 vector s i = [s 0,i , s 1,i , · · · , s N r ,i ] T is the PS of user i as aforementioned. Let x i denote the transmitted modulation symbol of user-i, n is the N r -dimensional Gaussian noise vector, i.e. n ∼ CN 0, σ 2 n I N r . For the j-th RE, the received baseband signal y j can also be expressed as where n j is the j-th element of Gaussian noise vector n in (1), i.e. n j ∼ CN 0, σ 2 n . VOLUME 8, 2020

III. SYMBOL-WISE GAUSSIAN APPROXIMATION MESSAGE PASSING BASED MUD
In this section, we first give a review of the SMP algorithm and GAMP algorithm. It can be shown that for QPSK modulation and PDMA system, the computational complexity of the MUD can be significantly reduced by the Gaussian approximation.

A. QUASI-OPTIMAL STANDARD MESSAGE PASSING (SMP) BASED DETECTION ALGORITHM
Under factor graph framework, as shown in Fig. 2, the detector can be partitioned into two kinds of nodes, one is the function node decoder (FND) which associated with the received signal for each resource element (RE) y j , 1 ≤ j ≤ N r , the other is the variable node decoder (VND) corresponding to the transmitted signal x i , 1 ≤ i ≤ N t . The message passing algorithm operates with the extrinsic information exchanged between these two kinds of nodes iteratively along the edges. SMP algorithm can be described as formula (3) and (4) [12].
where p x i →y j (x i ) denotes the extrinsic probability information from variable node x i to function node y j , p y j →x i (x i ) is the extrinsic probability information in opposite direction, and f (y j |x) = 1 In the above two formulas, χ(i)\j denotes the set of function node neighboring to variable node i except function node j. In principle, if the factor graph of the MUD is cycle-free, SMP can achieve the same performance as MAP algorithm. However, for large modulation alphabet size or high overload PDMA system, the SMP algorithm becomes impractical.

B. GAMP-BASED DETECTION FOR PDMA 1) MESSAGE PASSING FROM FND TO VND
As shown in formula (4), the calculation of the extrinsic information from FND y j to VND x i is of exponential computation complexity since we need to marginalize the joint distribution. The key point of GAMP algorithm is to model the input a priori message of FND p x i →y j (outgoing messing of VND) as continuous Gaussian random variable. Based on this idea, the extrinsic message passing from VND x i to FND y j is therefore approximated as Gaussian random variable, denoted as p x i →y j , with mean m x i →y j and variance σ 2 x i →y j [5]. As a result, the extrinsic information in probability manner can be expressed as where m y j →x i and σ 2 y j →x i denote the mean and variance of the Gaussian distributed extrinsic information p y j →x i (x i ) from FND y j to VND x i respectively. According to (2) and (5), we have 2

) MESSAGE PASSING FROM VND TO FND
The incoming message of the VND x i contains two types of probability information, one is the information, denoted as p e i →x i (x i ), feedback from channel decoder, the other is the message transferred along d v connected edges. To compute the extrinsic information delivered from variable node x i to function node y j , i.e. p x i →y j (x i ), we need to perform the following three steps.
• Step 1: Calculate the distribution of the product of d v −1 input Gaussian distributed information except the j-th edge. Letm x i andσ 2 x i denote the mean and variance of this combined Gaussian distribution N c x i ;m x i ,σ 2 x i respectively, according to the rule of the product of Gaussian distributions [13],m x i andσ 2 x i can be computed as follows • Step 2: Calculate the likelihood probability conditioned on the combined Gaussian distribution N c x i ;m x i ,σ 2 x i and the discrete a priori distribution p e i →x i (x i ) feedback from channel decoder as following,  where in (10) with a continuous Gaussian distribution.
To achieve this goal, we resort to moment matching as shown in [14] and [4]. The yielding continuous Gaussian distribution, denoted as p x i →y j (x i ) = N c x i ; m x i →y j , σ 2 x i →y j , with mean m x i →y j and variance σ 2 x i →y j respectively can be evaluated as follows, where A denotes the constellation.

IV. PROPOSED JOINT FACTOR GRAPH BASED OPTIMIZATION
Since both GAMP detector and non-binary LDPC decoder can be depicted by factor graph, it is reasonable to represent the IDD receiver as a joint factor graph as shown in Fig. 3. A summary of the notations used is presented as follows, y i,j : The j-th FND of the i-th subblock of MUD-FND.
x i,j : The j-th VND of the i-th subblock of MUD-FND. v i,j : The VND of the LDPC code, i.e. LDPC-VND, and v i,j = x i,j for analysis purpose. c i : The CND of the LDPC code, i.e. LDPC-CND.
The average mutual information (AMI) from LDPC-VND to MUD-VND.
The AMI from MUD-VND to LDPC-VND.
The AMI from LDPC-VND to LDPC-CND.
The AMI from LDPC-CND to LDPC-VND. The analysis and optimization of the non-binary LDPC coded PDMA system can be carried out under the joint factor graph framework by some powerful tools such as symbol-wise EXIT chart.

A. SYMBOL-WISE EXIT (S-EXIT) CHART BASED ANALYSIS
Extrinsic information transfer (EXIT) chart is widely used in the analysis and design of iterative detection and decoding (IDD) system. A lot of research work have demonstrated its effectiveness in predicting the threshold of IDD system [15], [16]. To obtain the EXIT chart of the iterative receiver, we first need to partition the receiver into some component detector/decoders. Then we evaluate the output AMI of the component detector/decoder with respect to the input a priori AMI. The derived functional relationship between the output and the input AMI is referred to as component-EXIT. When all component-EXIT charts are obtained, we can visualize the IDD system by a joint EXIT with all component-EXIT charts coupled according to their input and output relationship. Since it is difficult to track the actually exchanged message between the component decoders accurately. we resort to Monte-Carlo simulation aided symbol-wise EXIT analysis. The outline of this method can be summarized as follows.
For each I A ∈ [0, 1], we model the a priori message according to formula (14). At the output of detector/decoder, we obtain the extrinsic information in terms LLR or probability manner, then we use formula (15) is the mapping function from code symbol to constellation point, according to [16], the a priori LLR can be modeled as a q-1 dimensional Gaussian vector with mean m = −σ 2 /2 · · · −σ 2 /2 T and covariance matrix C as Let denote a q-1 dimensional Gaussian vector and ∼ CN (0, I q−1 ), then the a priori LLR W can be modeled as , then the corresponding a priori LLR of x i = M (α) can be modeled as The last equality is obtained by eliminating the first element of the q dimensional vector L since w 0−α − w −α = 0.

2) THE CALCULATION OF THE OUTPUT AVERAGE MUTUAL INFORMATION
Instead of integrate the q-dimensional distribution, we resort to a numerical method which exploiting the ergodic characteristic of the transmitted code symbol. According to Theorem 2 in [17], we evaluate the output extrinsic AMI as follows denotes the q-ary entropy function and 0 ≤ H (x) ≤ 1, q is the order of the GF. With this definition, H (x) = 1 when x takes {0, 1, · · · , q − 1} with equal probabilities. p x k |y, L \k is the output extrinsic information in probability manner at the output of the detector or decoder.

B. PROPOSED SYMBOL-WISE EXIT BASED ITERATIVE OPTIMIZATION ALGORITHM
Although GF(4)-LDPC code is employed, there is still ample room for improvement when iterative detection and decoding algorithm is used [18]. To facilitate the optimization, we combine the FND module and VND module as module I, the CND individually is modeled as module II as shown in Fig. 4. In the following we will show how to obtain the component-EXIT corresponding to these two modules in detail. Let λ = λ 2 , λ 3 , · · · , λ D v and ρ = ρ 3 , ρ 4 , · · · , ρ D c denote the variable node degree distribution and check node degree distribution respectively, where D v and D c denote the maximum degree of VND and CND respectively, then the degree distribution pair (λ, ρ) gives a description of an ensemble of non-binary LDPC codes. The code rate of LDPC code is [19] where

1) CALCULATION OF THE EXIT CHART OF MODULE I
• The relationship between the input I

(I)
A and output I v of VDN can be expressed as • The relationship between the output AMI I s and input AMI I v of FND can be denoted as It can be seen from formula (18) that I s is related to channel parameter E b /N 0 . This function can be obtained by Monte Carlo simulation. In Fig. 5, the relationship between I v and I s when E b /N 0 changing from 3.5dB to 5.5dB with interval of 0.1dB for S 2,3 PDMA system is shown.
• The relationship between the output AMI I   (14). For a  specifical check node degree j, the output extrinsic LLR can be obtained by numerical simulation, then the output AMI can be evaluated using formula (15). By curve fitting, we can obtain the function I (II) A ) for each considered check node degree j. Fig. 6 gives an example of the CND curves with check node degree from j=3 to 10 for GF(4)-LDPC code. Thus, for all check node degree j ∈ [3, D c ], the combined EXIT chart can be written as Then the symbol-wise EXIT chart based iterative optimization algorithm can be summarized as Algorithm 1. The optimization process can be implemented with the simple linear programing algorithm [20]. It is need to point out that we can find the near-optimal result since the algorithm searchs from high E b /N 0 and low code rate to low E b /N 0 and high code rate.

A. OPTIMIZATION RESULTS AND BER COMPARSION
In this section, we obtain two irregular GF(4)-LDPC codes, denoted as code 1 and code 2 whose key parameters are

Algorithm 1 Symbol-Wise EXIT Chart Based Two-Stage Optimization Algorithm for NB-LDPC Coded PDMA
Input: Target code rate R T , sufficiently low initial code rate 0 < R c < R T < 1, sufficient high E b /N 0 , maximum variable node degree D v , maximum check node degree D c , the prefix number of optimization iterations N _iter = 3; Output: degree distribution pair (λ, ρ), code rate R; for n = 1 : N iter Step 1:With check-degree profile fixed, optimize the variable node degree profile as follows, Step 2:With the variable node degree profile fixed, optimize the check node degree profile as follows, Step 3:calculate the code rate R, if |R − R T | ≤ 0.05, jump to step Return, otherwise E b /N 0 = E b /N 0 − 0.1 and jump to step 1. end for Return degree profile pair(λ, ρ), code rate R and threshold SNR E b /N 0 .
shown in Table 1, with the proposed optimization algorithm for 150% and 200% overload cases respectively. In Table 1, code 3 denotes the irregular GF(4)-LDPC code with the same degree distribution as World Interoperability for Microwave Access (WiMax) LDPC code but with code length of N s = 9600 and random constructed [21], [22]. Fig.7 shows the EXIT-chart of the PDMA systems with regular-(3,6) GF(4)-LDPC, irregular GF(4)-LDPC code (with the same degree distribution as WiMax LDPC, denoted as code 3 as shown in Table 1), and the optimized irregular LDPC code (code 1) respectively. In Fig. 7(a), with regular-(3,6) GF(4)-LDPC, the threshold SNR predicted by SEXIT in terms of E b /N 0 is 4.9dB. In Fig. 7(b), the threshold E b /N 0 = 4.2dB with code 3. While in Fig. 7(c) the optimized irregular LDPC (code 1) proposed in this work with threshold E b /N 0 = 3.3dB as predicted by SEXIT. In order to verify the effectiveness of the proposed optimization algorithm, we constructed three GF(4)-LDPC codes according to the corresponding degree distributions as shown in Table 1 and a regular-(3,6) GF(4)-LDPC code. All codes are of length N s = 9600 symbols over GF (4) and are of code rate R = 0.5. The nonzero elements of theirs pairty check matrix are chosen from the nonzero elements of GF(4) randomly. The WiMax LDPC (code 3) has maximum variable node degree D v = 6 while the optimized irregular LDPC code, code 1 and code 2, with D v = 10 and D v = 17 respectively. In all simulation settings, QPSK modulation is employed and identical independent fading channel model is used. Meanwhile, we assume that channel fading coefficients are perfectly known at the receiver side but not known at the transmitters. In both Fig. 8 and Fig.9, the number of outer iterations Out_Iter = 15, the number of iterations of GAMP/SMP detector In_iter = 6, and the number of iterations of LDPC decoder is set to 30. The outer iteration denotes the message exchanging between detector and LDPC decoder, while the inner iteration denotes the message exchanging between the VNDs and FNDs within SMP/GAMP detector. In Fig. 8, the numerical simulation results for 150% overload PDMA system with PM S 2,3 are shown. We found that the optimized irregular GF(4)-LDPC code (code1) outperforms regular-(3,6) GF(4)-LDPC code about 1.4dB at bit error ratio (BER) level of 1e-5. Furthermore, the optimized irregular GF(4)-LDPC has about 0.5dB performance gain than code 3. Fig. 9 shows the BER simulation results of 200% overload PDMA system with PM S 3,6 . The proposed irregular GF(4)-LDPC (code 2) coded system outperforms the regular-(3,6) GF(4)-LDPC coded one 3dB while outperforms code 3 coded system about 1dB at the BER level of 1e-5. When system overload becomes larger, the performance gain introduced by the proposed algorithm becomes more prominent.
We also give a comparison between the GF(4)-LDPC coded scheme and GF(2)-LDPC coded system as shown in Fig. 10. The threshold E b /N 0 of WiMax LDPC (binary code in [22]) coded PDMA with PM S 2,3 and QPSK is 3.9dB. When we constrain the D v = 10 and D c = 10 for comparison purpose, then the threshold E b /N 0 (predicted by EXIT chart as reference [18]) of the optimized GF(2)-LDPC is also 3.9dB. In Fig. 10, the green solid line with square marker denotes the BER of optimized GF(2)-LDPC code with degree distribution and with code-length of N b = 2304bits.   We find that the optimized GF(2)-LDPC code exhibits about the same BER performance as WiMax code with the same code-length. For the optimized GF(4)-LDPC code (code 1), when its code length N s = 1152 symbols, with GAMP-based detector (marked with GA as in Fig. 10), it outperforms both the WiMax LDPC code and optimized GF(2)-LDPC code system with GAMP about 1dB respectively, and achieves about the same performance as WiMax coded PDMA with SMP-based detector. Furthermore, when the code length of the GF(4)-LDPC code increases to N s = 9600symbols, it performs about 0.9dB better than the optimized GF(2)-LDPC code with length N b = 19200bits at the BER level of 1e-4.

B. COMPLEXITY COMPARISON
In this subsection, we give a simple comparison between the GF(4)-LDPC coded PDMA with GAMP detector and GF(2)-LDPC coded PDMA with SMP detector. The bottle-neck of the SMP algorithm is the computation complexity of underling chip wise MAP of FND, which need to compute |A| d c Euclidean distance be of the form |y j − h j,i x i − i =i h j,i x i | 2 with all x i fixed, 1 ≤ i ≤ d c . Thus the computational complexity is 8(d c + 3)|A| d c floating point of operations (FLOPs) for each FND. While the computational complexity of GAMP algorithm of the FND, according to [5], is 17|A|d c . Fig. 11 shows the relationship between the number of FLOPs and the number of users (d c ) collide on a specific RE. When d c = 4, such as PM S 3,6 case, the complexity of GAMP is about only 1/32 of that of SMP algorithm. For LDPC code, when forward and backward based log-domain belief propagation decoding algorithm is employed, the computational complexity of CND is proportional to q 2 and 3d c − 2. So the computation complexity of GF(4)-LDPC is about 4 times of GF(2)-LDPC code conditioned on the same D c . Meanwhile, the GF(4)-LDPC decoder needs about 50% more memories than GF(2)-LDPC. For the code 1, code 2 and code 3 in Table 1, they are of the similar decoding complexity since they have about the same maximum check node degree.

VI. CONCLUSION
In this work, we have proposed a non-binary LDPC coded PDMA scheme. By combining the symbol-wise mapping VOLUME 8, 2020 from the non-binary code symbol to modulation constellation and an symbol-wise multiuser detection algorithm, a seamless information transfer IDD receiver is achieved at the receiver side. Compared with the existing work, the proposed scheme in this paper is of good performance and has relatively low front-end detection complexity while keep the computational complexity of the channel decoder at an reasonable level. The proposed GF(4)-LDPC coded PDMA scheme is of practical importance for future wireless applications.  XUEYAN CHEN received the B.S. degree in electronic and information engineering from PLA Information Engineering University, in 2009, and the M.S. degree from Shenyang Ligong University and the Ph.D. degree from the Beijing University of Posts and Telecommunications (BUPT), in 2012 and 2018, respectively. She is currently a Lecturer with the School of Computer and Communication Engineering, Zhengzhou University of Light Industry, Zhengzhou, China. Her research interests include cognitive radio, relay systems, physical layer security, and energy harvesting. VOLUME 8, 2020