Optimization of Non-Binary LDPC Coded Massive MIMO Systems With Partial Mapping and REP Detection

In this work, a non-binary low density parity check (LDPC) coded high dimensional multiple input multiple output (MIMO) scheme with partial mapping for high order modulation is proposed. For the proposed scheme, when <inline-formula> <tex-math notation="LaTeX">$M$ </tex-math></inline-formula>-ary quadrature amplitude modulation (QAM) is employed, then non-binary LDPC code constructed over Galois field with order <inline-formula> <tex-math notation="LaTeX">$\sqrt {M}$ </tex-math></inline-formula> is used for partial mapping, where <inline-formula> <tex-math notation="LaTeX">$\sqrt {M}$ </tex-math></inline-formula> is an integer. At the receiver side, a real-valued expectation propagation (REP) based detection algorithm is used to couple with the non-binary decoder seamlessly. Meanwhile, the symbol-wise <italic>a-priori</italic> information returned by the non-binary decoder is used to aid the REP detection. Furthermore, a symbol-wise extrinsic information transfer (SEXIT) chart based iterative optimization algorithm is used to optimize the concatenated non-binary LDPC code. This work proposed to model the <italic>a-priori</italic> information of non-zero codes/symbols of detector/decoder as the output of a cyclic-symmetric additive white Gaussian noise (AWGN) channel and a simplified method based on interpolation is proposed to calculate the component-EXIT chart of the REP-detector for massive MIMO. The proposed method can facilitate the optimization and avoid a large amount of simulations. Both SEXIT chart based analysis and numerical simulation results demonstrate the validity of the above idea.


I. INTRODUCTION
In fifth generation (5G) and beyond mobile communication systems, massive multiple-input multiple-output (MIMO) and high order modulation are considered as two necessary technologies to improve the system throughput and spectral efficiency in enhance mobile broadband (eMBB) scenario [1], [2]. The key issue in the design of open loop MIMO system is to develop a low complexity and high performance detection algorithm especially for high dimensional MIMO and/or high order modulation [3].
Non-binary low density parity check (NB-LDPC) codes show great advantages over their binary counterparts due The associate editor coordinating the review of this manuscript and approving it for publication was Li Zhang. to their better performance under non-binary channel and easier to decouple with high order modulation. In [4], 256-ary LDPC over Galois field (GF(256)-LDPC) is employed in MIMO system with two antennas and 16-ary quadrature amplitude modulation (16QAM). It achieves near capacity performance with joint detection algorithm but the computational complexity of both detector and decoder are extremely high. As a result, this scheme is impractical for more than four antennas since the Galois field (GF) becomes very large. In [5], standard belief propagation (SBP) based detection algorithm under factor graph framework is proposed and exhibits near optimal performance for MIMO system with up to eight antennas and binary phase shift keying (BPSK) modulation. However, it is also impractical for massive MIMO system since it is essentially a high complexity chip-wise VOLUME 10, 2022 This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/ maximum a-posteriori (MAP) detection. In [6], Gaussian approximation-based message passing (GAMP) algorithm is proposed as a simplified version of SBP. By approximating the exchanged message with Gaussian distribution, the information updating of GAMP is operated with linear complexity. Moreover, in [7] and [8] non-binary LDPC code is combined with the GAMP algorithm to enhance the front-end detector in iterative detection and decoding (IDD) manner. But there is still room for improvement in terms of performance and complexity.
In [9], expectation propagation (EP) algorithm is proposed to further reduce the computational complexity of the MIMO variable node (VN) decoder and exhibits great advantage over classical minimum mean square error with successive interference cancelation (MMSE-SIC) detection algorithm [10] and Gaussian tree approximation with successive interference cancelation (GTA-SIC) algorithm [11]. Furthermore, some modified EP algorithms are applied to massive MIMO systems and achieve either performance gain or complexity reduction [12]- [17]. The state-of-theart scheme in [18] employs non-uniform priors distributed according to the channel decoder output to improve the EP-based iterative receiver for high order modulation system. However, only a regular-(3,6) LDPC code over GF (2) is used.
In this paper, we proposed a low-order non-binary LDPC coded massive MIMO scheme with partial mapping for high order modulation. At the receiver side, a real-valued EP (REP) based IDD scheme is employed to couple with non-binary LDPC decoder seamlessly [19]. The difference between our scheme and that in [4] lies in that the later aims to obtain capacity approaching performance at the cost of higher complexity, while the former to achieve a trade-off between complexity and performance. Compared with [7], [8] and [18], we employ the REP-based detection algorithm instead of GAMP/EP at the receiver side. Furthermore, symbol-wise extrinsic information transfer (SEXIT) chart based iterative optimization algorithm is used to optimize the concatenated non-binary LDPC code. In our previous work [19], it is proposed to model the a-priori loglikelihood-ratio (LLR) information of non-zero code-symbol as the output of a cyclic-symmetric additive white Gaussian noise (AWGN) channel and a simplified method based on interpolation is proposed to calculate the component-EXIT chart of the REP-detector for massive MIMO. This full paper gives a full exposition of the method to model the a-priori LLRs of non-zero codes/symbols. In addition, the bit error ratio (BER) performance comparison between EP-IDD and proposed REP-IDD is also considered. Furthermore, the detailed method to carry out the iterative optimization algorithm omitted in [19] is also given. For notation clarity, let I n denotes a n × n identity matrix. Let N R (x i ; m, σ 2 ) 1 √ 2πσ exp − 0.5(x i − m) 2 /σ 2 denote a Gaussian probability density function (PDF), which implies that x i is a realvalued Gaussian random variable with mean-value m and variance σ 2 .
The paper is organized as follows. Section II is about the real-valued system model. Section III introduces the REP-based detection algorithm and the PDF evolution of the output LLRs of REP-detector's VNs. Section IV is about the system optimization with SEXIT and the proposed method to calculate the component EXIT. Section V is the simulation results. We conclude in Section VI.

II. SYSTEM MODEL A. SYMBOL-INTERLEAVED CODED MODULATION WITH PARTIAL MAPPING FOR SPATIAL MULTIPLEXING MIMO
Consider a non-binary LDPC coded massive MIMO system with N t transmit antennas and with N r receive antennas. Orthogonal frequency division multiplexing (OFDM) is used for data transmission. It is assumed that the transmitter and the receiver are perfectly synchronized and all channel coefficients are perfectly known at the receiver side but not known at the transmitter. As shown in Fig. 1, input binary information bits are converted to q-ary information sequence u = [u 1 , u 2 , . . . , u K s ], u i ∈ GF(q) each representing p = log 2 q input bits. The output code word of rate R = K s /N s GF(q)-LDPC encoder is denoted as where v i ∈ GF(q). After interleaved, the output sequence is expressed as v = [v 1 , v 2 , . . . , v N s ]. Define a real-valued symbol set, i.e., q-ary pulse amplitude modulation (q-PAM) X is defined as the mapping from GF(q) to A, i.e., X : GF(q) → A. M -ary square QAM constellation is employed, = {x l |x l = x I + jx Q }, where x I ∈ A and x Q ∈ A denote the in-phase component and quadrature respectively. In this work, partial mapping that maps two GF(q)-LDPC code symbols onto a QAM constellation point is employed [21]. This is expressed as : GF(q), GF(q) → (x I , x Q ). The layer mapping and OFDM following partial mapping are used as convention although not shown in Fig. 1. The frequency domain symbol at the k-th transmit antenna is denoted as x k , where x k ∈ , and is the QAM constellation. Let x c ∈ N t ×1 denote the transmitted symbol vector over all transmit antennas. At the receiver side, the corresponding received baseband signal vector y c ∈ C N r ×1 is where H c ∈ C N r ×N t denotes the complex-valued channel coefficient matrix, whose elements are obtained from an independent identically distributed (i.i.d) complex Gaussian distribution with zero mean and unit variance, n c ∈ C N r ×1 is the noise vector whose entries are modeled as i.i.d CN (0, N 0 I N r ), and E s = E[ x c 2 ] is the average energy of transmitted symbol-vector. The average bit signal-to-noise-ratio E b /N 0 is expressed as follows, where R c denotes the channel coding rate and m = log 2 M denotes the number of bits corresponding to each constellation point. N 0 = 2σ 2 n is the single side power spectral density of noise.

B. REAL-VALUED EXPRESSION FOR MASSIVE MIMO SYSTEM
To perform the real-valued expectation propagation based detection, the complex-valued model (1) should be converted into a real-valued expression as follows, where , and n = Re(n c ) Im(n c ) .

C. REP-IDD STRUCTURED RECEIVER
At the receiver side, as shown in Fig. 1, the REP-based MIMO detector computes the a-posteriori probability information with respect to the transmitted code-word v based on the channel receive vector y and channel matrix H . After interleaved, this information is used for non-binary LDPC decoding. In contrast to [18], non-binary LDPC instead of binary code is used to couple with the REP-detector seamlessly. The output extrinsic LLR information of non-binary LDPC decoder is delivered to the REP-detector as a-priori information to further improve its performance, which is referred to as REP-IDD. REP-IDD structured receiver contains three types of iterations. The first one is the iteration within the REP-detector, denoted as inner-iteration (or REP-iteration), the second one is the iteration within the non-binary LDPC decoder, denoted as LDPC-iteration, and the third one is the iteration between REP-detector and non-binary LDPC decoder, denoted as outer-iteration. In order to analyze and optimize the IDD system, the a-priori LLR of REP-detector or non-binary LDPC decoder is modeled as a multivariate Gaussian distribution, which is the output of a cyclicsymmetric AWGN channel, where the mean-value vector and variance matrix of that multivariate Gaussian distribution evolved as the iteration proceed.

III. PROPOSED REAL-VALUED EXPECTATION PROPAGATION BASED DETECTION
For the real-valued system model in (3), the a-posteriori probability of the transmitted signal vector is expressed as where R n = σ 2 n I N r . For non-IDD system, a-priori p a (x i ) conforms to non-informative uniform distribution [20]. For IDD system in this work, a-priori p a (x i ) can improve the detection performance when the system works above an critical signal-to-noise-ratio (SNR), which is called threshold SNR. The discrete a-priori returned by the LDPC decoder can be expressed as p , where superscript denotes the -th outer iteration and i p i = 1. However, the computation of marginal p(x i |y, H) of (4) directly is impractical when N t and/or the size of A is large since it is needed to sum over all possible values of all variables except x i . EP and the proposed REP algorithm in this work, are aim at finding a feasible Gaussian approximation q(x) to the a-posteriori distribution (4) by replacing each non-Gaussian distributed p a (x i ) with a continuous Gaussian distribution t i are updated iteratively within the EP inner iteration. As a result, the multivariate Gaussian distribution q(x) is expressed as, where As a result, the mean value vector µ q and covariance matrix q are given by, and EP algorithm tries to find the closest multivariate Gaussian distribution q(x) to the a-posteriori distribution p(x|H, y) where the closest is in terms of the Kullback-Leibler divergence, The optimization solution of (8) is obtained by matching the first and second moments of p(x|H, y) and q(x) respectively. It is performed as Var EP algorithm is used to find the best factors t i (x i ) of q(x) in (5) one by one and to refine them through successive iterations. The IDD structured receiver with EP as front-end detector, i.e., EP-IDD, is summarized as follows.
(1) Compute the first and second moments of the cav- (2) Compute the mean value µ new (x i ) and variance (9) and (10), as follows Step 3: Compute the extrinsic information q e (x i ) from EP-detector to non-binary LDPC decoder. Indeed, q e (x i ) = q ∼i (x i ), the output extrinsic information of x i with Gaussian distribution with mean m e and variance γ e .
Step 4: With q e (x i ) as a-priori, non-binary decoder computes extrinsic information, which is delivered to the EP-detector as a-priori information p ( +1) a (x i ) for the next round out-iteration. step 5: Repeat Step 1 to Step 4 for a certain number of iterations.
As can be seen from Step 1 of the Algorithm above, to obtain the initial marginal distribution of all x i , it is necessary to perform matrix inverse operations as in (6) and (7). However, for massive MIMO system, this is unexpected due to complexity consideration. To overcome this problem, expectation propagation algorithm based on graphical model, referred to as REP-IDD, can be used instead of the above EP-IDD version.

A. PROPOSED REAL-VALUED EP ALGORITHM BASED ON FACTOR GRAPH
The REP-based detector, as shown in Fig. 2, has 2N r function nodes (FNs) corresponding to the received component-wise real-valued signals, and 2N t variable nodes (VNs) corresponding to the transmitted component-wise modulation symbols. The edge between y j and x i , 1 ≤ i ≤ 2N t , 1 ≤ j ≤ 2N r , indicates that the j-th FN receives all the transmitted signals of all users due to MIMO system is full connected factor graph. The reliability information exchanged between VN x i and FN y j are modeled as continuous Gaussian distributions, denoted as x i →y j ), respectively. The REP-IDD algorithm is summarized as follows.
Step 1: Message passing from VNs to FNs (1) Calculate the a-posteriori probability of all VN x i for 1 ≤ i ≤ 2N t , denoted asβ x i , this is carried out as the following procedures. Fig. 2(a). According to the product rule of Gaussian distributions [22], we have where N (i) denotes the set of FNs neighboring to VN x i .
• Calculate the a-posteriori probability of VN x i for all i.
We resort to the moment matching as in [23], we have (2) Calculate the extrinsic information from VN to FN. According to expectation propagation principle, the extrinsic information in terms of probability from VN x i to FN y j , denoted as p And we have Step 2: Message passing from FNs to VNs Similarly, as shown in Fig. 2(b), the extrinsic information in terms of probability from FN y j to VN x i can also be approximated as Gaussian distribution, denoted as where N (j)\i denotes the set of VNs neighboring to FN y j with x i excluded.
Step 3: Calculate the extrinsic information from REP-detector to non-binary LDPC decoder For α ∈ GF(q) and x i ∈ A, the extrinsic probability information is calculated as Step 1 and Step 2 are performed for a prefixed number of iterations, referred to as inner iteration, and then jump to Step 3. For non-binary LDPC coded system, the exponential operation in (21) can be avoided since we need to convert the probability information to LLR information delivered to LDPC decoder as a-priori information. The rest of REP-IDD is the same as the aforementioned EP-IDD.

B. THE PROBABILITY DENSITY FUNCTION OF THE OUTPUT LLR-VECTOR RANDOM VARIABLE OF REP DETECTOR
For discrete random variable x i ∈ A defined in Section II-A and α ∈ GF(q) = {0, 1, . . . , q − 1}, X is bijective mapping from GF(q) to A. The LLR value of x i is defined as in (21). Then the corresponding LLR-vector random variable can be expressed as since L x i = X (0) = 0 by definition. For example, assume that 16QAM and GF(4)-LDPC are employed, N t = N r = 4, we set the first of the N t layer-data to all-zero codeword, while the remaining N t − 1 layers with random selected codeword over GF(4). The output 3-dimensional LLR-vector random variable of VN x i is denoted as Fig. 3 shows that the first dimensional PDF f 1 (L x i ) is approximately Gaussian distribution as the number of inner iterations increases under REP-based detector. The second and third dimensional PDFs have the similar shape as f 1 (L x i ), so we omit it. Meanwhile, it implies that 5 iterations are sufficient for the REP-detector to converge for 4 × 4 MIMO system.

IV. SEXIT CHART BASED ANALYSIS AND OPTIMIZATION
Extrinsic information transfer chart (EXIT) tool has attracted great attention in the design of LDPC codes in wireless systems [25]- [28]. In this section, symbol-wise extrinsic information transfer (SEXIT) chart [24], [29]- [32] is employed to optimized the concatenated non-binary LDPC code to further improve the system performance when IDD-aided receiver is used. Before we proceed to the computation of SEXIT, it is needed to give a description of the method to model the a-priori information and the method to evaluation the average mutual information (AMI) [33].
A. MODEL THE a-priori LLR AS q − 1 DIMENSIONAL GAUSSIAN DISTRIBUTION Let e denote a q−1 dimensional column vector whose entries are all 1. As pointed out by [30], [34], if v i = 0 ∈ GF(q), the a-priori LLR L a ∈ R q−1 of v i is modeled as a q − 1 dimensional Gaussian distribution with only one parameter σ . The mean value vector m L ∈ R q−1 and covariance matrix R L ∈ R (q−1)×(q−1) of L a are written as and Therefore, the a-priori LLR L a is generated as where z ∈ R q−1 is a Gaussian random vector with z ∼ N (0, I q−1 ). In this section, the above method is extended from non-binary code symbol to modulation symbol. Furthermore, since the transmitted codes/symbols are random selected from GF(q) or q-PAM, not always equal to zero. We proposed to model the a-priori LLR of v i = α ∈ GF(q) and x i = X (α) according to Proposition 1 and Proposition 2 respectively.
. If x i = α is transmitted, the corresponding a-priori LLR L can be expressed as where the subscripts of l in (25) are operated in GF(q). Proof: As shown in Lemma 17 of [30], the output LLR l = [l 1 , . . . , l q−1 ] T of cyclic-symmetric AWGN channel is permutation-invariant. we have Furthermore, since in this work the LLR is defined as . .
where LLR function denotes the conversion from LLR-vector to probability-vector and LLR −1 is the inverse operation. After eliminating the first row of (27), we get (25). Therefore, we can model the a-priori LLR of v i = 0 as the output of a cyclic-symmetric AWGN channel according to Proposition 1 since they have the same AMI as v i = 0. Proposition 2: Let L a = [l 1 , l 2 , . . . , l q−1 ] T denote the a-priori LLR of x i = X (0), when v i = α ∈ GF(q), the corresponding a-priori LLR of x i = X (v i ) can also be modeled as (25).
Proof: Since the mapping X from GF(q) to A is bijective, then the mutual information I (L a , v i ) = I (L a , x i ) and So it is reasonable to model the a-priori LLR of x i as (25).

B. EVALUATE THE SYMBOL-WISE AMI VIA SAMPLING AVERAGE
It is seems difficult to obtain the symbol-wise AMI between the transmitted signal and the corresponding output extrinsic information analytically since a underlying q−1 dimensional integration is a must [35]. Fortunately, this can be solved by an approximate method borrowed from [30], [36]. Theorem 1 in [36] and Theorem 2 in [30] give some explanations of how to evaluate the AMI for arbitrary transmitted signal x i ∈ A\X (0).
Theorem 1: For an APP (a-posteriori probability) detector, let y denote the channel observation, L a \i and L e i denote the input a-priori with L a i excluded and output extrinsic LLR of a component detector/decoder respectively. Then the extrinsic LLR L e i at the output of a component detector/decoder is a sufficient statistic of the channel observation y and the a-priori L a \i . It can also be expressed as, for an element g ∈ GF(q) and a random variable x i ∈ A, p x i = X (g)|L e i = p x i = X (g)|y, L a \i . Theorem 2: The symbol-wise AMI between transmitted symbol X ∈ A and its corresponding extrinsic LLR L is where The methods described above are checked by numerical simulations as shown in Fig. 4. For σ ∈ [0 : 0.01 : 7], Gaussian distributed samples of L a is generated according to (24). Then formula (28) is used to evaluate the AMI. The obtained AMI I = J M (σ ) is a monotonic increasing function of σ . As a result, σ = J −1 M (I ) is well defined. For non-zero codes/symbols, we proposed to model the a-priori LLR of x i = X (0) according to (25) and to evaluate the AMI according to (28). In this method, the obtained AMI is referred to as rebuilded AMI, denoted as T = (I ). Fig. 4 shows the difference between I and T . Furthermore, we found the gap between T and I is negligible as the length N s of codeword increases and N s > 960 symbols is sufficient for both GF(4) and GF (8) codes. In the rest of this paper, such as subsection E, we employ J M function to build the relationship between parameter σ and the corresponding AMI.

C. JOINT FACTOR GRAPH OF NON-BINARY LDPC CODED MIMO
As shown in Fig. 5(a), y j and x i denote the FNs and VNs of the REP-detector respectively, for 1 ≤ i ≤ 2N t and 1 ≤ j ≤ 2N r . v i and c j , 1 ≤ i ≤ N s , 1 ≤ j ≤ N s (1 − R), denote the variable node decoders (VNDs) and check node decoders (CNDs) of the non-binary LDPC decoder. Note that VOLUME 10, 2022 e i , 1 ≤ i ≤ 2N t , denote equality nodes (ENs) that force all variable nodes involved to be equal, by which the analysis will be more convenient. For MIMO system, there are 2N t FNs ( 2N r VNs ) connected to each VN (FN). For non-binary LDPC code, the definition of degree profile is the same as its binary counterpart. Let λ = [λ 2 , λ 3 , . . . , λ D v ] and ρ = [ρ 2 , ρ 3 , . . . , ρ D c ] denote the variable node and check node degree distributions respectively, where D v and D c denote the maximum degree of VND and CND respectively. Then the degree distribution pair (λ, ρ) gives a description of an ensemble of non-binary LDPC codes, and the code rate is [37] As shown in Fig. 5(b), the joint factor graph of the IDD receiver is partitioned into two modules. Module I contains the REP-detector (FNs and VNs), ENs and LDPC's VNDs while module II contains LDPC's CNDs only. Let I A , E b /N 0 ), we must compute the functional relationship of I S with respect to I V . This key step was carried out by numerical simulations for each E b /N 0 with interval of 0.1dB within the considered SNR region as in [38]. Take a closer look at the dashed lines in Fig. 6 (also the Fig. 5   FIGURE 6. Comparison of the proposed method and traditional method in [38], where the blue dash curves are the traditional method while the red solid curves denote the proposed one. Both methods under N t = N r = 64 MIMO and 16QAM. in [38]), we found that those AMI curves are parallel with each other approximately and increase with E b /N 0 . In this work, we proposed a simplified method to obtain that but with the lower complexity. For example, for a reasonable low E b /N 0 = γ 1 chosen empirically, the AMI curve of I S with respect to I V is obtained by numerical simulation, and denoted as a close-form polynomial function I S = φ 1 (I V , γ 1 ) by curve fitting. Similarly, the AMI curve for a sufficient high E b /N 0 = γ N is obtained the same as above, and denoted as I S = φ N (I V , γ N ). As a result, the average gap between these two curves is defined as 17940 VOLUME 10, 2022 where L denotes the number of samples of I V , and we choose L = 100 in our simulation. I V i = i/L denote the discrete samples of I V for 1 ≤ i ≤ L. Therefore, for 2 ≤ ≤ N − 1, is an integer, we have the following functional relationship of I S with respective to I v by interpolation, denoted as Fig. 6 shows the comparison results between the proposed method (solid lines) and the existing method (dash lines) in [38] for GF(4)-LDPC coded MIMO system with N t = N r = 64 and 16QAM, where γ 1 = 4.5dB, γ 33 = 7.7dB and = 0.0035. It is shown that the proposed simplified method gives a good approximation of the traditional numerical simulation results, and thus has about the same performance. Meanwhile, this method can avoid massive numerical simulations.

E. COMPUTATION OF THE SEXIT OF MODEL I
After the functional relationship between I S and I V is obtained, the computation of the SEXIT of model I becomes feasible. The followings give a detailed descriptions of the computation procedures according to Fig. 7.
(a) Along each edge of LDPC VND v i , the input a-priori LLR is modeled as a Gaussian distributed vector according to (25) The output LLR of LDPC VND v i with degree p is modeled as a Gaussian distributed vector with parameter σ = √ pσ , and the corresponding mutual information is (c) e i is equality node which implies I v = I . Therefore, The relationship between I S and I V can be obtained as (31).
(e) LDPC VND v i with degree p receives p − 1 LLRs corresponding to I (I ) A and extrinsic LLR corresponding to I S from e i . Therefore, the AMI between the transmitted symbol x i and the output LLR is expressed as where According to the semantics of factor graph as in [39], by averaging the output mutual information of LDPC-VND with degree distribution λ p in edge perspective, the relationship between I (I)

E and I (I)
A is written as

F. COMPUTATION OF SEXIT OF MODEL II
In Fig. 8, LDPC CND c j with degree h receives h − 1 inputs LLR, each of which is modeled as q−1 Gaussian distribution according to (25) A ). Then the output LLRs is obtain by numerical simulation. The conversion from LLR-vector to probability-vector can be found in [30], [41]. The corresponding AMI is calculated by (28), denoted as By averaging the output mutual information of LDPC-CND with degree distribution ρ h in edge perspective, the relationship between I (II)

E and I (II)
A is written as

G. ITERATIVE OPTIMIZATION
The aim of the optimization is to find a near-optimal LDPC degree profile, denoted as (λ, ρ) pair, conforms to f 1 (I A ∈ [0, 1) with the lowest E b /N 0 . Maximize the code rate in (29) is not easy, we resort to an iterative optimization algorithm similar to [38] to obtain our results.

1) OPTIMIZE THE VARIABLE NODE DEGREE DISTRIBUTION WITH FIXED CHECK NODE DEGREE DISTRIBUTION
As shown in Fig. 9(a), for a specific I a ∈ [0, 1), then r = f 1 (I a ) and t = f −1 2 (I a ). To meet the convergence condition, the SEXIT of model I must be positioned above the SEXIT of model II, i.e., r > t. As a result, the optimization of the VOLUME 10, 2022 variable node degree distribution λ with ρ fixed is converted to the following linear programming problem, Note that p (·) in (32) is monotonic increase function and the check node degree distribution ρ is fixed, then f −1 2 (I ) is well defined and calculated [40].

2) OPTIMIZE THE CHECK NODE DEGREE DISTRIBUTION WITH FIXED VARIABLE NODE DEGREE DISTRIBUTION
As shown in Fig. 9 Similarly, h (·) in (36) are monotonic increase functions and the variable node degree distribution λ is fixed, then f −1 1 (I ) is well defined and calculated.
The iterative optimization method in this paper is summarized as Algorithm 1.

A. BER SIMULATION RESULTS
According to the simplified method and iterative optimization algorithm mentioned above, we obtain code 1 and code 2 over GF(4) for 16QAM systems respectively, and code 3 over GF(8) for 64QAM system. The parameters of these three codes are listed in Table 1. For all codes in this work, the nonzero elements of their parity check matrices are selected randomly from the corresponding non-zero elements of GF(q).

Algorithm 1: SEXIT Chart Based Iterative Optimization Algorithm for NB-LDPC Coded MIMO
Input: Target code rate R T , sufficiently low initial code rate 0 < R c < R T < 1, sufficient high E b /N 0 , maximum variable node degree D v , maximum check node degree D c , the prefix number of optimization iterations N _iter = 3; Output: degree distribution pair (λ, ρ), code rate R; Step 1: With ρ = [ρ 2 , ρ 3 , . . . , ρ D c ] fixed, optimize λ = [λ 2 , λ 3 , . . . , λ D v ] using (38). For comparison purpose, an irregular GF(2)-LDPC code with the same degree distribution as World Interoperability for Microwave Access (WiMax) LPDC code, i.e., λ(x) = 0.2895x + 0.3158x 2 + 0.3947x 5 and ρ(x) = 0.6316x 5 + 0.3684x 6 , is constructed with code-length N b = 19200 bits and code-rate 0.5. In Fig. 10, the BER simulation results of GF(4)-LDPC coded MIMO with N r = N t = 4 and 16QAM are shown. With fixed number of inner iterations, i.e., Inner = 5, when the number of out iterations increase, i.e., from Out = 5 to Out = 10, the BER performance of both EP-IDD and REP-IDD are improved. There is about 0.2dB performance gap between the REP-IDD and standard EP-IDD at low E b /N 0 . While this performance gap vanishes at high E b /N 0 . In this perspective, the proposed REP-IDD has negligible performance loss over standard EP-IDD. In addition, the proposed REP-IDD outperforms the state-of-art CLSIC-LMMSE in [41] about 2dB in terms of BER performance when both schemes with regular-(3,6) GF(4)-LDPC, and  outperforms the GF(2)-LDPC (WiMax degree profile) coded one with EP-IDD about 1.3dB. Furthermore, when the optimized irregular GF(4)-LDPC code (code 1) instead of regular-(3,6) GF(4)-LDPC code is employed, the REP-IDD obtains 0.5dB performance gain over standard EP-IDD with regular-(3,6) GF(4)-LDPC code. Fig. 11 is about the simulation results of non-binary LDPC coded MIMO with N r = N t = 64, where GF(4) and GF(8)-LDPC code are employed for 16QAM and 64QAM respectively. For 16QAM system, with Out = 8 and Inner = 10, at the BER level of 1e-4, REP-IDD performs about 0.3dB worse than EP-IDD when both cases employ regular-(3,6) GF(4)-LDPC. When the optimized irregular GF(4)-LDPC code (code 2) is employed instead of regular-(3,6) G(4)-LDPC, the REP-IDD system has further 1.1dB performance improvement, and it performs about 0.8dB better than EP-IDD which employs regular-(3,6) GF(8)-LDPC code. For 64QAM system, the BER performance gap between REP-IDD and EP-IDD becomes larger, about 1.4dB when both cases with regular-(3,6) GF(8)-LDPC. When the optimized irregular GF(8)-LDPC code (code 3) is employed, the REP-IDD has further 3.6dB performance improvement and it outperforms the EP-IDD with regular-(3,6) GF(8)-LDPC code about 2.2dB. Since CLSIC-LMMSE is unfeasible for massive MIMO system, so we omit the comparison with it. In summary, optimizing the concatenated non-binary LDPC code, not only compensates the performance loss between the REP-IDD and EP-IDD, but also brings additional performance gain. All the above simulation results verified the effective of the proposed idea.

B. COMPLEXITY COMPARISON
The proposed REP has about the same complexity as its complex-valued version in [23]. The main difference between the proposed REP-detector based on factor graph and EP-detector lies in the method to calculate the a-posteriori probabilities of VNs. For EP-detector, all symbol-wise a-posteriori probabilities are calculated at once according to (6) and (7) with computational complexity O(N 3 t ). While for REP-detector, the computational complexity is dominated by (11) and (12) with complexity of O(N t ). Therefore, for all N t VNs, the computational complexity is O(N 2 t ). In this perspective, REP-detector has a great advantage over EP-detector. When binary code is used in high order QAM system with soft demodulation, the conversion of symbolwise LLR to bit-wise LLR is very expensive with maximum likelihood algorithm, which also leads to information loss. The proposed scheme in this work can avoid this problem. With forward-backward algorithm, both the non-binary LDPC and binary LDPC need (3D c − 2)q(q − 1) addition and (3D c − 2)q 2 multiplication operations at each CND, which dominate the computational complexity. q is the order of GF, and a GF(q) code symbol corresponding to log 2 q bits. Fig. 12 shows the number of addition and multiplication operations needed for GF(4) and GF(8)-LDPC CND respectively. If more advanced non-binary LDPC decoding algorithm is employed [42]- [46], the decoding complexity of GF(8)-LDPC can be further reduced but is out of the scope of this work.

VI. CONCLUSION
In this work, we proposed a non-binary LDPC coded massive MIMO scheme with high order modulation and partial VOLUME 10, 2022 mapping which is combined with REP algorithm seamlessly. For REP-detector and non-binary LDPC decoder, the a-priori LLR of non-zero code is modeled as an transformation of multivariate Gaussian distributed random vector, which is the output of cyclic-symmetric AWGN channel with allzero codeword input. This method is robust and effective according to the simulation results in this work. A simplified method borrowed from [36] (Theorem 2) is employed to evaluate the output AMI, which is obtained by averaging the samples of corresponding extrinsic probability information of component-detector/decoder. Furthermore, the proposed curve-fitting and interpolation based analysis method facilitate the computation of the component EXIT greatly. All above methods pave the way for optimization of the IDD receiver based on SEXIT. Both semi-analytical method based on SEXIT and numerical results illustrated the validity of the proposed iterative optimization algorithm and prominent performance gain achieved.