Belief Propagation Decoder With Multiple Bit-Flipping Sets and Stopping Criteria for Polar Codes

Compared with successive cancellation list (SCL) decoders, belief propagation (BP)-based decoders suffer performance loss in middle- and high-signal-to-noise ratio (SNR) regions. By analyzing the behavior of the incorrect decoding results of the bit-flipping BP decoder with a critical set of order $\omega $ (BFBP-CS $^\omega $ ), we found that undetected errors mainly contribute to the error floor. Based on this observation, we proposed a belief propagation decoder with multiple bit-flipping sets (BFSs) and stopping criteria (BP-MF-MC) in this work. We use multiple stopping criteria to identify undetected errors and a small BFS to find an additional estimated codeword given by the bit-flipping BP (BFBP) function. For uncorrected errors, we use multiple BFSs to find estimated codewords with the BFBP function. Furthermore, we propose a method to dynamically generate a BFS based on the submatrix check. This method can remove unnecessary bit-flipping positions and increase the order of the critical set. Then, the best codeword is selected from all estimated codewords according to the maximum likelihood principle. Numerical results show that BP-MF-MC performs similarly to the cyclic redundancy check-aided SCL (CA-SCL) decoder with list size 16 and is slightly worse than CA-SCL with list size 32.


I. INTRODUCTION
Polar codes are well known for their ability to achieve Shannon capacity and their low encoding and decoding complexity [1]. The successive-cancellation (SC) decoding algorithm proposed by Arıkan [1] is one of the common decoding methods for polar codes. To improve the performance of SC, Tal and Vardy [2] introduced a successive cancellation list (SCL) decoding algorithm whose performance is very close to that of maximum-likelihood decoding.
On the other hand, the belief propagation (BP) algorithm [1], [3] is theoretically more parallel than SC-based algorithms. However, the BP algorithm often has a much higher computational complexity than SC. To lower the complexity, an early-stopping criterion [4]- [6] was proposed to reduce the number of iterations of the BP algorithm. The authors of [7] proposed a subfactor graph-freezing technique to reduce the average number of computations as well as the average number of iterations required by the BP algorithm. To reduce The associate editor coordinating the review of this manuscript and approving it for publication was Oussama Habachi . the required memory, a stage-combined BP decoding algorithm [8] was introduced to reduce the decoding latency and memory requirement. Moreover, a generalized BP algorithm [9], [10] based on these modifying factor graphs was proposed to further improve the performance of the BP algorithm.
Additionally, the error-correction performance of the BP algorithm is worse than that of the SCL decoder. Several BP-based algorithms have been proposed, and they outperform the conventional BP algorithm. The BP list decoder (BPL) was proposed based on a permuted factor graph [11], [12]. The performance of the BPL decoder is close to that of the SCL decoder, whereas it is inferior to that of the cyclic redundancy check (CRC)-aided SCL (CA-SCL) decoder. In [13], the proposed BP bit-strengthening (BPBS) decoder performs similarly to the SCL decoder in mediumand high-SNR regions. However, BPL and BPBS benefit only slightly or not at all from an outer CRC code.
In [14], a parity-check matrix was introduced to improve the performance of the BP decoder. In [15], CRC-polar BP (CPBP) and neural CPBP (NCPBP) decoders, which achieve significant error-correction performance improvement compared to conventional CRC-aided BP decoders, were proposed. Inspired by the SC-based bit-flipping decoders, a bit-flipping BP decoder using a critical set of order ω (BFBP-CS ω ) was proposed in [16]. BFBP-CS ω uses a CRC check to detect errors and exhaustively sets the a priori knowledge of the unreliable bits to find a codeword passed the CRC check. However, the BFBP-CS ω decoder exhibits an error floor in high signal-to-noise ratio (SNR) regions.
In this paper, we aim to lower the error floor of the bit-flipping BP (BFBP) decoder and design a BP decoder that has a similar performance to that of CA-SCL decoders. The main contributions of this paper are summarized as follows: 1) In this work, we propose a belief propagation decoder with multiple bit-flipping sets (BFSs) and stopping criteria (BP-MF-MC), which is a generalization of BFBP-CS ω . We analyze the behavior of the incorrect decoding results of the BFBP-CS ω decoder. We find that undetected errors contribute to the error floor in high-SNR regions. Therefore, we use multiple stopping criteria to identify undetected errors and multiple BFSs to find more estimated codewords with the proposed decoder. Finally, the best codeword is selected from the estimated codewords according to the maximum likelihood principle. 2) We present many types of BFSs in this work. We propose a method to dynamically generate a BFS based on the submatrix check. This method can use the submatrix check to remove unnecessary bit-flipping positions. Thus, it can increase the order of the critical set. It is shown in the numerical results that the configurations with a dynamically generated BFS can improve performance. The rest of this paper is organized as follows: In Section II, we introduce the BP and BFBP-CS ω algorithms. Section III analyzes the decoding error properties of BFBP-CS ω . Section IV proposes the BP-MF-MC algorithm. In Section V, we present the numerical results of BP-MF-MC for the polar codes (2048,1024+24), (1024,256+24), and (1024,768+24). We also evaluate the computational complexity of the BP-MF-MC in this section. Concluding remarks are given in Section VI.

A. NOTATION
In this work, we use letters W in standard font to denote scalars and boldface letters W to denote vectors and matrices.

B. POLAR CODES
Polar codes are linear block codes based on the phenomenon of channel polarization, in which individual channels are recursively combined and split such that their mutual information tends toward either 1 or 0. In other words, some of these channels become completely noise-free, while the others become completely noisy. Furthermore, the fraction of noiseless channels tends toward the capacity of the underlying binary symmetric channels [1]. Polar codes are specified by a generator tensor G N , where N = 2 n is the code length. A polar code (N , r) can be generated in two steps. Let A and A c be reliable and unreliable positions for information and frozen bits. First, an N -bit message u is constructed by assigning r information bits to A and 0 to A c . Then, the N -bit u is multiplied with the generator G N to generate an N -bit transmitted codeword x = uG N .

C. BELIEF PROPAGATION DECODER
The process of polar coding (encoding and decoding) can be represented by a factor graph [1]. Figure 1 shows the factor graph of polar codes with N = 8, which is divided into n = log 2 N stages. Each stage has N /2 processing elements (PEs), and each PE has two input and two output variable nodes. The BP decoding of polar codes is the process of passing the log-likelihood ratio (LLR) iteratively through the factor graph. Node (i, j) is associated with two types of LLR: left-to-right R i,j and right-to-left L i,j , where i is the row index on the factor graph at stage j. Each PE computes the R and L messages as follows: where g(x, y) = 0.9375 · sign(x)sign(y) min(|x|, |y|). The messages R and L are initialized by where llr i is the LLR of the i-th received bit. In this work, BP decoding uses the CRC check as the early stopping criterion to reduce the number of iterations. VOLUME 8, 2020 Algorithm 1 BP Algorithm Using CS ω 1: Input: llr N 1 , A, CS ω 2: Output:û N 1 3: Initialize L and R by using (5) and (6) 4:û N 1 ← BP(llr N 1 , A, L, R) 5: ifû N 1 does not satisfy CRC then 6: for all (j ω Initialize R by (5) 8: for l = 1 to ω do 9: ifû N 1 satisfy CRC then 13: returnû N 1 14: end if 15: end for 16: The critical set (CS) contains the bit positions that tend to be unreliable, and its construction is given in [16]. BFBP-CS ω is summarized in Algorithm 1. First, the conventional BP algorithm is performed. If the BP decoder fails the CRC test, BFBP-CS ω exhaustively enumerates all the possible values b ω Finally, BFBP-CS ω terminates the iteration once the estimated codeword passes the CRC check.

III. ANALYSIS OF THE PERFORMANCE OF BFBP-CS ω
In the first part of this section, we present the simulation conditions for this work and analyze the performance limits of BFBP-CS ω using oracle-assisted BP-CS ω (OABP-CS ω ). We find that OABP-CS ω has an error floor in the high-SNR region, and it is difficult to further improve the performance simply by increasing ω. In the second part, we analyze the error types of OABP-CS ω . We find that undetected errors mainly contributed to the error floor in the high-SNR region.

A. SIMULATION CONDITION
In this work, the modulation format is binary phase-shift keying, and the channel noise model is additive white Gaussian noise. The maximum number of iterations for the BP decoders is 100. The CRC checks used in this work have lengths of 11, 16 and 24 bits. Their polynomials [17] are x 11 + x 10 +x 9 +x 5 +1x 16 +x 12 +x 5 +1 and x 24 +x 23 +x 6 +x 5 +x+1, respectively.

B. FER LOWER BOUNDS OF THE BFBP-CS ω
According to [16], the performance and complexity of BFBP-CS ω increase with ω. If ω ≥ 6, the size of CS ω , which is approximately 2 ω × |CS|, is too large to simulate its performance. Therefore, we use OABP-CS ω to predict the theoretical optimal performance of BFBP-CS ω , which serves as a lower bound on the frame error (FER) results.
OABP-CS ω is almost the same as BFBP-CS ω except that line 9 of Algorithm 1 is replaced by where j l is the l-th position of CS ω and u j l is the j l -th message bit.
The performance of the OABP-CS ω decoder for the polar code (1024,768+24) is shown in Figure 2. The code is constructed from the reliability table of [17] and has 24 CRC bits. As ω increases, the FER of OABP-CS ω decreases. However, the performance difference between the two adjacent curves decreases as ω increases. Compared with CA-SCL, OABP-CS ω has the error floor at a high SNR. For example, OABP-CS 9 performs similarly to or better than CA-SCL with list size 8 (CA-SCL8) in a range of [3,4]dB. However, the performance of OABP-CS 9 is worse than that of CA-SCL8 if SNR>4 dB.

C. ANALYSIS OF THE ERRORS OF OABP-CS ω
We divide the errors of OABP-CS ω into three types. The first type, error type I, includes all undetected errors that satisfy the CRC check. The second type, error type II, passes the CRC check during conventional BP decoding. The third type, error type III, is an error that is a decoding failure of OABP-CS ω .
The percentages of the three error types are shown in Figure 3. As shown in the figure, the percentage of type III decreases, whereas the percentages of types I and II increase, with the SNR. The percentage of type II is almost the same as that of type I. Moreover, the percentage of type II does not decrease as ω increases. Therefore, we can conclude that error type II is the dominant factor accounting for the error floor of OABP-CS ω and BFBP-CS ω in the high-SNR region. Furthermore, the performance of OABP-CS ω in the high-SNR region cannot be further improved by simply increasing ω due to error type II. For the low-SNR region, error type III is the main factor in the failure of the decoder.  (5) and (6) 14: for all i ∈ do 15: U i ← BFBP(llr N 1 , A, i ) 16: end for 17: end if 18: Select the best codewordû N 1 in U according to the maximum likelihood principle using (8)

IV. BELIEF PROPAGATION DECODERS WITH MULTIPLE BIT-FLIPPING SETS AND STOPPING CRITERIA
Based on the above observations, we generalize BFBP-CS ω and propose the BP-MF-MC decoder. We use multiple BFSs and stopping criteria to lower the rate of error types II and III simultaneously. The scheme of BP-MF-MC is given in Algorithms 2 and 3.
Let S be an independent stopping criterion other than the CRC check, such as the G-matrix [5], the worst of information bits (WIB) [4] or the best frozen bits (BFB) [6]. If the estimated codewordû N 1 satisfies S, it is more likely a correct codeword and is output by the algorithm. Let = { 1 , . . . , n φ } be a sequence containing n φ BFSs. The BFSs in can be constructed by various methods, Initialize L and R by using (5) and (6) 6: for l = 1 to ω do 7: The scheme of BFBP(·) is shown in Algorithm 3. It is a generalized BFBP-CS ω decoder over the flipping sets in . The function BFBP(·) differs from the BFBP-CS ω decoder in two ways. The first is in line 5 of Algorithm 3. The BFBP decoder reinitializes both messages R and L at the beginning of its flipping process. However, BFBP-CS ω only re-initializes R in line 7 of Algorithm 1. Therefore, BPBP(·) can be implemented in parallel. Second, the BFSs in do not require all the bit-flipping positions j w 1 to be of the same length. Thus, can consist of several disjoint BFSs. Let = { 1 , . . . , n ψ } be a sequence containing n ψ BFSs. Ifû N 1 does not satisfy the CRC check, BFBP(·) is also used to find the decoding results U i for each i .
On line 18 of Algorithm 2, the best codewordû N 1 is selected in the set U = {U 1 ; . . . ; U n u } according to the maximum likelihood principle [11] Of course, the sequences and should be carefully chosen to balance the performance and complexity of BP-MF-MC.
To efficiently find the error bit positions, we propose a method D(λ th , F, A c ) to dynamically generate the BFS with a submatrix check. This method is inspired by the BP bit-strengthening method [13]. The scheme of D(λ th , F, A c ) is shown in Algorithm 4. Let G 2 m be a generator matrix of size 2 m ×2 m . Letû k be a row vector of length 2 m , which is the k-th subfactor graph ofû N 1 . Let λ th be the last stage that performs the submatrix check. Letx k be the row vector corresponding toû k , which is given bŷ for k = 1 : 2 n−m do 6: ifû k G 2 m =x k then 7: end for 10: end for 11: generate the critical set CS using A f 12: ω ← floor(log 2 (F/|CS|)) 13: generate the critical set CS ω using A f where R k and L k are the k-th row vectors of length 2 m at the m-th stage.
First, the set of frozen bit positions A f is initialized by A c . If the submatrix check in line 7 of Algorithm 4 is satisfied, we take these information bits to be correct and update A f . Then, we use the updated A f to generate the CS with the algorithm given in [16]. Finally, Algorithm 4 outputs CS ω using A f . In this way, we increase the order ω by eliminating the unnecessary bit-flipping positions. According to the performance of OABP-CS ω , this procedure is helpful in decreasing the number of errors of type III. Remarks:

V. SIMULATION RESULTS
In this section, we introduce several configurations for BP-MF-MC and evaluate the FER performance for three polar codes (1024, 256+24), (1024, 768+24) and (2048, 1024+24). The code (2048, 1024+24) is designed by the Gaussian approximation (GA) method [18]. The design SNR is 2.5 dB, which is optimized according to the simulation results of CA-SCL decoders with various list sizes over a wide range. The polar codes (1024,256+24) and (1024,768+24) are designed by the reliability table given in [17].

A. CONFIGURATIONS
According to Algorithm 2, there are two sequences used by the function BFBP(·) of Algorithm 3. The sequence is used on line 8 to provide an additional estimated codeword that passes the CRC check and lowers the number of errors of type II. Thus, we expect that is as small as possible. In contrast, is used to find a codeword that can pass the CRC check. We require to be as large as possible because otherwise, the conventional BP fails. In this work, and are mainly constructed by five types of BFSs. The first two are the critical sets CS 1 and CS 3 . According to the BPBS decoder, bit strengthening is also helpful to improve the performance of the BP decoder. Thus, the size of CS 1 for BP-MF-MC is twice the number of critical bit-flipping positions, which is given by The third A \ CS 3 is the information set excluding CS 3 . Interestingly, we find that the frozen bits A c can be used as the flipping set. The last set is the BFS dynamically generated by Algorithm 4. For the polar codes (2048,1024+24) and (1024,768+24), λ th is 9 and F = 3000, while λ th = 6 for the polar code (1024,256+24).
In this work, the configuration of BP-MF-MC refers to the way and are constructed. In the following, we use the tuple (| |, | \ D(λ th , F, A c )|, F) to denote a configuration of and , where |·| is the number of elements of a sequence. The configurations used in this work for BP-MF-MC are summarized in Table 1.
Finally, there are two stopping criteria used in this work. The first is the CRC check used in BFBP (·) and on line 5 of Algorithm 2. The second S is the G-matrix used on line 6 of Algorithm 2.
However, the configurations (0, 2760, ∞) and (0, 4412, ∞) perform similarly to BFBP-CS 3 with increasing E b /N 0 , especially at E b /N 0 = 3 dB. The main reason is that these two configurations cannot reduce the number of errors of type II if | | = 0. Therefore, configurations with | | > 0 can lower the error floor at high SNRs. The figure shows that there is a performance improvement of (440,1760,∞) over BFBF- The performance of the configurations, which are (440, 4412, ∞), (440,4412,9), (1760,4412,9), and (0, 3412, 9), is shown in Figure 5. The figure shows that (440,4412,9) and (1760,4412,9) have the best performance among all the configurations of BP-MF-MC. These two configurations outperform CA-SCL with list size 16 (CA-SCL16) and are even close to CA-SCL with list size 32 (CA-SCL32) when E b /N 0 > 1.75 dB and 24 CRC bits are used. Finally, the performance of (440,4412,9) and (1760,4412,9) does not show any error floor, and they are similar to each other. Thus, we can select the smaller to reduce the complexity of BP-MF-MC. Therefore, we choose = {CS 1 } in the following numerical examples.

C. COMPLEXITY
The average number of iterations for various decoders is shown in Figure 8. The target code is (2048,1024+24). It is observed that the configuration (1760, 4412, 9) uses the highest average number of iterations among all the configurations. The configurations (440,4412,9), (440, 4412, ∞) and (440, 1760, ∞) use more iterations than BFBP-CS 3 while using fewer iterations than (1760,4412,9). These three configurations overlap each other in high-SNR regions. (0,3412,9) and (0, 4412, ∞) converge to BFBP-CS 3 and require slightly more iterations than the conventional BP decoder. Furthermore, the complexity of (0, 3412, 9) is much higher than that of (0, 4412, ∞) if E b /N 0 ≤ 2.5 dB. Therefore, although the dynamic flipping set can improve performance, it increases computational complexity, which may decrease the throughput of hardware implementation due to the large resource requirement.
Next, we perform complexity comparisons between the proposed BP-MF-MC and conventional decoders based on the simulation results. LetĪ be the average number of iterations for BP-based decoders, such as BP-MF-MC and BFBP-CS ω . Here, for the BP decoder, one computation by a PE given in (1)-(4) is taken to have unit complexity. Thus, the computational complexity of the BP-based decoder is calculated using 2NnĪ . The complexity of the CA-SCL decoder is taken from [16]. The comparison results are given in Table 2. Please note that the complexity for the configurations given in Table 2 is the average number of PEs when implementing all the decoders in software. The table shows that the complexity of BP-MF-MC with the configuration (440,4412,9) decreases with SNR. At E b /N 0 = 3dB, the complexity of (440,4412,9) is approximately two times as much as that of CA-SCL32 or three times as much as that of BFBP-CS 3 . The complexity of (0,3412,9) is close to that of CA-SCL32 or BFBP-CS 3 if E b /N 0 ≥ 2.75 dB. Furthermore, the average complexity of the configurations given in Table 2 is much higher than that of the CA-SCL decoders if E b /N 0 ≤ 2.5 dB.

VI. CONCLUSION
In this paper, we propose the BP-MF-MC algorithm for polar codes and present the numerical results for two polar codes. The simulation results show that BP-MF-MC performs similarly to the CA-SCL16 algorithm. Although the complexity of the proposed algorithm can approach that of the CA-SCL32 decoder in high-SNR regions, it is important for future studies to lower the complexity of BP-MF-MC in low-SNR regions so that it can be used in practice.