Dynamic Multi-Symbol Flipping Decoding of Non-Binary LDPC Codes

A dynamic multi-symbol (DM) flipping scheme is proposed for various symbol flipping decodings of non-binary low-density parity-check (NB-LDPC) codes. This new approach divides the whole decoding process into multiple stages according to the number of iterations and allows a different maximum number of symbols to be flipped in each stage. Numerical analysis reveals that the proposed multi-symbol flipping scheme can yield a higher probability of correct flipping with the dynamic flipping threshold. The proposed DM scheme can be highly parallelized, extensively improving the decoder throughput over existing symbol flipping decoding algorithms based on prediction (SFDP). Numerical results show that the DM scheme can significantly enhance the error correction capability and achieve faster convergence speed with little increase in average computational complexity.


I. INTRODUCTION
L OW-DENSITY parity-check (LDPC) codes are now widely employed in broadcasting (DVB and ATSC) and wireless communications systems (5G NR and WiFi) because of their strong error correction capabilities. The soft-information-assisted algorithms for the binary LDPC codes such as the traditional belief propagation (BP) decoding algorithm and the min-sum (MS) decoding algorithm in [1] are widely applied to storage applications because of their excellent error-correcting performance. In [2], the bitserial circuits are proposed to determine the minimum and sub-minimum values and the hardware cost is affordable for the bit-serial check-node architectures. To further reduce the complexity and fully utilize the decoder hardware, the MSbased decoding algorithms [3] adopt the simplified checknode unit to perform the row operation. Comprehensive studies have been conducted to remove computationally intensive hyperbolic computations from the traditional BP decoding algorithm in the hardware-efficient decoder [4]. In this context, the LDPC codes are hardware-friendly for high-throughput realization [5], [6].
Non-binary coding schemes have attracted considerable attention of many researchers because of a larger coding gain [7], [8]. Regular non-binary LDPC (NB-LDPC) codes show superior performance in the deeper error floor region and outperform irregular binary LDPC codes with shortto-moderate blocklength. They have been adopted in the BeiDou navigation satellite and positioning system applications (BDS) [9] and are further considered for the flash storage systems [10] and the optical fiber communications [11], [12]. NB-LDPC codes mainly have three classes of decoding algorithms: message-passing (MP) based algorithms [13], [14], [15], majority-logic decoding (MLgD) algorithms [16], [17] and symbol flipping decoding (SFD) algorithms [18], [19], [20]. An MP-based algorithm has a computational complexity dominated by the order of Galois field GF(q) and the row weight of check-node, which is significantly higher as the q value and code rate increase. The simplified MP decoding [14] based on the fast Fourier transform (FFT) reduces the computational complexity of check-node update from O(q 2 ) to O(q · log 2 q). The extended min-sum (EMS) algorithm [21] further reduces the computational burden through using only n m most reliable values for the check-node updating, and the complexity is successfully decreased to O(n m log 2 n m ), with n m q. Although a larger degree of hardware parallelization to lower the decoding latency is considered in the trellis-EMS (T-EMS) algorithm [22], the large latency of soft output updating is unsuitable for high-speed applications, such as the read/write in NAND flash memory.
In the case of NB-LDPC codes, the SFD algorithms, particularly the hard-decision ones, are very attractive for flash storage systems employing the NB-LDPC codes since they can offer a reasonable trade-off between performance and complexity. Those symbol flipping algorithms can enjoy lower complexity than the bit-reliability based algorithms such as the weighted bit-reliability based algorithm [23] and the full bit-reliability based algorithm [24]. Recently, many SFD algorithms with the prediction mechanism (SFDP) have already shown excellent performance, e.g., the SFDP algorithms aided by plurality logic (P-SFDP) and the SFDP algorithms aided by binary Hamming distance (D-SFDP) [18], which are further improved with self-adjustment schedule (D-SA-SFDP) [19] and the randomly penalized flipping metrics (N-D-SFDP) [20]. All these SFDP algorithms flip only one symbol for each iteration with a serial schedule resulting in high decoding latency.
The multiple bit-flipping (BF) algorithms for binary codes usually flip all the bits if their reliabilities fall below a designated threshold. The multi-bit noise-aided gradient-descent BF algorithm (M-NGDBF) in [25] flips every bit satisfying the inversion threshold. Similarly to the multi-flipping operation for the binary code, the M-P-SFDP algorithm for NB-LDPC codes in [26] flips all the symbols satisfying the predetermined threshold. The parallel flipping operations allow for faster convergence but suffer from severe performance degradation due to over-flipping. Two multi-symbol flipping decoding algorithms aiming at reducing the number of iterations are also proposed in [26]. The F-P-SFDP algorithm flips a fixed number of symbols and the A-P-SFDP algorithm adaptively flips all the symbols satisfying the fixed flipping threshold. The F-P-SFDP and A-P-SFDP algorithms accelerate the convergence speed at the expense of more comparison operations for sorting, which requires the global sort operation to determine which symbols to flip. Meanwhile, there is no significant improvement in error correction performance because the relaxed thresholds lead to performance degradation caused by over-flipping.
To better balance the decoding delay and error correction capability, we propose a dynamic multi-symbol (DM) flipping scheme applied for the existing x-SFDP (x: P, D, D-SA, N-D) algorithms, which dynamically flips multiple symbols during the iterative decoding. The DM scheme adopts the dynamic flipping threshold as well as the limitation on the maximum number of flipped symbols which significantly accelerates the convergence speed and lowers the decoding delay. In addition, in order to avoid excessive flipping, the maximum number of flipped symbols is limited, thereby greatly improving the BER performance under the same iterations. For instance, for a (96, 48) regular NB-LDPC code with column weight γ = 2 and row weight ρ = 4 over GF(64), the proposed DM-D-SFDP and DM-D-SA-SFDP algorithms can achieve the performance gain of approximately 0.5 dB and 0.6 dB, respectively, at a bit error rate (BER) of 10 −7 . The average number of iterations of the proposed DM scheme is decreased to less than half compared with the existing algorithm. Moreover, our proposed DM-P-SFDP algorithm outperforms the existing multi-symbol flipping algorithms in BER performance. For a (204, 102) regular NB-LDPC code over GF (16) at the BER of 10 −6 , the proposed DM-P-SFDP algorithm achieves about 0.3 dB and 0.25 dB gain over the A-P-SFDP and the F-P-SFDP algorithm, respectively.
The remainder of the paper is organized as follows. In Section II, we give the concepts and notation of NB-LDPC codes, and then review the existing single symbol flipping algorithms in [18], [19], [20]. In Section III, we propose the dynamic multi-symbol flipping scheme and provide a detailed description of the parameter selection. In Section IV, the comparisons with the existing multi-symbol flipping algorithms in [26] are also presented to provide a benchmark for our proposed DM algorithms. In Section V, experimental evaluations and performance analysis are used to demonstrate the improvement of the error correction performance and the convergence rate of our proposed DM scheme. Finally, the concluding statements are given in Section VI.

II. PRELIMINARIES
Suppose a regular (γ , ρ) NB-LDPC code over GF(q), q = 2 d is used for error correction in an additive white Gaussian noise channel with zero mean and variance σ 2 . The M × N regular parity-check matrix of the NB-LDPC code with column weight γ and row weight ρ is H = [h m,n ] M×N , h m,n ∈ GF(q). If h m,n is nonzero, the m-th check node (CN) and the n-th variable node (VN) are connected with each other in the Tanner graph. The hard-decision symbol vector z (1) = {z (1) 1 , . . . , z (1) N } is obtained by the received signals, where each symbol has a binary image representation z In the k-th iteration, the syndrome vector s (k) = z (k) * H T is firstly calculated to determine whether the current harddecision vector z (k) satisfies all the parity-check constraints. If the parity check fails, the hard-decision symbol z (k) j is passed from the j-th VN to its adjacent CNs. Then, the extrinsic information-sum (EXI) passed from the i-th CN to the j-th VN is denoted as where N i = {j : 1 ≤ j ≤ N, h i,j = 0}. When the j-th symbol is selected to be flipped in this iteration, the q − 1 possible flipped values can be represented by The P-SFDP algorithm combines the plurality logic to the flipping metric where z j ) represents the number of timesz (k) j occurs among the EXIs passed from all CNs connected to the j-th VN. η is the weighting factor.
The D-SFDP algorithm introduces the reliability of the binary Hamming distance d(z where represents weighting factors. However, the SFDP algorithms suffer from the problem that decoding may be trapped into a local maximum as a class of gradient descent algorithms. When the decoding process is trapped into a local maximum, the flipped VNs may drop into an oscillation. The cycle oscillation always leads to a decoding failure, where several VNs are changed circularly in subsequent iterations. In order to break the cycle oscillation in the P/D-SFDP algorithms, the flipping metric was modified with the self-adjustment strategy and the random penalty items introduced in [19], [20]. We use the D-SFDP algorithm to illustrate the improvements of the flipping metric in detail.
The D-SA-SFDP algorithm considers the statistics of the symbol flipping and develops a new flipping metric with the help of a self-adjustment schedule where t and t (k) z j represent a fixed weighting factor and the number of times that the selected symbol is flipped toz Moreover, the N-D-SFDP algorithm uses the random penalty items to facilitate the D-SFDP algorithm escaping from the undesirable local optimum. The modified flipping metric with random penalty items can be calculated by whereḡ (k) i,j and g (k) i,j denote the random variables to penalize the weighted hard reliabilities ofz Those aforementioned x-SFDP algorithms prevent flipping the same symbol in two consecutive iterations to avoid oscillation in a close region of the local maximum. The maximum value of the q − 1 flipping metrics of the j-th symbol is defined as the flipping function and its related flipping value is

III. THE PROPOSED DM SCHEME
Our proposed DM scheme is superior to existing multisymbol flipping algorithms. Different from the fixed flipping threshold in the A/F-P-SFDP algorithms in [26], our proposed DM scheme adopts the dynamic flipping threshold and effectively limits the probability of a relaxed threshold that brings about over-flipping. We dynamically adjust the maximal number of flipped symbols under explicit numerical limits to accelerate the convergence speed.
In what follows, , N } denote the threshold adjustment factor, the vector of flipping functions, the vector of symbol flipping values and the indicator vector of the flipped positions to record the flipped positions in the previous iteration, respectively. For the avoidance of cycle oscillation, we prevent the symbols from flipping in two consecutive iterations. Specifically, f (k) j = 1, j = 1, . . . , N indicates the j-th symbol that has already been flipped in the (k − 1)-th iteration, which is unable to be flipped in the k-th iteration. We set the maximum number of iterations to k T and divide the entire decoding process into T stages. In the t-th, 1 ≤ t ≤ T stage, up to w t symbols can be flipped during the iteration period (k t−1 , k t ]. The relation between the division of the decoding stages and the maximum numbers of flipped symbols can be represented as Here, we have to determine the endpoints of T iteration stages k 1 , k 2 , . . . , k T and the maximum numbers of the symbols to be flipped in each stage w 1 , w 2 , . . . , w T . We find that the DM scheme with T = 3 stages can provide a noticeable performance improvement over the original single symbol flipping algorithms and a more complex division of decoding stages is unnecessary. More specifically, w 1 is related to the code length and is usually set to a relatively large value (e.g., 6,10,15). w 2 and w 3 are limited to small values to avoid over-flipping, which are usually set to 2 and 1, respectively. k T equals the maximum number of iterations and we also take T = 3 stages as an example. We firstly set k 3 = 100 then optimize k 1 and k 2 with experimental test. We provide some empirical reference values of k 1 , k 2 , k 3 and w 1 , w 2 , w 3 in Section V for different codes which are finally determined through simulation.
The dynamic threshold value plays a key role in accelerating the convergence speed and reducing the decoding delay. The predetermined threshold adjustment factor can only be found by simulations and is directly correlated to the signal-to-noise ratio, the degree distribution, and the decoding algorithm used. If we apply the threshold adjustment factor with the second largest flipping function value, the parameter can be set to a constant number which is concluded empirically. Different optimized values of should be set (e.g., 1, 2, 3) when we apply it to the threshold E α − , where E α represents the maximum value of E. The is typically set to 1 in E β − that is empirically shown to provide good performance for typical codes thus the dynamic threshold E β − exhibits more robustness, where E β represents the second maximum value of E. The threshold adjustment factor applied to the largest or the second largest flipping functions has similar performance in terms of decoding simulations. The only difference is the value of the predefined threshold adjustment factor. In summary, the threshold adjustment factor is applied to the second largest flipping function E β other than the largest one E α in the following simulations for its easy implementation.
In the k-th iteration, we first determine the decoding stage satisfying k ∈ (k t−1 , k t ]. Then, we select two maximum flipping functions E  N], j = α , β are also flipped until at most w t symbols are flipped. We update the symbol sequence z (k+1) and then set f Multiple symbols are selected to be flipped in the DMx-SFDP algorithms according to the flipping functions calculated by (7). We first set the iteration counter to k = 1, initialize the indicator f (1) = 0 N and calculate all M syndromes with s (k) = z (k) * H T . If s (k) = 0 M , we terminate the decoding process and output the current hard-decision sequence z (k) . Otherwise, we calculate γ EXIs and the q − 1 flipping metrics of each symbol by (1)  : the threshold adjustment factor; counter k, the vector of flipping functions vector of symbol flipping values μ (k) and the indicator vector f (k) . Finally, we update the hard-decision symbol vector z (k) , accumulate the iteration counter and repeat the above operations until the iteration counter reaches k T . The proposed DM scheme can be adapted to all existing x-SFDP algorithms in Section II. The detailed decoding process of the proposed DM-x-SFDP algorithm is shown in Algorithm 2.
We consider two flipping rules. The first flipping rule prevents the variable nodes flipped in the previous iteration from being flipped in the current iteration, while the second flipping rule does not take into account whether the variable nodes have been flipped in the previous iteration. Specifically, we consider a (204, 102) NB-LDPC code C 1 over GF (16). The values of E α versus the number of iterations with those two flipping rules under the D-SFDP algorithm and our proposed DM-D-SFDP algorithm for C 1 at E b /N 0 = 4.5 dB are exhibited in Fig. 1, where the maximum number  (7) and μ (k) by (8)

FIGURE 1. The values of Eα versus the number of iterations for C 1 decoded by the D-SFDP algorithm and the DM-D-SFDP algorithm (* Our proposed DM-D-SFDP algorithm).
of iterations is set to 100 for all cases and the decoding will be early terminated if the parity-check matrix is satisfied. For example, in Fig. 1, the SFDP algorithm without avoiding re-flipping achieves the maximum number of iterations, implying that a decoding failure occurs for the chosen frame. While, the other three curves terminate at a number less than 100 (e.g., 11,15,30), which means that the chosen frame is successfully decoded and the decoding process is early terminated. When the iteration number exceeds 25, the red line shows that two flipping values reappear periodically, and the decoding process is trapped in an infinite loop that eventually leads to a decoding failure.
When a loop occurs, the decoder gives oscillating flipping function values or stays at several local states. A simple method to escape from these local states is by performing random moves. Significant efforts have been invested to avoid oscillations in the decoding such as the introduction of the random penalty items (i.e., the N-D-SFDP algorithm) and the self-adjustment scheme (i.e., the D-SA-SFDP algorithm). The cycle oscillation is still unavoidable because the larger loops are almost impossible to be broken. For example, the randomly penalized SFDP algorithms can not completely avoid the oscillation and some symbols may still be trapped in the cycle oscillation. Those symbols may not be flipped in two successive iterations but will be flipped across a longer interval where several VNs are changed circularly in several subsequent iterations.
The multi-symbol flipping operation in our proposed DM scheme helps perform random moves in the decoding pro-

A. PERFORMANCE COMPARISONS
We compare the P-SFDP algorithm applied our proposed DM scheme (i.e., the DM-P-SFDP algorithm) with existing multi-symbol flipping algorithms in [26]. The DM scheme can also be applied to other x-SFDP (x: D, D-SA, N-D) algorithms and the detailed simulation is provided in the following section.
The BER performance comparisons of the original P-SFDP algorithm, our proposed DM-P-SFDP algorithm and the existing M/F/A-P-SFDP algorithm are provided in Fig. 2. It can be seen that the DM scheme greatly improves the BER performance over the existing multi-symbol flipping algorithms under the maximum number of iterations of 100. The average number of iterations are presented in Fig. 3. Although the A-P-SFDP algorithm performs well in terms of convergence speed but the error performance is not   competitive when compared to our proposed DM-P-SFDP algorithm. The DM-P-SFDP algorithm achieves about 0.3 dB gain over the A-P-SFDP algorithm at the BER of 10 −5 . Our proposed DM-P-SFDP algorithm achieves more noticeable performance gain than the F-P-SFDP and A-P-SFDP algorithms compared with the original single symbol flipping algorithms.
A detailed comparison between the existing multi-symbol flipping algorithms and our proposed DM-P-SFDP algorithm is summarized in Table 1. The corresponding parameters for those algorithms coefficients of C 1 are provided in Table 2. We provide a rational explanation of why our proposed DM-P-SFDP algorithm is superior to existing multi-symbol flipping algorithms in the next two sections.  Table 1 shows that the M-P-SFDP, A-P-SFDP and F-P-SFDP algorithms in [26] apply the fixed flipping rule E (k) j > 0 which is a loose limit for flipping symbols. Our proposed DM-P-SFDP algorithm applies the dynamic flipping threshold to avoid excessive flipping while the M-P-SFDP, A-P-SFDP and F-P-SFDP algorithms adopt the fixed flipping threshold E (k) j > 0 which is too relaxed to guarantee the flipping reliability. Although the convergence speed can be accelerated, they suffer a severe BER performance degradation because of excessive flipping in the first few iterations. The relaxed flipping threshold and no limitation on number of flipped symbols will further lower the reliability of flipping in the iterative decoding thus the A-P-SFDP algorithm suffers from BER performance degradation caused by over-flipping. It can be seen in Fig. 2 that the A-P-SFDP and the F-P-SFDP as multi-symbol flipping algorithms have almost no BER improvement compared with the original single symbol flipping algorithm and the M-P-SFDP algorithm even suffers from BER performance deterioration due to excessive flipping. By contrast, the DM-P-SFDP algorithm achieves more noticeable performance gain than the F-P-SFDP and A-P-SFDP algorithms.

C. LIMITATION ON THE NUMBER OF FLIPPED SYMBOLS
We provide statistical results to show that the proposed DM scheme can be advantageous in terms of BER performance. The proportion of the number of symbols actually flipped in one iteration during the decoding process for C 1 is provided. The comparisons between the existing multi-symbol flipping algorithms (F/A-P-SFDP) and our proposed DM-P-SFDP algorithm are shown in Fig. 4. The logarithmic coordinates are adopted in the Y-axis for a fair comparison. The w in the legend of Fig. 4 represents the number of actually flipped symbols in one iteration during the decoding process. The M-P-SFDP algorithm is not considered below due to the significant performance degradation compared with the original single flipping algorithm. Fig. 4(a) clearly shows that our proposed DM-P-SFDP algorithm applies explicit numerical limitation on the maximum number of flipped symbols in each iteration to avoid excessive flipping. The DM-P-SFDP algorithm has a better convergence speed than the F-P-SFDP algorithm in Fig. 4(b) because the DM-P-SFDP algorithm can effectively flip multiple symbols with an even distribution of the actual number of flipped symbols as depicted in Fig. 4(a). The vertical axis of the Fig. 4(b) adopts the logarithmic coordinates and only one symbol is flipped in most of the cases, which accounts for more than 70% of all iterations. At the same time, the number of symbols of dynamic flipping in our proposed DM scheme is not allowed to exceed the predefined value (e.g., 15), so the occurrence of over-flipping is suppressed. The A-P-SFDP algorithm in [26] flips all the symbols satisfying E (k) j > 0 without limitation. The number of symbols flipped during a single iteration is even more than 50 which leads to serious over-flipping as seen in Fig. 4(c). The shaded area in Fig. 4(c) highlights the cases where the number of actually flipped symbols is larger than 20, which can lead to severe over-flipping. As the authors clarify [26], the A-P-SFDP algorithm provides effective tradeoffs between convergence speed and computational complexity but appears no improvement in BER performance. The probabilistic results demonstrate that the dynamic flipping threshold combined with the number restriction on flipped symbols in our proposed DM scheme limits the probability of occurrences of over-flipping.

D. COMPUTATIONAL COMPLEXITY COMPARISON
The detailed computational complexity comparisons of the aforementioned multi-symbol flipping algorithms are depicted in Table 3. The parameters for NB-LDPC codes in SFDP algorithms are defined in Table 4. Our proposed DM-P-SFDP algorithm has the same Galois field additions/multiplications (GA/GM) and the integer/real additions (IA/RA) computational complexity as that of the F-P-SFDP and A-P-SFDP algorithms and has 2n − 4 more integer/real comparisons (IC/RC) operations than that of the A-P-SFDP algorithm. Although the comparison complexity of our proposed DM-P-SFDP algorithm is slightly increased, we effectively avoid over-flipping and significantly improve the BER performance. The F-P-SFDP algorithm in [26] defines  The A-P-SFDP algorithm suffers from over-flipping without number limitation on flipped symbols. Although the F-P-SFDP algorithm imposes a limitation on the number of flipped symbols at the initial stage of the decoding, the high complexity increment of full sorting still limits the decoding delay in hardware implementation. Our proposed DM scheme avoids the high complexity increment and achieves a faster convergence speed as well as significant performance gain with the help of a dynamic flipping threshold and upper limitation of flipped numbers.

V. SIMULATION RESULTS AND COMPLEXITY ANALYSIS
The proposed DM scheme can be applied to existing single symbol flipping algorithms and we compare the decoding performance and the convergence rate of different flipping algorithms with or without DM scheme. Three regular NB-LDPC codes are selected for testing, denoted by C 1 , C 2 and  C 3 , respectively. The code parameters and the coefficients for the DM scheme in this section are provided in Table 5, where T = 3 stages are set for total 100 iterations. The threshold adjustment factor is designed in consideration of improving the parallelism of flipping as well as avoiding excessive flipping, which is optimized by experimental trials.

B. CONVERGENCE SPEED
The DM-N-D-SFDP algorithm, which almost always has the best performance, also achieves significantly faster convergence rate for three codes. We define the average number of iterations ask and give the statistic results ofk for the DM-x-SFDP (x: D, D-SA, N-D) algorithms in Fig. 6, which drops to less than half of the original x-SFDP algorithms, significantly reducing the decoding delay.
The comprehensive simulation results reveal that the application of the DM scheme allows for better FER performance and faster convergence speed. The proposed DM-x-SFDP algorithms retain high reliability by dynamically adjusting the number of flipped symbols to improve the decoding reliability and reducing the processing latency. Table 6 presents the computational complexity per iteration of various SFDP algorithms. The operation numbers of the Galois field additions/multiplications φ g , the integer/real additions φ a and the integer/real comparisons φ c per iteration of the x-SFDP algorithms are constants, since only one symbol will be flipped. Thus, the average computational complexity of different operations g , a and c in the x-SFDP algorithms can be directly calculated by multiplying the average iteration number (e.g., g =kφ g ). The corresponding operations numbersφ g ,φ a andφ c in the DMx-SFDP algorithms can be represented as the functions of the actual number w k of flipped symbols in the k-th iteration, where w k ≤ w t in the t-th stage, k ∈ (k t−1 , k t ], t = 1, . . . , T. Here, the operation numbers of GA/GM and IA/RA with DM scheme follow the relation, γ ). The IC/RC operations of the x-SFDP and the DM-x-SFDP algorithms in each iteration are φ c = 2(N − 1) + δ c andφ c = 3N − 4 + w k δ c , respectively, where δ c = (γ + d − 1)(γρ + 1 − γ ). We statistic a total of F correctly decoded frames and define the actual iteration number of the DM-x-SFDP algorithms of each frame as k i , i = 1, . . . , F. The average computational complexity of the DM-x-SFDP algorithms is statistically analyzed    Fig. 8 present a comparison of the average computational complexity of the D-SFDP algorithm ( g , a , c ) and our proposed DM-D-SFDP algorithm (ˆ g ,ˆ a , c ) for C 2 and C 3 , respectively. Since the number of flipped symbols w k in the k-th iteration is much less than w t during the t-th stage, k ∈ (k t−1 , k t ], the actual computational complexity of each iteration will rapidly decrease as the iteration number grows. At the same time, the average number of iterationsk is almost halved with the help of the highly parallelized DM scheme. The average computational complexity of DM-x-SFDP algorithms is close to the x-SFDP algorithms. The average complexity with IC/RC operations for the DM-D-SFDP algorithm (ˆ c ) is even less than that of the the D-SFDP algorithm ( c ) as depicted in Fig. 7 and Fig. 8.  As a typical case, we consider the NB-LDPC code over GF(64) in the BeiDou standard and use the iteration distribution histogram to visualize the proportion of the number of actual iterations in Fig. 9. We record the actual number of iterations of the D-SFDP algorithm with or without the DM scheme by F = 10 5 correctly decoded frames. The computational complexity of the D-SFDP algorithm and the DM-D-SFDP algorithm is also presented in the table of Fig. 9. The DM scheme significantly accelerates the convergence and reducesk to 47%. This phenomenon explains why our proposed DM scheme can significantly lower the decoding delay by reducing the average iteration number to less than half. The average computational complexity is statistically analyzed over 10 5 successfully decoded frames with (10), whereˆ g andˆ a in the DM-D-SFDP algorithm increase at most 15% whileˆ c decreases 29% compared with those of the D-SFDP algorithm for C 3 at E b /N 0 = 6 dB.

C. COMPUTATIONAL COMPLEXITY ANALYSIS
The computational complexityˆ c is closely affected by the parameter N, which also decreases 10% to 40% in C 2 with the DM-D-SFDP decoding. Although the computational complexityˆ g ,ˆ a per iteration of DM-x-SFDP algorithms is proportional to the coefficient of w k , which is relatively high at the initial decoding period, the number of iterations decreases to less than half as the DM scheme speeds up the convergence. The actual execution time of the decoding is significantly reduced, as the number of iterations drops to less than half, while the increase in computational complexity of each iteration is minimal.

VI. CONCLUSION
This paper proposes a DM scheme for the SFDP decoding algorithms of regular NB-LDPC codes, which dynamically flips multiple symbols in parallel to accelerate the decoding process. The proposed DM scheme can be highly parallelized in hardware implementations with dynamic modification of the number of flipped symbols in each iteration, which extensively lowers the decoding latency and improves the throughput over existing SFDP algorithms. The resulting DM scheme is better than the single symbol flipping algorithms because it significantly enhances the error correction capability and achieves faster convergence speed with little increase in average computational complexity. Numerical results reveal that the proposed DM-x-SFDP algorithm exhibits better BER performance than existing multi-symbol flipping algorithms and decreases using the global search or sort operations. The implementation of our proposed DM scheme in SFDP algorithms can achieve better trade-offs between the computation complexity and error correction performance, providing a promising new approach for the practical applications of NB-LDPC codes. It has been well demonstrated through extensive simulations that better BER and FER performance can be achieved with a slight increase in average computational complexity. Moreover, our proposed DM scheme has ultra-fast convergence speed and can be highly parallelized for practical applications in hardware implementations.