Low-Density parity-check (LDPC) codes have attracted tremendous attention for their near-capacity performance and their potential of parallel implementation of decoders. Quasi-cyclic (QC) LDPC codes are LDPC codes with regular structure. An advantage of QC-LDPC codes is that efficient encoding methods in [15] can be used. The *M* × *N* parity check matrix (PCM) **H** of a QC-LDPC code *C* can be written in block-matrix form. There are *M*_{b} · *N*_{b} circulants in **H** and each circulant is a *Z* × *Z* square matrix. Therefore, *M* = *M*_{b} *Z* and *N* = *N*_{b} *Z*. A Block-LDPC code [1] is a QC-LDPC code for which the weight of each circulant in its PCM is either 1 or 0. Block-LDPC codes can achieve good error-correcting performance and are suitable for VLSI implementation. The LDPC codes specified in WiMAX standard [16] are Block-LDPC codes.

Message passing decoding (MPD) algorithms including two phase message passage (TPMP) algorithm [2] and layered MPD [3], [4] can be used to decode LDPC codes. LMPD has a convergence speed faster than that of TPMP. In LMPD, **H** of *C* is partitioned into several layers and these layers are processed sequentially. Since the quasi-cyclic structure of Block-LDPC codes, the partitioned layers have regularity and can be decoded by using an identical core matrix **H**^{c}, where the dimension of **H**^{c} is much smaller than that of **H** [5]. This algorithm is called layered MPD using identical core matrix (ICM) and is denoted as LMPD-ICM, which can achieve a convergence speed faster than that of TPMP. In addition, an LMPD-ICM based decoder has the advantage of low interconnection complexity [5].

Conventionally, the MPD updates all of the messages in every iteration. This update is either simultaneous (such as TPMP) or sequential (such as LMPD and LMPD-ICM). In fact, we can select the message-passing schedule according to the observed rate of change of messages [6]. The scheduling of MPD affects the performance of the LDPC decoder significantly [6]. On the other hand, early termination is an important technique for iterative decoders since it can reduce the computational complexity, decoding latency, and power consumption significantly in hardware implementation by stopping unnecessary decoding iterations [7], [8], [9], [10], [11]. In this paper, we propose to reduce the complexity of LMPD-ICM by using dynamic scheduling with the consideration of early termination. We dynamically skip or redo the decoding operations for some layers based on appropriate criteria. In addition, we propose a strategy of early termination which can be combined with the proposed dynamic scheduling and is suitable for hardware implementation.

There are various algorithms such as TPMP, LMPD, and LMPD-ICM proposed for decoding LDPC codes. The decoding operations in these decoding algorithms consist of variable-node operations and check-node operations. Both sum-product algorithm (SPA) and offset min-sum algorithm (OMSA) [12] can be used in the check-node and variable-node operations. Since OMSA is suitable for hardware implementation, in this paper, we consider only OMSA. Since we will propose dynamic scheduling and early termination for LMPDICM which is a variant of LMPD, in the following, we review both LMPD and LMPD-ICM.

### A. Layered Message Passing Decoding: LMPD

To perform LMPD on the code *C*, we have to partition the PCM **H** of *C* into *P* layers, where each layer consists of *M*/*P* rows of **H**. These *P* layers can be viewed as the PCMs of *P* block codes. The intersection of these *P* block codes is exactly the code *C*. LMPD is implemented by sequentially decoding these *P* block codes, and passing the extrinsic values from one layer to the other layers. For Block-LDPC codes, we can select one block row as one layer. Hence, *P* = *M*_{b}. LMPD based on this kind of definition of layers is called conventional LMPD.

### B. LMPD Using Identical Core Matrix: LMPD-ICM

It was shown in [5] that an *M* × *N* Block-LDPC code can be partitioned into layers by a different point of view. The PCM **H** is partitioned into *P* = *Z* layers respectively denoted as **H**_{0},**H**_{1},…,**H**_{P−1}, where **H**_{p} is an *M*_{b} × *N* sub-matrix containing the *p*-th (*Z*+*p*)−th,…,[(*M*_{b}−1)*Z*+*p*]-th rows of **H** for 0 ≤ *p* < *P*. For 0 ≤ *p* < *P*, let **H**′_{p} be a matrix which consists of all the non-zero columns of **H**_{p}. **H**′_{p} can be viewed as the PCM of a block code *C*_{p} whose code bits are from the code bits of *C*. The code length of *C*_{p} is much smaller than that of *C*. From the quasi-cyclic property of **H**, it was demonstrated in [5] that all matrices **H**′_{p}, 0 ≤ *p* < *P*, are column-permutated versions of a core matrix denoted as **H**^{c}. With the core matrix, the decoding of *C* can be accomplished by sequentially decoding *C*_{0}, *C*_{1},…,*C*_{P−1}, and passing the extrinsic values from one layer to the other layers. TPMP algorithm is employed to decode each of *C*_{p}, 0 ≤ *p* < *P*. This decoding is called layered MPD using identical core matrix and is denoted as LMPD-ICM.

SECTION III

## DYNAMICALLY SCHEDULED LMPD-ICM

The dynamic scheduling used in [13] continually monitors the status of variable nodes. If some variable nodes are reliable enough, then they are forced into sleep. These variable nodes will be waked up if they are not reliable enough again. In this approach, a lot of comparisons are used to decide whether a variable node is reliable enough or not.

In this paper, we investigate dynamic scheduling for LMPD-ICM from the point of check nodes instead of variable nodes used in [13]. Let be a row vector which contains the decoded bits associated with code *C*_{p} for *p* = 0,1,…,*P*−1. The number of successively passing the condition of can be used to decide whether layer *p* is reliable enough or not. Layer *p* is viewed as reliable enough if it successively passes the condition of for *N*_{at} times. First, we investigate an approach called dynamic skipping (DS) which is realized by skipping the decoding processes of some layers which are reliable enough until these layers are invalid again, where layer *p* is said to be invalid if . Like the approach used in [13], DS results in an error floor. Thus, we investigate another approach called dynamic enhancement (DE) which is implemented by redoing the decoding processes for those layers which are not reliable enough.

### A. Algorithm

Instead of skipping some layers, DE enhances the operation of some layers according to the result of performed at the end of each round of decoding *C*_{0},*C*_{1},…,*C*_{P−1}, where the 1 × *N* vector is a decoded codeword. We mark those layers for which the associated parity-check constraints are not satisfied and then re-process these marked layers. We also use the result of to determine whether we stop the decoding process or not. Let *N*_{R} denote a predetermined number of reprocessing cycles. This algorithm features that although some layers are not marked, they are still processed at least one time in every round of decoding *C*_{0},*C*_{1},…,*C*_{P−1}. Let Ω be a set which contains the indices of the layers that should be re-processed. The content of Ω will be dynamically updated based on the result of . Let *L*_{M} = *I*_{M} *P* denote the maximum number of layers that is allowed to decode one codeword. In addition, *t* and *l* denote the counters of reprocessed cycles after one round of decoding *C*_{0}, *C*_{1},…,*C*_{P−1} and the total number of processed layers, respectively. The proposed algorithm is described as follows.

Step 1 |
[Initialization] *l* = 0. |

Step 2 |
Set *t* = *N*_{R}. |

Step 3 |
[Layered decoding] Decode *C*_{k}, 0 ≤ *k* ≤ *P*−1 sequentially. Increase *l* by one after decoding every layer. If *l* = *L*_{M}, go to Step 9. |

Step 4 |
Set Ω = ϕ, where ϕ is the null set. |

Step 5 |
[Determine invalid layers] . |

Step 6 |
[Early termination] If Ω = ϕ, go to Step 9. |

Step 7 |
[Re-process] Decode *C*_{k}, for all *k* ∊ Ω sequentially. Increase *l* by one after decoding every layer. If *l* = *L*_{M}, go to Step 9. |

Step 8 |
Decrease *t* by 1. If *t* = 0, go to Step 2. Otherwise, go to Step 4. |

Step 9 |
[Decision] Make hard decision based on the signs of the APP values. Then terminate the decoder. |

Note that the layers processed in Step 7 are not necessarily the same for each reprocessing cycle after one round of decoding *C*_{0}, *C*_{1}, …, *C*_{P−1} if *N*_{R} > 1.

### B. Simulation Results

Through this paper, we will use the (2304, 1152) Block-LDPC code specified in WiMAX to demonstrate the proposed techniques. The encoding methods in [15], [16] can be used to encode this LDPC code. Since the computational complexity in each round of decoding *C*_{0}, *C*_{1}, …, *C*_{P−1} is not the same for LMPD-ICM using dynamic scheduling, in this paper, we use the average number of processed layers, denoted as ANPL, before the decoding procedure is terminated as a measure of complexity. Fig. 1, Fig. 2, and Table I respectively show the BER results, ANPL results, and reduction of ANPL of *C* using LMPD-ICM and dynamic scheduling. Both DS and DE are considered. We see that using DS can reduce the complexity at the cost of BER degradation at high SNR. In contrast, using DE can reduce the complexity without BER degradation at high SNR. Note that, at the SNR region smaller than 1.8 dB, using DS with *N*_{at} = 4 can provide better ANPL performance and similar BER as compared with using DE.

SECTION IV

## DYNAMICALLY SCHEDULED LMPD-ICM USING HARDWARE-EFFICIENT EARLY TERMINATION

In Section III, we stop iterative decoding precisely by checking whether the result of is zero or not. This stoping criterion is denoted as PC-**H** which is executed at Step 5 and should be completed within a short time interval for only slightly decreasing the throughput. Hence, it costs a lot of hardware resource to perform this operation. Stoping criteria using *a posteriori* probability (APP) values of variable nodes and check equations of check nodes were investigated in [7], [8] and [9], [10], respectively. These techniques are interesting but involve massive mathematical computations in examining the stoping conditions. Now, we propose a technique of early termination which is suitable for hardware implementation.

### A. Stoping Criterion for LMPD-ICM

In LMPD-ICM, we can use the core matrix **H**^{c} to decode *C*_{0}, *C*_{1}, …, *C*_{P−1} sequentially. In the proposed stoping criterion, we check whether the code bits of *C*_{p} satisfy the parity-check constraints or not at the end of decoding *C*_{p} by using **H**′_{p}, which is a column-permuted version of **H**^{c}. In other words, we perform parity checks of *C* at *P* different time instants. As compared to PC-**H**, the proposed early termination can reduce the hardware resource by a factor *P* although the computational complexity is the same. For the code used in this paper, *P* = 96. When all the parity constraints defined by **H**^{c} for a layer are satisfied, the layer is viewed as a valid layer. The proposed criterion is that, if there are λ consecutive distinct valid layers, the decoder will be terminated. In order to check all the constraints in the PCM, the minimum value of λ is *P*. Since the check procedure is accomplished at distinct time instants, the previous valid layers might become invalid in the decoding of later layers. Hence, λ should not be smaller than *P*. If we increase the value of λ, then we can decrease the probability of undetected errors at the cost of higher ANPL. In the implementation, we need a counter which is initially set to zero, to accumulate the number of consecutive valid layers. If an invalid layer occurs before the counter reaches to λ, the counter will be reset to zero and re-accumulate the valid layers again. This early termination can be easily realized with slight usage of hardware resource since the dimension of **H**^{c} is much smaller than that of **H**.

Fig. 3 shows the simulation results of *C* using LMPD-ICM and the proposed early termination. Also included in this figure are the results of LMPD-ICM, conventional LMPD, and TPMP using hard-decision aided (HDA) criterion [11] which was used in the chip implementation of WiMAX decoder in [14]. The HDA criterion is that if there is no difference between the decoded message words produced at any two consecutive iterations, then the decoding process is terminated. From Fig. 3, we see that using HDA criterion as stoping criterion for LMPD-ICM, conventional LMPD, or TPMP results in degradation of error performance at high SNR. Using the proposed criterion with λ = 1.5 × *P* = 144, the BER is similar to the case of using PC-**H**. Note that the BER of *C* using LMPD-ICM with maximum iteration number of 100, i.e., *L*_{M} = 9600, is the same as “curve (C7)” in Fig. 3.

### B. Stoping Criterion for LMPD-ICM With DE

In order to check whether the decoded codeword is correct or not, all the check constraints in the PCM **H** should be examined for at least one time. Hence, DS is not suitable for the proposed stopping criterion since it skips some layers in one-round decoding of *C*. On the other hand, DE guarantees every layer to be examined for at least one time in one-round decoding of *C*. Thus, DE is suitable to be combined with the proposed stopping criterion. In the following, we investigate the performance of LMPD-ICM using proposed stopping criterion and DE (*N*_{R} = 1). Let *c* denote the number of consecutive valid layers. The proposed algorithm is described as follows.

Step 1 |
[Initialization] *l* = 0, *c* = 0. |

Step 2 |
Set Ω = ϕ, *k* = 0. |

Step 3 |
Decode *C*_{k} to obtain . If add *k* to Ω and reset *c* to 0. Otherwise increase *c* by 1. |

Step 4 |
Increase *l* by 1. If *l* = *L*_{M}, go to Step 8. |

Step 5 |
[Early termination] If *c* = λ, go to Step 8. |

Step 6 |
Increase *k* by 1. If *k* = *P*, go to Step 7. Otherwise, go to Step 3. |

Step 7 |
[Reprocess] Decode *C*_{k} for all *k* ∊ Ω. Increase *l* by one after decoding every layer. If *l* = *L*_{M}, go to Step 8. After all the layers indexed by Ω are processed and *l* remains smaller than *L*_{M}, go back to Step 2. |

Step 8 |
[Decision] Make hard decision based on the signs of the APP values. Then terminate the decoder. |

The variable *c* counts the consecutive valid layers only in the normal operation stage, i.e., Step 3. When the decoder is in the re-process stage, i.e., Step 7, *c* remains unchanged since there is no checking procedure used here. Hence, early termination never occurs in Step 7.

Fig. 3 and Fig. 4 respectively show the BER and ANPL results of *C* using LMPD-ICM with or without DE. We see that, for the proposed stopping criterion, a trade-off between the ANPL performance and the BER performance can be achieved by varying the value of λ. In addition, DE can reduce the ANPL and improve the BER performance.

Both dynamic scheduling and early termination are proposed for LMPD-ICM which can be realized by using a core matrix. The modified MPD using these two techniques simultaneously can reduce the computational complexity with similar error performance as compared to the case of not using these two techniques. The proposed early termination is suitable for hardware implementation. In addition, LMPD-ICM with proposed stopping criterion can achieve better error performance as compared to TPMP and LMPD using conventional stopping criterion such as HDA criterion. The proposed techniques can also be applied to the conventional LMPD. The reduction in computational complexity is expected.