High-Throughput Multi-Frame Decoding of QC-LDPC Codes With Modified Rejection-Based Minimum Finding

The key computation in the min-sum decoding algorithm of a Low-Density Parity-Check (LDPC) is finding the first two minima and also the location of the first minimum among a set of messages passed from Variable Nodes (VNs) to Check Nodes (CNs) in a Tanner graph. In this paper, we propose a modified rejection-based scheme for this task which is able to find the one-hot sequence of the minimum location instead of its index. We show that this modification effectively reduces the complexity of min-sum decoding algorithm. Additionally, we reveal a pipelining potential in such a rejection-based architecture which facilitates the multi-frame decoding of Low-Density Parity-Check (LDPC) codes and therefore results in an improvement in decoding throughput with bearable hardware overhead. Synthesis and floorplanning in an industrial 28 nm CMOS technology show improved results in terms of throughput, power, and chip area.


I. INTRODUCTION
Low-Density Parity-Check (LDPC) codes are one of the selected forward-error correction candidates for nextgeneration wireless communication systems like 5G and IEEE 802.11ax. However, despite the promising performance, the decoding complexity of LDPC codes is still a barrier toward high-throughput applications of these codes. In order to address this downside of LDPC codes a great deal of effort has been expended on different aspects of them like construction, encoding, and decoding.
The major decoding method for LDPC codes is the iterative Belief Propagation (BP) algorithm run over the Tanner graph representation of these codes. BP algorithm in its primary form is conducted as a sum-product algorithm [1] in which a series of messages are successively exchanged between the nodes of the Tanner graph. Min-sum decoding algorithm [2], [3] is a simplified method compared with the sum-product scheme for BP decoding of LDPC codes replacing complex The associate editor coordinating the review of this manuscript and approving it for publication was Faissal El Bouanani . hyperbolic tangent functions with simpler minimum finding computations. More specifically, in min-sum decoding algorithm the computation of messages passed from CNs to VNs in a Tanner graph during decoding is targeted for simplification, and a CN finds the minimum value among the messages receiving from its neighbor VNs instead of computing complicated hyperbolic tangent functions.
After a more detailed introduction of min-sum decoding in section II it becomes evident that the essential computation of this decoding scheme is finding the first and second minima together with the index (or location) of the first minimum among some binary values. This computation, although simpler than the original computations in sum-product algorithm, should still be performed with the most simplest approach in order to avoid overall decoding complexity. To this goal, different approaches have been proposed specifically for application to min-sum decoding which [4] gives an overview of them.
In this regard, authors in [5] proposed a hardware-friendly rejection-based technique which is able to accomplish this task faster than the former techniques, only with insubstantial increase in circuit area. Additionally, we proposed in [4] a modification to this method which suggests to use one-hot sequence representation for the location of minimum instead of its binary index, claiming that this modification has implementation benefits. More specifically, the rejection-based scheme of [5] finds the index of the first minimum, while in our modified approach the one-hot sequence of the first minimum is found. In this paper, we first re-investigate this modification and show how the rejection-based technique can be modified to output one-hot sequence of the minimum. Besides, an important pipelining capacity of this method is revealed which is able to boost decoding throughput while maintaining the overall hardware overhead acceptable and not increasing the consumed energy per bit. The proposed idea of multi-frame decoding with the modified rejection-based scheme is synthesized and also floorplanned in an industrial 28 nm CMOS technology. Results from this synthesis reveal a considerable improvement in throughput, power consumption, and also chip area of the min-sum decoder.
The organization of the paper is as follows. Section II provides the necessary preliminaries, specifically about min-sum decoding algorithm and also rejection-based method for finding the first two minima. In section III, the modified rejection-based scheme is introduced and its pipelining capacity for boosting decoding throughput is revealed in section IV. Section V is devoted to simulation, synthesis, and floorplan results, and final conclusions are made in section VI.

A. QC-LDPC CODES AND LD SCHEDULE
In a nutshell, the codewords of a Quasi-Cyclic LDPC (QC-LDPC) code are spanned by a sparse Row-Column (RC)-constrained [6] matrix composed of only Circulant-Permutation Matrices (CPMs) and zero matrices. A CPM can be assumed as an identity matrix in that all the rows have been shifted cyclically. Fig. 1-(a) shows an example Parity-Check Matrix (PCM) of a QC-LDPC code.
BP is the major decoding algorithm of LDPC codes, performed based on the Tanner graph representation of them. The Tanner graph of an example LDPC code has been shown in Fig. 2 which is composed of two sets of nodes and a number of connections between them. The gray circles, called VNs, represent the code bits or columns of the PCM, while the white circles, called CNs, denote the check sums or rows of the PCM. During the BP algorithm, reliability messages are successively passed between VNs and CNs in an attempt to obtain probabilities expressing whether a given symbol in a received codeword is '1' or '0'. The resulting sequence will then have the maximum probability according to the received soft-decision sequence. We assume Z j,l as a Variable-to-Check (VTC) message passed from l th VN to j th CN and L j,l as a Check-to-Variable (CTV) message passed from j th CN to l th VN.
The rules governing the order of the exchange of messages between the nodes of a Tanner graph is referred to as the schedule of the BP algorithm. One basic possibility is the flood-like schedule, in which, in each iteration, firstly all the VNs update their messages and send them to CNs and then CNs update their own messages and send them back to VNs. Flood schedule facilitates a fully parallel decoding architecture, yet at the cost of high interconnect complexity. Layered Decoding (LD) is rather a partially parallel schedule which can propose faster convergence rate, improved coding gain, and reduced hardware overhead over flood schedule with lower decoding complexity [7]. In LD schedule, the rows in the PCM are split in layers, and then the BP algorithm runs over layers in successive order. In other words, each iteration of the algorithm is split into several sub-iterations, during each one reliability messages are exchanged between CNs of that layer and their neighbor VNs. At the end of each sub-iteration the updated reliability messages are handed down to the next layer. Accordingly, only a subset of CNs and VNs participate in each sub-iteration, and layers are processed successively from top to down the PCM.
The complexity of LD schedule is lowered if the layers in the PCM have only single-or zero-weight columns. QC-LDPC codes have inherently such a property, if each row of CPMs is considered as a layer. Moreover, by the shuffling idea proposed in [8], [9] the implementation complexity of LD is further simplified, since it suffices to define only the connections of the first layer of the shuffled PCM for decoding. The shuffling only interleaves the row orders and thus does not degrade the Bit-Error Rate (BER) performance. More details on the principles of shuffling, how it is performed and how it is beneficial are found in [9]. Fig. 1-(b) shows the shuffled version of the example PCM.

B. MIN-SUM DECODING
Min-sum decoding is a simplified variant of BP algorithm, in which the magnitude of a CTV message |L j,l | is approximated by the minimum of the magnitude of all the VTC messages |Z j,l | arriving in s j from all its neighbors except v l . For example, in Fig. 2, CTV messages sent from s 1 to its neighbor VNs v 0 , v 1 and v 3 have the magnitude of |L 1,0 | = min(|Z 1,1 |, |Z 1,3 |), |L 1,1 | = min(|Z 1,0 |, |Z 1,3 |) and |L 1,3 | = min(|Z 1,0 |, |Z 1,1 |). This implies that the CTV messages sent from a CN to its neighbor VNs are either the first or second minimum among the VTC messages that CN has just received. For instance, if one assumes that the relation |Z 1,0 | < |Z 1,1 | < |Z 1,3 | holds for the previous example, then it results in |L 1,0 | = |Z 1,1 |, |L 1,1 | = |Z 1,0 | and |L 1,3 | = |Z 1,0 |. Accordingly, the task of a CN processing unit in a min-sum decoding algorithm reduces to finding the first and second minimum among the incoming VTC messages together with the location of the first minimum, with the latter needed, because the CTV message at that location is simply valuated as the second minimum and all the other CTV messages as the first minimum.   the minimum exclusively for min-sum LDPC decoders [5], [10]- [18]. The sorting-based approach in [10] is a radix-2 tree structure which makes ρ − 2 + log ρ comparisons between the ρ binary numbers in a tree-like structure in order to find the first two minima min 1 and min 2 . The other architecture proposed in [10] constructs a 2 k -input minimum-value generator using two 2 k−1 -input minimum-value generators in a tree form. In [11], a two-input module is implemented with one adder and one multiplexer and a three-input module with three adders, five multiplexers and several simple logical gates (refer to Figures 13-a and 13-b of [11]). Based on these two modules bigger multi-input designs can be realized in hierarchical form. Authors in [12] modify the sorting-based approach of [10] to perform better in area and speed. However, their modification only touches how min 2 is found and the remaining procedures i.e. determining min 1 and its index remains intact. [13] further generalizes the approach of [10] aiming at using mixed radix architectures that improve the architecture latency. The modified tree structure of [14] requires less number of comparisons to find min 1 and min 2 with respect to [10] and [16] achieves the same goal by reusing intermediate comparison results calculated for min 1 for collecting the candidates of min 2 . [17] presents an algorithm called exMin that reduces the required hardware by using an estimation of the second minimum as min 2 .
In [15] and [18] a different strategy is adopted wherein the finding of the first two minima is based on scanning of each input data from the Most Significant Bit (MSB) to the Least Significant Bit (LSB). This strategy is referred to as bit-serial architecture and assumed as an alternative to the tree structure.
Among them, tree-based architectures have been of greater importance, whose latest variant is the rejection-based method proposed by the authors in [5]. Let x 0 , . . . , x ρ−1 be the ρ binary fixed-point numbers, each K bits long, whose first two minima min 1 and min 2 are to be found. In this method, two 3 × 2-MIN and 4 × 2-MIN modules, shown in Fig. 3 serve as the fundamental building blocks for constructing larger modules with arbitrary number of inputs. In these modules in the first step all the two combinations of the ρ = 3 or 4 numbers are compared in pair by the MIN units, thus yielding the flags a, b, c in 3 × 2-MIN or a, b, c, d, e, f in 4 × 2-MIN modules. These flags are then used to form the select bits of the multiplexers, according to the relations stated at the bottom of the two modules. In these relations, the signs ''.'', ''+'' and ''bar'' represent logical operations of AND, OR and complement respectively. Note further that each MIN unit in these modules outputs ''1'' if its upper input is bigger than its lower input and ''0'' otherwise.

III. MODIFIED REJECTION-BASED SCHEME
Besides the binary representation, there is an alternative way of specifying the location of min 1 , referred to as the one-hot sequence. A one-hot sequence is a ρ-bit binary sequence, in that all the bits except one are '0'. The only '1' in the sequence resides in the same location as the index of the min 1 . For example, for ρ = 4, the one-hot sequence may be ''1000'', ''0100'', ''0010'' or ''0001'', in each case the location of '1' specifies the index of the min 1 .
We claim that the one-hot sequence is the superior way of locating min 1 instead of its binary representation, since it simplifies the overall min-sum decoding algorithm. To argue that, the schematic diagram of the relevant part of the min-sum decoding scheme is sketched in Fig. 5. This diagram illustrates how CTV messages are computed from the received VTC messages at a CN with degree 4. Among the four received VTC messages, min 1 , min 2 and location of min 1 must be determined. Then, the value of the CTV message at the location of first minimum takes the value of min 2 and all the other CTV messages take the value of min 1 . In Fig. 5-a the index output is the location of min 1 specified as the one-hot sequence based on the modified rejection-based technique described shortly after. As shown, as many multiplexers as the weight of the processing CN, in this example 4, are needed, and the bits of index output, namely index i are used directly as the select bits of the multiplexers. If a select bit is '0', S 1 which is min 1 is selected, and otherwise S 2 which is min 2 . Hence, only one of the multiplexers outputs min 2 and all the others output min 1 . In contrast, in Fig. 5-b index is   FIGURE 5. Computation of CTV messages in a min-sum decoding algorithm, when the index of min 1 is specified by (a) one-hot sequence according to the proposed modified rejection-based technique, and (b) binary representation according to rejection-based technique of [5].
the binary representation of min 1 location provided by the rejection-based technique of [5]. As a result, a set of additional comparators which have been boldfaced in the figure are required to produce the select bits of the multiplexers. A comparator outputs '1' only if its two inputs equal. The first input of the comparators are respectively the different possibilities of the index, in this example ''00'', ''01'', ''10'', and ''11''. Accordingly, only one of the comparators will have a '1' in output, leading to min 2 to appear at the output of the multiplexer connected to that comparator. The other comparators will have '0' in output, leading to min 1 to appear at the output of their multiplexers. Based on this argument, one can claim that one-hot sequence is the preferred way for representation of the location of min 1 in min-sum decoding method, since the need for extra comparators is eliminated.
The two 3 × 2-MIN and 4 × 2-MIN circuits in Fig. 3 can be modified to output the one-hot sequence of the location of min 1 instead of its binary index. Fig. 6 illustrates the modified circuits in them the outputs I 0, I 1, I 2, and I 3 are the bits of VOLUME 10, 2022   These relations have been expressed in an optimized form which is straightforward to do so with using basic methods for optimized implementation of logic functions like Karnaugh map [19] or one of other advanced methods.
In our modified rejection-based scheme only the fundamental 3 × 2-MIN and 4 × 2-MIN circuits are modified to output the location of min 1 as a one-hot sequence. However, building larger modules with arbitrary number of inputs follows the same procedure as exemplified for the original scheme in Fig. 4.

IV. MULTI-FRAME DECODER ARCHITECTURE
In this section, we reveal a potential for improving decoding throughput of LDPC codes. The flowchart of the decoding procedure is shown in Fig. 7. Decoding begins with fetching a new sequence whose corresponding codeword is sought. This sequence is initially checked if it satisfies the paritycheck equations. If yes, it means that it is already an error-free codeword and a new sequence can be taken in for decoding. Otherwise, it enters the decoding loop until it is decoded successfully or the preset number of iterations represented by J max is reached and decoding fails. Each round of the decoding loop is in fact one sub-iteration performed on a layer of a PCM. Processing steps in the decoding loop are as follows: 1) Initialization/Update VTC messages: For a new sequence for decoding at its first iteration and first subiteration A Posteriori Probability (APP) values Y l , 0 ≤ l < n are initialized with y l , the soft-decision sequence at the output of the channel, and CTV messages are initialized with 0, i.e., L j,l = 0, 1 ≤ j ≤ E, 0 ≤ l < n.
Here, n is the code length and E is the number of rows in each layer of the PCM. VTC messages are initialized at each iteration with APP values, i.e., Z j,l = Y l , 1 ≤ j ≤ E, 0 ≤ l < n. 2) Update CTV messages: CTV messages are updated as where with A l = {j : h j,l = 1, 1 ≤ j ≤ E} representing the set of CNs in layer 1 connected to v l . Fig. 8 shows the implemented decoding architecture for the example (12,4)-QC-LDPC code whose PCM was shown in   Fig. 9. Such a block is comprised of two main parts each specified by a dashed contour. The upper one is exactly a circuit similar to Fig. 5-(a) responsible for calculating the magnitude of the CTV messages according to part II of (1). VTC messages input to this circuit are first converted from 2's complement format to sign-magnitude format. The lower circuit in Fig. 9 is responsible for calculating the sign of CTV messages according to part I of (1). The inputs of this circuit are the sign, i.e., the MSB of the VTC messages. At the last stage, the sign of a CTV message is combined with its magnitude, yielding the corresponding CTV message in 2's complement format.
Each processing step in the decoding loop can be implemented in one clock cycle, except for the step of updating CTV messages, which is equivalent to finding the first two minima and also the location of the first minimum among a multiple of VTC messages. The rejection-based approach outlined in section II-C or its modified version proposed in section III do not carry out their task in one step. Therefore, they need more than one clock cycle to perform their task if reserving one clock cycle for each step.
By examining the structure of 3 × 2-MIN and 4 × 2-MIN modules in Fig. 3 or 6 it is deduced that they need two clock cycles for their operation. In the first clock cycle the pairwise comparisons of the inputs A, B, C and D are carried out and the flags a, b, c, d, e and f are formed. Then, in the second clock cycle the outputs of the module are generated. When building modular circuits with 5-8 inputs, as exemplified by Fig. 4, two levels of fundamental units are needed. This results in the latency of 2 × 2 clock cycles. Likewise, when building larger modular circuits with 9-16 inputs, the number of levels of fundamental modules will be 3 and hence they need 3 × 2 clock cycles. By deduction, we can state that the number of levels of fundamental modules needed for building a configuration with ρ inputs is log 2 ρ − 1. This yields the latency of 2( log 2 ρ − 1) + 1. The one extra clock cycle is required to perform the final sorting between the two outputs of the module in order to determine which one is min 1 and which one is min 2 .
The multi-level structure of the (modified) rejection-based scheme can be used smartly to increase decoding throughput without considerable hardware overhead. In this multi-level structure when one layer is given input, its previous layers are already done and they are in idle mode. Consider e.g. the configuration in Fig. 10, i.e., a 16 × 2-MIN module requiring 7 clock cycles to finish its work. During the first two clock cycles the four 4×2-MIN modules at level-1 are active. During the clock cycles 3 and 4 these two modules stand by and the two 4×2-MIN modules at level-2 start operating. The last three clock cycles belong then to the single 4 × 2-MIN module at level-3. Apparently, for this architecture, the flow of inputs could become non-stop and the module can take new inputs every clock cycles.
The functional (also known as Register-Transfer Level (RTL)) simulation of the 16-input modified rejection-based module of Fig. 10 conducted with Modelsim TM is illustrated in Fig. 11. The figure shows that the module needs initially 7 clock cycles to output the result of the first inputs. But afterward, it is given input every clock cycles, and it gives output every clock cycles as well. In this figure, ''clk'' is the clock signal and the 16 inputs are denoted by ''d1'' to ''d16''. The signals ''O1'' and ''O2'' are min 1 and min 2 respectively and ''min_index'' is the location of min 1 as a one-hot sequence.
Based on this fact, the decoder architecture in Fig. 8 can accept several sequences for decoding consecutively. The downside of this multi-frame processing idea is the additional overhead. When several sequences are taken in for decoding, it is not known which one will be converged to a valid codeword and leave the decoder sooner than the others. Therefore, the decoded sequences will be most likely out of the order with which they have entered the decoder and a need for numbering the sequences and tracking their order is inevitable. On one side, this introduces additional hardware overhead to accomplish the numbering and tracking of the sequences. On the other side, it imposes a restriction on the maximum number of sequences which can be decoded at the same time. The latter is due to the memory size which is needed to order decoded codewords. We refer to this parameter as the multiplicity factor which is a positive integer and determines the maximum number of sequences that can be decoded simultaneously. When multiplicity factor is one, there is not any cohesiveness in the decoder and a sequence enters decoder only if the previous sequence has left it. For multiplicity factor bigger than one the decoding throughput is expected to increase however with the price of hardware overhead.
On the other hand, the flip-flops storing APP, CTV and VTC values must be replicated, each for storing values specific to one particular sequence. Fig. 12 shows the effect of this change in the corresponding sections of Fig. 8. The number of replications equals the parameter of multiplicity factor in the design.

V. PERFORMANCE ANALYSIS AND COMPARISON RESULTS
The proposed LDPC decoder has been synthesized and floorplanned in a 28 nm CMOS process with 1.0 V supply. In order to analyze the decoder in terms of chip area, maximum allowed clock frequency, power consumption, and achieved throughput the experimental setup shown in Fig. 13 has been utilized. The entire steps of generating messages, encoding, Binary Phase-Shift Keying (BPSK) modulation, sending over an Additive White Gaussian Noise (AWGN) channel and then demodulation were carried out with MATLAB. The resulting soft-decision sequences are quantized uniformly and converted to fixed-point values in 2's complement format and then saved in a testbench file. This testbench file is utilized during the step of netlist simulation for estimating decoding throughput. Synopsys Design Compiler was used   to generate gate-level netlist of our design. Post-synthesis simulations was carried out with Cadence NCSim-Simulator and static-timing analysis as well as power analysis of the netlist was conducted with Synopsys Prime Time. For the ease of reference, all the used parameters with their preset values are presented in Table 3 Our experiments and simulations have been conducted with four QC-LDPC codes (four code-rates) from IEEE 802.11 standard, although any QC-LDPC code from other communication standards like 5G and WiMax can also be examined. Synthesis results are found in Table 4. The synthesis has been conducted with multiplicity factor of 1, 4, and 8, for each code. This parameter, as discussed in section IV, indicates the maximum number of frames that can be decoded simultaneously. All syntheses have been conducted with (6,2) quantization bits, meaning that floating-point values are converted to 6-bit fixed-point values in which 2 bits are dedicated to the fractional part and the remaining 4 bits to integer part. This specific choice is based on MATLAB simulation results depicted in Fig. 14 showing that the BER performance degradation with (6,2)-bit fixed-point values compared with the case of floating-point values is negligible. In simple words, allocating more bits during quantization increases the precision, resulting in an improved performance closer to the performance of an ideal LDPC decoder with floating-point values. However, this increases the hardware area and complexity. The E b /N 0 values in Table 4 for which the BER is 1e-7 are also derived based on the simulations in Fig. 14. The loss in gain at BER of 1e-7 when converting from floating-point to VOLUME 10, 2022   fixed-point is nearly 0.1 dB. The average number of iterations needed by the decoder to reach a specific BER performance is also important to consider when examining the effect of quantization. The average number of iterations related to the BER curves of Fig. 14 are shown in Fig. 15, also showing that the increase in number of iterations is negligible when using quantized messages during decoding. It is also worth mentioning that the MATLAB model for acquiring BER curves with fixed-point messages is an exact translation of the implemented decoding architecture shown in Fig. 8 and 12.
As the multiplicity factor rises from 1 to 4 and then 8, throughput also increases largely, while the energy efficiency remains nearly unaltered. Besides, the increase in chip area is with a smaller rate than the increase rate of throughput. To show this evidently Fig. 16 depicts the bar graph of the increase rate of both throughput and chip area for the four examples. In these graphs multiplicity factor of 1 is the baseline for comparison with which the increase rate of throughput and chip area for multiplicity factor of 4 and 8 are compared. The graphs approve that the idea of multi-frame decoding effectively improves the overall decoding throughput with a bearable hardware overhead.
The clock frequency values in Table 4 (and also Table 5) are the maximum values by which the design can work, and they are estimated based on the parameter of Worst Negative Slack (WNS) reported by the synthesis tool. The clock frequency may be increased so long as WNS is positive. When WNS becomes negative, it indicates that the clock frequency is too  fast for data to travel through data paths of the chip within one clock cycle. Therefore f clk must be lowered. Among the four LDPC codes of study, the PCM becomes larger as the code-rate decreases. In summary, decoding complexity is expected to rise as code-rate decreases and multiplicity factor increases. This serves as the reason for smaller clock frequency possible with half-rate code when multiplicity factor is 4 or 8.
The further step of floorplanning was performed only for two codes of minimum and maximum rate, i.e., 1/2 and 5/6. The critical path delay in the two cases are respectively 0.4 and 0.3 ns, yielding the maximum clock frequency of 2.5 and 3.33 GHz. The verified netlist is imported in Cadence-Innovus tool for backend-design process with eightmetal layers. Standard cells, power grids and stripes are placed on the core area. Afterward, clock-tree synthesis and timing optimization are performed. Further steps are power routing, signal routing and insertion of leaf cells, and finally is the post-route static-timing analysis and optimization. Chip layout of the implemented LDPC decoder for the (648,324) code is also shown in Fig. 17.
In order to have a comparison to the state-of-the-art Table 5 presents our results, together with the results of some previous works of the field. The present work has absolute superiority over the state-of-the-art in regard to chip area and clock frequency. The clock frequency the proposed design can work with is at least 3 times the value of other works. The core area in our design is smaller than all other designs, even the design of [21] which has implemented a partially parallel architecture. This is a clear indication that the proposed architecture is of lower complexity, despite the fact that we VOLUME 10, 2022   have considered a bigger quantization precision compared to others. In order to have a more fairer comparison in regard to the decoding throughput the last row of the table holds normalized throughput in each case, in which the achieved throughput has been normalized with respect to chip area. These results show a considerable improvement in the normalized throughput as a result of our multi-frame decoding idea. In terms of power and energy efficiency our design is however insubstantially inferior. This is likely because of the circuitry needed for tracking frame numbers, storing APP, CTV and VTC messages for multiple frames, and also circulating the APP values during the decoding.

VI. CONCLUSION
A modified rejection-based scheme for finding the first two minima and location of the first minimum in a min-sum decoding algorithm of an LDPC code was proposed. In this modified method the location of the minimum is derived as a one-hot sequence instead of an index, thus leading to simplification of the min-sum decoding. In addition, rejectionbased scheme allows for further pipelining of the decoding procedure and thus a multi-frame decoding architecture. This idea can effectively increase decoding throughput without prohibitive hardware overhead and thus is a practical idea. Our synthesis and post-layout simulation results in an industrial 28 nm CMOS technology approves the effectiveness of the multi-frame processing in increasing throughput with reasonable hardware overhead.