Error-Aware SCFlip Decoding of Polar Codes

The successive-cancellation flip (SCFlip) decoder and its variants provide a significant coding gain with the average complexity practically identical to that of the successive cancellation (SC) decoder in a wide range of signal-to-noise ratios (SNRs). But, they suffer from high complexity and long latency when the SNR decreases, since the average number of extra decoding attempts becomes inevitably large. To mitigate this problem, we propose a novel SCFlip decoder, called an error-aware SCFlip (EA-SCFlip) decoder, for distributed cyclic-redundancy-check (CRC) polar codes. Based on the distributed CRC bits, it employs early termination at each extra decoding attempt so that it can reduce the decoding complexity and latency on the average. It also reduces the search space of candidate bit-flips in the dynamic building of the bit-flip list by exploiting the parity-check relationship (PCR) of the first error-detected CRC bit at each extra decoding attempt. Furthermore, we propose a greedy algorithm to design a distributed CRC code such that the obtained PCRs make the early-error-detection capability of the EA-SCFlip decoder as high as possible. Numerical results demonstrate that the EA-SCFlip decoder can indeed achieve an early termination gain as well as a complexity reduction, when a polar code is concatenated with the distributed CRC code designed by the proposed algorithm.


I. INTRODUCTION
Polar codes, introduced by Arikan [1], are the first class of structured error-correcting codes that are proved to achieve the capacity of an arbitrary binary-input discrete memoryless channel asymptotically with an exponent of 1/2 under successive cancellation (SC) decoding. However, they have quite poor finite-length performance due to the presence of imperfectly polarized bit-channels and the suboptimality of SC decoding. To improve their performance, several decoding methods including SC list (SCL) decoding [2], [3], SC stack (SCS) decoding [4], and SC flip (SCFlip) decoding [5] have been developed as well as various code constructions [6], [7] have been suggested. As a result, polar codes were recently adopted as a channel coding scheme for the control channel in 5G New Radio (NR) [8].
Cyclic redundancy check (CRC) codes were introduced by Tal and Vardy [2], [3] in order to improve the performance of polar codes under SCL decoding. Conventionally, the CRC The associate editor coordinating the review of this manuscript and approving it for publication was Cunhua Pan .
bits are placed at the end of the information block. On the other hand, Chen et al. [9] proposed distributed CRC codes by introducing an interleaver between the CRC encoder and the polar encoder in order to make CRC bits scattered over the information block. These distributed CRC bits can be used in the early termination process to reduce the decoding latency or in the tree pruning process to improve the performance of SCL decoding. Later, some greedy algorithms to design an interleaver for distributed CRC codes were presented in [10], [11] so that the CRC bits are placed at the earlier positions of the information block. For short notation, a polar code concatenated with a (distributed) CRC code will be referred to as a (distributed) CRC-polar code in this article.
Although the SCL decoder provides outstanding performance of a CRC-polar code, it suffers from high complexity and long latency. As an alternative, the SC flip (SCFlip) decoder [5] performs standard SC decoding and possibly extra decoding at most T times until the CRC constraints are satisfied. Later, several methods to achieve a complexity reduction or a performance improvement have been investigated [12]- [16]. In particular, dynamic SCFlip (D-SCFlip) decoding [17] is a novel method to boost the code performance by introducing a new metric to a bit-flip, allowing multi-bit flipping and updating a bit-flip list dynamically. Recently, some modifications of the D-SCFlip decoder were proposed. A simplified D-SCFlip decoder was proposed in [18] to improve the throughput by exploiting easily-decoded subcodes such as rate-0, rate-1 and repetition nodes. In [19], the bit error rates estimated via Gaussian approximation were used to reduce the search space for updating the bit-flip list.
The SCFlip decoder and its variants only perform standard SC decoding for most of the received frames at a middle-tohigh signal-to-noise ratio (SNR) region. That is, they provide a significant coding gain with the average complexity practically identical to that of the SC decoder in a wide range of SNRs. But, they require a number of extra decoding attempts at a low SNR region. This phenomenon gets worse when the maximum number of extra decoding attempts is allowed to be sufficiently large for a performance improvement. Finally, they suffer from high complexity and long latency as the SNR decreases.
In order to mitigate the complexity and latency problem, we propose a novel SCFlip decoder, called an error-aware SCFlip (EA-SCFlip) decoder, for a distributed CRC-polar code. Whenever an error is detected by one of the distributed CRC bits at each extra decoding attempt, the proposed EA-SCFlip decoder early terminates the corresponding decoding procedure so that it can reduce the decoding complexity and latency on the average. Given an error-detected CRC bit at a decoding attempt, it also makes use of the corresponding parity-check relationship (PCR) to limit the search space of candidate bit-flips. This leads to reducing the complexity of updating a bit-flip list at each extra decoding attempt. Furthermore, in order to maximize the effect of early error detection in the EA-SCFlip decoder, we propose a greedy algorithm to design a distributed CRC code having a high early-error-detection capability. The proposed algorithm tries to find the parity-check matrix (PCM) for a new equivalent CRC code by applying a number of elementary row operations and column permutations to the PCM for a CRC code with a given generator polynomial. Numerical results demonstrate that the EA-SCFlip decoder can indeed achieve an early termination gain as well as a complexity reduction, when a polar code is concatenated with the distributed CRC code designed by the proposed algorithm.
The rest of paper is organized as follows. Section II provides a background on polar codes, the SC decoder and the SCFlip decoder. In Section III, we propose a greedy algorithm to design a distributed CRC code having a high earlyerror-detection capability. The proposed EA-SCFlip decoder is presented in Section IV. In Section V, we evaluate the early termination gain of the proposed EA-SCFlip decoder and analyze its computational complexity. Finally, we give concluding remarks in Section VI.
Notation: We use calligraphic letters (e.g. A) to denote sets. Conventionally, we use A c to denote the complementary set of A. We write boldface lowercase letters (e.g. a) to stand for vectors, and write boldface uppercase letters (e.g. A) to denote matrices.

II. PRELIMINARIES
A polar code is characterized by a 2-tuple (N , I), where N = 2 n is the code length and I = {i 1 , . . . , i K I } ⊂ {1, . . . , N } is an information set of cardinality K I = |I|. A polar codeword c is obtained by where G N is the generator matrix and u is an encoder input vector which has K I information bits at the indices in I and N − K I frozen bits at the indices in I c . For classical polar codes [1], G N is given by the n-th Kronecker product of F = 1 0 Given the received signals y N 1 and the previous estimateŝ u i−1 1 , the SC decoder for a polar code computes the estimate of u i asû where L i is the decision log-likelihood ratio (LLR) of u i , defined as The decision LLRs in (2) can be efficiently calculated through a polar encoding graph. Recursive formulas for computing them are available in [20]. In the SC decoder, an estimation error inû i results from channel noise or error propagation due to the previous estimation errors. It was reported in [5], [15] that the errors induced by channel noise are mainly concentrated within three bits, and even within two bits when the SNR increases. This tendency is observed over a wide range of rates.
The SCFlip decoder and its variants are a class of decoders which perform SC decoding and bit flipping. They aim to correct channel-induced errors, and eliminate the errors caused by error propagation in turn. Typically, they perform standard SC decoding, possibly followed by at most T extra decoding attempts until the CRC constraints are satisfied. In the SCFlip decoder [5], the absolute values of the decision LLRs obtained by the initial SC decoding are used for selecting a set of T unfrozen bit indices to be flipped. Each extra decoding attempt is done with flipping only one bit in this set.
The D-SCFlip decoder, proposed by Chandesris et al. [17], is a generalized SCFlip decoder which allows multi-bit flipping. Here, the notion of a bit-flip of order ω was introduced, It represents a set of ω indices, E = {i 1 , . . . , i ω } ⊂ I such that i 1 < · · · < i ω , to be flipped. The new metric associated with a bit-flip E = {i 1 , . . . , i ω } of order ω, defined by was also proposed, where α is a perturbation parameter. VOLUME 8, 2020 The overall procedure of D-SCFlip decoding is summarized as follows. Whenever the initial standard SC decoding fails, the decoder builds a list of T candidate bit-flips, L flip = {E 1 , . . . , E T }, referred to as a bit-flip list, based on the metrics defined in (3). At the initial step, each bit-flip consists of only one unfrozen bit index, i.e., |E t | = ω t = 1, ∀t. The metric set corresponding to L flip is denoted by M flip = {M α (E 1 ), . . . , M α (E T )}. If the t-th decoding attempt with a bit-flip E t = {i 1 , . . . , i ω t } still fails and ω t is less than a predetermined maximum bit-flip order ω, the decoder computes M α (E) for all possible bit-flips E = E t ∪ {i} such that i ∈ I and i > i ω t , and updates both L flip and M flip by selecting more likely bit-flips.

III. DESIGN OF DISTRIBUTED CRC CODES FOR EARLY ERROR DETECTION
CRC bits are commonly placed at the end of the information block before polar encoding, but can be arranged within the information block by properly employing an interleaver [9]. This kind of CRC bits will be referred to as 'distributed CRC bits' in this article. Since they can be used for both early detection of decision errors and early termination of decoding, the decoding latency can be reduced. In this section, we consider how to design a distributed CRC code minimizing the decoding latency. More specifically, we propose an algorithm to construct its parity-check matrix (PCM) maximizing the latency-reduction effect for a given CRC generator polynomial. We also evaluate the early-error-detection capability of the designed code.

A. DESIGN PROBLEM OF A DISTRIBUTED CRC CODE
Given an M -bit CRC generator polynomial g(x), the systematic generator matrix and PCM of a [K + M , K ] CRC code C can be written as respectively, where I K is the K × K identity matrix and P is the K ×M matrix determined by g(x). The non-zero entries in each column of P represent the information bits participating in the parity-check equation (PCE) of the corresponding CRC bit. Each row of H corresponds to the PCE associated with one single CRC bit. Note that C has a PCM of a cyclic form, denoted by H . Applying column/row permutations to G [9], [10] or H [11], it is possible to get an equivalent CRC code whose parity bits are distributed such that their indices are higher than all the associated information bit indices. This property makes this CRC code have an early-error-detection capability. Since there are many equivalent CRC codes, it is very valuable to construct a CRC code maximizing an early-error-detection capability.
In order to do so, we first make a problem formulation. Consider a PCM H new of a [K +M , K ] distributed CRC code. Let h j be the index of the last nonzero component of the j-th row in H new . We assume that h 1 < h 2 < . . . < h M = K + M , that is, H new is of the staircase form. Intuitively, lowering h j for all j can be expected to improve the early-error-detection capability. Therefore, our problem to design a good CRC code is to construct H new with h j as low as possible for all j. As a measure of early-error-detection capability, the last index sum given by h(H new ) = M j=1 h j can be considered.

B. PROPOSED DESIGN ALGORITHM
To solve our problem, we start with a PCM H such as the PCM in (4) and apply a number of elementary row operations and column permutations to it. The main difference between our method and conventional methods in [9]- [11] is whether the addition of one row to another row is allowed or not. This makes high the possibility that h j of a new PCM H new is smaller for all j.
The proposed design algorithm can be inductively described as follows. In order to determine the first row of H new , we randomly generate linear combinations of the rows of H in (4). More specifically, L nonsingular matrices U 1 of size M × M are randomly generated and are multiplied by H. We then choose U * 1 such that the minimum row weight of U 1 H is minimized over L nonsingular matrices. As a next step, we permute the rows of U * 1 H such that the row of minimum weight is placed on the first row and then permute the columns such that the nonzero components of the first row are located to the left. The resultant matrix is denoted by H (1) . Now we assume that H (j−1) is fixed, that is, the first j − 1 rows of H new have already been determined. The next step is to determine the j-th row of H new . Note that h j−1 is the column index of the rightmost nonzero component in the (j−1)-th row of H (j−1) . Similarly to the previous steps, we again randomly generate L nonsingular matrices U j of size M × M , given by where V is an (M −j + 1) × (M −j + 1) nonsingular matrix. We then choose U * j such that the minimum nonzero row weight of U j H (j−1)

formed by the columns with the indices in
such that the first j − 1 rows are fixed and the newly obtained row is placed on the j-th row, and then permute the columns such that the nonzero components within the last part with K +M −h j−1 components in the j-th row are located to the left of this part. The resultant matrix is denoted by H (j) . We repeat this step until j = M and get H new .
The procedure of the proposed algorithm is summarized in Algorithm 1. In order to get an optimal matrix H new , an exhaustive search needs ( trials. This number of trials increases too fast as M increases. For this reason, L is introduced as the parameter to control the tradeoff between complexity and goodness in our algorithm. Algorithm for s = 1, . . . , L do 6: Randomly generate a M × M nonsingular matrix U j in (5). 7 Swap the j-th row and the k * -th row of H (j) . 20: Perform proper column permutations to obtain a PCM H (j) in staircase form. 21: h j ← h j−1 + ; 22: end for 23 better performance with respect to the early-error-detection capability.
As a design example, consider the [30 + 10, 30] CRC code with generator polynomial g(x) = x 10 + x 9 + x 5 + x 4 + x + 1. The PCM H of the standard form, the PCM obtained by swapping the rows and columns of H, and the PCM H new obtained by applying Algorithm 1 to H are graphically shown in Fig. 1. The indices of the distributed CRC bits obtained from H new are lower than those from the matrix in Fig. 1 (b). Therefore, a good distributed CRC code to detect decision errors earlier can be efficiently constructed by Algorithm 1.
Before verifying the validity of Algorithm 1, we first discuss some theoretical backgrounds on the generalized Hamming weights (GHWs) of a code [21]- [23] and their relation to {h 1 , . . . , h M }. Given a [K +M , K ] CRC code C with a PCM H, let C ⊥ be the dual code of C, given by Clearly, H is a generator matrix for C ⊥ . For 1 ≤ j ≤ M , the j-th GHW of C ⊥ is defined as the minimum support size of j-dimensional subcodes of C ⊥ , that is, where In particular, However, it is generally a very difficult problem to completely determine the GHWs of a code, although the GHWs of several special codes are completely known [21]- [23].   For other values of j, we haved j ≥ d j , but the difference between them is made as small as possible by an extensive search.
In order to verify the validity of Algorithm 1, we show the differences between E(h j ) andd j for various values of L in Fig. 2. Here, E(h j ) is the expected value of h j taken over the PCMs designed by Algorithm 1, which depends on the parameter L controlling the tradeoff between complexity and goodness in the algorithm. The cumulative sum of these differences decreases as L increases. When this sum gets smaller, the early-error-detection capability of the designed PCM for the distributed CRC code becomes better on the average. The parameter L is properly chosen so that the sum converges sufficiently, say L = 10 2 for M = 10 and L = 10 4 for M = 16, as shown in Fig. 3.

C. EARLY-ERROR-DETECTION CAPABILITY
When a distributed CRC code is concatenated with a polar code, its PCM may be designed by Algorithm 1. Based on it, the decoder for this distributed CRC-polar code can be aware of the presence of decision errors before estimating all the unfrozen bits. It is worthy of noting that the positions of the first few distributed CRC bits may be more crucial to the effect of early error detection.
The distributed CRC codes designed by the proposed algorithm are compared with those constructed by the methods in [10] and [11], when their generator polynomials are given by g 1 (x) = x 10 + x 9 + x 5 + x 4 + x + 1 for CRC-10 and g 2 (x) = x 16 +x 12 +x 5 +1 for CRC-16. These comparisons are shown in Fig. 4 in terms of the early-error-detection capability. Here, the x-axis represents the information length of the considered CRC code, while the y-axis represents the ratio of the number of information bits appearing before the j-th CRC bit to the information length, where j ∈ {1, 2, 3}. Numerical results show that the CRC codes designed by Algorithm 1 have higher early-error-detection capability than those constructed by the conventional methods [10], [11] in both the CRC-10 and the CRC-16 cases. From a practical viewpoint, the early termination gain of a decoder for a CRC-polar code will be defined and extensively discussed in Section V.

IV. ERROR-AWARE SCFlip DECODER
In this section, we propose a novel SCFlip decoder, called an EA-SCFlip decoder, for distributed CRC-polar codes. Based on the distributed CRC bits, it employs early termination at each extra decoding attempt so that it can reduce the average decoding complexity and latency. It also reduces the search space of candidate bit-flips in the dynamic building of the bit-flip list by exploiting the PCR of the error-detected CRC bit.

A. UTILIZATION OF DISTRIBUTED CRC BITS
In conventional SCFlip decoding of a CRC-polar code, each extra decoding attempt is performed if the CRC test fails. As the CRC bits are appended at the end of the information block, all the decision LLRs and the corresponding estimates need to be computed before the CRC test. This causes a decoding complexity and latency problem in a low-to-middle SNR region, together with the sequential nature of SCFlip decoding.
To mitigate this problem, the proposed EA-SCFlip decoder for distributed CRC-polar codes reduces the search space of candidate bit-flips in the dynamic building of the bit-flip list, in addition to early termination. This is done by exploiting the PCR of the first error-detected CRC bit at each extra decoding attempt. For a given information set I {i 1 , . . . , i K +M } ⊂ {1, 2, . . . , N } with i 1 < . . . < i K +M , a polar codeword is obtained by the encoding process in (1), where the CRC bits are assigned to the components of u, whose indices are in I. For 1 ≤ j ≤ M , we denote by P j the j-th PCR set corresponding to the j-th CRC bit, defined by where i k is the k-th information bit index in I and h j,k is the (j, k)-entry of the PCM for the employed CRC code. In addition, we let r j be the j-th CRC bit index in I, given by The structure of a distributed CRC code is shown in Fig. 5. Assume that a decision error is detected by the j-th CRC bit for the first time during the t-th extra decoding attempt with a bit-flip E t = {i 1 , . . . , i ω t } of order ω t . Then, we conclude that an odd number of decision errors have occurred among the unfrozen bits with the indices in P j , and need to flip at least one of them to make the j-th PCE satisfied. Therefore, it suffices to search for candidate bit-flips over the reduced set given by instead of Conventional flipping-set reduction techniques including the method in [12] are usually based on the selection of a subset of the information set for a polar code. Note that this subset is predetermined off-line and is chosen to include almost all the error-prone unfrozen bit indices so as to avoid a performance loss. On the other hand, the proposed searchspace reduction in (6) is adaptively applied on the basis of the distributed CRC bits so that the set of candidate bit-flips is dependent on the received signal vector. Since it excludes only the candidate bit-flips that would definitely induce a decoding failure, it has no performance loss in the process of updating the bit-flip list.

B. PROPOSED DECODING ALGORITHM
A high-level description of the proposed EA-SCFlip decoder is shown in Algorithm 2. Based on the structure of a distributed CRC code, the main operations of the EA-SCFlip decoder are described as follows: • Standard SC decoding • SC flip decoding with early termination (Algorithm 3) • Update of the bit-flip list over a reduced search space (Algorithm 4). return {û i } i∈I ; 7: else 8: Initial(L flip , M flip , {L i } i∈I ); 9: for t = 1, . . . , T do 10:

Algorithm 2 A High-Level Description of the EA-SCFlip
if s M 1 = 0 M then 12: return {û i } i∈I ; 13: else if |E t | < ω then 14: l ← MinIndex(s M 1 ); 15: Update(I, L flip , M flip , {L i } i∈I , E t , l); 16: end if 17: end for 18: end if The EA-SCFlip decoder first do standard SC decoding. When standard SC decoding fails, it builds a list of T candidate bitflips, and then performs extra SC flip decoding with early termination at most T times until the all the PCEs are satisfied. Like the D-SCFlip decoder in [17], it updates the bit-flip list by evaluating candidate bit-flips, whenever it fails.
The proposed decoder is similar to the D-SCFlip decoder, except for the CRC test and the utilization of the information on the PCR sets of the error-detected CRC bits. In Algorithm 2, we denote by '⊕' the modulo-2 addition operation. The function Initial(·) in Line 8 computes the initial bitflip list L flip and the corresponding metric set M flip for the following decoding attempts. The function MinIndex(·) in Line 14 outputs the minimum index among the indices of the CRC bits not satisfying their corresponding PCEs.
The function ET_SCFlip(·) as a subroutine of Algorithm 2 is described in Algorithm 3, which runs a SCFlip decoding attempt with early termination. Here, the function h E (i, L) is given by The current decoding attempt is terminated as soon as the decoder checks the failure of a PCE. Finally, Algorithm 3 returns estimated unfrozen bits {û i } i∈I , decision LLRs {L i } i∈I and syndromes s M 1 as its output. For given y N 1 andû i−1 1 , calculate L i in a recursive manner. 4:û i ← h E (i, L i ); 5: if i = r j then 6: s j ← ⊕ end if 13: end for 14: The function Update(·) as a subroutine of Algorithm 2 is presented in Algorithm 4, which updates the bit-flip list and the corresponding metric set by expanding the just failed bit-flip. The candidate bit-flips in (6), together with the unchecked bit-flips in the bit-flip list, are compared and only a part of them comprises the newly updated list. Since the proposed EA-SCFlip decoder has a smaller search space for candidate bit-flips than the D-SCFlip decoder, it has lower computational complexity in the update of a bit-flip list. A more detailed comparison will be made in Section V.

V. NUMERICAL RESULTS
The early termination gain and the computational complexity of the proposed coding scheme over the binary-input additive white Gaussian noise (BI-AWGN) channel are numerically discussed in this section. For this purpose, we consider polar codes concatenated with a distributed CRC code with the generator polynomial g(x) = x 10 +x 9 +x 5 +x 4 +x +1. Here, two types of polar codes are considered: one is an Arikan's polar code (APC) and the other is a convolutional polar code (CPC) regraded as a natural extension of APCs [24]- [27]. The information set for the APC is designed according to the density evolution [28] at a specific SNR, while that for the CPC is chosen by using the Monte Carlo method at the same SNR. For example, the design SNR is set to 2.5 dB for the codes of rate 1/2 and length 512.

A. EVALUATION OF THE EA-SCFlip DECODER
Given a bit-flip at a decoding attempt, the proposed EA-SCFlip decoder and the D-SCFlip decoder for a distributed CRC-polar code output the same result (a success or a failure). When the output does not satisfy the CRC constraints, these two decoders update the bit-flip list by taking more likely bit-flips in the search space and inserting them into the bit-flip list. If newly inserted bit-flips for the D-SCFlip decoder are not included in the reduced search space given by (6), the corresponding decoding attempts are definitely unnecessary since they would fail eventually. Hence, when the maximum number T of extra decoding attempts is limited, the EA-SCFlip decoder performs better than the D-SCFlip decoder, although the frame error rate (FER) performance gain is negligible.
In Fig. 6, the FER performance of the proposed EA-SCFlip decoder for both the APC and the CPC of rate 1/2 and length 512 is shown with respect to T . Here, the bit-flip order ω is chosen as 1, 2 or 3, and the perturbation parameter α is set to 0.4 for the APC and 0.3 for the CPC. The CPC has better performance than the APC, when we fix ω and T . Furthermore, for each ω, their FER performances are saturated as T increases. Based on this observation, we choose T = 8 for ω = 1, T = 64 for ω = 2, and T = 256 for ω = 3 for these codes.
The average number of decoding attempts for the EA-SCFlip decoder, denoted by T avg , is shown in Fig. 7. Note that T avg ≥ 1, since the initial decoding attempt is counted. The CPC has smaller T avg than the APC in a low SNR region,  say, for SNR ≤ 2.0 dB. As the SNR increases, T avg for both codes converges to one, which corresponds to the initial decoding attempt by the standard SC decoder. In particular, T avg increases when the SNR decreases or T gets large. In this case, we are faced with the computational complexity and latency problem.
The relative frequency of the first error detection at a CRC bit is given in Fig. 8, when the SNR is set to 1.5 dB. Here, 10 3 frame errors are detected in total by the Monte Carlo simulation. We observe that most of them can be detected by the first few CRC bits, regardless of the choice of ω and T . This suggests to employ early error detection whose effect will be in the following subsection.

B. EARLY TERMINATION GAIN
The proposed EA-SCFlip decoder terminates each extra decoding attempt whenever a decision error is detected by any VOLUME 8, 2020 one distributed CRC bit, while conventional SCFlip decoders complete every decoding attempt until all the unfrozen bits are estimated. To evaluate the effect of early termination, we define the early termination gain η N u /N t as a complexity measure. Here, N t is the number of estimated unfrozen bits at one extra decoding attempt of a conventional SCFlip decoder and N u is the average number of unfrozen bits which are not estimated due to early termination. Note that N t = K + M and the average is taken over all extra decoding attempts.
The early termination gain of the proposed EA-SCFlip decoder for the APC is shown in Fig. 9 (a), where distributed CRC codes are constructed by Algorithm 1 and the methods in [10], [11], respectively. Numerical results demonstrate that the proposed EA-SCFlip decoder terminates unnecessary decoding attempts very early, especially at a low SNR region. As expected in Subsection III.C, it achieves a larger gain when combined with the distributed CRC code designed by  Algorithm 1. The early termination gain for the CPC has a similar tendency, as shown in Fig. 9 (b). The gain is almost 0.4 at a low SNR region, when ω = 3 and the distributed CRC code designed by Algorithm 1 is employed.
In order to further evaluate the early termination gain of the proposed EA-SCFlip decoder for other APCs and CPCs, we consider rate-1/3 APCs and CPCs of length 256, 512, and 1024. They are concatenated with the distributed CRC codes with 10 parity bits, constructed by Algorithm 1. The EA-SCFlip decoder has ω = 3 as the bit-flip order, while it has α = 0.4 for the APCs and α = 0.3 for the CPCs as the perturbation parameter. The maximum number of extra decoding attempts is set to 64, 128, and 256 for the code lengths 256, 512, and 1024, respectively. The FER performance and the early termination gain of the EA-SCFlip decoder for these codes are shown in Fig. 10 and Fig. 11, respectively. The EA-SCFlip decoder achieves a large early termination gain at a low-to-middle SNR region, similarly to that for half-rate codes. It is also noteworthy that a shorter  distributed CRC-polar code has a larger gain. This is because a distributed CRC code with a shorter information length has a higher early-error-detection capability, as suggested in Fig. 4.

C. COMPUTATIONAL COMPLEXITY
Like the D-SCFlip decoder, the proposed EA-SCFlip decoder requires a number of SCFlip decoding attempts and dynamically builds of a bit-flip list. The average number of decoding attempts for the latter is almost the same as that for the former. For each extra decoding attempt, the latter reduces approximately η of the amount of required computations for the former by employing early termination. This is because the former estimates all the unfrozen bits at every decoding attempt.
Whenever the current decoding attempt fails and the corresponding bit-flip order is less than the maximum bit-flip order ω, the D-SCFlip decoder updates the bit-flip list by  searching for all the unfrozen bits with indices higher than the maximum index belonging to the just failed bit-flip. On the other hand, the proposed EA-SCFlip decoder investigates all the unfrozen bits whose indices are higher than the maximum index in the currently failed bit-flip and are less than or equal to the index of the just error-detected CRC bit.
The average number of unfrozen bits to be considered in the update of the bit-flip list for these two decoders is shown in Fig. 12. Numerical results demonstrate that the proposed EA-SCFlip decoder has about 40 % less unfrozen bits than the D-SCFlip decoder. This leads to approximately 40 % reduction in the complexity of updating the bit-flip list over a wide range of SNRs, compared with the D-SCFlip decoder, as shown in Fig. 13.
A similar phenomenon is observed for other APCs and CPCs. As an example, Fig. 14 shows the average number of unfrozen bits to be considered in the update of the bit-flip list for the rate-1/3 APCs and CPCs of length 256, 512, and 1024, considered in Subsection V.B. It is shown in Fig. 15 that the EA-SCFlip decoder has about 40 % reduction in the complexity of updating the bit-flip list for the codes of length 1024 over a wide range of SNRs, compared with the D-SCFlip decoder, while it has about 50 % reduction for the codes of length 256. In summary, the EA-SCFlip decoder is very effective in terms of complexity as well as latency, especially for distributed CRC-polar codes of short length.

VI. CONCLUSION
The EA-SCFlip decoder for a distributed CRC-polar code was proposed to mitigate the decoding complexity and latency problem in conventional SCFlip decoders. A greedy algorithm to design a distributed CRC code was also proposed to further maximize the effect of early error detection in the EA-SCFlip decoder. Numerical results show that the early termination gain of the EA-SCFlip decoder over the D-SCFlip decoder is significant when an APC or a CPC is concatenated with the distributed CRC code designed by the proposed algorithm. As a further research, it is an interesting subject to develop a design method to systematically construct a single distributed CRC code supporting various parameters.