BER Evaluation Based SCFlip Algorithm for Polar Codes Decoding

Successive cancellation (SC) decoding of polar codes may bring about error propagation that needs to be mitigated. In this paper, we present a new SC Flipping (SCFlip) decoder, named bit error rate (BER) evaluation based SCFlip (BER-SCFlip), which can accurately target the first error bit and correct it with a high probability. Thus, a high error correction capability and a low decoding complexity can be achieved. First, we propose a new criterion to find out the most suspicious error bit. Those non-frozen bits that have higher decoding BERs derived from log-likelihood-ratios (LLRs) after SC decoding than the corresponding expected ones estimated via Gaussian Approximation (GA), are collected into the flip-bits set. These candidate bits will be flipped one by one according to their SC decoding orderings in extra decoding attempts until the decoded codeword passes cyclic redundancy check (CRC) or a predetermined maximum number of extra attempts is reached. We then propose an extended version of BER-SCFlip, named BER-SCFlip-<inline-formula> <tex-math notation="LaTeX">$\omega $ </tex-math></inline-formula> with the capability to correct up to <inline-formula> <tex-math notation="LaTeX">$\omega $ </tex-math></inline-formula> error bits in each extra decoding attempt. By combining our criterion for the flip-bits selection with that of Dynamic SCFlip (D-SCFlip), the proposed BER-SCFlip-<inline-formula> <tex-math notation="LaTeX">$\omega $ </tex-math></inline-formula> significantly reduces decoding complexity and latency while maintaining the superior error-correction performance close to that of D-SCFlip-<inline-formula> <tex-math notation="LaTeX">$\omega $ </tex-math></inline-formula>. The simulation results show that the proposed schemes are competitive among existing SCFlip algorithms and could achieve the error-correction performance approaching that of CRC-aided SCL decoding under list size <inline-formula> <tex-math notation="LaTeX">$L = 16$ </tex-math></inline-formula> while maintaining low complexity.


I. INTRODUCTION
Polar codes are the first family of channel codes that are proved theoretically to achieve the capacity of binaryinput discrete memory-less channels (B-DMCs) under a low-complexity successive cancellation (SC) decoding when code lengths go to infinity [1]. However, for polar codes of short and moderate lengths, the error-correction performance under SC decoding is not satisfactory. Two factors may contribute to the weakness of finite-length polar codes: partial polarization and sub-optimality of SC decoding. Accordingly, two categories of approaches were proposed in the literature. One employed modified kernels in the code construction to speed up the polarization rate [2]- [5]. The other focused on enhancing the performance of SC decoding, e.g., the SC list (SCL) decoder proposed in [6], [7] to significantly improve the error-correction performance of polar codes at finite lengths. To further enhance the performance of The associate editor coordinating the review of this manuscript and approving it for publication was Mingjun Dai . SCL decoder, the authors in [6], [8] proposed a CRC-polar coding scheme, where outer CRC codes were concatenated with inner polar codes to help identify the correct path from the list of decoding paths. The CRC-polar codes under SCL decoding not only compete with LDPC and Turbo codes in terms of the error-correction performance, but also have a lower encoding and decoding complexity thanks to their recursive operations [8]. Thus, CRC-polar codes have been chosen by 3GPP as error correction codes in the control channel for the 5th generation of mobile communication standards (5G) [9]. Since the invention of CRC-polar codes and their SCL decoding, many researches have been conducted to further improve the error correction performance [10], [11], or to reduce the decoding complexity, latency and storage cost [12]- [16].
In addition to the SCL decoder, an SC Flipping (SCFlip) decoder was proposed as an alternative to improve the performance of SC decoding [17]. By trying to identify the first error bit based on the magnitudes of log-likelihoodratio (LLR) values in SC decoding and flip it in extra decoding attempts, SCFlip decoder could achieve a better error correction performance than SC decoder at a cost of a minor increase in the decoding complexity. However, it could not compete with SCL decoding under large list sizes in the low SNR regime, for its inability to accurately target the error bits solely based on the LLRs.
To boost the performance of conventional SCFlip decoders, many approaches have been reported in the literature, either to reduce the computational complexity or to enhance the error-correction performance. To reduce the complexity, a fast SCFlip decoder was proposed in [18] with a narrowed critical set for flipping (flip-bits set) such that the efficiency of flipping and re-decoding procedures could be improved and its hardware implementation was simplified. The authors in [19] further reduced the search zone in constructing the critical set by exploiting the code tree structure. Moreover, by adjusting the critical set progressively during the decoding, the proposed Progressive SCFlip could rival SCL decoding with a lower average complexity. It was observed from simulations in [20] that the first error was distributed non-uniformly, and an adaptive threshold was employed to screen possible flipping candidates. Thus, a good trade-off between performance and complexity was achieved. The same authors proposed a partitioned SCFlip decoder to detect and correct more errors in SC decoding [21]. By dividing a codeword into many subcodes and protecting them with outer CRC codes, the partitioned SCFlip decoder could achieve a better performance with a low average complexity. The authors in [22] combined the bit flipping with the high-performance SCL decoding such that a performance gain was achieved at a cost of increased complexity. Another SCL-based flipping (SCLFlip) decoding algorithm was also proposed in [23]. With a superior bit-flip criterion, the SCLFlip decoder in [23] could not only outperform existing SCLFlip decoders but also achieve a better performance than SCL decoder while maintaining a similar complexity. In [24] and [25], Ludovic et al. presented a Dynamic SCFlip (D-SCFlip) decoder where a novel metric was designed to select the possible erroneous non-frozen bits (flip-bits) by taking into account the sequential nature of SC decoder. By dynamically updating the flip-bits set and flipping one or multiple error bits per decoding attempt, D-SCFlip could achieve a good error-correcting performance close to that of CRC-aided SCL decoder with the list size L = 16.
The key to successful SCFlip decoding is to accurately pinpoint error bits of SC decoding for the extra decoding attempts. The existing criteria to choose the flip-bits are mainly based on the reliability comparison among different non-frozen bits, e.g., using LLRs or other metrics. Considering that the reliability of polar codes may vary from bit to bit due to polarization, it is more reasonable to take into account the inherited reliability difference among non-frozen bits in the selection of flipping bits.
Based on this analysis, we propose a new criterion for finding out possible erroneous bits in SC decoding. For the AWGN channel, we first calculate the bit error rate (BER) expectations for all non-frozen bits via Gaussian Approximation (GA) algorithm and then compute the estimated BERs from LLRs after SC decoding. Those non-frozen bits, whose estimated BERs after SC decoding are higher than their corresponding BER expectations, are chosen as suspicious bits and their subchannel indexes are collected as the flip-bits set. The earlier an error occurs, the more serious harm it will impose on the following decoding. Thus, we sort the indexes according to their orderings in the standard SC decoding. Based on this criterion, we propose a BER evaluation based SCFlip (BER-SCFlip) decoder that can achieve a good error-correction performance. Compared to the original SCFlip using absolute LLR values for the flip-bits selection, the scope of the flip-bits can be narrowed in our proposed BER-SCFlip such that the decoding complexity is reduced. To enhance the performance of BER-SCFlip, an extended version, named BER-SCFlip-ω, is proposed with a capability to correct up to ω errors for each extra decoding attempt. In BER-SCFlip-ω, the proposed BER evaluation criterion is combined with that of D-SCFlip in [25]. This combined strategy can significantly narrow the scope of candidate bits such that the complexity can be efficiently reduced.
To evaluate the performance of our proposed schemes, we derive lower and upper bounds of the word error rate (WER) performance. The lower bound of the WER performance is calculated using Oracle-assisted SC (OASC) decoder proposed in [17], which assumed the error bits are known by the receiver and thus enabled the accurate bit flipping. Under the CRC error-detection, the upper bound of the WER performance is also derived. Additionally, the influence of the CRC length and the maximum number of new decoding attempts on the WER performance is analyzed to optimize the design of our proposed scheme.
The rest of the paper is organized as follows. Section II introduces the notation and the basic principle of polar codes. Their SC and SCFlip decodings are also described in this section. The proposed BER-SCFlip decoder is presented in Section III. In Section IV, BER-SCFlip-ω decoder is proposed and its upper and lower bounds of the WER performance are analyzed in Section V. Simulation results are provided in Section VI and finally, conclusions are drawn in Section VII.

II. PRELIMINARY
A. POLAR CODES Polar codes are derived from channel polarization, which is realized by performing channel combining and splitting operations on N = 2 n independent and identical B-DMC channels [1]. Those k polarized subchannels A ⊂ {1, 2, · · · , N } with higher reliability are assigned to transmit information bits. The remaining subchannels A c are frozen to transmit a fixed value, e.g., 0, known to both the encoder and the decoder. The reliability of subchannels can be estimated by several techniques, such as Bhattacharya Parameter [1], density evolution [26], GA of density evolution [27], [28], efficient degrading and upgrading methods [29].
The encoded codeword, denoted by X, is obtained by X = UG N , where G N is a generator matrix and U is a data vector. The data vector U consists of k information bits on subchannels A and N − k frozen bits on subchannels A c . The codeword X is then modulated and sent through the channel. The data vector Y is received at the decoder as an input to estimate the transmitted dataû.

B. SC AND SCFLIP DECODING
In SC decoding, a pair of probabilities P(u i = 0|û i−1 1 ,Y) and denotes the sequence (û 1 ,û 2 ,û 3 , . . . ,û i−1 ). For each bit u i , its estimateû i depends on both Y and the previous estimateŝ u i−1 1 and can be obtained by hard decisions aŝ where L i N is the LLR of decoded symbol i computed as ).
To deal with the error propagation problem, SCFlip decoder was proposed in [17] to find out the suspicious channel-incurred error bits and flip them in extra decoding attempts, if the standard SC decoding failed. Each new decoding attempt performed the similar SC decoding except flipping one of the candidate bits, which had the lowest LLR absolute values. The computational complexity of SCFlip increases slightly compared with SC decoder in the medium and high SNR regimes and its error-correction performance could compete with CRC-aided SCL with L = 2. Following the line of SCFlip decoder proposed in [17], many researches have been reported in the literature for improving the error-correction performance or reducing the computational complexity [18]- [25].

III. BER-SCFLIP DECODER
It is essential for SCFlip decoder to find out the first error bit of SC decoder and correct it in the following decoding attempt. In this section, we present our proposed SCFlip decoder, BER-SCFlip decoder, with our designed method for the flip-bits selection.

A. PROPOSED CRITERION FOR A FLIP-BITS SELECTION
The flip-bits set denotes the set of unreliable bits that will be flipped in the extra decoding attempts. Instead of selecting flipping candidates based on a metric comparison among different non-frozen bits as in existing approaches, we perform this selection by comparing the reliability of non-frozen bits in SC decoding with their own reliability expectations. The indexes of those non-frozen bits whose estimated BERs derived from LLRs in SC decoding are greater than their BER expectations will be placed into the flip-bits set. The indexes collected in the set are then sorted by their SC decoding orderings, for the error occurring at an earlier stage of decoding has a more detrimental impact on SC decoding.
GA is a low-complexity algorithm and has found applications in coding and information theory [27], [28]. We employ GA to calculate the BER expectations for non-frozen bits here. Assume that an all-zero codeword is transmitted using BPSK modulation over an AWGN channel. The channel SNR can be estimated by SNR estimation schemes at the receiver [30], [31]. We also assume that the Gaussian noise in the AWGN channel has zero mean and variance σ 2 , which can be derived from the estimated channel SNR. If the average coded symbol energy is normalized to unity, the noise variance σ 2 can be obtained based on the channel SNR γ dB as σ 2 = 1 2R·10 γ /10 , where R is the code rate. After the encoded codeword x is transmitted through the AWGN channel with mean zero and variance σ 2 , the corresponding LLR value of received y conditioned on transmitted x is For the AWGN channel, the received LLR message L i N can be approximated as a Gaussian variable with mean µ y = 2 σ 2 and variance δ y 2 = 4 σ 2 =2µ y . According to [27], [28], the expectation of LLR, E[L i N ], can be updated as where Hence, we could obtain the BER expectation via GA as [27], [28] where erfc is the complementary error function defined as Meanwhile, the BER of non-frozen bit u i in SC decoding can be derived from the decoded LLR L i N as [12] Note that P E (i) and P SC (i) are quite different. P E (i) is the metric for measuring the reliability of non-frozen bit u i under AWGN channel while P SC (i) depicts the reliability of non-frozen bit u i under SC decoding. If P SC (i) > P E (i), we deduce that an error may occur in SC decoding of u i with a high probability. Our later simulation results also validate this hypothesis. According to this analysis, we construct the flip-bits set as ς = {i ∈ A|P SC (i) > P E (i)}. For example, Fig. 1 shows the distributions of a flip-bits set ς for polar code (1024, 512) concatenated with a 16-bit outer CRC under SC decoding. The blue line denotes P E of all non-frozen bits and red dots denote P SC of bits whose indexes are selected in the flip-bits set. It is observed that the first error bit marked with a red star falls into the flip-bits set and the capacity of the set decreases when SNR increases.
To further validate the efficiency of the proposed criterion for the flip-bits selection, the frequency of error occurrences for non-frozen bits obtained via Monte-Carlo simulations is compared with their probabilities of being selected in the flip-bits set. For our proposed scheme, the probability of a bit to be selected in our flip-bits set is The simulation results in AWGN channel are depicted in Fig. 2, where (1024, 512+16) and (2048,1024+16) CRC-polar codes under SNR γ = 1dB are decoded by SC decoding. The number of samples is 10 5 . It is observed that in Fig. 2, the probability P(i ∈ ς ) has a similar trend with the frequency of channels-induced errors, i.e., the index of the bit with a higher frequency of error occurrences will have a higher probability to be chosen into the flip-bits set. Similar results can be observed for simulations under other SNR values. Note that for the horizontal axis, non-frozen bits are resorted according to their frequency of channels-induced errors in a descending manner and part of bits without any error occurrence is omitted for the limitation of space.
The proposed BER-SCFlip decoder performs the same decoding procedure as the conventional SCFlip presented in [17] except that a different criterion is employed for selection and sorting of the flip-bits. The BER-SCFlip decoder first performs SC decoding. If the decoded codeword passes the CRC check, output the decoded result and continue for the next received codeword. Otherwise, BER-SCFlip selects the flip-bits and sorts them by their SC decoding orderings. New decoding attempts proceed and each decoding attempt is performed with one bit flipped starting from the first element in the flip-bits set until the CRC check is satisfied or the maximum number of attempts is reached.
The detail description of the algorithm is illustrated in Algorithm 1. The function GA_est first provides the BER expectations P E via GA algorithm according to channel SNR γ dB and function List_Con then generates the flip-bits set according to comparison results between P SC and P E . The elements in the set are sorted by the SC decoding ordering. The function SCFdecoding denotes SC decoding with flipping bit ς (i). To constrain the scope of the flip-bits and avoid unnecessary iterations, we set the maximum number of new decoding attempts as T , which is investigated in the later section.

IV. BER-SCFLIP-ω DECODING A. ALGORITHM DESCRIPTION
The BER-SCFlip decoder presented in the previous section has a limited error-correction capability, for only one suspicious bit is flipped in each decoding attempt. To remove this limitation, we extend the BER-SCFlip decoder to a generalized scheme (BER-SCFlip-ω) with the capability to correct up to ω error bits per extra decoding attempt. In BER-SCFlip-ω, our proposed BER comparison criterion is combined with that of D-SCFlip presented in [25] to construct and update the flip-bits set. BER-SCFlip-ω could reduce the searching scope of the flip-bits and allow for flipping more than one bits per extra attempt. Thus, it could achieve a good performance close to that of D-SCFlip-ω and SCL decoding with reduced decoding latency and complexity.
In BER-SCFlip-ω, the flip-bits set ξ = {ε 1 , ε 2 , · · ·} is generalized to include one or multiple indexes of the flip-bits for each element ε k as in D-SCFlip-ω [25] that will be flipped in extra attempts, e.g., the k-th element ε k = {i 1 , . . . , i ω k } contains ω k indexes of the flip-bits. In the D-SCFlip-ω [25], a novel metric M (ε k ) to measure the decoding reliability if bits in ε k were flipped is proposed as where α is a coefficient adjusting the weight between decoding orderings and LLR values. Note that (9) is derived from (15) in [25] by taking −ln(.) operation. Different from D-SCFlip-ω, we employ a selection criterion for the flip-bits that combines (9) with our proposed BER evaluation condition. Based on this criterion, we develop BER-SCFlip-ω algorithm described in Algorithm 2. The description is similar to D-SCFlip-ω algorithm, except for the function Init() and the function Update() used to construct and update the flip-bits set ξ , respectively. The functions in Algorithm 2 are described as follows.
-SCdecoding: A standard SC decoding. If the CRC check is satisfied, output decoded bits u N 1 and continue decoding for the next codeword. Otherwise, extra decoding attempts proceed with flipping bits.
The metric M E (i) denotes the estimated reliability of bit i via GA algorithm. Meanwhile, the metric M SC (i) denotes the decoding reliability after SC decoding if bit i was flipped. Note that (11) is simplified from ( to generate an initialized set ξ 1 = {ε 1 , ε 2 , . . . , ε } = {i 1 , i 2 , · · · , i }, where = min(|ξ 0 | , T ). Each element in ξ 1 contains an index of a bit, i.e., ε k = {i k }. The corresponding metric values are collected into a set The t-th extra SC decoding that performs the same operation as the conventional SC decoding except those decoded bits with indexes in the set ε t are flipped. If the CRC check is satisfied, output the decoded codeword u N 1 and continue the decoding of the next codeword. Otherwise, update the flip-bits set ξ t to ξ t+1 and the corresponding metrics set ψ t to ψ t+1 . Then continue the following decoding attempt SC (ε t+1 ) .
update(ξ t , ψ t , P E ): The update function, described in Algorithm 4, calculates M (ε ) using the LLR values after unsuccessful SC (ε t ) decoding for flipping elements ε = ε t ∪ {i}, where i ∈ ς (i w t ) and ς (i w t ) = {i ∈ A|P SC (i) > P E (i), i > i w t } (recall i w t is the last bit in the flipping element ε t ). If M (ε ) < M (ε T ), where ε T is the last element in ξ t , insert ε and M (ε ) into proper positions of ξ t and ψ t respectively according to their metric values. Since M (ε t ) < M (ε ), ε will be inserted into a position behind t. Otherwise, ξ t+1 = ξ t and ψ t+1 = ψ t . In this way, the updated sets ξ t+1 and ψ t+1 are obtained.
The main difference between the proposed BER-SCFlip-ω and D-SCFlip-ω lies in the strategies to initialize and update the flip-bits set. The initial construction of the flip-bits set in D-SCFlip-ω needs to sort all information bits by their metrics. In our proposed scheme, we only need to sort those bits satisfying M SC (i) < M E (i). Thus, the number of bits to be sorted can be dramatically reduced. The update function in D-SCFlip-ω, calculates M (ε) for all flipping elements ε = ε t ∪ {i}, where i ∈ A and i > i w t , which is a heavy computational burden in practice. In contrast, the update function in BER-SCFlip-ω calculates M (ε ) for a limited number of flipping elements ε = ε t ∪ {i}, where i ∈ ς (i w t ) = {i ∈ A|P SC (i) > P E (i), i > i w t }. By introducing the constraint P SC (i) > P E (i), the number of new elements whose metrics to be measured and sorted at each update is significantly reduced so that the computational burden can be mitigated as well.
Therefore, the proposed BER-SCFlip-ω has a much lower sorting and metric computation complexity compared to D-SCFlip-ω, which will be analyzed in the following section.

B. THE OPTIMIZATION OF PARAMETER α
The parameter α in (9) and (11) is used to adjust the weight between the decoding ordering and LLR values in the criterion for selecting the flip-bits. When α = 1, the selection of the flip-bits set mainly depends on LLR values. When α = 0, the selection of the flip-bits set only depends on the position of the bit in the decoding ordering.
In [25], the optimal parameter α opt was first obtained via Monte-Carlo simulations. For simplification, α opt was then calculated by modelling it approximately as a quadratic function of iWER-0.
The optimal parameter α opt in our scheme determined by Monte-Carlo simulations is different from that obtained in [25], for that the different criteria will affect the optimal value of α opt . In this section, we may also obtain the optimal α opt by modelling it as a function of some key parameters determining iWER-0, such as code length, rate and SNR.
Assume that the code rate of polar codes remains the same, e.g., 1/2, the influence of code lengths on α opt is first investigated. As Fig. 3(a) shows, for codes with different lengths (N = 512, 1024, 2048), there does not exist a big gap among the optimized α opt . Similar results can be observed from simulations under other rates. Thus, we can neglect the impact of code lengths on the parameter α opt . When the code lengths are fixed, e.g., N = 1024, the influence of different rates on α opt can be visualized in Fig. 3(b).
It is observed that higher rate codes tend to have a relatively higher optimized parameter α opt . Meanwhile, the optimized parameter α opt tends to decrease with increased SNR and converges to a floor value in an exponential manner. Thus, we present an approximate formula for the optimized α opt (R, γ ) as We obtain many data points depicting the correlation of α The equation (13) could be used to facilitate the parameter optimization and the algorithm design.

V. THE PERFORMANCE ANALYSIS
In this section, we evaluate the WER performance of the proposed SCFlip for the decoding of CRC-polar codes by deriving its lower and upper bounds. The lower bound of SCFlip decoders has been obtained in [25]  Let P SC denote the probability of SC decoding failure, t * be the number of additional decoding attempts before outputting a correct codeword. We write T to denote the maximum number of extra decoding attempts. The WER of proposed SCFlip, P BSCF , can be approximately expressed as where P(t * = t), the probability of correct decoding at the t-th additional attempt, can be obtained by Monte-Carlo simulations, while P FA and P UE , the probabilities related to CRC, can be approximately estimated by the method in [32]. Assume the error probability of the information bit p ∈ [0, 0.5], the AWGN channel with p can be approximately converted into a discrete binary symmetry channel(BSC) with The CRC check has a False Alarm only when any bit in r CRC redundancy bits is decoded incorrectly yet all information bits are correctly estimated. Thus, P FA can be expressed as According to [32], the occurrence probability of the undetected error can be where A i = |{c|c ∈ C, w(c) = i}| denotes the number of codes c with weight i in the codebook C and k is the information length of CRC-polar codes. The weight distribution A i is obtained by enumerating all short length codewords or SCL with an extremely large list for relatively long length codewords [32]. Using P UE ≤ 2 k − 1 2 −k−r ≤ 2 −r , we get the upper bound of P BSCF according to [32], [33].
Thus, we can get the lower bound and upper bound of the WER for our proposed SCFlip decoding in (18), where P OASC is the error probability of the perfect OASC decoding. . From (17), we can observe that a long-length CRC code may lead to a low upper bound of the WER for our proposed scheme. But the inordinate long length may cause the negative effect of the rate loss. Thus, to evaluate the effect of the CRC check (redundancy) length r on the decoding performance, we perform the simulations of BER-SCFlipω (ω → ∞) decoding for various CRC redundancy bits under (1024, 512) CRC-Polar codes. The maximum number of new decoding attempts T is 100. Note that both the CRC code rate R c = k k+r and the polar code rate R p = k+r N will change with r but the overall code rate R remains fixed. The generator polynomials of different CRC encoders are listed in Table 1 and the simulation result is shown in Fig. 4. It is observed that the protection of polar codes by a relatively longer CRC code could provide significant improvement in the WER performance, e.g., polar codes concatenated with CRC-12 achieve a 0.4dB gain at the WER of 10 −3 over those with CRC-4. However, excessively long-length redundancy bits, e.g., r = 24, give rise to a degraded WER performance. The reason is that the high polar code rate R p degrades the error correction capability of polar codes. Thus, a moderate CRC code length is recommended, e.g., CRC-12 or CRC-16 for CRC-Polar codes of length 1024 and rate 1/2.

VI. THE SIMULATIONS AND ANALYSIS
In this section, our proposed schemes are compared via simulations with some existing SCFlip decoding algorithms in terms of the WER performance and the computational complexity to validate the advantages of the proposed BER-SCFlip decoder and BER-SCFlip-ω decoder.

A. EVALUATION OF BER-SCFLIP DECODER
First, we compare BER-SCFlip decoder with other existing SCFlip decoding algorithms where only one bit will be flipped in one decoding attempt, including original LLR-SCFlip decoder in [17], D-SCFlip decoder in [25], Progressive SCFlip decoder in [19] and Improved SCFlip decoding in [20]. The comparisons are given in terms of the accuracy of targeting the first error bit, the WER performance and the computational complexity. For the fairness of comparison, the simulations are performed under the same code lengths and rates for all decoders: (1024,512) and (2048,1024) polar codes with CRC-16. The maximum number of additional decoding attempts is set to be T = 10.
We first perform extensive Monte Carlo simulations of 10 6 samples in experiments to obtain the statistical results about the probability of correctly targeting the first error bit and the average position of the estimated first error bit in the flip-bits set. The results are listed in Table 2. It is observed that all SCFlip decoders have good error-correction performance with reduced additional decoding attempts as SNR increases. It is also shown that the proposed SCFlip decoder can target the first error bit more precisely and more timely than all other SCFlip decoders at low SNR(γ < 2dB). At high SNR (γ ≥ 2dB), the proposed decoder performs better than other decoders except for D-SCFlip, which can target the first error bit a little bit earlier than the proposed decoder.
We then give a WER comparison among advanced SCFlip decoding algorithms in Fig. 5. It is observed that the proposed BER-SCFlip and D-SCFlip outperform others and could approach the performance of OASC decoding. At low SNR (γ ≤ 1dB), where more than one errors will occur with a high probability due to the strong channel noise, the performance gain is minor when only one bit can be flipped in additional decoding attempts.
To evaluate the computational burden incurred by additional decoding attempts, we calculate the average number of new attempts before successful decoding, which is listed for different SCFlip decoders in Table 3. It is observed that the proposed scheme has fewer additional decoding attempts and thus a lower complexity except for D-SCFlip, which shows a minor advantage over the proposed decoder under high SNR (γ > 2dB). Considering that the proposed BER-SCFlip employs a lower complexity criterion for targeting the error bits than D-SCFlip, the overall computational complexity of the proposed BER-SCFlip is lower than D-SCFlip, which will be elaborated in the later complexity analysis.

B. COMPLEXITY ANALYSIS FOR BER-SCFLIP DECODER
The computational burden includes both additional decoding attempts and the computational complexity for selecting unreliable bits in the construction of the flip-bits set. In Table 3, it is shown that the complexity difference in the VOLUME 8, 2020  number of extra attempts is minor since their probabilities of starting a new attempt are close. Thus, we turn to the complexity analysis in terms of different strategies for flip-bits selection in the following.
Denote the length of non-frozen bits by K . In the conventional SCFlip, LLR's absolute values are used as metrics for the flip-bits selection, thus no additional computation is needed for the metric calculation and the sorting complexity is O(K log K ). In D-SCFlip, in addition to the sorting complexity O(K log K ), K multiplications and (K +3)K 2 additions are needed to compute metrics for each new decoding attempt, excluding the complexity of logarithm and exponential computations due to their low-complexity operations of the look-up table. In contrast, Progressive SCFlip decoding has a much narrower scope of the critical set, so the sorting complexity can be reduced to O(|S| log |S|), where |S| denotes the capacity of the critical set S. Besides, it needs to build a tree structure and search the tree to construct the critical set S with at least O(N ) complexity. Improved SCFlip decoding needs massive simulations to construct the critical set before the decoding. Thus, its pre-decoding complexity is O (MN log N ), where M is the number of samples in the pre-experiment. Also, the comparison of LLR values with the determined threshold costs O(K ) computational complexity. The proposed BER-SCFlip needs to compute BERs from decoded LLR values for the flip-bits selection, resulting in K multiplications and K additions. The complexity of calculating the BER expectation via GA algorithm is O(N ) [28]. Also, the BER comparison of non-frozen bits costs O(K ) complexity. Besides, the sorting complexity can be omitted, since the set is sorted by the decoding ordering and no extra operation is needed.
In summary, the complexity comparison in the construction of the flip-bits set is listed in Table 4, where we can observe that our proposed BER-SCFlip has a relatively lower complexity in the construction of the flip-bits set.

C. THE EVALUATION OF BER-SCFLIP-ω ALGORITHM
BER-SCFlip-ω allows more than one channel-incurred errors to be corrected in one new decoding attempt with dynamic updating of the flip-bits set, which is obtained by incorporating the proposed criterion with D-SCFlip-ω. Thus, BER-SCFlip-ω could achieve a strong error-correction performance with reduced complexity. We provide simulations to compare the WER performance of the proposed BER-SCFlip-ω, D-SCFlip-ω [25],  and CRC-SCL decoder [6]. All simulations are performed under AWGN channel. CRC-polar codes (1024,512+16) and (2048,1024+16) are used for simulations, where CRC-16 in Table 1 is used. We construct polar codes using GA method presented in [28]. The maximum number of new decoding attempts T is set to be 100 and 300 for ω = 2 and ω = 3, respectively.

1) THE PERFORMANCE COMPARISON
The WER performance comparison results are presented in Fig.6. It is observed from Fig.6 that for ω = 2, the proposed BER-SCFlip-ω can achieve a better WER performance than D-SCFlip-ω for both CRC-polar codes (1024,512+16) and (2048,1024+16) under high SNR (γ > 2.5dB). Moreover, They both surpass CRC-SCL decoder with L = 8. When ω increases to 3, their performance could approach that of the CRC-SCL with L = 16 for N = 1024 and even surpass it for N = 2048 at high SNR (γ > 2.5dB). In addition, in contrast with D-SCFlip-ω, the effect of ω values on the performance gain is less obvious for our scheme when N = 1024, while for N = 2048 codes, the gap is filled.

2) THE COMPLEXITY COMPARISON
The computational complexity of BER-SCFlip-ω and D-SCFlip-ω includes not only the additional decoding complexity but also the computation complexity in the construction of the flip-bits set.
The average number of additional decoding attempts, T ave , is listed in Table 5 for the comparison between BER-SCFlip-ω and D-SCFlip-ω decoders. It is observed that the proposed BER-SCFlip-ω decoder could achieve successful decoding with much fewer average additional attempts than D-SCFlip-ω, especially at low SNR(γ ≤ 2dB), e.g,. the gap is over 10 for ω = 2 and over 40 for ω = 3 at SNR γ = 1dB. At high SNR (γ > 2dB), the average number of extra attempts for the successful decoding is similar for both decoders.
The computational complexity for the construction of the flip-bits set mainly includes the metric computation and sorting complexity. Denote the number of non-frozen bits by K . In the initialization of the flip-bits set, the number of elements whose metrics to be calculated is K for both decoders. Thus, their metric calculation complexity is both O(K 2 ) 1 as Table 6 shows. Before sorting, the metric comparison in BER-SCFlip-ω will cost extra O(K ) complexity. However, with the metric comparison, the number of elements to be sorted is reduced to |ξ 0 |, while D-SCFlip-ω needs to sort K elements in the initial step. Therefore, the sorting complexity of BER-SCFlip-ω and D-SCFlip-ω will be O(|ξ 0 | log |ξ 0 |) and O(K log K ), respectively.
Denote the number of non-frozen bits decoded after bit i ω t by τ . Recall that bit i ω t is the last bit in ε t in Algorithm 4. When updating the flip-bits set in our algorithm, the BER-based comparison with metric computation   complexity O(τ ) and comparison complexity O(τ ) can further reduce the number of elements to be sorted from τ to ν. In contrast, the metric computation and sorting in D-SCFlipω are based on τ elements. The complexity comparison in the updating of the flip-bits set is given in Table 6. It can be observed from Table 6 that the proposed BER-SCFlip-ω has a lower computation burden for constructing and updating the flip-bits set than D-SCFlip-ω.
To further verify the efficiency of the metric and BER-based comparison, the average number |ξ 0 | is collected, as well as ν i where i denotes the number of bits in ε t . Table 7 presents |ξ 0 | and ν i for both CRC-polar codes (1024,512+16) and (2048,1024+16) when SNR γ = 2dB and ω = 3. The average number τ i (i = |ε t |) is also given for comparison. As Table 7 shows, the introduction of the metric and BER-based comparison could bring a significant reduction in the number of elements to be calculated and sorted. Similar results can be observed for simulations under other SNR values.
Taking into account that the proposed BER-SCFlip-ω has a lower number of additional decoding attempts than D-SCFlip-ω, the former has a lower overall complexity and time delay than the later without much loss in the error-correction performance.
To evaluate the constraint T , the maximum number of new decoding attempts, on the WER performance, we conduct simulations to compare the WER of BER-SCFlip-ω(ω → ∞) with CRC-SCL decoding algorithm under L = 4, 8, 16 for reference. Fig. 7 shows the WER performance curves for (1024,512+16) and (2048, 1024+16) CRC-polar codes using BER-SCFlip-ω decoding with different T values, and CRC-SCL decoding with various list sizes. We can observe that BER-SCFlip-ω decoding with T = 10 outperforms CRC-SCL decoding with L = 4, while BER-SCFlip-ω with T = 50 could surpass CRC-SCL decoding with L = 8. The performance of BER-SCFlip-ω decoding with T = 300 approaches CRC-SCL with L = 16 for N = 1024 and could compete with it for N = 2048.
Comparing to SCL decoding complexity O (LN log N ), the proposed BER-SCFlip-ω has an approximate complexity O(T ave N log N ), since the computational complexity in the construction of the flip-bits set can be negligible with respect to SC decoding complexity. From Fig. 8, we observe that T ave , the average number of extra decoding attempts, can be automatically adapted to SNR variations. At very low SNR (γ < 1dB), BER-SCFlip-ω has a higher complexity than SCL decoding. However, At high SNR (γ > 2dB), T ave approaches 1 and BER-SCFlip-ω has nearly the same complexity as SC decoding, which is much lower than SCL decoding. Thus, the proposed criterion can be applied to SCL decoding for the path selection, which may speed up the decoding by narrowing the selection scope. We will investigate this issue in our future research.

VII. CONCLUSION
A BER evaluation based SCFlip decoder for polar codes is presented. By estimating the BER expectations of non-frozen bits via GA and comparing them with their corresponding BERs derived from LLR values in SC decoding, the error bits of SC decoding can be pinpointed accurately. Additionally, by combining our proposed criterion with that presented in D-SCFlip [25] for the flip-bits sets initialization and update, an extended BER-SCFlip-ω is proposed to correct multiple bits per decoding attempt such that a better error-correction performance is achieved. Thanks to the constraint we introduce, the scope of candidate bits in the constructing and updating the flip-bits set is significantly reduced and thus a low complexity, storage cost and decoding latency can be achieved.