Two Novel Semi-/Auto-Adaptive SNR Algorithms to Efficiently Train Deep Neural SPA Decoders

In the past few years, deep learning has been widely used in various fields due to its outstanding progress. One of the latest applications of deep learning is to use a neural network (NN) with trainable multiplicative weights to design decoders for error-correcting codes. High quality data are essential for deep learning to train robust NN models. In this study, two novel semi-/auto-adaptive SNR algorithms are proposed to efficiently train the neural decoders based on the Sum-Product Algorithm (SPA). For illustration, several neural SPA decoders for the Bose-Chaudhuri-Hocquenghem (BCH) code and low-density parity-check (LDPC) code have been constructed as examples. Simulation results show that, compared with the original neural decoders, the performance of these neural decoders trained by the proposed algorithms can be improved in the range of 0.2 to 0.6 dB. Moreover, the training time required for these decoders to achieve convergence can be reduced by up to 28.8% for the BCH code, and up to 35.6% for the LDPC code, without increasing decoding complexity.


I. INTRODUCTION
Owing to the presence of noise interference in digital message transmission, the received message may not be exactly the same as what was sent. To achieve reliable communication, it is necessary to detect and correct the errors by using error correcting codes (ECCs). In the past few decades, several near-capacity ECCs have been proposed for error-free data transmission over noisy channels, such as the turbo codes [1], low-density parity-check (LDPC) codes [2], and polar codes [3]. Shao et al. provided an overview and comparison of the turbo code, LDPC code, and polar code in [4]. Detailed descriptions of the capabilities of these three codes are given for meeting different requirements associated with the enhanced Mobile Broad Band (eMBB), Ultra-Reliable Low Latency Communication (URLLC), and massive Machine Type Communication (mMTC) applications of 5G, as well as the application specific integrated circuit (ASIC) implementation of the decoders. Among these codes, LDPC codes have attracted widespread attention through the excellent performance of the iterative belief propagation (BP) decoding algorithm [2]. Because of its widespread The associate editor coordinating the review of this manuscript and approving it for publication was Massimo Cafaro . popularity, adaptability, and parallelism in cost-effective hardware implementations, LDPC codes have been used to improve data reliability in various communication applications [5].
The BP decoding algorithms are widely preferred in many wireless communication standards because of their excellent error rate performance. Because the BP decoding algorithms involve great number of logarithmic and multiplicative operations when updating the check node, the high computational complexity burdens the hardware. In an effort to reduce the computational complexity, Fossorier et al. introduced minsum algorithm (MSA) for fast iterative decoding of LDPC codes [6]. However, MSA has significant performance degradation, because the computation of the check-to-variable message is simplified to a minimum operation instead of a hyperbolic tan calculation. Chen and Fossorier [7] introduced the Normalized Min-Sum (NMS) and Offset Min-Sum (OMS) algorithms to modify the MS algorithm to achieve better performance. Numerical results showed that with one properly chosen parameter for each of these two algorithms, performances can be close to that of the BP algorithm. In [8], one simple method with less complex arithmetic operations to find the offset correction factor had been demonstrated. Chang et al. [9] proposed a conditional VOLUME 10, 2022 This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/ variable node (VN) selecting metric to realize informed dynamic scheduling (IDS) LDPC decoding schedules. The goal is to find VNs with probable incorrect decisions and correct the errors by updating them. To reduce decoding latency, multi-edge updating versions of algorithms are realized by increasing the degrees of parallelism of the update. In [10], Roberts  proposed to sample the received signals and generate the corresponding asynchronous amplitude histograms (AAHs) [11] and two-dimensional asynchronously sampled in-phasequadrature amplitudes' histograms (2D-ASIQHs)-based images [12]. Due to the unique statistical properties of the AAHs, they can be used to obtain the information about different signal parameters (i.e., modulation type and SNR). Next, the significant features of these AAHs (or 2D-ASIQHs images) are extracted through Principal components analysis (PCA) and used to train the Support Vector Machine (SVM) based classifier and regressor. Simulation results show that this SVM-based system has a good accuracy rate in identifying and predicting the modulation type and SNR value of the received signal. Obtaining the modulation type and SNR information in advance will enable the receiver to make corresponding adjustments to improve system performance, which is crucial for the receiver design in future wireless communication systems. Nachmani et al. [13], [14] proposed one deep neural network (DNN) architecture for decoding HDPC codes. Because the edges of the Tanner graph are assigned with trainable weights and biases, this DNN decoder can be regarded as a weighted-BP (WBP) decoder. From the simulation, it can be seen that the performance can be improved by optimally training the neural network parameters. Gruber et al. [15] proposed to use a fully-connected (FC) neural network (NN) to decode the random codes and structure codes. When the code length was very short, the simulation results showed that this NN decoder performed as well as the maximum a posterior (MAP) decoder. Kim et al. [16] demonstrated that trained recurrent neural network (RNN) architectures can decode convolutional and turbo codes with close to optimal performance under the additive white Gaussian noise (AWGN) channel. Lugosch and Gross [17] proposed to apply deep learning technology to the offset min-sum algorithm (OMS) to achieve similar decoding performance as that in [13], while requiring fewer trainable parameters and lower hardware complexity. Wang et al. [18] adopted model-driven deep learning and proposed shared neural normalized min-sum (SNNMS) decoding to reduce the number of correction factors and lower the complexity. Vasić et al. [19] utilized DNNs to MSA for decoding LDPC codes. The simulation results showed that the neural decoder can perform no worse than conventional MSA. Chu et al. [20] proposed a neural-network optimized low-resolution decoding (NOLD) algorithm to improve the performance degradation of the MSA incurred by low-resolution quantization. In [21], we applied the DNN architecture to the Sum-Product Algorithm (SPA) decoder to improve the performance of the optical code-division multiple-access (OCDMA) system with LDPC codes.
Generating a suitable training dataset under various signalto-noise (SNR) ranges to train a robust neural network model is an interesting and important issue. In [14], Nachmani et al. demonstrated that different decoding performances on the same validation set were obtained while training WBP models with varying SNR ranges. Gruber et al. [15] introduced a metric to compare the training performance of an NN decoder having a particular SNR with that of an MAP decoder over a range of SNR values. If the training set includes the entire codebook, for both structure and random codes, training the NN decoders with a certain SNR value can generalize input values with arbitrary SNR. Otherwise, only the NN decoder used for structure codes would be capable of generalizing unknown codewords. In [22], Lian et al. utilized parameter adapter networks (PANs) to establish the relationship between the SNR and WBP parameters. Several shallow NNs were used to estimate the SNR-dependency for each parameter in the decoding algorithm. In [23], Be'Ery et al. proposed two supervised learning methods, which actively sampled the training data through hamming distance and reliability parameters, respectively. It was shown that this active sampling method provides an efficient way to separate useful training data.
As the method of finding useful data is a key factor for training the NN, we propose two novel semi-/auto-adaptive SNR algorithms without increasing the complexity of the decoder. These algorithms can be applied to the aforementioned DNN decoders, such as the neural SPA (NSPA), neural OMS (NOMS), and SNNMS. For clarity of exposition, we take the NSPA used in [13], [14], [21] as an example in this work. Intuitively, the BCH codes in [13], [14] and LDPC codes in [21] are also used to verify the effectiveness of the proposed algorithms. In addition, inspired by [13], [14], two modified NSPA (MNSPA) decoders are investigated and demonstrated. From the simulation results, we can observe that the performance of these neural decoders can be further improved using the proposed adaptive SNR algorithms.
The remainder of this paper is organized as follows. Section II introduces the necessary background for the neural decoder. In Section III, two novel semi-/auto-adaptive SNR algorithms are described in detail. Section IV presents the simulation results, and Section V concludes the paper.

II. PRELIMINARIES
Consider one (N , K ) linear block code with a parity-check matrix H, where N is the code length and K is the message length. Let x = (x 1 , x 2 , . . . , x N ) be the codeword transmitted over the AWGN channel with common binary phase-shift keying (BPSK) mapping to {+1} and y = (y 1 , y 2 , . . . , y N ) be the corresponding received output vector, where y v = (−1) x v + n v for 1 < v < N, and n v is Gaussian noise with standard deviation σ . The signal-to-noise ratio per bit used in this work can be represented as SNR = E s /(RN 0 ) = 1/(2σ 2 ), where E s denotes the signal energy, R is the code rate of the linear block code and N 0 is the power spectral density of noise.
Thus, the initial log-likelihood ratio (LLR) of the vth received value can be calculated as First, we review the sum-production algorithm, which is an iterative computing algorithm for decoding the LDPC code. For each iteration, the posterior LLRs of the received vector are computed by exchanging the messages between processing nodes. The symbols and notations used in SPA are listed as follows: • E: the number of edges in the Tanner graph.
• T : the maximum iteration number in the SPA.
• M (c): The set of variable nodes connected to check node c.
• M (c)\v: The set of variable nodes connected to check node c, not including variable node v.
• N (v): The set of check nodes connected to variable node v.
• N (v)\c: The set of check nodes connected to variable node v, not including check node c.
• L k (a,b) : The LLR message from node a to node b in hidden layer k. At the i-th iteration, the messages from variable nodes to check nodes are computed by: Notice that L 0 (c ,v) = 0 due to no information being present at the check nodes.
The messages from check nodes to variable nodes are updated by: The final output variable node value is calculated by: At the end of each iteration, the decoded codeword can be obtained by: Procedure stops ifŷ H T = 0 or a maximum number of iteration has been reached.

B. DEEP NEURAL NETWORK
Next, we introduce the structure of these DNN decoders, whose network architectures are non-fully connected neural networks. Essentially, this can be viewed as a Trellis representation of the Tanner graph, which extends two hidden layers from one iteration of the SPA.
The decoder is composed of one input layer of size N , 2T hidden layers of size E, and one output layer of size N . For each hidden layers, each neuron in that layer indicates the message transmitted over an edge in the Tanner graph. For illustration, one 9 × 9 parity-check matrix H is constructed as follows: By using Gaussian elimination, we can know that H has a rank of 7, which means that it can be considered as a paritycheck matrix of (9, 2) linear block code. The code length N is 9, and the column and row weights are both 3. Assume that the maximum iteration number T is 3, the corresponding decoder has one input layer, six hidden layers, and one output layer. The first (last) layers have 9 input (output) nodes, and VOLUME 10, 2022 all six hidden layers have 27 neurons corresponding to the 27 edges over its Tanner graph, as shown in Fig. 1.
The connections among these layers are determined as follow: 1. For the first (last) layer, neuron t is connected to a single input node x v in the input (output) layer if x v is incident to edge t, where 1 < v < 9 and 1 < t < 27. 2. For the remaining odd (or even) hidden layers, neuron t in this layer is connected to the neurons in the previous layer, whose corresponding edges in the Tanner graph are incident to N (v)\c (or M (c)\v). Note that since there is no information on the check nodes after the first initialization, the first and second hidden layers can be merged together. The self LLR messages L v are plotted as small arrows in Fig. 1.

C. NSPA DECODER
For the NSPA decoder [13], only the variable-to-check messages (i.e., odd hidden layers) and the output node are assigned with trainable weights. For odd hidden layer i, the messages are calculated as: where w i v and w i c ,v,c are trainable weights assigned to LLR messages L v and L i−1 (c ,v) , respectively. As stated previously, L 0 (c ,v) = 0 due to no information being present at the check nodes.
For even hidden layer i, the check-to-variable messages are updated as: The output marginalization is computed aŝ where σ (x) = (1 + e −x ) −1 is a sigmoid function to convert the value to the range [0, 1]. If the output value is in the range [0, 0.5], it is determined to be bit 1, otherwise, bit 0.

D. MNSPA DECODER
To reduce the number of trainable weights, two MNSPA decoder are investigated as follows.

1) TYPE I
Inspired by [14], one simple method to reduce the trainable weights of the NSPA can be easily implemented as follows.
For odd hidden layer i, the variable-to-check messages are updated as: Notice that the main difference between (7) and (10) is that the trainable weights are removed in (10). Similarly, L 0 (c ,v) = 0 as no information is present at the check nodes in the beginning.
For even hidden layer i, the check-to-variable messages are calculated as: where w i c,v is the learnable weight parameter for the edge connecting check node c to variable node v. Note that the input for the tanh −1 function is clipped in the range of −0.999 to 0.999 to stabilize the operation of the decoder.
Finally, the output marginalization is calculated aŝ where W v,out (v) and W cv,out (v) are the learnable weights for the self LLR message L v and the message from the check node s to the variable node x v , respectively. The output bit is determined to be 0 if the value is in the range (0.5, 1], otherwise 1.

2) TYPE II
Similar to [22], the MNSPA Type II decoder can be viewed as the NSPA decoder with simple scaling trainable weights. The equations used to calculate the variable-to-check messages for odd hidden layer i and output marginalization are the same as (10) and (12), respectively. For even hidden layer i, the equation for updating the check-to-variable messages is modified as: where w i is the learnable weight parameter for the edges connecting to hidden layer i.
For brevity, the two types of MNPSA decoders are named MNPSA-I and MNPSA-II decoders for the remainder of this paper. The main differences between the MNSPA decoders and the NSPA decoder are summarized as follows:  (12) and (13), two types of weight matrices {W v,out , W cv,out } of size {1 × N , 1 × N } and one trainable weight w i are assigned to the output layer and each even hidden layer i, respectively. Only T + 2N weights need to be trained, thus, the training process and computation complexity of the MNSPA-II decoder is the lowest among the three decoders. The performances of the decoders are independent of the transmitted codewords because the channel output is symmetric. Thus, the training database is constructed by using the zero codeword with noise. The goal is to train the decoder to recognize the output as close as possible to the zero codeword.
The neural network is trained by using an optimization process, which requires a loss function to calculate the model error. In this study, the binary cross-entropy (BCE) loss function is used for training as follows: where x v andx v are the vth element of the transmitted codeword and the neural decoder output, respectively.

III. SEMI-/AUTO-ADAPTIVE SNR ALGORITHMS
The NSPA, MNSPA-I, and MNSPA-II decoders are trained and optimized by minimizing the BCE loss function (14) using stochastic gradient descent (SGD) with mini-batch. Assume that the mini-batch size is N b , the size of each training dataset can be represented as N b × N . Therefore, without loss of generality, we can derive the average minibatch BCE (MBBCE) loss function as follow: where L BCE (j) represents the BCE value calculated by substituting row j of the mini-batch training dataset. The mini-batch training dataset consists of a range of different SNRs. It is clear that the variance of noise increases (or decreases) when the SNR decreases (or increases). If the SNR is close to zero, the noise is too large for the NN to learn the code structure. Similarly, if the SNR is close to infinity, it is also redundant for training the NN to handle the noise. Generating a suitable dataset under various SNR ranges is one key factor for training the NN. This inspired us to determine how to adjust the number of codewords with varying SNR in one epoch. First, the symbols and notations used in the algorithms are defined as follows: • Let a be the number of SNR values used for generating the training dataset, and S = {s k , for 1 < k < a} denotes the set of training SNR values.
• N s = {n s (k), for 1 < k < a} is the set of the number of codeword of each SNR.
• P = {p k = n s (k)/N b , for 1 < k < a} is the set of SNR ratios.
• N e : the number of maximum epochs.
• N f : the samples for each SNR. Fig. 2 illustrates the flowchart of the DNN decoders. Generally, the mini-batch training dataset is constructed by generating codewords with uniform distribution SNRs (i.e., N s = {n s (k) = N b /a, for 1 < k < a}). For clarity, assume that the mini-batch size is 128, and the number of training SNRs is 8 (i.e., from 1 dB to 8dB). Then, the number of codewords for each SNR n s (k) is 16, and the p k is 1/8 for 1 < k < 8.
Because these decoders are optimized by minimizing the loss function, one intuitive idea to realize the SNR adaption is to modify the MBBCE equation as follows: where L SNR (k) represents the MBBCE value when SNR is k. From (16), it can be observed that the MBBCE is modified to calculate the value of each SNR, rather than the summation of all SNRs in (15). Thus, we can utilize the set of SNR ratios P to adjust the number of codewords of different SNRs in one epoch.
The flowchart of the proposed algorithms is shown in Fig. 3. The processes for initialization and input are the same as the original flowchart in Fig. 2. The main difference is that the proposed algorithms will determine whether to adjust the set P in each epoch based on the optimization of MBBCE. Next, these two SNR algorithms are described in detail.

A. SEMI-ADAPTIVE SNR ALGORITHM
In the following, the Semi-Adaptive SNR algorithm is described in detail:

1) INITIALIZATION
• Set the values of a and N b , then, the sets of S, N s , and P can be determined.
• Set the value of f optimal and f att , where the former denotes the desired optimal factor for the MBBCE value, and the latter represents the desired attenuation rate for SNR adaptation. Note that 0 < f optimal , f att < 1.0.
• Let L pre = 0, where L pre denotes the MBBCE value for the previous epoch.
• Let L cur = 0, where L cur denotes the MBBCE value for the current epoch.
• Set T att = 0, where T att is the attenuation time of the SNR.

2) TRAINING
As shown in Figs. 2 and 3, the value of N s used for generating the training data in one epoch is the same (i.e., if loop < N f ). The value of MBBCE is calculated by using (16) and then used in algorithm 1 to process the SNR adaptation. If the MBBCE value of the current epoch does not reach the desired optimal value (line 2), algorithm 1 will set the counter as a trigger condition to adaptively adjust the SNR ratios and adjust the number of codewords for each SNR (lines 5-7). The detailed adaptive adjustment process is summarized in Algorithm 2. Because it is almost impossible to recover the signal if the noise is large, we propose to sequentially attenuate the codeword number of small SNRs. If the value of T att is larger than zero, we multiply the SNR ratio p i by (1f att ) to the power of (T atti), where 1 < i < a (lines 6-13). Next, the set of SNR ratios will be normalized and then used to adjust the number of codewords of each SNR through Algorithm 3.
It can be observed that N s is the nearest integer rounded to the product of the mini-batch size N b and the set of the SNR ratios (lines 2). As the summation of N s should equal N b , we may need to randomly adjust (add or delete) the value of N s (lines [5][6][7][8][9][10][11][12][13][14][15]. Example: Let a = 8, the mini-batch size N b = 128, and the SNR training set S = {s k = k, for 1 < k < a}. Then, the sets In this example, we set the desired f optimal = 0.9 and f att = 1.0. This means that if the decoder cannot reduce 10% of the MBBCE value of the previous epoch (line 2 in Algorithm 1), the SNR adaptation will be triggered (lines 6-13 in Algorithm 2). The set P can be updated as Furthermore, one question comes to mind, is it possible for NNs to automatically adjust the set of the number of codewords for each SNR? We review the semi-adaptive SNR algorithm. We can conclude that the SNR adaptation is achieved by varying the set of SNR ratios P, which essentially affects the change of N s . This shows us how to let NNs learn to automatically adjust N s . The Auto-Adaptive SNR algorithm is described in detail: • Set the values of a and N b , the sets S and N s can be determined.
• Initialize the set of the SNR ratios P = {p k = n s (k)/N b , for 1 < k < a}. It is important to change the status of P to trainable when programing the algorithm. This is the key to enable the NN to automatically adjust the P value.

2) TRAINING
Algorithm 4 presents the realization of the auto-adaptive SNR algorithm. The code is simple and easy to implement. Because the set P is set to be trainable, the NN can automatically adjust these weights in order to optimize the average MBBCE value. Similarly, the set of SNR ratios P will be normalized and used in Algorithm 3 to process the adjustment of the number of codewords for each SNR. Example: Let a = 8, the mini-batch size N b = 128, and the training SNR set S = {s k = k, for 1 < k < a }. Then, the sets P and N s can be determined as P = {p k = 1/8, for 1 < k < 8} and N s = {n s (k) = 16, for 1 < k < a}. Note that the status of the set P must be set to trainable.

IV. RESULTS AND DISCUSSION
We present the results of training and applying the proposed algorithms to two different codes, BCH(127, 106) [24] code and (155, 64) LDPC code [25]. The decoders were built in Python 3.7 with the Tensorflow library. The training is performed by using the SGD with mini-batch, and the optimizer for training the neural network is set to RMSPROP [26]. All other training relevant hyperparameters are summarized in Table 1. Let the number of iterations T of the SPA decoder and all neural decoders be 5. Therefore, all neural decoders have 10 hidden layers. The training dataset is generated by sending all zero codewords with varying SNRs ranging from 1 dB to 8 dB. The codewords are not all zero, the testing dataset is constructed in the same way. The input value needs to be limited to the range of [-10, 10] to make the tanh calculation in (7) and (10) stable, as with the SPA decoder.
During the training stage, the decoders are trained by 30,000 batches and validated by 10,000 batches in each epoch. The early stopping function is used to monitor the average MBBCE calculated by (16) at the end of each epoch. Once the model performance measure stops improving for 5 consecutive epochs, training stops. Only the model with the best performance will be saved. Then, this model will be used to decode the non-all-zero codewords of various noise variances at the testing stage. Simulation will be stopped at a minimum of 100 frame errors or 4,000,000 simulated frames for each SNR. If the performance is low, the model is reloaded and retrained.
Figs. 4 and 5 depict the bit-error-rate for the BCH(127, 106) code and (155, 64) LDPC code with and without semi-/auto-adaptive SNR algorithms, respectively. For comparison, the performances of these decoders trained by using pure high SNR are also included. First, the decoders trained by different types of datasets are abbreviated as follows: 1. TypeU: The dataset is composed of codewords with uniform SNR (i.e., from 1 dB to 8 dB). 2. TypeH: The dataset is composed of codewords of pure high SNR with a value of 8 dB. these TypeS decoders are very similar. This is because the distribution is directly influenced by f att . In this study, the value of f att is set to 1, which means that if the average MBBCE value of the current epoch does not reach the desired target (i.e., f optimal × L pre ), the number of codewords with a small SNR (from 1dB to 7dB) will be set to zero in sequence. Worth noting, different combinations of f optimal and L pre generate different distributions of the dataset N s , which may cause slightly different training results. However, this is beyond the scope of the study.  with an SNR of 7 and 8 dB and partial codewords of 6 dB. For the (155, 64) LDPC code, the dataset N s consists of a large number of codewords with an SNR of 6-8 dB and partial codewords of 5 dB.
The average MBBCE values for these two codes are presented in Figs. 4(g) to (i) and Figs. 5(g) to (i), respectively. Obviously, the MBBCE value of the TypeU decoder had the worst performance. Because the dataset consists of many codewords with a small SNR, the noise is too large to be useful for the NN to learn and recover the original codewords. The MBBCE value of the TypeS decoder appears to have sharp-edged improvements in specific epochs, where the dataset N s has been adjusted the number of codewords of small SNR in sequence to zero. The MBBCE values of the TypeH and TypeA decoders can be improved rapidly at the second epoch. This is because the datasets for the two types of decoders are constructed by the codewords with higher SNRs, while the noise is lower when compared with the TypeU and TypeS decoders at the initial training epochs. Furthermore, we can observe that the TypeA decoder has better performance than the TypeH decoder. The main reason being that the dataset for the former consists of codewords with multi SNRs, which is more helpful for the NN to learn and optimize the weights.
From the results in Figs. 4(j) to (l), we can observe that, for the BCH(127, 106) code, the TypeS decoder outperforms the other decoders. Compared with the TypeU decoder, the improvements of the TypeS decoder and the TypeA decoder vary from 0.3 dB to 0.6 dB. For the (155, 64) LDPC code, the performance of the TypeA decoder is better than others, as shown in Figs. 5(j) to (l). Likewise, compared with the TypeU decoder, the improvements of the TypeS decoder and the TypeA decoder range from 0.2 dB to 0.6 dB. However, these neural decoders are still unable to compete with the existing excellent algorithms due to the following possible reasons. 1. BCH code: Since the (127,106)-BCH code has a large number of short cycles in the graph, the SPA is essentially not suitable for decoding this code. Although these neural decoders outperform the original SPA decoders, compete with the existing excellent algorithms [27], [28], they still have long way to go. 2. (155, 64)-LDPC code: Generally, as the number of iterations increases, higher coding gain can be obtained. As shown in Figs. 5(j) to (l), it can be observed that the SPA decoder with iteration 50 outperforms all neural decoders for at least 1.3 dB. For the neural SPA decoders, if the iteration number is set to 50, the NN will have 100 hidden layers and this deep network is hard to train because of the notorious vanishing gradient problem.
To build a deeper NN, our future work will apply the residual network (ResNet) [29] to these neural networks.
The results of these decoders are summarized in Table 2. As shown in Table 2 (127, 106) code, the result is just opposite. This is because that the (155, 64) LDPC code has less short cycles than the BCH(127, 106) code, the performance improvement of the former requires more training time and high-quality training data to optimize the weights of the NN. Since the TypeS decoders will sequentially attenuate the codeword number of small SNRs during the training stage, the dataset for the TypeS decoders is composed of codewords of pure SNR with a value of 8 dB in the last few training epochs. Meanwhile, the dataset for the TypeA decoders consists of codewords with multi SNRs, which is more helpful for the NN to learn and optimize the weights. Moreover, these simulation results prove that the proposed semi-/auto-adaptive SNR algorithms can not only train the neural network to achieve better performance, but also reduce the convergence time.

V. CONCLUSION
As the training data is an important factor for training a neural network, two novel semi-/auto-adaptive SNR algorithms were proposed to adjust the composition of the training dataset. The codeword number for each SNR is changed according to the optimization of the average mini-batch BCE in each epoch. For exemplification, three types of DNN decoders for the BCH(127, 106) code and (155, 64) LDPC code were trained with/without the semi-/auto-adaptive SNR algorithms, respectively. Compared with the original neural decoders, the simulation results showed that the performance of these neural decoders for the BCH(127, 106) code can be improved in the range of 0.3 to 0.6 dB and that for the (155, 64) LDPC code can be improved by 0.2 to 0.6 dB. The proposed algorithms can be easily implemented without additional decoding complexity. Moreover, the training time for these decoders to achieve convergence could be reduced by up to 28.8% for the BCH(127, 106) code and up to 35.6% for the (155, 64) LDPC code.
However, these neural decoders trained by the BCH(127, 106) code and (155, 64) LDPC code may not be sufficient to directly decode the signals under poor communication conditions (i.e., at small SNR regions), because the noise is too large for the NN to recover the original codewords. An efficient way to overcome this problem is to increase the code length of the adopted ECCs. But the main limitation is that the GPU hardware requirement increases significantly as the code length of ECCs code increases. Therefore, our future work will focus on how to integrate the ResNet into these neural decoders to build a deeper network with less weights and complexity. In addition, since the SVM-based system has the ability to predict the SNR value of the received signal with a good accuracy rate, this is helpful for the design and training of the neural decoders. We can properly split the desired SNR values into parts, and train these neural decoders with different SNR values, respectively. Then, these well-trained neural decoders can be set to the receiver end. Once the predicted SNR value is obtained, the receiver can switch to the suitable neural decoder simultaneously to recover the received signal.