SICNN: Soft Interference Cancellation Inspired Neural Network Equalizers

In recent years data-driven machine learning approaches have been extensively studied to replace or enhance traditionally model-based processing in digital communication systems. In this work, we focus on equalization and propose a novel neural network (NN-)based approach, referred to as SICNN. SICNN is designed by deep unfolding a model-based iterative soft interference cancellation (SIC) method. It eliminates the main disadvantages of its model-based counterpart, which suffers from high computational complexity and performance degradation due to required approximations. We present different variants of SICNN. SICNNv1 is specifically tailored to single carrier frequency domain equalization (SC-FDE) systems, the communication system mainly regarded in this work. SICNNv2 is more universal and is applicable as an equalizer in any communication system with a block-based data transmission scheme. Moreover, for both SICNNv1 and SICNNv2, we present versions with highly reduced numbers of learnable parameters. Another contribution of this work is a novel approach for generating training datasets for NN-based equalizers, which significantly improves their performance at high signal-to-noise ratios. We compare the bit error ratio performance of the proposed NN-based equalizers with state-of-the-art model-based and NN-based approaches, highlighting the superiority of SICNNv1 over all other methods for SC-FDE. Exemplarily, to emphasize its universality, SICNNv2 is additionally applied to a unique word orthogonal frequency division multiplexing (UW-OFDM) system, where it achieves state-of-the-art performance. Furthermore, we present a thorough complexity analysis of the proposed NN-based equalization approaches, and we investigate the influence of the training set size on the performance of NN-based equalizers.


I. INTRODUCTION
D IGITAL communications at the physical layer level is traditionally a quite model-based discipline.
That is, especially for the receiver processing blocks of digital communication systems, most algorithms have been developed based on physical and statistical models of the communication chain.With this established approach well interpretable methods can be obtained, their performance bounds can often be specified, and usually algorithms achieving optimal performance for the given models can be derived.Besides these advantageous properties, model-based approaches also have some downsides.Performance-optimal methods can in some cases exhibit an infeasible computational complexity, requiring the application of suboptimal algorithms in practice.Further, modeling errors, wrong (or oversimplified) assumptions, or insufficient model knowledge may lead to a considerable performance degradation.Since with data-driven machine learning methods many of the drawbacks of model-based approaches can be resolved, currently intensive research is conducted on machine learning approaches for several applications in communications engineering.This includes possible future scenarios like communications assisted by reconfigurable intelligent surfaces (RISs) [2], molecular communications [3], or integrated sensing and communication [4].However, also in traditional wireless communication systems promising results can be achieved by means of machine learning.This involves completely abandoning the block-based paradigm of current digital communication system design with the help of end-to-end learning [5], [6], or replacing / enhancing individual blocks of a standard communication chain [7]- [9].The latter includes machine learning approaches for channel estimation [10], [11], channel decoding [12], and self-interference cancellation [13]- [15].In this work, we regard another important processing block at the receiver, namely equalization.Equalization, also referred to as data estimation, is the task of reconstructing transmitted data -distorted during transmission over a channel -at the receiver side of a communication system.Typically, equalization is conducted by model-based methods.However, also with machine learning methods auspicious results have already been demonstrated [16]- [25].In current publications on machine learning approaches for data estimation, mainly neural networks (NNs) are employed.NNs are known to be universal function approximators [26] and thus are expected to approximate the optimal data estimators.Many of the presented results are promising, but there are also some new challenges arising.More specifically, standard NNs like a fully-connected feedforward NN (FCNN) are black-box approaches, i.e., their inference is not interpretable, performance bounds can hardly be derived, and domain knowledge is not exploited.Especially due to the latter fact most NNs suffer from requiring large amounts of training data and a high inference complexity.Optimally, one can fuse model-based and data-driven approaches by, e.g., incorporating existing model knowledge into NNs, which is expected to lead to less complex and better performing NNs than the standard black-box NNs.One possibility of incorporating model knowledge into NNs is to design their layer structure accordingly, which leads to NNs we refer to as model-inspired NNs.Currently, one of the most promising and most popular approaches for obtaining NNs with a model-inspired layer structure is deep unfolding [27].The idea of deep unfolding is to take a model-based iterative algorithm, which is conceived for finding the solution of an optimization problem, fix its number of iterations, and unfold every iteration to a layer of an NN.Depending on the aspired abstraction level of the NN (i.e., the similarity between the model-based algorithm and the NN), only a few parameters of the model-based iterative algorithm (e.g., its step size) or even whole parts are replaced by learnable parameters or modules, respectively.Those can then be optimized with tools known from NN optimization by utilizing available training data.A number of NN-based data estimators, e.g., the NNs in [18], [20], [22]- [25] are designed by employing deep unfolding.In this work, we also apply deep unfolding for the design of our proposed NN-based equalizers.
Most NN-based data estimators are currently proposed for equalization in multiple-input multiple-output (MIMO) communication systems, often assuming data transmission over an uncorrelated Rayleigh fading channel.In this work, the developed NN-based equalizers are mainly evaluated for single carrier frequency domain equalization (SC-FDE) systems [28], [29].In an SC-FDE system, a single carrier transmission scheme is utilized, but the payload data is transmitted in a block-wise manner with guard intervals between successive blocks, as it is the case in orthogonal frequency division multiplexing (OFDM) systems.The received blocks are transformed to frequency domain before conducting matched filtering, downsampling, and equalization.This allows an efficient receiver implementation [29] and results in a system model similar to that of an OFDM system.In this work, we regard employing both a cyclic prefix (CP) and a so-called unique word (UW), which is a known deterministic sequence, as guard interval.The UW can advantageously be utilized, e.g., for synchronization purposes [30], however, at the cost of equalization complexity.For a CP guard interval, the optimal linear equalizer is a low-complex single-tap equalizer, while for a UW guard interval, in turn, the optimal linear equalizer is more complex.In contrast to CP-OFDM systems, for SC-FDE optimal performance can only be obtained with computationally highly demanding nonlinear equalizers.This motivates developing NN-based data estimators for SC-FDE systems.Employing NN-based equalizers for SC-FDE systems necessitates, in contrast to MIMO systems over uncorrelated Rayleigh fading channels, an additional pre-processing step.As extensively described in [31], for the application of NN-based equalizers in SC-FDE systems a data normalization scheme is required for a well-behaved NN training and thus a satisfying performance of the NN equalizers.We also briefly review the necessary data normalization scheme in this work.

Contribution
In this work, we propose the NN-based data estimators SICNNv1 and SICNNv2, which are designed by unfolding an iterative soft interference cancellation (SIC) method [32].The main idea of iterative SIC is that in each iteration every single data symbol in the transmitted data vector is estimated on its own, by considering the influence of all other data symbols in the data vector as interference.This interference can be mitigated by incorporating estimates of the data symbols from the previous into the current iteration.By that, the data symbol estimates are refined from iteration to iteration.Although also DeepSIC [23] is inspired by the same iterative SIC method, SICNNv1 and SICNNv2 are fundamentally different from this NN.The idea of SIC is adopted by DeepSIC concerning its structure.That is, DeepSIC consists of multiple stages, where each stage is comprised of as many sub-FCNNs as there are data symbols in the transmitted data vector.Each of the sub-FCNNs is utilized to estimate one data symbol, whereby the input data of a sub-FCNN is made up of the received vector as well as of estimates provided by the sub-FCNNs for the remaining data symbols from the last stage.All estimates are refined stage by stage.That is, DeepSIC has similarities with the model-based SIC method only by refining the estimates of the posterior data symbol probabilities, but neither interference cancellation is conducted in a stage, nor model knowledge is utilized.In contrast, our proposed NN-based equalizers are far more similar to the underlying model-based method.More specifically, with SICNNv1 -an adapted version of an NN-based equalizer called SICNN proposed in our previous work [1] -we try to resemble the model-based iterative SIC method closely.However, we replace numerically demanding, computationally intensive operations, for which also approximations have to be made in the model-based approach, by low-complex NNs.This NN-based approach achieves significantly better performance than the corresponding model-based method, while exhibiting lower complexity.We tailor SICNNv1 for being employed as an NN-based equalizer in an SC-FDE system by exploiting some properties of this communication system in the NN architecture design.SICNNv2, in turn, is more abstracted from the model-based iterative SIC method, i.e., less model knowledge is utilized for the NN architecture design.However, it is more universal and can also be applied as an equalizer in any communication system with a block-based data transmission scheme.A further difference between DeepSIC and the proposed SICNNv1 and SICNNv2 is their generalization ability regarding different channels.As it is the case for, e.g., MMNet [21] or ViterbiNet [33], DeepSIC is trained for one specific channel.This generally allows lower complex NNs, but requires retraining as soon as the channel changes.SICNNv1 and SICNNv2 belong, like DetNet [18] or OAMP-Net [20], to the group of NN-based data estimators, which are trained with different channels sampled from a statistical channel model, and use the actual channel realization as an input.These NNs generally require an extensive offline training, and exhibit a higher computational inference complexity, but they do not have to be retrained as long as the specified statistical channel model is valid for the operating environment.
Since in every stage1 of SICNNv1/SICNNv2 the same task has to be fulfilled, namely to refine estimated posterior data symbol probabilities, we additionally introduce two modified versions of SICNNv1 and SICNNv2.While in SICNNv1 and SICNNv2 for every stage different sub-NNs are utilized to estimate posterior data symbol probabilites, in SICNNv1Red and SICNNv2Red every stage uses the same sub-NNs, which drastically reduces the number of parameters to be trained.
We compare the proposed NN-based equalizers with state-of-the-art model-based and NN-based data estimators concerning achieved bit error ratio (BER) performance, and regarding their computational complexity during inference.The evaluation is conducted for SC-FDE systems, either employing a UW or a CP as a guard interval, for both quadrature phase shift keying (QPSK) and 16-QAM (quadrature amplitude modulation) alphabets, and with perfect and imperfect channel knowledge at the receiver.We investigate the required amount of training data for achieving satisfying performance of selected NNbased equalizers, pointing out the advantage of reducing the number of learnable parameters of an NN.
Further, we demonstrate the universal applicability of SICNNv2 by presenting its achieved performance for a communication system employing the so-called UW-OFDM signaling scheme.
As another important contribution of this paper, we present a novel approach to generate training sets for NN-based equalizers.In this approach, only those sample data transmissions are included in the training set for which the number of data symbol estimation errors made by a baseline equalizer exceeds a specified quantity.This greatly enhances the performance of NN-based data estimators at high signal-to-noise ratios (SNRs).
The remainder of this paper is structured as follows.In Sec.II, we review the SC-FDE signaling scheme and data transmission model, and we discuss state-of-the-art model-based equalizers, including in particular the iterative SIC method our NN-based equalizers are inspired by.In Sec.III, the novel NN-based equalizers are introduced and discussed in detail.Further, we propose a novel approach for generating training datasets for NN-based data estimators in Sec.IV.We present BER performance results, and an in-depth analysis of the computational complexity of the regarded model-based and NN-based equalizers in Sec.V.

Notation
Throughout this paper, we use lower-case bold face letters x for vectors and upper-case bold face letters X for matrices, x k for the kth element of x, [X] kj for the element of X in row k and column j, and [X] k * for the kth row of X. Further, (.) T , (.) H , and (.) * indicate transposition, conjugate transposition, and conjugation respectively, while |X| is the determinant of the matrix X.We denote the probability density function (PDF) of a continuous random variable as p(.), the probability mass function (PMF) of a discrete random variable as p[.], a conditional PMF of a random variable a given b as p[a|b], and a PMF evaluated at the value ã as p[a = ã].We describe the expectation operator averaging over the PDF/PMF of a random variable a as E a [.], where the subscript of the expectation operator is omitted when the averaging PDF/PMF is clear from context.

II. PRELIMINARIES
In this section, we describe the system model for SC-FDE, and state-of-the-art model-based equalization approaches.Further, we discuss an iterative SIC approach for data estimation, and we highlight some properties of this method.

A. Single Carrier Frequency Domain Equalization
In an SC-FDE communication system [28], [29], [34], [35], a single carrier modulation scheme is employed for data transmission.At the transmitter, the data symbols to be transmitted, which are drawn from a modulation alphabet S (in this work, we mainly use QPSK as a modulation alphabet), are grouped into blocks of length N d .These blocks of data symbols, which we refer to as data vectors d ∈ S N d , are strung together to generate a transmit data burst, whereby they are separated by guard intervals of length N g .As a guard interval, in this work we consider using either a CP or a UW, which is a deterministic sequence known by the receiver.Depending on the employed guard interval, some processing steps in the receiver are different, which are described later in this section.The transmit data burst is upsampled and pulse shaped with a root-raised-cosine (RRC) filter, followed by transmitting the resulting signal over a multipath channel, which is additionally disturbed by additive white Gaussian noise (AWGN).At the receiver, the first processing step depends on the employed guard interval.While for a UW guard interval every received data vector including its succeeding guard interval is transformed individually to frequency domain, for a CP guard interval the CPs are removed first before transforming the remaining received data vectors individually to frequency domain.In frequency domain, the further processing steps matched filtering, downsampling, and equalization are conducted.Independent of the employed guard interval, the general model 2 of the transmission of a data vector up to the input of the equalizer in the equivalent complex baseband can be written as [36] y r = HF N ′ x + w . ( Here, y r ∈ C N ′ is the received vector after matched filtering and downsampling in frequency domain, where N ′ depends on the employed guard interval and is being specified later in this section.H ∈ R N ′ ×N ′ is a diagonal matrix containing the sampled frequency response of the cascade of upsampler, pulse shaping filter, multipath channel, matched filter, and downsampler on its main diagonal.Note, that H is a realvalued matrix since we conduct optimal matched filtering in frequency domain, i.e., the filter is matched to the channel distorted transmit pulse (for further details on optimal matched filtering in SC-FDE systems, we refer to [37], [38]).Furthermore, F N ′ ∈ C N ′ ×N ′ is the N ′ -point discrete Fourier transform (DFT) matrix and w ∼ CN (0, N ′ σ 2 n H) is circularly symmetric complex AWGN, with σ 2 n being the variance of the AWGN in time domain.The structure of the transmitted vector x ∈ C N ′ as well as the final system model differ for UW and CP guard intervals, which we further detail in the following.
1) Unique Word Guard Interval: As already mentioned, in case of a UW guard interval [34], [35], at the receiver both a received data vector and its succeeding UW are transformed to frequency domain for the further processing steps.Hence, the vector x ∈ C N ′ in (1) has the form x = [d T , u T ] T , where d ∈ S N d is the transmitted data vector to be estimated, u ∈ C Ng is the UW, and By assuming perfect channel knowledge on receiver side, the influence of the known UW u on the received vector y r can be removed according to with where M uw ∈ C N ×N d and M ′ ∈ C N ×Ng are built by the first N d columns and the remaining N g columns of F N , respectively.
2) Cyclic Prefix Guard Interval: In case of a CP guard interval [28], [29], the guard intervals are removed at the receiver before transforming the received blocks of data to frequency domain, which means that x in (1) is realized as the transmitted data vector d ∈ S N d , and N ′ = N d .Consequently, for a CP guard interval, the data transmission is modeled as 3) System Model for Single Carrier Frequency Domain Equalization: As elaborated above, the model for data transmission in an SC-FDE system is given by (3) for a UW guard interval and by (4) for a CP.For the ease of notation, in the remainder of this work the SC-FDE system model is given for both guard intervals by with y ∈ C N ′ , M ∈ C N ′ ×N d , and H = HM, where we employ for a • UW guard: 4) Model-Based Equalization: Based on (5), data estimation can be conducted for a given received vector y and a channel matrix H.As thoroughly elucidated in [39], depending on the optimality criterion, there exist different optimal equalizers.The bit-wise maximum a-posteriori (MAP) estimator yields for every transmitted bit the bit value featuring the highest posterior probability.It is known to be the optimal estimator regarding the bit error probability.The vector MAP, in turn, is optimal regarding the error probability of the data vector estimate.However, the computational complexity of both of the aforementioned estimators grows exponentially with the data vector length N d , which makes them in general prohibitive for practical applications.Hence, one usually has to resort to suboptimal linear or nonlinear estimation methods.
The best linear estimator in the Bayesian sense is the linear minimum mean square error (LMMSE) estimator, which is given by [40] with I and σ 2 d being the identity matrix with appropriate dimensions and the variance of the symbol alphabet, respectively.In case of a CP guard interval (M = F N d ), ( 6) can be simplified to [35] where 1 is a diagonal matrix, and thus also the inversion required to compute this estimator matrix can be realized efficiently.Typically, instead of multiplying by 1 N ′ M H , an inverse DFT is conducted.Note, that this low-complex equalizer can also be employed for a UW guard interval when applying LMMSE estimation to (2) instead of (3), i.e., the known UW u is not being removed before data estimation, but is estimated as well.Compared to the LMMSE estimator (6), this approximate3 LMMSE estimator allows a lower-complex equalization, however, at the cost of performance degradation [35].
A popular suboptimal nonlinear estimator is the decision feedback equalizer (DFE), which is an iterative method.There, in every iteration LMMSE estimation of the data symbol with the smallest error variance is conducted, followed by removing the influence of the hard decision data symbol estimate on the received vector.However, in case of wrong data symbol estimates, this method suffers from error propagation deteriorating the estimation performance.For more details, we refer to [39], where the DFE is elaborated for a so-called unique word orthogonal frequency division multiplexing (UW-OFDM) system.In the following, we address another suboptimal nonlinear method, namely iterative soft interference cancellation (SIC), in more detail, since the proposed NN-based equalizers are deduced from this modelbased approach.

B. Iterative Soft Interference Cancellation
The idea of the iterative SIC method proposed in [32] is to estimate each data symbol d k , k ∈ {0, ..., N d − 1}, in the data vector d separately, and refine the estimates from iteration to iteration.For the estimation of the kth data symbol d k , all other data symbols d l , l ̸ = k, are treated as interference, and thus their influence on the received vector y is cancelled as far as possible.Since the data symbols d l are unknown, their currently available estimates dl are utilized for interference cancellation.Interference cancellation reduces the unknown variable to be estimated from an N d -dimensional data vector d to a single data symbol d k , and thus nonlinear minimum mean square error (MMSE) estimation -which is generally computationally infeasible for estimating d -can easily be applied for estimating d k .In order to prevent error propagation, instead of using hard decision data symbol estimates for interference cancellation, soft information of every data symbol estimate from the previous iteration is utilized in form of the MMSE estimate, which is the posterior mean, and the corresponding conditional mean square error (MSE).The soft estimates are refined Algorithm 1 Model-based iterative SIC for SC-FDE.for q = 0, ..., Q − 1 do

4:
for k = 0, ..., N d − 1 do 5: Compute y (q) ic,k according to (11) 6: Compute vv,k following (12) for q = 0, or (14) for q > 0 7: Update soft information: d(q) k using ( 18) and e k via (19) 9: end for 10: end for 11: return d(Q−1) 12: end function iteratively.In [32], this approach is proposed for a MIMO system where all entries of the channel matrix H are independent of each other (all entries of the channel matrix are modeled as independent random variables following a normal distribution).This allows for simplifications in the iterative SIC method that cannot be applied in general.In the following, we present the iterative SIC method following the approach proposed in [32].However, we adapt this method, which is also summarized in Algorithm 1, for an SC-FDE system, where the assumption of independent elements of H is not fulfilled.
Let us regard the qth iteration, q = 0, ..., Q − 1, of Q total iterations of the iterative SIC method.We assume that for every data symbol d k , k ∈ {0, ..., N d − 1}, a soft estimate from the previous iteration (q − 1) is available, namely, the MMSE data symbol estimate, which is the posterior mean and the corresponding MSE given y Here, y is the received vector without the interference of all but the kth data symbol estimates in iteration (q − 1).For the estimation of a data symbol d k , one can reformulate the system model (5) to where Hk is H after removing the kth column h k , and dk is the data vector without the kth data symbol d k .The term Hk dk denotes the interference caused by all but the kth data symbol in the data vector, which should ideally be removed from y for the estimation of d k .SIC can be conducted by removing Hk d(q−1) k from y, leading to where d(q−1) k consists of all but the kth data symbol estimates from iteration (q−1), and δ(q−1) k − dk contains the (unknown) data symbol estimation errors from the previous iteration step.For estimating d k based on (11), the statistics of the total noise vector v (q) k , which is composed of the Gaussian noise vector w and the noise due to data symbol estimation errors, have to be specified.We start by considering the noise statistics for the first iteration (q = 0).As we will elaborate later in this section, initializing the data symbol estimates with the mean of the symbol alphabet is a rational choice, i.e., d(−1) ic,k = y and δ(0) k = − dk .Assuming independent and identically distributed (i.i.d.) data symbols with uniform prior probability and reasonably large N d , following central limit theorem arguments, Hk dk can be considered to follow a circularly symmetric complex Gaussian distribution with zero mean, and to be independent of w.Hence, v k approximately also follows a circularly symmetric complex Gaussian distribution with zero mean and a covariance matrix where Mk is the matrix M without the kth column.For all further iterations (q > 0), we start by specifying the type of the statistical distribution of the vector r k .Based on central limit theorem arguments and unbiased MMSE estimates (cf.Appendix A), r (q−1) k can be approximated to follow a circularly symmetric complex Gaussian distribution with zero mean, and thus the same assumption holds for v (q) k .As shown in Appendix A, the noise covariance matrix C (q) vv,k is given by where is the conditioned error covariance matrix.For the off-diagonal entries of E (q−1) k and for the third and the fourth term in (13) no exact closed form solution is available.A possible workaround is to employ the approximation where k+1 , ..., e (q−1) , correlations between the data symbol estimates as well as correlations between the estimation errors and the AWGN noise are neglected.However, this may lead to inaccuracies in the estimation process.Especially at high SNRs, for deep fading channels, and when q is increasing (then the e (q−1) k are usually becoming smaller), C (q) vv,k can become ill-conditioned, which is -in combination with the occurring approximation errors -an issue for computing its inverse required for the next steps in the estimation process.
For computing a data symbol estimate d(q) k , the posterior PMF p d k |y ic,k is needed, which can be obtained via the Bayesian rule by utilizing the likelihood function p y which is independent of d k , and a function depending on d k .With the above stated results at hand, the MMSE data symbol estimate d(q) k and its corresponding MSE e (q) k can then be computed as and where a uniform prior PMF p[d k ] is assumed.
Updating the data symbol soft estimates concludes one iteration.The succeeding iteration starts with conducting SIC following (11).
A quite interesting and not obvious result is verified in Appendix B, namely, that when using a zero vector as initialization for the estimated data symbol vector, after one iteration the iterative SIC exhibits the same hard decision bit error probability as the LMMSE estimator.Hence, we employ the zero vector, which is also the prior mean of the data vector (since we assume a symmetric modulation alphabet and uniformly distributed data symbol probabilities), as initialization of the iterative SIC method.

III. SOFT INTERFERENCE CANCELLATION INSPIRED NEURAL NETWORK EQUALIZERS
As already mentioned in Sec.II-B, the model-based SIC method suffers from the issue that the computation of the inverse noise covariance matrix C (q) −1 vv,k , also known as precision matrix, is computationally and numerically demanding while approximations have to be made in addition.With our proposed NN equalizers SICNNv1 and SICNNv2, whose layer structures are inspired by the iterative SIC method, we aim to overcome the weaknesses of the model-based SIC method.However, the SIC operation from the model-based method is preserved in the developed NNs, which is expected to help SICNNv1 and SICNNv2 to provide reliable soft estimates (required, e.g., to compute log-likelihood ratios in case of channel coded transmission), and allows to obtain interpretable intermediate quantities / variables inside the NN-based equalizers.While the structure of SICNNv1 is very similar to the model-based method and is specifically designed for SC-FDE communication systems, SICNNv2 is more general and can be applied for any communication system, where the system model can be formulated as in (5) with any matrix H.For both SICNNv1 and SICNNv2, we additionally present a version with a reduced number of learnable parameters, since we exploit the knowledge that every stage of SICNNv1/SICNNv2 has to provide estimates of the posterior data symbol probabilities, given the estimates of the previous stage and the received vector, and thus should work with the same set of learnable parameters.The parameterreduced versions are referred to as SICNNv1Red and SICNNv2Red.Besides, in this section we also briefly review a normalization scheme of the input data, which is required for a satisfying performance of NN-based data estimators in SC-FDE systems.

A. SICNNv1
The NN architecture is deduced by deep unfolding [27] the iterative SIC method described in Sec.II-B to Q stages of SICNNv1.That is, every iteration of the model-based SIC method (the outer loop of Algorithm 1), corresponds to one stage of SICNNv1.The steps conducted in one stage of SICNNv1, which is schematically shown in Fig. 1, are very similar to those of the model-based method described in Algorithm 1, however, the model-based computations in line 6 and line 7 of Algorithm 1, are accomplished using data-driven FCNNs.Let us describe the structure of stage q of SICNNv1, q ∈ {0, ...
Nd−1,Re p(q−1) Compute products specified at block outputs more detail, starting at its input.The inputs of stage q are the received vector y, the sampled frequency response diag( H), the noise variance σ 2 n , and the vectors p(q−1) , where S ′ = Re{S} = Im{S} (assuming a symmetric alphabet S).The elements p(q−1) k,Re,l and p(q−1) k,Im,l of the vectors p(q−1) k,Re and p(q−1) k,Im , respectively, are the estimates of stage (q − 1) for the data symbol posterior probabilities p Re{d k } = s l |y , respectively, where s l ∈ S ′ are the uniquely numbered symbols of S ′ , l ∈ {0, ..., |S ′ | − 1}.For the inputs of the first stage (q = 0), p(−1) , the estimated posterior probabilities are initialized uniformly, which represents the prior data symbol probability distribution.The values of the elements of p(q) k,Re and p(q) k,Im are updated in every stage following the procedure described below.Similar to the model-based method, in the first step ( 1 in Fig. 1) of the qth stage and with the estimated MSEs e (q−1) are computed, where e (q−1) k is utilized as a reliability measure of the corresponding data symbol estimate.Note that these quantities can be computed independently for every data symbol index k, and thus the blocks 1 are drawn isolated from each other in Fig. 1.
In a second step ( 2 in Fig. 1), model-based interference cancellation is carried out by computing using the data symbol estimates d(q−1) k .Instead of computing (an approximate of) the precision matrix C (q) −1 vv,k in a model-based fashion, we estimate it by utilizing fully-connected feedforward layers, which we also refer to as sub-NNs.A straightforward approach is to estimate the real and the imaginary part of the N 2 elements of C (q) −1 vv,k .We exploit two observations for both reducing the number of parameters to be estimated by a sub-NN and ensuring that the estimated precision matrix satisfies the properties implied by its definition.Firstly, the covariance matrix C (q) vv,k , and thus also its inverse, has to be a Hermitian, positive definite matrix.That is, C (q) −1 vv,k can be decomposed into the matrix product k is the matrix to be estimated by the sub-NNs.Secondly, our empirical investigations showed, that for the regarded SC-FDE communication system a precision matrix C (q) −1 vv,k exhibits significant non-zero values only on the major and the first few minor diagonals, and thus can be approximated as a band matrix.In the initial version of SICNNv1 described in [1] (where it is simply referred to as SICNN), we specify B (q) k to be a lower triangular matrix containing non-zero values only on the main diagonal and the first n md minor diagonals, where n md ∈ N 0 is a hyperparameter of SICNN, which tremendously reduces the number of non-zero elements of B (q) k to be estimated.The non-zero elements of the complex-valued matrix B (q) k are to be estimated by two separate sub-NNs, using one sub-NN for the real part and one for the imaginary part.As stated in [1], hyperparameter optimization turns out that the best equalization performance is achieved with n md = 0, i.e., the precision matrix is assumed to be a diagonal matrix.This insight motivates an adaption of the architecture of SICNN for the structure of SICNNv1.As depicted in Fig. 1, a single sub-NN FCNN 1 is employed to estimate the major diagonal of C (q) −1 vv,k .To ensure positive definiteness, the final estimates of the major diagonal of C (q) −1 vv,k are obtained by squaring the outputs of FCNN 1. FCNN 1 has n L,C hidden layers with n H,C neurons per hidden layer, ReLU activation functions, and a batch norm layer after the input layer.For determining the required inputs of FCNN 1, we reconsider the computation of C (q) vv,k in (13) and the quantities involved there.Besides the terms describing correlations between w and d(q−1) k , only terms consisting of σ 2 n , H, and Mk E (q−1) k MH k occur in (13).When replacing E (q−1) k by its approximation E (q−1) k , the latter term can be expressed as Mk E k , with m i as the ith column of M. A (q) k is a Hermitian Toeplitz matrix, consequently it is already fully described by its first row a where we exploit m i,0 = 1 for all i in the last step.The vectors a (q) H k , which are computed in block 3 , are concatenated in block 4 with σ 2 n and diag H to the input vector of FCNN 1 With a given estimated precision matrix Ĉ(q) −1 vv,k , the posterior PMF p d k |y ic,k should be estimated.Experiments, where the posterior PMF is computed as described in Sec.II-B in a model-based fashion did not lead to satisfying performance.We assume a major reason for this issue is that the estimate provided by FCNN1 is not precise enough to be treated as the exact precision matrix.Hence, we utilize another sub-NN (FCNN 2 in Fig. 1), which is trained (jointly with FCNN1) to estimate the posterior PMF p d k |y ic,k and can cope with inaccuracies in the estimated precision matrix.More specifically, the output of FCNN2 is the vector T (which is also the output of the qth SICNNv1 stage), containing estimates for the data symbol posterior probabilities, as introduced earlier in this section.To specify the required input quantities of FCNN 2 for estimating the posterior PMF p d k |y ic,k , let us consider its representation by applying the Bayesian rule and assuming a uniform prior data symbol probability, where f (q) k (.) is defined in (17).Due to the definition of f (q) k (.), the posterior PMF p d k |y ic,k , its complex conjugate, and h H k C (q) −1 vv,k h k .Therefore, the input vector of FCNN 2 is chosen to be where the elements of s k are computed in block 5 .FCNN 2 consists of n L,Pr fully-connected hidden layers with n H,Pr neurons per hidden layer, ReLU activation functions between each hidden layer, and a batch norm layer after the input layer.Further, two independent softmax functions are utilized as output activation functions to obtain p(q) k,Re and p(q) k,Im , which are concatenated to the stage output p(q) k .An investigation of the inputs of sub-NN FCNN 2 as defined in (26) reveals a large variation of the values of s (q) k , which depends on both the data symbol index k and the SICNNv1 stage index q.Hence, in the qth SICNNv1 stage we suggest to multiply y (q) ic,k and h k by ||y , which turns out to lead to a more robust training procedure.
In one stage, we use the same sub-NNs for estimating the precision matrix and the posterior data symbol probabilities for all N d data symbols to be estimated.However, different sub-NNs are utilized from stage to stage, i.e. their learnable parameters are in general different to those of the sub-NNs of the remaining stages, whereby the hyperparameters of the sub-NNs are the same for all stages.
We optimize the parameters of SICNNv1 by employing a custom loss function based on the cross entropy loss, which can be computed as More specifically, instead of utilizing only the final output of SICNNv1 for computing the loss value, the custom loss is based on Q partial cross entropy losses of all Q stage outputs, which are weighted by the corresponding stage index q.That is, the employed custom loss function is given by where P := {p is the collection of all stage outputs, w q = (q + 1)/ Q − 1 q=1 q is the weighting factor of the partial losses, and d oh,k,Re and d oh,k,Im are one-hot vectors corresponding to the real and imaginary part of a data symbol d k , respectively.This custom loss function is inspired by the loss function employed for training DetNet [18] and by the auxiliary classifiers of GoogLeNet [41], and should lead to a faster converging training.

p(q)
Nd−1,Re p(q) k Im{y k,Re , e Fig. 2. Schematic structure of one stage of SICNNv2.

B. SICNNv2
The first operations conducted in a stage of SICNNv2 are equivalent to those in a stage of SICNNv1.That is, given the estimated posterior probabilities p(q−1) k from the previous stage (q−1), the corresponding data symbol estimates d(q−1) k,Re/Im and estimated MSEs e k,Re/Im are computed according to ( 20) and ( 22), respectively.With the data symbol estimates d(q−1) k,Re/Im for all data symbols in the data vector at hand, interference cancellation is conducted as a next step according to (23) to obtain y (q) ic,k .However, as shown in Fig. 2, the remaining structure of a stage of SICNNv2 differs from the stage structure of SICNNv1.Specifically, while the further inference steps conducted in a stage of SICNNv1 to obtain the stage output are similar to the steps of the model-based algorithm, in an SICNNv2 stage an FCNN is employed for directly estimating the posterior data symbol probabilities using the input vector k,Re , e with a normalization factor ρ . The estimated posterior data symbol probabilites are contained in the output vector p(q) T of the FCNN, which is also the output of the qth stage of SICNNv2.The FCNN has n L hidden layers, n H neurons per hidden layer, a batch norm layer after the input layer and every third hidden layer, and ReLU activation.SICNNv2 is trained with the same loss function (27) as SICNNv1.
The architecture of SICNNv2 is solely based on the idea of SIC, and is more universal than that of SICNNv1 since no properties of an SC-FDE system are utilized.Hence, SICNNv2 can be employed for equalization in other communication systems like, e.g., general MIMO systems or UW-OFDM systems.However, for SC-FDE we expect a higher equalization complexity and maybe a worse BER performance of SICNNv2 compared to SICNNv1 as less model knowledge is incorporated.

C. Parameter Reduction: SICNNv1Red and SICNNv2Red
For reducing the number of parameters to be trained, we exploit the fact that in every stage of SICNNv1 and SICNNv2 the same task is fulfilled, namely, the posterior data symbol probabilities are to be estimated given the estimated posterior data symbol probabilities from the previous stage, the received vector, the channel matrix, and the noise variance.Hence, we employ the same sub-NNs in every stage of the NN-based equalizers, leading to the corresponding parameter-reduced versions SICNNv1Red and SICNNv2Red.These parameter-reduced NNs can also be viewed as a single stage where its output is fed back Q times.
This NN architecture distinctly reduces the number of parameters to be optimized, which reduces the computational effort for training, and is also supposed to lead to a more robust training procedure and a smaller amount of training data to be required.However, it turns out that the employed loss function for training the parameter-reduced NNs has to be slightly altered for obtaining a good performance.More specifically, the outputs of the stages with higher stage index q are given a higher importance by changing the weights w q of the loss function (27) for training SICNNv1Red and SICNNv2Red to where r is a hyperparameter, which we choose for SC-FDE systems to be r = 4.
IV. TRAINING SET GENERATION AND DATA NORMALIZATION In this section, we describe a novel approach for generating training sets for NN-based equalizers.For the regarded SC-FDE systems, this approach considerably improves the performance of NN-based equalizers at high SNRs.Further, we briefly describe a data normalization scheme specifically tailored for SC-FDE, which was already presented in [31].

A. Training Set Generation
The achieved BER performances of model-based and NN-based equalizers are generally evaluated in a specified E b /N 0 interval, where E b is the mean bit energy and N 0 is the noise power spectral density on receiver side, i.e., E b /N 0 is a measure for the SNR.To generate the training set for NN-based equalizers, typically sample data transmissions over channel realizations 4 drawn from a statistical channel model are conducted, where E b /N 0 for the data transmission is selected randomly with a uniform distribution within a specified range, short training SNR range.The upper and lower limits of the training SNR range are typically hyperparameters, which are to be selected carefully, since they have a significant influence on the performance of trained NN-based equalizers [39], [42], [43].Despite a careful selection of the training SNR range, the issue of "flattening out" BER curves can occur.That is, although NN-based equalizers perform well for a wide E b /N 0 range, at higher E b /N 0 values, and thus low BERs (of, e.g., 10 −5 or 10 −6 ), their BER curves do not fall as steeply as those of many model-based equalizers.Although this issue occurs for most NN-based equalizers (shown, e.g., in [31], [44]), interestingly, there are only very few works like [43], where proper training of NN-based equalizers is addressed.In this work, we propose a novel approach for the generation of training sets for NN-based equalizers.By training NN-based data estimators with these specifically generated training sets, the issue of flattening out BER curves at high SNRs can be mitigated significantly.
Typically, even low complex equalizers like the LMMSE equalizer achieve low BERs at high SNRs (as it can be seen, e.g., in Fig. 4, where the LMMSE estimator achieves a BER of 5•10 −5 at E b /N 0 = 14 dB).In other words, the decision boundaries of the baseline LMMSE equalizer and the optimal bit-wise MAP equalizer differ only slightly.That is, by randomly generating data transmissions for the training set at high SNRs, for those data transmissions even with the baseline LMMSE estimator only very few data symbol estimation errors occur.However, the NN-based equalizers are expected to approximate the optimal bitwise MAP estimator.Hence, for NN training, exactly those few received data symbols are of interest, where with the baseline LMMSE estimator a wrong estimate is obtained for the corresponding transmitted data symbol, while the optimal estimator still achieves a correct data symbol estimate.Since only a few of those important received data symbols are contained in the training set when generating the training data randomly, their influence on the training loss of the NN is small, leading to the aforementioned issue of flattening out BER curves.With the aforementioned observations in mind, we suggest the following method for generating the training set of NN-based equalizers for SC-FDE systems: instead of randomly selecting an SNR value within the SNR training range for the transmission of every data burst that is contained in the training set, we define an evenly spread grid on the SNR training range (on a linear scale).The number of grid points coincides with the number of channels over which data transmissions are to be conducted to generate the training set.For every SNR grid point, a channel realization is drawn from the assumed statistical channel model, and a burst of N burst data vectors d is transmitted over this channel.The corresponding received vectors y are equalized using a baseline LMMSE estimator.Instead of including all data vectors of the transmitted burst in the training set, only those are retained where the baseline equalizer produces at least N epd errors per data vector.Since particularly at higher SNRs the number of retained data vectors is generally far lower than N burst , another burst of data vectors is generated and transmitted over the same channel, again followed by keeping only the data vectors where at least N epd errors per data vector are produced by the baseline estimator.This procedure is repeated until N burst data vectors are found for the specific channel, which are then included in the training set.However, for flat channels, even with the baseline equalizer no or too few errors occur such that no data vectors are found which could be included in the training set.Therefore, a stopping criterion has to be introduced, where after N check burst generations the number of retained data vectors is checked.If the the number of retained data vectors is smaller than, e.g., 0.1N burst , the current channel realization is discarded.While keeping the SNR value corresponding to the specified SNR grid point, a new channel realization is drawn from the statistical channel model, and the same data vector selection process as described above is carried out.The parameters for the training set generation depend on the communication system setup for which the NN-based equalizers are trained, and thus they are specified in Sec.V-A individually for every setup.

B. Data Normalization
Normalizing the input data of NNs is generally considered to be very important for training convergence when optimizing their learnable parameters via backpropagation [45], [46].While in many current publications on NN-based data estimation in MIMO systems over uncorrelated Rayleigh fading channels no normalization of the NN input data is applied (cf., e.g., [17]- [19]), we showed in [47] and [39] that for a so-called UW-OFDM communication system a proper data normalization is of major importance for the performance of NN-based equalizers (for a visualization of the influence of data normalization on the performance of NN-based equalizers for UW-OFDM, we refer to [47]).With the same idea as for UW-OFDM in mind, namely to apply a normalization scheme leading to variances of the elements of the noise vector that are independent of the multipath channel, we implement a data normalization scheme for SC-FDE systems.This data normalization scheme for SC-FDE is elucidated in [31], and thus we repeat here only the result.To obtain channel-independent noise variances var(w i ), the system model (5) has to be multiplied by K = κ H − 1 /2 , where κ = tr{ H}/tr{ HMM H H} . ( The normalization of the input data of the NN-based equalizers is implemented by multiplying both y and H by K as part of pre-processing, and is neglected in the remainder of this paper for the sake of readability. V. RESULTS In this section, we investigate the proposed versions of SICNN thoroughly by means of simulations of data transmission in an indoor frequency selective environment.To demonstrate the wide applicability the proposed NN-based approaches, we evaluate them for a number of different SC-FDE communication system setups.We show simulation results for SC-FDE systems with both UW and CP guard intervals.Besides simulations with a QPSK modulation alphabet, also results for a 16-QAM alphabet are provided.Most of the simulations are conducted assuming perfect channel knowledge on receiver side.The robustness of the proposed NN-based data estimators in case of imperfect channel knowledge, is proven by simulating their performance for estimated channel impulse responses.Further, we highlight the performance improvements of NN-based equalizers when being trained on a training set generated with our proposed approach, presented in Sec.IV-A.We compare SICNNv1 and SICNNv2, and their corresponding parameter-reduced versions SICNNv1Red and SICNNv2Red, with state-of-the-art model-based and NNbased equalizers in terms of both their achieved BER performance over a specified SNR range and their computational complexity.More specifically, for comparison with model-based equalizers, we use the LMMSE estimator [35], the iterative DFE (implemented in the same way as described in [47] for UW-OFDM systems), and the iterative SIC method, where the approximation ( 14) is employed.We compare the proposed NNs with the state-of-the-art NN-based data estimators OAMP-Net2 [19] and DetNet [18], whereby we do not use DetNet as proposed in [18] for MIMO systems, but a better performing version that is adapted for SC-FDE systems [31].Moreover, we show the BER performance and computational complexity of KAFCNN from [31], which is an FCNN that is designed for equalization in SC-FDE systems by using a layer conducting an inverse DFT as a last layer, i.e., the knowledge that the data symbols being defined in time domain are to be estimated given a received vector in frequency domain is incorporated.
Moreover, for an SC-FDE system with a UW guard interval we present the influence of a limited training set size on the BER performance of selected NN-based equalizers to investigate the "data hunger" of an NN depending on its number of learnable parameters.
Finally, we also present performance results for SICNNv2 being employed as an equalizer in communication system utilizing the so-called UW-OFDM signaling scheme.With these results we want the highlight the wide applicability and the versatility of the proposed NN-based equalizers.

A. Simulation Setup and Neural Network Training
The shown simulation results are obtained by simulating data transmission without channel coding.Apart from Sec. V-D, all simulation settings, results, interpretations, and conclusions in this work are given for SC-FDE systems.The simulation setup and the results for the simulation of SICNNv2 as an equalizer in a UW-OFDM system are detailed in Sec.V-D.For SC-FDE communications with a UW guard interval, simulations are conducted with the SC-FDE system parameters N d = 20, N g = 12 (i.e., N = 32), RRC roll-off factor α = 0.25, a baseband sampling time T s = 52 ns.Further, unless noted otherwise, a QPSK modulation alphabet is employed, and perfect channel knowledge is assumed on receiver side.The parameters of the simulations of SC-FDE systems with a CP guard interval differ from those with a UW guard interval by the data vector length N d = 32, all other parameters are maintained.
The achieved BER performances of the different equalizers are plotted in a specified E b /N 0 interval.The presented BER performances for SC-FDE systems are averaged results over 7000 different multipath channel realizations, which are modeled as described in [48] in form of tapped delay lines with uniformly distributed phase, Rayleigh distributed magnitude, and an exponentially decaying power profile with a root mean square delay spread of τ RMS = 100 ns.The data transmission is conducted in form of data bursts containing 1000 blocks of payload data per burst, where the channel is assumed to be stationary for one burst and changes independently of its previous realizations for every burst.
For training the NN-based equalizers, we generate training sets with the proposed approach described in Sec.IV-A.The selected parameters of the training set generation method N epd and N burst , the training  E b /N 0 range, as well as the employed baseline equalizer for selecting the data vectors for the training set are summarized for all SC-FDE system setups in Tab.I. Unless stated otherwise, every training set consists of data transmissions over 30000 different channels.For training of all NNs early stopping is used.That is, the BER performance on a validation set is evaluated after every epoch, and the set of learnable NN parameters achieving the best validation performance is chosen after training for a pre-defined maximum number of epochs.
The hyperparameters of the NN-based equalizers are found using the hyperparameter optimization framework Optuna [49].For SICNNv1, the best hyperparameter settings found are given in Tab.II, and we train it for 25 epochs at most.For SICNNv1Red, the same hyperparameters as for SICNNv1 are used apart from a learning rate η SICNNv1Red .The hyperparameters of SICNNv2 are also given in Tab.II, and we train it for a maximum of 25 epochs.The hyerparameters of SICNNv2Red differ from those of SICNNv2 only by η SICNNv2Red .For DetNet, OAMP-Net2, and KAFCNN the best hyperparameter found are shown in Tab.III.Moreover, DetNet and KAFCNN are trained for 60 epochs at most, and OAMP-Net2 for a maximum of 15 epochs.

B. Bit Error Ratio Performance for SC-FDE
1) Unique Word Guard Interval, QPSK: We start with investigations on an SC-FDE system with UW guard interval and QPSK modulation alphabet.First, we investigate the influence of the number of iterations Q of the model-based iterative SIC method as well as the number of stages Q of SICNNv1 on the achieved BER performance to highlight similarities and differences between the model-based and the NN-based approach.As shown in Fig. 3, our simulation result validates the proof of the equivalence of the bit error probabilities of the LMMSE hard decision estimates and the estimates of the iterative SIC method after one iteration.Moreover, the BER performance of the iterative SIC method considerably improves when conducting a second iteration (Q = 2), outperforming the DFE over a wide E b /N 0 range.For Q ≥ 3, two interesting effects are visible.Firstly, the BER performance flattens out at higher E b /N 0 values.Secondly, although more iterations lead to an improvement of the BER performance at lower DFE LMMSE Fig. 3. BER performance of the iterative SIC method and SICNNv1 for different numbers of iterations / stages Q (SC-FDE with UW guard, QPSK).E b /N 0 values (which is, however, rather small), at higher E b /N 0 values the performance even slightly degrades the more iterations are conducted, which can be explained by the error caused by approximating the covariance matrix C (q) vv,k .For SICNNv1, more stages than iterations of the model-based method are required to obtain good BER performance, however, for Q = 7 stages, the iterative SIC method is considerably outperformed by SICNNv1.
Next, we compare the proposed NN-based data estimators SICNNv1, SICNNv2, SICNNv1Red and SICNNv2Red with the aforementioned state-of-the-art model-based and NN-based equalizers in terms of achieved BER performance.Their training and hyperparameter optimization is conducted with the same training set as used for the proposed NN-based equalizers.As shown in Fig. 4, SICNNv1 is the best performing equalizer for a wide E b /N 0 range, followed by SICNNv2 and OAMP-Net2.The parameter-reduced variant SICNNv1Red exhibits approximately the same performance as DetNet.All  of the aforementioned NN-based equalizers can outperform or perform similarly as the model-based equalizers considered for comparison.SICNNv2Red, in turn, is the worst performing among the proposed NN-based equalizers, but still has a far better BER performance than KAFCNN.From this simulation result we can conclude that using the same sub-NNs in all stages of the proposed NN-based equalizers leads to a reduction of the number of learnable parameters at the cost of a performance decrease.However, since fewer parameters have to be optimized, the reduction of learnable parameters decreases the computational effort for training the NNs.
For an SC-FDE system with a UW guard interval and QPSK modulation alphabet, we also show  the influence of our proposed approach for training set generation, described in Sec.IV-A, on the BER performance of trained NN-based equalizers.Exemplary for SICNNv1, SICNNv2, and DetNet, we compare their performance when being trained on a dataset generated with our approach with the case that they are trained with randomly generated training data.For the randomly generated training data, the E b /N 0 values of the sample transmissions contained in the training set are chosen randomly (with uniform distribution on the linear E b /N 0 scale) in the range [3 dB, 14 dB], which is a state-of-the-art approach for the training of NN-based equalizers.As shown in Fig. 5, particularly at high E b /N 0 values, the performance of the aforementioned NN-based equalizers can be significantly improved by training them on a training set generated by our proposed method.As elucidated in Sec.IV-A, we assume that the main reason for this performance improvement is, that at high SNRs distinctly more training samples lying close to the decision boundaries of the optimal equalizer are available, allowing the NNs to approximate the optimal decision boundaries.For all following results, the NNs are trained with datasets that are generated with our proposed method.
2) Unique Word Guard Interval, 16-QAM: To show that the proposed NN-based equalizers can also cope with higher order modulation alphabets, we present BER performance results for an SC-FDE system with a UW guard interval and 16-QAM modulation alphabet.As shown in Fig. 6, SICNNv1 is also the best performing among all considered equalizers for this system setup.Similar as for a QPSK modulation alphabet, the best performing equalizer behind SICNNv1 are SICNNv2 and OAMP-Net, outperforming the model-based DFE in lower E b /N 0 regions and performing similarly in higher E b /N 0 regions.SICNNv1Red, SICNNv2Red, DetNet, and KAFCNN, in turn, exhibit a significantly worse performance, where KAFCNN is the worst performing among all considered NN-based equalizers.
3) Imperfect Channel Knowledge: We investigate the influence of imperfect channel knowledge on the performance of NN-based and model-based equalizers.To this end, the channel, which is assumed to be stationary for one transmitted data burst, is estimated as described in [50] using a known preamble.This preamble is transmitted prior to the data burst and contains two identical pilot vectors x p .Based on the two corresponding received pilot vectors y p , the channel frequency response is estimated with the best linear unbiased estimator (BLUE) [40].For further details on the estimation of the channel frequency response, we refer to [50].
For this evaluation, the NN-based equalizers are trained in the same manner as for perfect channel knowledge, however, as an input a channel matrix is employed which is computed using the estimated channel frequency response.As shown in Fig. 7 for SICNNv1, SICNNv2, and OAMP-Net2, an imperfect   channel knowledge has a similar influence on the BER performance as for model-based equalizers, demonstrating their robustness in terms of imperfect channel knowledge at the receiver.
4) Cyclic Prefix Guard Interval, QPSK: For all the previously shown results we have employed a UW as a guard interval.In this section, we investigate the influence of using a CP as a guard interval on the performance of the regarded NN-based and model-based equalizers.As presented in Fig. 8, SICNNv1 is also the best performing equalizer for this system setup, where its performance is very similar to that of OAMP-Net2.SICNNv2 and SICNNv1Red exhibit a very similar BER performance and can clearly outperform DetNet, which achieves similar BER results as SICNNv2Red.The worst performing NN-based equalizer is KAFCNN, still outperforming the model-based DFE.The LMMSE equalizer performs worst, however, as mentioned in Sec.II-A, stands out due to its very low complexity for SC-FDE communications with CP guard intervals.

C. Influence of a Reduced Training Set Size
In this section, we investigate the influence of a limited training set size on the BER performance of selected NN-based data estimators.That is, while for the BER performance results shown in Sec.V-B the NN-based equalizers are trained utilizing sample data transmissions over 30 000 different multipath channels, the regarded NNs are trained with training sets consisting of 10 000 or 3 000 channels.We evaluate the influence of a reduced training set size on the BER performance of selected NN-based equalizers for an SC-FDE system with UW-guard interval and QPSK modulation alphabet.The hyperparameters of the NN-based equalizers are the same as stated in Sec.V-A, apart from the number of training epochs which is adapted appropriately such that the number of update steps of the learnable NN parameters remains the same for all training set sizes.
We regard SICNNv1, its parameter-reduced variant SICNNv1Red, the OAMP-Net2, and the KAFCNN for performance comparison, whereby these NNs have for the chosen hyperparameter settings 135 590, 19 370, 32, and 746 628 learnable parameters, respectively.As shown in Fig. 9, the performance of SICNNv1 and SICNNv1red slightly degrades in case of a reduced training set size of 10 000 or 3 000 channels.The BER performance of OAMP-Net2 barely changes when reducing the training set size, while the performance of KAFCNN decreases most and is even worse than that of the LMMSE estimator when being trained with 3 000 different channels.That is, the fewer learnable parameters an NN contains, Fig. 8. BER performance of NN-based and model-based equalizers for SC-FDE with a CP guard interval and QPSK alphabet.the less it suffers from a limited training set size.This result emphasizes the importance of parameter reduction, e.g., by incorporating model knowledge into the layer architecture of an NN.

D. Bit Error Ratio Performance of SICNNv2 for UW-OFDM
As mentioned in Sec.III-B, the layer architecture of SICNNv2 is inferred by deep unfolding iterative SIC, but no properties of any specific communication system are exploited.Hence, we expect SICNNv2 to be universally applicable for any communication system with system models similar to the system OAMP-Net2 SICNNv2 Fig. 10.BER performance comparison for UW-OFDM (system I from [39]).model ( 5) of SC-FDE systems.To demonstrate this claim, we apply SICNNv2 as an equalizer in a UW-OFDM system [51]- [53].The data transmission in UW-OFDM systems can be modeled as [39], [51], [52] where d ∈ S N d is the transmitted data vector of length N d to be estimated, y ∈ C N d +Nu the received vector at the input of the equalizer, and N u the length of the UW guard interval.Further, H ∈ C (N d +Nu)×(N d +Nu) denotes a diagonal matrix containing the sampled channel frequency response of the channel (excluding at positions of OFDM zero-subcarriers) on the main diagonal, G ∈ C (N d +Nu)×N d the so-called generator matrix, which is a full, rectangular matrix, and w ∼ N C(0, (N d + N u )σ 2 n I), where σ 2 n is the variance of AWGN in time domain.For further details on UW-OFDM we refer to [51], [52].That is, the models of UW-OFDM and SC-FDE systems are very similar, allowing to employ SICNNv2 unaltered for UW-OFDM, apart from the used data normalization, which is described in [39].We train (exactly in the same manner as all other state-of-the-art NNs used for comparison) and evaluate SICNNv2 for a UW-OFDM system referred to as system I in [39], where N d = 8, N u = 4, and the modulation alphabet is QPSK.The best hyperparameter combination found is a learning rate η = 5 • 10 −4 , Q = 6 stages, n L = 2 hidden layers of the sub-NNs, and n H = 200 neurons per hidden layer of the sub-NNs.SICNNv2 is compared to the state-of-the-art NN-based equalizers OAMP-Net2 [19], RE-MIMO [17], DetNet [18], and an improved version of DetNet, that employs a preconditioner in its layers [39].Due to the small dimension of system I, even the optimal BER performance can be computed by applying the bit-wise MAP estimator.For all further details on the simulation setup, on the NNs used for comparison, or their training, we refer to [39].As shown in Fig. 10, SICNNv2 can outperform the NN-based equalizers OAMP-Net2, DetNet in its original form [18], and performs similar as RE-MIMO.Further, its performance is very close to that of the improved version of DetNet and to the optimal BER performance achieved by the bit-wise MAP estimator.This result demonstrates the applicability of the proposed SICNN idea for different communication systems.

E. Computational Complexity
Besides the BER performance of the regarded equalizers, also their computational complexity is an important aspect.We compare the inference complexity of the model-based and the NN-based equalizers regarded in this work in terms of the number of real-valued multiplications required for the equalization of a received vector, where four real-valued multiplications and two real-valued multiplications are accounted for a product of two complex values and a real and a complex value, respectively, and divisions are counted as multiplications.Since NN training can be carried out offline, we do not regard their training complexity.We assume that both H and H = HM are already available and thus the complexity of computing H is not considered for the following complexity analysis.Unless stated otherwise, we derive the computational complexities for a general matrix M and a length N ′ of the received vector, where, as described in Sec.II-A3, both have be replaced by the appropriate quantities M uw and N , or F N d and N d , when using a UW or a CP as a guard interval, respectively.
We start by investigating the complexity of the proposed SICNNv1.Let us first consider the operations conducted in a single stage q for estimating the data symbol d k .The computation of k , y real-valued multiplications, and the inference of FCNN 1 has a complexity of Squaring the outputs of FCNN 1 takes another N ′ multiplications.For the terms h vv,k is computed first and multiplied by y ic,k and h k subsequently, leading for these two terms in total to a complexity of The inference of FCNN 2 and the normalization of y (q) ic,k and h k require another 6 + 3n H,pr + (n L,pr − 1)n 2 H,pr + 2n H,pr |S ′ | and 8N ′ + 1 multiplications, respectively.Consequently, for estimating a single data symbol in a stage, in total For data normalization, another real-valued multiplications are required, leading to a total complexity of SICNNv1 with M SICNNv1,kq as specified in (31).For SICNNv2, the approach for deriving its complexity is similar.As already stated for SICNNv1, the total complexity for computing k,Re , e k,Im , and y For computing the scaling factor ρ (q) k , scaling the quantities contained in z k (and squaring ρ (q) k ), and the inference of the FCNN, another Consequently, the complexity of SICNNv2 follows to with The complexity of DetNet can be derived similarly as for UW-OFDM, which has been conducted in [39].For all operations conducted for SC-FDE specific pre-processing and for the inference of DetNet, we refer to [31].Here, we only state the final result for the inference complexity of DetNet.In the qth DetNet layer inference of a single hidden layer FCNN, one-hot demapping of the data vector estimate in the qth layer, and applying weighted residual connections are conducted, which entails real-valued multiplications, where d h is the number of neurons in the hidden layer of the FCNN, and d v is the dimension of an auxiliary variable passing unconstrained information from DetNet layer to DetNet layer [18], [31], [39].In total, the inference complexity of DetNet is where the subtracted term accounts for no one-hot demapping in the last DetNet layer, L is the number of DetNet layers, and for the input pre-processing the quantities M H H 1 /2 M and M H H − 1 /2 y are computed (cf.[31]).
The KAFCNN [31] with weighted residual connections and a multiplication by a partial inverse DFT matrix in the last layer requires The OAMP-Net2 layer structure [19] also stems from unfolding an iterative model-based algorithm, such that determining its inference complexity is conducted in a similar fashion as for DetNet or SICNNv1/SICNNv2.For the matrix inverse that has to be computed in every layer of OAMP-Net2, we assume that a Cholesky decomposition [54] is employed for accomplishing this task.The computational complexity of OAMP-Net2 can thus be specified as (using the notation from [19]) Let us now consider the complexity of the model-based equalizers.For the LMMSE estimator, one has to distinguish between UW and CP guard intervals.We start by regarding its complexity in case of a UW guard interval.Here, we first regard its complexity for determining the LMMSE estimator matrix E LMMSE , which is independent of the received vector y and thus has to be computed only once per data burst (the channel is assumed to be stationary for the whole data burst).By assuming that the inverse in ( 6) is computed utilizing a Cholesky decomposition [54], real-valued multiplications are to be carried out for computing E LMMSE .Given E LMMSE , the complexity of equalizing one received vector is In case of a CP guard interval, equalization becomes for the LMMSE estimator far less complex.As given in (7), the matrix for which an inverse has to be computed is a diagonal matrix.Hence, obtaining the estimator matrix E LMMSE,dg requires 4N d real-valued multiplications, which is already the number of multiplications that have to be carried out per burst For equalization of a single received data vector, a multiplication with the diagonal estimator matrix E LMMSE,dg is required, followed by conducting an inverse DFT, leading to a complexity of real-valued multiplications.
For the DFE, using a CP guard interval does not reduce the complexity as for the LMMSE, since in every iteration the LMMSE error variances (which are the diagonal elements of the LMMSE error covariance matrix) have to be computed.We distinguish between operations to be carried out only once every data burst and those to be accomplished for every received vector.For a derivation of the complexity of the DFE we refer to [39], where an in-depth complexity analysis of the DFE for a similar system, namely a UW-OFDM system is conducted.Here, we only state the final results.The number of real-valued multiplications to be conducted once per data burst is while the inference complexity per received vector is given by For the iterative SIC method, the same steps have to be conducted Q times, where -as for the LMMSE estimator -a Cholesky decomposition is utilized for inverting the (approximated) covariance matrices C (q) vv,k in every iteration.We assume that the estimated data symbols d(q−1) k are multiplied by the corresponding column of H only once per iteration, being available for interference cancellation required for estimating any data symbol d k .Hence, the computational complexity of the iterative SIC method with Q iterations is The numerical results of the complexities of the equalizers for the SC-FDE system setup specified in Sec.V-A are given in Tab.IV.For all system setups considered, DetNet and KAFCNN are the lowest complex NN-based equalizers.The inference complexity of SICNNv1 is distinctly lower than that of SICNNv2 and OAMP-Net2, and also than the model-based iterative SIC method it is deduced from.However, the LMMSE estimator and the DFE exhibit by far the lowest complexity.

VI. CONCLUSION
In this work, we proposed novel NN-based equalizers, called SICNNv1 and SICNNv2, inspired by a model-based soft interference cancellation scheme.SICNNv1 is tailored for an SC-FDE communication system, while SICNNv2 is also applicable for other communication systems with block-based data transmission.In addition, we presented a novel approach for generating training sets for NN-based equalizers, which considerably helps to improve their performance at high SNRs.We evaluated the proposed NNbased equalizers for a number of different SC-FDE system setups, and investigated their robustness with respect to imperfect channel knowledge at the receiver.In particular SICNNv1 exhibits a superior BER performance over all regarded state-of-the-art model-based and NN-based equalization approaches for all SC-FDE system setups considered.To highlight the universal applicability of SICNNv2, we exemplarily presented its state-of-the-art performance for a UW-OFDM system.Further, we investigated the influence of the size of the dataset used to train the NN-based equalizers, and we presented an in-depth complexity analysis.

APPENDIX A NOISE STATISTICS IN A SOFT INTERFERENCE CANCELLATION STEP
Let us consider the system model (11) for estimating the kth data symbol d k , k ∈ {0, ..., N d − 1}, for any but the first iteration (q = 0), i.e., 0 < q < Q, which we repeat here for readability.Following central limit theorem arguments, the total noise v (q) k is assumed to feature a multivariate Gaussian distribution, i.e., p(v vv,k are to be specified.We start by computing the statistics of δ(q−1) k , followed by those of r k to obtain finally the distribution of v (q) k .It is important to note, that for computing noise statistics in iteration q, the estimation errors from the previous iteration (q−1) have to be specified, whereby the interference canceled vectors y (q−1) ic,k are fixed and available, i.e., the PDFs/PMFs of v k , and δ(q−1) k are not unconditional, but are conditioned on a given y (q−1) ic,k .A data symbol estimation error δ is given, can only attain a finite number of different values (as many as the cardinality |S| of the symbol alphabet), and thus its statistics is described by a PMF p . In order to compute the statistics of an estimation error we start by reconsidering the MMSE estimate of the preceding iteration Since we can reformulate p d k y , where we consider d(q−2) k to be a fixed vector in iteration (q − 1), which does not feature a statistical distribution.It can be observed from the system model for iteration step (q − 1) respectively.Hence, the conditional PMF of an estimation error δ k can be written as ic,k δ (q−1) k y (q−1) ic,k = p d k |y, d(q−2) δ (q−1) k y, d(q−2) , and the PMF of δ(q−2) as p dk |y, d(q−2) δ(q−1) k y, d(q−2) .
Following central limit theorem arguments, we assume the distribution of r (q) k to be multivariate Gaussian.The mean of r Its covariance matrix, in turn, is given by rr,k = E r (q) k r (q) k r (q) H k = Hk E dk |y, d(q−2) δ(q−1) k δ(q−1) H k y, d(q−2) For computing an element in the mth row and the nth column of E = E d i |y, d(q−2) δ (q−1) i δ (q−1) i * y, d(q−2) = e (q−1) i .
With the above results at hand, v k can be specified to be a zero mean Gaussian distributed vector with a covariance matrix k w H − wr rr,k − Hk E dk |y, d(q−2) ,w δ(q−1)

APPENDIX B ESTIMATES OF ITERATIVE SOFT INTERFERENCE CANCELLATION METHOD AFTER FIRST ITERATION
We show for a QPSK modulation alphabet that the bit error probability of a hard decision estimate produced by the iterative SIC method described in Sec.II-B after the first iteration is equivalent to the bit error probability of an LMMSE hard decision estimate when initializing all data symbol estimates of the iterative SIC method with 0, i.
where w ∼ CN (0, N σ 2 n H), and the corresponding LMMSE data estimator follows to which is expressed in a different way as in (6), but can be shown to be mathematically equivalent.Based on the LMMSE estimates dk , the hard decision estimate for bit b 0k is are the sets of data symbols containing a bit with value 0 and 1 at the 0th position, respectively.For determining the PDF p(y|d k = s ′ ), we reformulate (55) as y = Hk dk + h k d k + w . (60) Due to central limit theorem arguments, Hk dk can be assumed to be Gaussian distributed, and thus p(y|d k ) is approximated to be a multivariate complex Gaussian PDF, i. with and C k as defined in (63).
Let us now consider the iterative SIC method, starting with the system model in its first iteration (q = 0), which is given by where v

Fig. 4 .
Fig. 4. BER performance of NN-based and model-based equalizers for SC-FDE with a UW guard interval and QPSK alphabet.

Fig. 5 .
Fig. 5. BER performance comparison of NN-based equalizers for SC-FDE with a UW guard interval and QPSK alphabet, when being trained with randomly generated training data (rand.tr.data), or on a training set generated by the proposed approach.

Fig. 6 .
Fig. 6.BER performance of NN-based and model-based equalizers for SC-FDE with a UW guard interval and 16-QAM alphabet.

Fig. 7 .
Fig. 7. BER performance of NN-based and model-based equalizers for SC-FDE with a UW guard interval, QPSK alphabet, and with perfect and imperfect channel knowledge.
30k train set size 10k train set size 3k
k ∈ {0, ..., N d − 1}.To this end, we show that the decision criteria of both estimation methods for the jth bit b jk of the kth data symbol, k ∈ {0, ..., N d − 1}, j ∈ {0, ..., log 2 (|S|) − 1}, being 0 or 1 are the same, and thus also their bit error probability must coincide.The QPSK bit-to-symbol mapping (b 1k b 0k ) → d k is assumed to map b 0k to the real part and b 1k to the imaginary part of d k .The bit values 0 and 1 are mapped to the symbol values −ρ and ρ, respectively, with ρ = 1/ √ 2 being an energy normalization factor.Let us start with the LMMSE data estimator.The SC-FDE system model is given by (cf.(5)) y = Hd + w ,

1
dk + w is assumed to be zero mean Gaussian noise with a covariance matrix C k + N σ 2 n H.The MMSE estimate for data symbol d k is the mean of the posterior PMF E d k |y [d k |y] for the model (66), which is given by (cf.(18))dk = s ′ ∈S s ′ p y d k = s ′ s ′ ∈S p y d k = s ′ (67)Considering the QPSK bit-to-symbol mapping defined above, the MMSE hard decision estimate for bitb 0k is b0k = Re{ dk } > 0 0 otherwise .(68)Since the denominator of the MMSE estimate given in (67) is always positive, the MMSE hard decision estimate b0k is estimated to be 1 if0 < Re s ′ ∈S s ′ p y d k = s ′ = s ′ ∈S Re{s ′ }p y d k = s ′ = s ′ ∈S (+) Re{s ′ }p y d k = s ′ + s ′ ∈S (-) Re{s ′ }p y d k = s ′ = ρ s ′ ∈S (+) p y d k = s ′ − ρ s ′ ∈S (-) p y d k = s ′ ,

TABLE I PARAMETERS
OF THE TRAINING SET GENERATION APPROACH FOR DIFFERENT SC-FDE SYSTEM SETTINGS.

TABLE II HYPERPARAMETER
SETTINGS OF THE PROPOSED NN EQUALIZERS.THE HYPERPARAMETERS OF THE PARAMETER-REDUCED NNS SICNNV1RED AND SICNNV2RED DIFFER FROM THE HYPERPARAMETERS OF SICNNV1 AND SICNNV2 ONLY BY THE LEARNING RATE ηSICNNV1RED AND ηSICNNV2RED, RESPECTIVELY.

TABLE III HYPERPARAMETER
[18]INGS OF THE STATE-OF-THE-ART NN EQUALIZERS USED FOR COMPARISON.SIMILAR TO THE ORIGINAL PUBLICATION[18], FOR DETNET η DENOTES THE LEARNING RATE, L THE NUMBER OF LAYERS, dH THE NUMBER OF HIDDEN NEURONS IN THE SINGLE-HIDDEN-LAYER FCNN, dV THE DIMENSION OF THE AUXILIARY VARIABLE PASSING UNCONSTRAINED INFORMATION THOURGH THE NETWORK, AND β THE RESIDUAL WEIGHTING FACTOR.FOR OAMP-NET2 [19], η IS THE LEARNING RATE, AND T IS THE NUMBER OF LAYERS.FOR KAFCNN [31], η IS THE LEARNING RATE, L THE NUMBER OF LAYERS, dH THE NUMBER OF NEURONS PER HIDDEN LAYER, AND β THE RESIDUAL WEIGHTING FACTOR.

TABLE IV NUMBER
OF REQUIRED REAL-VALUED MULTIPLICATIONS (ROUNDED TO HUNDREDS) OF EVALUATED EQUALIZERS FOR DIFFERENT SC-FDE SYSTEM SETUPS.