Iterative-Detection-Aided Tomlinson-Harashima Precoding for Faster-Than-Nyquist Signaling

Faster-than-Nyquist(FTN) is a promising technique due to higher spectral efficiency, but at a cost of introducing the inter-symbol interference (ISI) which usually requires the computationally demanding detection algorithms. In order to reduce the detection complexity at the receiver, in this paper, a novel Tomlinson-Harashima precoding (THP) scheme is proposed for FTN system. In the conventional THP (CTHP) scheme, the ISI introduced by time-packing can only be diminished from several preceding information symbols, which results in large residual ISI and an unattractive bit-error-rate (BER) performance. Moreover, due to the suboptimal receiver – a simple modulo operation which ignores the correlation between received symbols, CTHP faces obvious capacity loss and is ineffective to execute interference cancellation for FTNS. In this paper, residual ISI of FTNS in CTHP is formulated in detail and modeled mathematically. Further more, an improved THP method with an optimized receiver based on the soft interference cancelation (SIC) algorithm and iterative turbo process is proposed to combat residual ISI for FTNS. Computational complexity analysis and numerical simulation results show that the proposed scheme not only has inexpensive computational cost but also greatly outperforms CTHP and other cited schemes. Moreover, for moderate time-packing, it can approach the ISI-free BER performance boundary and is also competitive to the MAP equalization technique.


I. INTRODUCTION
In recent years, to meet the ever increasing demand of high spectral efficiency, faster-than-Nyquist (FTN) technique has received considerable attention and has been regarded as one of the candidate modulation schemes for 5G communications [1]- [5]. It also attracts more and more attention in the fields of satellite communication and optical network systems [6]- [8] to enhance spectral efficiency and anti-nonlinear-effect capability instead of high order modulations in the channel with nonlinear effects and many scholars have focused on the study of FTN signals [9]- [17]. In faster-than-Nyquist signaling (FTNS), information symbols are transmitted at a rate higher than that suggested by the Nyquist criterion and intersymbol interference (ISI) is introduced. Hence proper equalization scheme is required for the The associate editor coordinating the review of this manuscript and approving it for publication was Jose Saldana . FTN signal reception. A number of demodulating methods have been proposed for the FTN receiver [18]- [27], but they are still suffering a high computational complexity. On the other hand, the performance of some low-complexity equalization methods is usually not efficient especially when the ISI due to FTN is severe.
Recently, considerable attention has been drawn to precoding techniques [28]- [32] since it can effectively relieve the computational burden on equalizer at the receiver. Instead of using highly complex equalization at the receiver, precoding techniques preprocess signals at the transmitter and hence can reduce computational complexity greatly. The ISI of FTNS is deterministic and completely known in advance at the transmitter because of deliberately introduced by time packing, so precoding methods can be employed without any feedback of channel state information (CSI) from the receiver. By using the properties of ISI tap coefficients matrix of FTNS, a series of matrix-based precoding techniques [33]- [35], such as the singular value decomposition (SVD) precoding, GMD precoding, GTMH-precoding and Cholesky precoding were studied. These precoding schemes based on matrix factorization can be seen as forms of zero-forcing equalization and Cholesky precoding method has best performance on ISI cancellation. However, owing to large dimension of ISI tap coefficients matrix in FTNS, computational complexity of these matrix-based precoding schemes is very high. To reduce the complexity of ISI cancellation strategy based on matrix computation, [22] introduced a bidirectional interference cancellation technique (BICT) which was independent of matrix factorization and achieved similar performance as Cholescky precoding with lower complexity.
Beyond that, non-linear precoding methods are also studied in recent years, such as the well-known Tomlinson-Harashima precoding (THP) method [37], [38] with lower complexity. It is an effective precoding scheme to combat ISI and also has been studied for FTNS in the fields of 5G mobile communications [3], microwave backhaul links [39] and optical communications [40], [41]. However, conventional THP (CTHP) method usually uses a simple modulo operation at the receiver, which is suboptimal because it causes bit-errorrate(BER) performance deterioration in the form of so-called ''modulo-loss''. M. Jane and A. Medra [42] introduced two novel demapping algorithms for an FTN-THP system to compensate the modulo-loss to achieve better BER performance. It is manifested that only at the scenario with very low ISI, good BER performance can be achieved, while in a relatively high ISI scenario, the presented methods performed poorly. This is mainly because they ignored the correlation between the received symbols and the difference between the signal model of CTHP and that of FTNS.
In this paper, we propose an optimized TH precoding method, in which, a soft interference cancellation (SIC) [36] algorithm is used to replace the original deprecoder and iterative turbo equalization is also employed to exchange the soft information at the receiver. Firstly, TH-precoded signal form in FTN system is modeled and formulated and it's found that there's large residual ISI in FTNS after using TH precoder at the transmitter. Then, residual ISI is modeled as a Gaussian process and removed with SIC equalization algorithm and iterative turbo equalization. Through several iterations, residual ISI is further cancelled and better performance is achieved. The improved THP method with the optimized receiver is called iterative-detection-aided THP (IDA-THP). We summarize the main contributions of this paper in the following: (1) We formulate and analyze TH-precoded signal form in FTN system and it's found that there's large residual ISI in FTNS after using TH precoder at the transmitter.
(2) We model residual ISI as a Gaussian process and deduce its mathematical expression. (3) In order to remove residual ISI effectively, we propose an equalization method based on SIC principle and iterative turbo structure to replace the original modulo operation at the receiver. (4) We analyze the computational complexity of the proposed IDA-THP method and simulate the BER performance. Analysis results show that for comprehensive consideration of BER performance and computational complexity, the IDA-THP maybe a better choice than other detection schemes in practical applications of FTNS.
Different from the previous work, the main differences are summarized as: (1) There's large residual ISI in FTNS after TH precoded. In this paper, we formulate and analyze residual ISI in detail and deduce its mathematical expression, which is never discussed in previous literatures. Moreover, the residual ISI is modeled mathematically as a Gaussian process to make subsequent ISI cancelation more specific and accurate.
(2) At the receiver, a soft interference cancelation method with turbo equalization is proposed to replace original modulo operation to compensate the BER performance loss and the transmitted signal is estimated and reconstructed to obtain better performance of ISI cancelation.
The remainder of this paper is organized as follows. The system model is briefly introduced and problem formulation is deduced in Section II. In Section III, we model the residual ISI and present the proposed IDA-THP scheme for FTNS. Further more, we validate the proposed transmission scheme based on complexity analysis and simulations in Section IV. Finally, we draw a conclusion in Section V.

II. SYSTEM MODEL AND PROBLEM FORMULATION A. SYSTEM MODEL
We consider the communication system as shown in Fig. 1, where x n are the information symbols generated by M -PSK modulator with the coded bits every τ T seconds and T is the symbol period and τ is called the compression factor, 0 < τ ≤ 1. h(t) is a real unit-energy T-orthogonal baseband pulse, i.e. ∞ −∞ |h(t)| 2 dt = 1. When τ = 1, an orthogonal system is obtained and s(t) is an ISI-free signal referred to Nyquist signaling. While the case τ < 1, it is called FTNS. The root raised cosine (RRC) pulse is considered as the shaping pulse. Fig. 2 shows time domain waves of signal in conventional Nyquist transmission and FTN transmission system. The same interval T is used in two different systems. As shown in Fig. 2, we can see that when symbols are transmitted at Nyquist rate, i.e. τ = 1, the best sampling points will not be disturbed by neighboring symbols and thus Nyquist signaling is ISI-free; when τ = 0.8 and τ = 0.6, symbols are transmitted at a faster rate than Nyquist rate and the spectrum efficiency is improved but at the cost of introducing infinite controlled ISI. As the compression factor gets smaller, ISI becomes more severer.

B. PROBLEM FORMULATION
We use the linear equivalent structure of the block diagram of the conventional TH precoder [38] as shown in Fig. 3. The nonlinear modulo operation of THP is replaced by an equivalent addition of an unique sequence and the transmitted data symbols. The effect of the modulo reduction can be characterized as follows: an unique sequence d[k] , d[k] ∈ 2MZ , which is called precoding sequence, is added VOLUME 8, 2020   to the data sequence x[k] in order to create an effective data are chosen so that the real-valued channel symbols The third term of (1) also can be expressed as corresponding to the intersymbol interference, which is uniquely determined by the preceding transmitted symbols a(k − j)(j = 1, 2, · · · , L − 1) . Then the transmitted signal s(t) of FTNS shown in Fig.1 is written in the form The observed received signal after the matched filter is expressed as where n(t) = w(t) × h * (−t), w(t) represents real white Gaussian noise with total variance, given by N 0 = σ 2 . Assuming perfect timing synchronization between the transmitter and the receiver, the received FTN signal y(t) is sampled each τ T and the kth received sample can be written as ISI from adjacent symbols +n(kτ T ), (5) where A g (mτ T ) is the ambiguity function and defined , so the noise at the receiver is nonwhite. The ambiguity function A g (mτ T ) depends only on the base pulse 7750 VOLUME 8, 2020 h(t) and represents interference between signals. The change of |A g (mτ T )| with time intervals is shown in Fig. 4. We can see that the ambiguity function decreases dramatically with increase of |m|. So usually only the interference from adjacent symbols is considered in practice. Based on this observation, ISI from the preceding (L − 1) and following (L − 1) symbols is considered and (5) can be written as ISI from following (L -1) symbols +n(kτ T ). (6) According to the definition of signal-to-interference ratio (SIR) where E s is the symbol energy and < · > is the inner product operator. The change of SIR with different compression factors is shown in Fig. 5. It's clear that the interference is more and more severe with the decrease of the compression factor if no signal processing technique is used.
Using THP technique to eliminate interference, by substituting (1) into (6), we obtain As shown in (8), the first item v k A g (0) is the desired symbol and sum of the second and third item (1 − a k+n A g (−nτ T ) is residual interference and the last item is colored noise. It is seen that part of ISI L−1 j=1 a k−j A g (jτ T ) has been cancelled which is only from (L-1) preceding symbols. Especially when the compression factor is larger, (1 − A g (0)) is close to zero and ISI from preceding symbols is almost eliminated. However, since the CTHP method was originally proposed to combat ISI of orthogonal Nyquist signaling, ISI from following symbols isn't processed at all by TH precoder resulting in a large amount of residual ISI at the receiver. Moreover, with reduction of compression factor, residual ISI stemming from preceding symbols becomes more and more serious and finally it leads to more severe total residual ISI. Therefore, the CTHP is not effective to eliminate ISI in FTNS.
On the other hand, a simple modulo operation is usually used as the conventional de-precoder shown in Fig. 1, which ignores the correlation between received symbols. As a result, inaccurate soft information values are passed to the decoder that leads to a significant loss in BER performance. VOLUME 8, 2020

III. THE PROPOSED ITERATIVE-DETECTION-AIDED TOMLINSON-HARASHIMA PRECODING (IDA-THP) SCHEME
According to the analysis above, since noncausality and symmetry of ISI matrix in FTNS isn't taken into account in the CTHP method, there's a lot of residual ISI. Coupled with performance loss brought by the modulo operation, the CTHP method can't get expected results of ISI cancellation of FTNS.
To improve the performance, in this paper, residual interference is modeled as gaussian noise and a symbol-bysymbol detection method based on SIC equalization is employed to replace the original modulo operation at the receiver. The corresponding system block diagram is shown in Fig. 6.

A. RESIDUAL ISI MODEL
In order to remove residual ISI effectively, according to [8], the residual ISI in (8) can be modeled as a zero-mean Gaussian process with power spectral density (PSD) N I which is independent of the additive noise. Based on this observation, (8) is simply written as where u k is the equivalent residual interference noise with variance N 0 + N I and where σ 2 c = E{|v m | 2 } and h m = h(t − mτ T ).

B. THE PROPOSED IDA-THP SCHEME
The symbol-by-symbol iterative detection method based on SIC equalization in the IDA-THP method is given as follows.
We assume that the proposed iterative detection algorithm based on SIC principles is activated with priori probabilities of transmitted symbols from the SISO decoder. Here, the priori probabilities is supposed to equal to {P(a n )}.
1) When the iterative number is zero, i.e. iter = 0, the extrinsic probabilities p (iter) (y|a n ) are initialized to a constant value, where y denotes the observed sample set of the current code symbol; 2) The posterior mean and variance of each code symbol are estimated according to a (i) 2 P(a n = a (i) |y) p (iter) (y|a n = a (i) ) p (iter) (y) − |µ n | 2 (12) 3) According to the posterior mean and variance above, the residual interference is evaluated as where denotes the interference set. Because A g (mτ T ) rapidly decreases as |m| increases, usually only the symbols adjacent to the considered one contribute to interference in practice; 4) After evaluated interference is removed, the received samples can be expressed aŝ 5) Assuming that {ŷ n } is independently distributed, a symbol-by-symbol evaluation of the extrinsic probabilities is expressed as where N I = N 0 + m∈ A g (mτ T ) 2 σ 2 n ; 7752 VOLUME 8, 2020 6) Then iter = iter + 1. If iter < S, where the parameter S denotes the overall number of iterations, return to (2), otherwise continue; 7) At last, the extrinsic probabilities fed to the SISO decoder are p S (y|a n ).
Step (1) -(5) is described as a self-iteration process in the SIC algorithm. After a certain number of self-iterations, the extrinsic probabilities are fed to the convolutional decoder for reliable decoding. Through making use of both SIC equalization and turbo iterative process, the dependency between the received symbols is used to reconfigure the transmitted signal which is then employed to correct received symbols so as to realize residual ISI cancellation. Therefore, the proposed IDA-THP scheme not only can eliminate ISI from preceding symbols at the transmitter, but also can further cancel residual ISI through an iterative detection process at the receiver.

IV. COMPLEXITY ANALYSIS AND SIMULATION RESULTS
In this section, firstly, we analyze the computational complexity of the proposed IDA-THP method and compare it with other detection methods. Secondly, we illustrate and verify the BER performance of the proposed IDA-THP scheme by way of computer simulations.

A. COMPLEXITY ANALYSIS
In this section, we first present an analysis of the computational cost of the proposed IDA-THP scheme and then compare it with CTHP method, BICT proposed in [22] and the maximum a-posteriori probability(MAP) equalization which is well known as one of the optimal detection algorithms. For convenience of analysis and comparison, it's assumed that complexity of the outer decoder is ignored. In addition, it should be noted that the CTHP and BICT methods are based on the noniterative receiver model of FTNS shown in Fig.1, while for the proposed IDA-THP scheme and the MAP method, the turbo iteration-based system model is used as shown in Fig. 6.
The computational complexity of the proposed IDA-THP scheme is mainly decided by TH precoder at the transmitter and SIC equalization at the receiver. Firstly, the complexity of TH precoder at the transmitter depends on (1) which is O(NML). Secondly, the computational complexity of the SIC algorithm depends linearly on the information sequence length N , the number of self-iterations S, the constellation size M and the cardinality of the interference set which is equal to the number of adjacent interference symbols L. So the computational complexity of SIC equalization is O(NSML). Considering the number of turbo outer iterations is K , hence the overall complexity of the IDA-THP method is O(KNSML). Meanwhile, complexity of the CTHP method is O(NML) which mainly depends on TH precoder at the transmitter and the modulo operation at the receiver. On the other hand, complexity of BICT is O(N 2 ) and that of MAP-equalization is exponentially increased with channel memories as well as the cardinality of the signal constellations and is O (KN M L ). The overall computational  complexity of four different detection schemes is summarized in Tab. 1.
We can find by comparison, the CTHP method has the lowest computational complexity because its complexity linearly depends on system parameters and no iterative process is required at the receiver. Also based on noniterative receiver model, BICT still has high complexity since its complexity is proportional to square of information sequence length N , which usually has larger value, e.g. N ≥ 500. On the contrary, owing to achieving a convergence performance when S = 3, K = 3, as shown in Fig.7, the proposed IDA-THP scheme has similar computational complexity with the BICT. Particularly when the value of N is larger, the complexity of IDA-THP scheme is lower than that of the BICT.
Since turbo equalization iterations are employed at the receiver in both IDA-THP scheme and MAP method, the computational complexity of two schemes is related to the number K of outer turbo iterations between the MAP or SIC equalizer and the outer decoder, as shown in Tab.1. Moreover, in simulation, the same number of outer turbo iterations K = 3 is performed in both of two schemes. So in order to be compared more clearly, computational complexity for each outer turbo iteration is presented in detail in Tab.2, which is shown with different channel memories, signal constellations. According to Fig.4, only the adjacent symbols contribute to ISI, so the length of channel memory is set from 3 to 6. Also due to the performance convergence of IDA-THP, self-iteration number S of IDA-THP is equal to 3. As shown in Tab.2, when low-order modulation BPSK is used and L ≤ 5, the IDA-THP has similar complexity with MAP. However, with high-order modulation, e.g. M ≥ 4, or longer channel memory L > 5, the IDA-THP has lower complexity than the MAP. Especially under higher modulation order M or larger L, computational complexity of the IDA-THP is even much lower than that of the MAP.
In conclusion, the CTHP method has the lowest computational complexity and the proposed IDA-THP scheme has lower complexity than the MAP, in particular under higher modulation order M or larger L. Moreover, Compared with the BICT, the proposed IDA-THP scheme has similar computational complexity, but as the value of N becomes larger, the complexity of IDA-THP scheme is lower than that of the BICT.

B. SIMULATION RESULTS
In this section, we illustrate and verify the BER performance of the proposed IDA-THP scheme by the computer simulations. For the purpose of comparison, we also present the performance of CTHP, SIC equalization, BICT and MAP equalization which considers the BCJR algorithm based on Ungerboeck Model. Here we assume that the length of data sequence is 1000 bits and the (7, 5) 8 recursive systematic convolutional code, QPSK and the RRC pulse-shaping filter with length L g T = 10T and roll off factor β = 0.3 are employed for FTNS. Moreover, 3 self-iterations are performed in the SIC equalization block and the number of outer turbo iterations between the MAP or SIC equalizer and the decoder is set to be three.
Firstly, we present the BER performance of the proposed IDA-THP scheme with different length of channel memories and time packing factors. As shown in Fig.8, when ISI is relatively low with τ = 0.85, the BER performance is improved as the considered channel memory length increases. When L = 5, the BER performance convergence of IDA-THP scheme can be asymptotically achieved. This result coincides with the experimental observation from Fig.4. When ISI becomes higher with τ = 0.7, as shown in Fig.9, the length of channel memory only has a minimal impact on the BER performance and a convergence of the BER performance also can be obtained in the same case of L = 5.  Secondly, we compare the BER performance of the proposed IDA-THP scheme with other methods. From Fig.10, when ISI is relatively low with τ = 0.85, we can see that the proposed IDA-THP scheme can approach the ISI-free performance boundary and only has a narrow signal-to-noise (SNR) gap (about 0.2 dB at P e = 10 −5 ). Moreover, the IDA-THP has better BER performance than the MAP when E b /N 0 ≥ 5.6dB, which indicates that the IDA-THP is competitive to the MAP for slight ISI and higher SNR. Meanwhile, it's obvious that the IDA-THP outperforms BICT, CTHP and SIC. When ISI becomes serious with τ = 0.8, as shown in Fig.11, the BER performance of all schemes has been degraded in a certain extent, but the IDA-THP still shows better performance than BICT, CTHP and SIC, while it is slightly inferior to the MAP. When τ = 0.7, i.e. more severe ISI, similar phenomenon also can be found in Fig.12. It's seen that the IDA-THP still can achieve better performance than other detection methods except the MAP. The performance deterioration of IDA-THP is mainly because of the mismatch between the Gaussian mathematic model and more serious residual interference. In conclusion, the proposed IDA-THP 7754 VOLUME 8, 2020  greatly outperforms CTHP, SIC and BICT. For moderate time compression, the IDA-THP can approach the ISI-free BER performance boundary and is also competitive to the MAP. In a severe ISI scene, the IDA-THP has worse BER performance than the MAP. However, taking into account both the BER performance and computational complexity, the IDA-THP scheme may be a better choice than other schemes.
Finally, to further verify that the proposed IDA-THP scheme is suitable for the practical application of FTNS, we perform the simulation of FTNS in the proposed IDA-THP scheme with a low-density parity check(LDPC) code of rate 1/2. Fig.13 shows the BER performance of different methods with τ = 0.9. As can be seen that the proposed IDA-THP gets the similar BER performance to the MAP and can approach the ISI-free BER performance with only about 0.1dB SNR gap. When the induced ISI becomes higher with τ = 0.8, as shown in Fig.14, performance of all the schemes is degraded, but the IDA-THP still shows a performance gain of 1 dB and 0.5 dB over CTHP and BICT, respectively. Moreover, the SNR gap loss between the IDA-THP and the MAP is less than 0.3dB. The BER performance   of different detection schemes under packing factor τ = 0.7, i.e. more serious ISI, is shown in Fig.15. It's obvious that the IDA-THP outperforms the BICT, CTHP and the SIC, meanwhile, the MAP outperforms the IDA-THP by only 0.2dB. So we conclude that the proposed IDA-THP still can achieve excellent BER performance with LDPC code.
Therefore, it is concluded that the proposed IDA-THP has lower computational complexity and greatly outperforms CTHP, SIC and BICT. It also has great BER performance closer to and even competitive to the MAP for moderate time compression.

V. CONCLUSION
In this paper, an improved TH precoding scheme named IDA-THP was proposed to solve the problem of ISI elimination in FTNS. Firstly, TH-precoded signal form in FTN system was deduced and we found that there was large residual ISI after TH precoded at the transmitter. Then we modeled residual ISI as a Gaussian process and gave its mathematical expression. Further more, an optimized receiver, which was based on SIC principle and iterative turbo equalization, was proposed to replace the original modulo operation in CTHP to remove residual ISI specifically. Finally, complexity analysis and numerical simulations showed that the proposed IDA-THP scheme not only had lower computational cost but also achieved excellent BER performance. It could effectively improve BER performance of CTHP and greatly outperformed the SIC and BICT schemes. Moreover, for moderate time-packing, it could approach the ISI-free BER performance and was also competitive to the MAP equalization technique. Although it had a certain BER performance degradation in the strong-interference scenario, we think that the IDA-THP scheme may be a more preferred choice for FTNS from the point of view of compromise in BER performance and computational complexity. Therefore, the proposed IDA-THP scheme has a certain theoretical and practical significance.