MIMO Time Domain Equalizer Design for Long Reach xDSL MIMO Channel Shortening

In discrete multi-tone (DMT) transmission based digital subscriber line (DSL) systems, a cyclic prefix (CP) is added to each symbol before transmission, where the length of the CP is larger than the estimated channel impulse response (CIR) length. This ensures the elimination of inter-symbol interference (ISI) and inter-carrier interference (ICI) between the carriers of the same symbol, and allows for single tap frequency domain equalizers and crosstalk cancellation at the receiver. Recently, long reach xDSL (LR-xDSL) has been proposed to extend the reach of conventional DSL systems. With the extended loop lengths, the required CP length increases, in order to match the larger CIR length. The longer CP adds a large overhead and results in overall throughput loss. A more efficient way to deal with extended loop lengths is to use a channel shortening filter - commonly referred as a time domain equalizer (TEQ), to reduce the length of the CIR to the length of CP. This paper focuses on minimum mean square error (MMSE) based multiple input multiple output (MIMO) TEQ design for LR-xDSL MIMO channel shortening. Constraints are applied to the minimization problem to eliminate the trivial solution. This paper proposes two new constraints for the MMSE based MIMO TEQ design for upstream scenarios, which result in a lower complexity and provide better (or similar) performance compared to existing MMSE based MIMO TEQ design methods. Furthermore, a diagonal MIMO TEQ with lower memory requirement and lower computational complexity is presented based on the proposed constraints, which can be applied in upstream as well as downstream scenarios.


I. INTRODUCTION
Digital subscriber line (DSL) systems offer broadband communication over the existing copper telephone lines. Throughout the various generations of DSL, discrete multi-tone (DMT) is used as the modulation format. DMT is a multi-carrier modulation technique, which divides the available bandwidth in multiple discrete sub-bands, each corresponding to one carrier (also known as tone) [1]. This allows the input bitstream to be divided into parallel bits The associate editor coordinating the review of this manuscript and approving it for publication was Rui Wang . streams. In each stream, groups of bits are converted into high-order QAM symbols (up to 16384-QAM), which are subsequently modulated on a discrete carrier by an inverse discrete Fourier transform (IDFT) operation. The IDFT operation provides a time-domain symbol, to which a cyclic prefix (CP) is added. The length of the CP plays a crucial role in the error-free reception of the transmitted QAM symbols. Roughly, if the CP length is larger than the estimated channel impulse response (CIR) length, the transmitted QAM symbols can be recovered at the receiver (after discrete Fourier transform (DFT)), with single tap frequency domain equalizers and crosstalk cancellation, without inter-symbol (ISI) and inter-carrier interference (ICI) between the carriers of the same symbol. Hence, this condition imposes a constraint on the length of the CP.
The older generations of DSL such as ADSL [2] (and its later versions ADSL2 and ADSL2+), have been characterized by low crosstalk levels on the one hand (allowing for per-line single input single output (SISO) design), but on the other hand by a long CIR, as they commonly allow loop lengths up to 5000 meters. Although the use of a similarly long CP would solve the problem of ISI and ICI, an overhead to the length of the symbol would be introduced, which then consequently decreases the achievable throughput. A more efficient way to deal with this problem has been the use of a channel shortening filter, which is mostly a time domain channel shortening filter, commonly known as a time domain equalizer (TEQ) [3]. The TEQ is placed before the DFT block at the receiver, such that it reduces the length of the CIR to a target value, i.e. the length of the CP, used at the transmitter.
It is also noteworthy that the TEQ should not be confused with the commonly used classic equalizer (or equalization process) in (e.g. single-carrier) wireless communication systems. The aim of a classic equalizer includes reducing the CIR to a Dirac impulse [4], [5]. Hence, in the absence of noise, a classic equalizer could be a filter whose impulse response is the inverse of the CIR. Unlike the classic equalizer, the goal of a TEQ is not to produce a Dirac impulse, but to reduce the length of the CIR to a predetermined target value, where the actual shape of the shortened CIR is not defined a priori [6].
In later generations of DSL, with the deployment of optical fibre to distribution point units (DPUs) closer to subscriber's premises, loop lengths have effectively been shortened. Hence, channel shortening has not been used in later generations of DSL (e.g. VDSL [7], G.fast [8], G.mgfast [9]), confining the use of a TEQ to ADSL.
Recently, however, also long reach VDSL2 (LR-VDSL2) has been proposed with the purpose of providing high data rates (possibly up to 40 Mbit/s for downstream) [10] and a longer reach than conventional VDSL2, for areas where optical fibre cannot easily be deployed (due to geographical or financial barriers). Hence, the need for transmitting data over longer loops again motivates the use of a TEQ for LR-VDSL2. Moreover, due to the rapid development of DSL standards (within Q4/15 ITU standardization), long reach G.fast (LR-G.fast) is also being considered. Therefore, future VDSL2 and G.fast DPUs can be connected to lines with a wide variety of lengths. All the lines connected to the same DPU should have the same CP length in order to simplify the system and allow for efficient vectoring. In this scenario, the CP length is chosen according to the longest line and hence shorter lines may undergo a huge throughput loss, with no improvement in performance. This further motivates the use of a TEQ for LR-xDSL. However, since crosstalk can no longer be neglected in VDSL and later generations of DSL, a multiple input multiple output (MIMO) TEQ design is required instead of a SISO TEQ design as defined for ADSL, for a joint shortening of direct as well as crosstalk channels.
DMT systems bring a specific complication to the TEQ design, as the channel shortening is performed before the demodulation (i.e., before the DFT) while bitrates are defined by the achieved signal-to-noise ratio (SNR) after the demodulation. Therefore bitrate maximization is a challenging task. Although in [11], a SISO bitrate maximizing TEQ (BM-TEQ) has been proposed to maximize the total bitrate for a given filter order, the optimization problem is non-linear and non convex, hence it is not considered here. More recently, a low complexity blind adaptive SISO TEQ has been suggested to maximize the total bitrate [12]. Similarly in [13], a blind channel shortening equalizer structure has been proposed, which uses genetic algorithm to search for the optimal SISO TEQ coefficients. The genetic algorithm finds the best possible combination of all the TEQ coefficients, but at the cost of a very high computational complexity. Therefore and also because in DSL systems the channel state information (CSI) is almost perfectly known [14], blind channel shortening equalizers are not considered here.
In [15] a generalized framework for the minimum mean square error (MMSE) based MIMO TEQ design has been proposed with an identity tap constraint (ITC) and an orthonormality constraint (ONC) on the target impulse response (TIR) matrix. The paper further compared the performance of both constraints and concluded that the ONC outperforms the ITC. Therefore, in this paper the ONC on the TIR is considered as the reference constraint. In [16] a maximum shortening SNR (MSSNR) based MIMO TEQ design has been proposed, which aims at minimizing the energy of the shortened CIR outside a target window, while maintaining the energy within the target window. In [17] the MSSNR based MIMO TEQ design has been modified and a comparison between the MSSNR and the MMSE based MIMO TEQ design has been made. The results show that the MMSE based MIMO TEQ design outperforms the MSSNR based MIMO TEQ design. In [18] a per-tone equalizer has been proposed which interchanges the position of the DFT and the MIMO TEQ. It allows a separate channel shortening filter for each carrier. Since, the MIMO TEQ is placed after the DFT, it can be considered as a frequency-domain equalization and is not considered in this paper. A summary and evaluation of various TEQ design methods has been provided in [19], [20]. This paper focuses on MMSE based MIMO TEQ design. Constraints are applied to the minimization problem to eliminate the trivial solution. The contributions of the paper are as follows: (i) Two new constraints are proposed for MMSE based MIMO TEQ design for upstream scenarios. The first proposed constraint (UNCDc) allows parallel processing of the TIR for each line. The UNCDc is further modified into a new constraint namely the UNCDc-Zxc, which maintains the parallel computation (of the TIR for each line) capability, but also reduces its computational complexity and provides better (or similar) performance compared to existing MMSE based MIMO TEQ design methods. (ii) Furthermore, a diagonal MIMO TEQ is presented which does not only allows VOLUME 8, 2020 parallel processing of the TIR and the TEQ matrix at reduced computational cost but also significantly reduces the memory requirement and run-time complexity and can be applied in upstream as well as downstream scenarios.
The paper is organized as follows: In Section II the MMSE based MIMO TEQ design equations for MIMO DSL systems are reviewed with the conventional ONC. To allow parallel processing of the TIR for each line, a new UNCDc based MMSE MIMO TEQ design is proposed. Furthermore, to reduce the computational complexity of the earlier proposed constraint, another constraint, namely the UNCDc-Zxc, based MMSE MIMO TEQ design is presented which allows parallel computation of the TIR for each line at a reduced computational cost. In Section III the diagonal MIMO TEQ is presented, with a lower memory requirement and lower computational complexity. In Section IV a computational complexity analysis is presented and in Section V a comparison of memory requirement is performed between the full MIMO TEQ and the diagonal MIMO TEQ. Simulation results are reported in Section VI and finally Section VII concludes the paper.

II. MIMO TEQ
The system considered is a cable binder with M -lines, corresponding to an M × M baseband communication system with additive white Gaussian noise and slowly varying time dispersive channel of order L [15]. In the time domain, the relation between transmitted and received signals can be described as . . . where T with y l and n l having a similar structure as x l , h pq l is the l th sample of the CIR between transmitter q and the receiver p and x q (l) is the l th time domain sample transmitted by transmitter q. (1) can be rewritten as where, [l] and the noise correlation matrix of denotes the expected value operator and (·) H represents conjugate transpose. Assuming the input correlation matrix and the noise correlation matrix are non-singular, two more matrices are defined namely the input-output cross correlation matrix and the output correlation matrix The aim of a MIMO TEQ is to reduce the M × M channel of order L to a target M × M channel of order N b , where generally N b is the CP length, using an M × M TEQ matrix of order T − 1 (Fig. 1). In an upstream scenario, where coordination is possible between the receivers, the MIMO TEQ matrix is given by where W l is an M × M matrix given by Similarly, the TIR matrix is defined as The TEQ matrix W and TIR matrix B are designed by minimizing the mean square of the error vector e (l) . whereB which accounts for the so called synchronization delay (Fig. 1). The mean square error (MSE) is hence defined as where tr(·) represents the trace of a matrix. According to the orthogonality principle so thatB H R xy = W H R yy . Hence, W H can be written as Substituting W H from (11) in (9), an expression is obtained that is only dependent onB To avoid the all zero trivial solution (W = 0, B = 0) when minimizing (9) or (12), a non-triviality constraint is added to the minimization problem. The non-triviality constraint can be either applied on the TEQ (W) or the TIR (B) matrix. It has been shown in [21] that non-triviality constraints on the TIR matrix provide better performance than non-triviality constraints on the TEQ matrix. Hence, in this paper we only consider non-triviality constraints on the TIR matrix. Moreover, in [22], two non-triviality constraints on the TIR matrix are discussed for the SISO TEQ and subsequently extended to the MIMO TEQ [15], referred to as the identity tap constraint (ITC) and the orthonormality constraint (ONC), with the ONC outperforming the ITC. For this reason, the ONC on TIR is considered here as the reference constraint for the performance comparison. The ONC constrains the rows of the TIR matrix to be orthonormal. Hence the optimization problem under the ONC becomes A. UNIT NORM CONSTRAINT ON DIRECT CHANNELS TIR (UNCDc) By using the ONC, the optimization problem is defined such that the solution structure remains the same, as compared to the SISO scenario [22]. The proposed non-triviality constraint in this section is instead a straightforward (and natural) extension of the single line scenario and provides better performance, while allowing for a parallel computation of the TIR matrix columns, independent of each other. Instead of applying the ONC to the complete TIR matrix, a unit norm constraint (UNC) can be applied only to the direct channels of the TIR matrix. For a complete TIR matrix (given in (7)), the m th column defines the TIR output for line m.
The part of this column that represents the input from line m (i.e. the direct channel for line m) is given by (15) and the remaining part can be represented by b indirect,m . The UNC is applied to the vector b direct,m . Hence the optimization problem becomes Since the UNC is applied separately for each column of the TIR matrix, it allows for a parallel computation of the optimal solution for each column (i.e. line). To apply the UNC for column m, the TIR matrix is permuted such that the direct channel (b direct,m ) occupies the last M positionš where A m is the permutation matrix. Hence, the permuted m th column of the TIR matrix is structured aš The relevant contribution in (12) is then given as Using (17) whereŘ By using the Cholesky factorization ofŘ total,m where R chol is an upper triangular matrix and where R 22,m is an M × M matrix, and by substituting (22), (23) and (18) in (20), one obtains Assuming b direct,m is already known, then minimizing (24) is an unconstrained quadratic problem in b indirect,m . Hence, for a particular synchronization delay . Hence, a search over a range of synchronization delays is required, to find the optimal , which minimizes the total MSE, M m=1 E e (l) (m) 2 .
The optimal b opt indirect,m can be subsequently calculated from b opt direct,m using (25). Once the complete TIR matrix B is obtained, the optimal TEQ matrix (W) is calculated using (11).

B. UNIT NORM CONSTRAINT ON DIRECT CHANNELS TIR AND ALL ZERO TIR FOR CROSSTALK CHANNELS (UNCDc-Zxc)
The complexity of the TEQ design with the UNCDc suggested in the previous section can be reduced by having the TEQ serve a different purpose for the direct and for the crosstalk channels, namely to shorten the direct channels and minimize the energy of the crosstalk channels. This can be achieved by setting the TIR for crosstalk channels to zero. As a result the computational complexity is reduced, as only the diagonal elements (corresponding to the direct channels) of the TIR matrix are to be evaluated. Thus, the UNCDc is modified into a UNC on the direct channels of the TIR matrix and an all zero TIR constraint for the crosstalk channels. The resulting structure for B l in (7) is an M × M diagonal matrix The optimization problem to be solved is With (12) Once the optimal TIR matrix B is complete, the corresponding optimal TEQ matrix W is calculated using (11).

III. DIAGONAL MIMO TEQ
A further reduction of the computational complexity can be achieved by considering a diagonal MIMO TEQ. It not only reduces the computational complexity and memory requirement but also allows for a MIMO TEQ realization in a downstream scenario, where no coordination is possible between the receivers. The structure for W l in (6) The part of (1), defining the signal received on line m is . . . (37) . Therefore for line m, the TIR and corresponding TEQ coefficients can be found by minimizing the following cost function Hence, the optimal TEQ for line m is In an upstream scenario, the synchronization delay can be optimized for all lines together as in section II-A. In a downstream scenario, the synchronization delay can be optimized similarly but then for each line individually.

IV. COMPUTATIONAL COMPLEXITY
The complexity of computing the optimal TEQ matrix (W opt ) can be divided in two parts: (i) computational complexity of computing the optimal TIR matrix (B opt ) and subsequently, (ii) computational complexity of computing the W opt from B opt .

A. ONC BASED MMSE MIMO TEQ DESIGN 1) OPTIMAL TIR MATRIX COMPUTATION (B opt )
The complexity of computing the optimal TIR matrix under the ONC is dominated by the eigendecomposition of the R total matrix given in (13). Hence, the computational com- The computation of the W opt from the B opt follows (11). The required operations can be subdivided as: (i) Computing R xy : In DSL systems the CSI is assumed to be completely known [14]. Hence, the cross-correlation matrix R xy can be computed using (3) (2). Hence, the computational complexity of computing R yy is O(M 2 T 2 ). (iii) Computing R −1 yy : The matrix R yy has a blocked Toeplitz structure, which can be exploited to compute its inverse with a computational complexity of O(M 3 T 2 ), using the efficient algorithm suggested in [23]. (iv) Computing W opt : The optimal TEQ matrix W opt is finally computed using (11), which involves matrix multiplicationB H R xy R −1 yy . Since the matrixB H is a sparse matrix, the matrix multiplication can be done efficiently with a computational complexity of O(M 3 T (N b + 1) + M 3 T 2 ). Therefore, the total complexity of computing the optimal TEQ matrix W opt under the ONC is O(M 3 TN b + M 3 T 2 ).

B. UNCDc BASED MMSE MIMO TEQ DESIGN 1) OPTIMAL TIR MATRIX COMPUTATION (B opt )
The computationally expensive part in the computation of the B opt under the UNCDc is the Cholesky decomposition of theŘ total,m matrix in (22), which has to be performed independently for all lines (M times). Therefore, the complexity of computing the B opt under the UNCDc is

2) OPTIMAL TEQ MATRIX COMPUTATION (W opt )
The complexity of computing the optimal TEQ matrix (W opt ) under the UNCDc remains the same as under the ONC, since it is also computed using (11). Hence, the computational complexity of computing the W opt under the UNCDc is O(M 3 TN b + M 3 T 2 ). VOLUME 8, 2020 The computationally expensive task in the computation of the B opt under the UNCDc-Zxc is the eigendecomposition of the R m direct given in (32), which has to be performed independently for all lines (M times). Hence, the complexity of computing the B opt under the UNCDc-Zxc is given as

2) OPTIMAL TEQ MATRIX COMPUTATION (W opt )
The computation of the optimal TEQ matrix (W opt ) under the UNCDc-Zxc also uses (11), as under the ONC and the UNCDc. However, under the UNCDc-Zxc, the B opt matrix has a diagonal structure and has only M (N b + 1) non-zero coefficients instead of M 2 (N b + 1) non-zero coefficients under the ONC and the UNCDc. Hence, the computational complexity of step-(iv), defined under IV-A2, is reduced to The computationally expensive task in the computation of the B opt under the UNCDc-Zxc for diagonal MIMO TEQ is the eigendecomposition of the R m direct,∧ given in (41), which has to be performed independently for all lines (M times). Hence, the complexity of computing the B opt for the diagonal MIMO TEQ under the UNCDc-Zxc is given as

2) OPTIMAL TEQ MATRIX COMPUTATION (W opt )
The computation of the W opt from the B opt follows (42). The required operations follows the same order as defined in IV-A2, but with a diagonal structure of the B opt matrix and the W opt matrix. Therefore, the total complexity of computing the optimal diagonal TEQ matrix W opt under the UNCDc-Zxc is O(M (N b T + T 2 )). Table 1 summarizes the computational complexity of various MIMO TEQ design methods discussed.

V. MEMORY REQUIREMENT
In comparison to a full MIMO TEQ, the diagonal MIMO TEQ structure significantly reduces the memory requirement to store TEQ coefficients (W). For a full MIMO TEQ,  the TEQ has the structure shown in (5) and (6), requiring M 2 T coefficients to be stored. The diagonal MIMO TEQ follows the structure given in (33) and (35). Thus, it needs only MT TEQ coefficients to be stored. A similar reduction can be seen in the runtime complexity. A full MIMO TEQ structure performs M 2 T multiplications to compute the filtered output (W H y [l] ), while the diagonal MIMO TEQ performs only MT multiplications. Figure 2 shows the reduction in memory requirement and runtime complexity of a diagonal MIMO TEQ compared to a full MIMO TEQ, for different practical binder sizes.

VI. RESULTS
The G.fast 106b profile [24] is considered here for the simulation of a 2 × 2 MIMO DSL system, i.e., a 2-line DSL system with 2048 carriers. A total transmit power of 8 dBm and a noise power of −140 dBm/Hz is considered. A practical approach is chosen for the transmit power distribution over carriers as follows. Initially, the power is allocated to carriers according to the power spectral density (PSD) mask specification [25]. Based on that, a TEQ filter is designed and the number of bits that can be transmitted over each carrier is calculated. The carriers for which the transmitted bits is less than 1, are rejected and left unused. The remaining power is   eventually distributed over the used carriers, respecting the power mask and then the TEQ filter coefficients are updated. The simulations are performed for both a theoretical channel model and measured channels. The theoretical channel (length 600m) is based on the KHM model, suggested in [26], while the measured channel data corresponds to cable binders of two Tier-1 operators (channel 1 of length 728m and channel  2 of length 600m). The data rates are computed with a bit-cap of 14 bits and without vectoring. Fig. 3, Fig. 4 and Fig. 5 compare the performance of the state-of-the-art full MMSE TEQ based on ONC and the proposed design based on UNCDc (II-A), in terms of the total bit rate for a two line DSL system, for different channels. It can be noticed that the UNCDc provides better (or at least similar) data rates compared to the ONC. The difference in performance can be explicitly seen in Fig. 3 for a filter length of 4, where the ONC breaks down and shows a fall in data rates compared to the UNCDc. Fig. 6, Fig. 7 and Fig. 8 show the performance of the stateof-the-art full MMSE TEQ based on ONC and the proposed design based on UNCDc-Zxc and the low complexity diagonal MIMO TEQ design. From the results, it can be seen that both the UNCDc-Zxc and the diagonal MIMO TEQ show mostly either equal performance or an increase in bit rate achieved compared to the ONC, while providing a reduction in the computational complexity and memory requirement (by the diagonal MIMO TEQ).

VII. CONCLUSION
In this paper design methods for MMSE based MIMO TEQ have been presented using two novel non-triviality constraints. The UNCDc (II-A) provides a more natural extension of the single line case (with possibility of parallel computation of TIR matrix columns, corresponding to different lines) and shows improved (or at least similar) bit rate performance compared to the state-of-the-art full MMSE TEQ based on ONC. The complexity in MIMO TEQ design has been reduced by another proposed constraint -UNCDc-Zxc (II-B). The computational complexity and memory requirement has further been reduced by the suggested novel diagonal MIMO TEQ, which also shows performance similar or better (in some scenarios), as compared to the state-ofthe-art full MMSE TEQ based on ONC. YANNICK LEFEVRE received the master's degree in engineering sciences from Vrije Universiteit Brussel (VUB), Brussels, Belgium, and Universiteit Gent, Ghent, Belgium, in 2010, and the Ph.D. degree in applied sciences and engineering from VUB, in 2014. He joined Nokia Bell Labs, Antwerp, Belgium, in 2015. As a Research Engineer, he works on next-generation copper and optical access technologies. His research interests include digital signal processing, forward error correction, signal shaping, and modulation. He was a recipient of an Aspirant Grant from the Research Foundation-Flanders (FWO).
PASCHALIS TSIAFLAKIS received the M.Sc. and Ph.D. degrees in electrical engineering from KU Leuven, in 2004 and 2009, respectively. He has further conducted research at Princeton University, UCLA, Tsinghua University, and UC Louvain. Since 2013, he joined Nokia Bell Labs, where his main activities focus on research and innovation, contributing to standardization bodies, and driving innovation into next-generation communication products. He has performed research in fields of optimization, signal processing, and machine learning, with applications to wireline and wireless communication systems. He received both the Ph.D. and Postdoctoral Researcher Fellowship of the Research Foundation Flanders (FWO), the Belgian Young ICT Personality Award, in 2010, the Nokia Innovation Award, in 2017, the Nokia Bell Top Inventor Award, in 2019, the Distinguished Member of Technical Staff Title, in 2019, and several IEEE Best Paper awards. VOLUME 8, 2020