Low-Complex Digital Cancellation of Transmitter Harmonics in LTE-A/5G Transceivers

Nowadays, mobile communication radio frequency transceivers for LTE-A/5G usually operate in frequency-division duplex mode and support carrier aggregation. The resulting complexity requires compromises in the analog front-end. One consequence is a limited stop-band attenuation of the duplex filters that separate the transmit and the receive paths. The resulting transmit leakage signal combined with other non-idealities, like power amplifier (PA) harmonics, can create self-interference problems that desensitize the receivers. In this work, we propose an optimized digital self-interference cancellation scheme to mitigate the unacceptable performance degradation caused by PA induced harmonics. We derive an appropriate interference model and quantify the interference power levels for a specific PA. Based on these results, our low-complex cancellation approach solely targets the dominating components. Our algorithm relies on the auto-covariance matrix for the second and third power of LTE signals, which we derive for the first time in an analytic form. This statistical information can be combined with the transform-domain concept to noticeably improve the performance of a least-mean squares based cancellation algorithm. The capabilities of the approach are demonstrated by means of simulations and are validated on measurement data.


I. INTRODUCTION
A MAJOR design goal for radio frequency (RF) transceivers in mobile communications devices is low power consumption. One critical component is the transmit (Tx) power amplifier (PA), where a compromise between linearity and efficiency needs to be made [1], [2]. Doubtlessly, any nonlinear behavior of the PA eventually leads to increased in-band distortion, out-of-band (OoB) emissions and harmonic components. In in-band full-duplex (IBFD) and frequency-division duplex (FDD) transceivers, these unwanted signal components potentially cause selfinterference in one or multiple receive (Rx) chains. In FDD carrier aggregation (CA) operation, which is enabled by the Long-Term Evolution Advanced (LTE-A) and 5G standards [3], [4], PA harmonics are critical. A harmonic emission close to an Rx carrier might be down-converted to the baseband (BB) and overlay the wanted Rx signal [5]. The spectral separation provided by the front-end filters, especially the duplexers, is typically insufficient to avoid a deterioration of the Rx signal. This can limit the throughput and consequently impair the user experience. For scenarios close to reference sensitivity, the harmonic interference might be strong enough to hamper a successful data transmission via the affected component carrier. The growing number of bands and CA combinations used for mobile communications, generates numerous use cases susceptible to this self-interference mechanism. Consequently, appropriate cancellation methods are vital.
Doubtlessly, self-interference cancellation is a major challenge in IBFD and requires a combination of analog and digital mitigation schemes [6]- [9]. IBFD is of high interest for future mobile communication standards due to the enhanced spectral efficiency [10]- [15]. A wide range of methods has been published to mitigate self-interference caused by inband distortions or OoB emissions of a nonlinear PA. Most concepts are based on polynomial PA models with and without memory. The estimation of the nonlinearity coefficients and the leakage path is implemented either using classical estimation approaches, such as batch least-squares solutions and the recursive least squares (RLS) [16], [17], or adaptive learning methods, like least-mean squares (LMS) algorithms combined with basis function orthogonalization [18], spline adaptive filters [19] and neural networks [20]. For many of these concepts, the computational complexity during training is high. This aspect is often neglected due the assumption of a quasi-static scenario that only requires seldom re-adaptation.
In contrast to in-band distortions or OoB emissions, the explicit mitigation of Tx harmonics is rarely covered in literature. Some insights on this subject are given in literature on specific PA designs [21], where PA with an integrated harmonic rejection mechanism is demonstrated. Further exemplary designs utilizing this idea are given in [22], [23]. Reference [24] provides an in-depth analysis of the underlying concept. Of course, many digital predistortion techniques for PAs exist. While these algorithms reduce OoB emissions near the fundamental transmit frequency, the influence on harmonics is not covered [25]- [27]. From a self-interference cancellation perspective, many computationally expensive general learning concepts could be adapted to the Tx harmonics issue. However, to the best of our knowledge no low-complex solution with appropriate adaptation rate and steady-state performance has been published so far. In this work, we focus on LTE-A/5G FDD operation and propose to combine a static reference signal generation and LMS algorithm to estimate the leakage path. This Hammerstein structure [28] is enhanced by an approximate prewhitening of the reference sequence, following the transform-domain LMS (TD-LMS) concept with a precomputed power normalization. In general, this improves the adaptation rate and enables a consistent steady-state performance of the normalized LMS (N-LMS), when applied to inherently correlated Long-Term Evolution (LTE) BB data. The power normalization requires the analytic auto-covariance matrix of the reference sequence. Since this sequence can be represented as a linear transformation of a random vector, we derive the general auto-covariance of the second and third power of such a linear model.
The paper is organized as follows: In Section II, we provide a detailed model of the PA RF output signal under the presence of a polynomial nonlinearity with memory. This leads to a BB model of the transmitter-induced selfinterference in the affected Rx chain. In Section III, we derive the auto-covariance matrix of the reference sequence for the cancellation of the second and third harmonic, including all resampling stages and a frequency shift. Section IV gives details on the design of the resampling filters and proposes a variant of the TD-LMS suitable for non-stationary signals. In Section V, we discuss a state-of-the-art method which builds upon the parallel Hammerstein (PH) model. In Section VI, we perform simulations to compare the performance of our approach to the state-of-the-art method. We validate our approach on measurement data in Section VII. Finally, Section VIII provides concluding remarks. Fig. 1 shows the minimum configuration of FDD CA transceiver that might suffer from self-interference due to PA nonlinearity in certain scenarios. The difference between the Tx and Rx carrier frequencies of path 1, f Tx1 and f Rx1 , is determined by the duplexing spacing, which is implicitly part of the channel specification [29], [30]. First, it helps to ensure sufficient suppression of OoB distortions of the Tx at f Rx1 . Second, harmonics of f Tx1 cannot desensitize the Rx in the same band. In case of inter-band CA, though, the standard does not require such a relationship between the used bands. Consequently, it is possible that a harmonic of the Tx signal, which leaks over the duplexer, might distort the Rx of another band. In our example, Tx1 is the aggressor and Rx2 is the victim. The leakage signal might be down-converted by the local oscillator (LO) frequency f Rx2 or a harmonic of it. The Tx path of the second band is omitted in Fig. 1, since it is irrelevant for the modeling of the effect. Later on, we will give concrete examples for LTE-A and 5G.

A. INTERFERENCE MODEL
We start with analyzing the distortions caused by a nonideal PA with the dynamic transfer characteristic g PA (.).
The main effect modeled by g PA is the saturation of the PA output at high input amplitudes. Saturation is an issue especially in mobile devices, since it is desired to maximize the utilization of the PA output range for efficiency reasons. Considering Weierstrass' approximation theorem, a polynomial model with sufficient degree P is able to approximate any continuous function on a closed interval to any desired accuracy [31]. One method to obtain the coefficients for a given function is using Bernstein polynomials [32]. Though, the steady-state transfer characteristic is commonly not sufficient to fully describe the PA circuit behavior. In fact, electrical and electro-thermal memory effects may occur by design, and thus, have to be included in the model [33]. In summary, we employ the real memory polynomial with the degree P ≥ 1 and the P possibly different impulse responses α p (t). ' * ' denotes the time-continuous or timediscrete convolution, depending on the context. This model has been shown to be accurate in modeling the nonlinearity of different PAs [34]. α 0 (t) is included to harmonize the notation but typically is zero. For small inputs x[n] and a weak memory effect, we have α 1 (t) ≈ α 1 δ(t − τ ) with the linear gain α 1 , the Dirac delta δ(t) and a time delay τ . The polynomial matches the PA characterization described in Section II-B. Using the model (1), the output of the PA is given by with the real RF impulse responses α RF,p (t) and the upconverted Tx signal x BB (t) is the complex, time-continuous BB Tx signal, which results from the digital sequence x BB [n] by means of the digital-to-analog converter (DAC). Using the identities and Since we are interested in the signal components at a single harmonic, we group the occurring terms by their carrier frequency m f Tx1 by substituting p = 2l + |m| and k = l + (|m| − m)/2 for l = 0, . . . , (P − |m|)/2 yielding For shorter notation we used the frequency-shifted impulse responsesα Evaluating this model at the fundamental (m = ±1) results in the transfer characteristic of the PA In this work, we focus on the second (H2) and third harmonic (H3), which have the highest probability of hitting Rx band under the constraints of the LTE-A and 5G band definitions and usually also contain significant power [35]. The corresponding signals are x PA, H3 with the general BB representation The BB equivalent PA output signals for H2 and H3 are x PA, H2 x PA, H3 with α RF,l (t) = 2 α Hθ BB,l (t) e j2πθf Tx1 t .
In the analog front-end, due to the high power ratio between Tx and Rx signals, the Tx-Rx isolation of the duplexer of typically 50 dB [36] is insufficient to suppress the harmonic x PA, Hθ RF (t). The leakage signal y TxL, Hθ RF (t) is a spectrally shaped version of x PA, Hθ h TxL RF (t) represents the impulse response of the leakage path with the baseband equivalent impulse response h TxL BB (t) centered at the ζ -th harmonic of the carrier of Rx2. It is mainly determined by the duplexer stop-band. The case ζ = 1 is explicitly included in this formulation. The spectral impact of other front-end components in the leakage path, such as switches and diplexers, is usually minor compared to the duplexer and thus will be neglected in our considerations. In the example shown in Fig. 1, Rx2 amplifies and down-converts the leakage signal, if BW Tx1 and BW Rx2 are the bandwidths of the relevant Tx and Rx, respectively. At the input of the receiver, the wanted signal and the noise are denoted by resulting in the total Rx signal y Rx2, Tot This signal passes the low-noise amplifier (LNA), which is assumed to be linear for the purposes of this work [16], [19]. The next stage is the (non-ideal) down-conversion with the main LO signal and optionally a harmonic with gain g sp leading to the mixer output signal A Rx2 is the total gain of the Rx path. Consequently, the total BB Rx signal after the (ideal) anti-aliasing filter (AAF) is Here we assume a sufficiently narrow bandwidth of the shifted harmonic x PA, Hθ BB (t) e j2π f t and the additive noise η Rx2 BB (t) such that both are unaffected by the AAF.g sp equals 1 for ζ = 1 and g sp otherwise. Usually, direct-conversion Rx chains contain a direct current (DC) cancellation stage, which is modeled by the notch filter h dc [n] [37], [38]. In combination with the channel-select filter (CSF) h csf [n], we obtain the digital BB signal +η BB [n] where Except for the quantization noise η ADC BB [n], the analog-todigital converter (ADC) is assumed to be ideal. The sampling ratef s shall be sufficiently high to enable the calculation of the shifted harmonic without aliasing. By inserting the timediscrete BB equivalent PA output, we yield the interference signal . Without loss of generality, in the following, we set A Rx2 = 1 to shorten the notation.

B. POWER LEVEL CONSIDERATIONS IN PRACTICAL TX HARMONICS SCENARIOS
Throughout the rest of the paper, we focus on two specific Tx-Rx band combinations. The first example covers an H2 interference and incorporates the LTE-A bands 8   The analysis of the PA nonlinearity (7) showed that several signal components with different exponents overlay at a certain harmonic θ f Tx1 . In order to quantify the power levels of the individual summands, we require the impulse responses α RF,p (t). An exemplary model of degree 16 was obtained for the discrete wideband PA ZVA-183G-S+. The same amplifier was utilized in the final measurement setup described in Section VII. Given a known input signal with varying power at a Tx carrier frequency of 900 MHz, the PA output was captured at 900 MHz, 1.8 GHz and 2.7 GHz. Based on these measurements we conclude that a memoryless polynomial model characterizes the PA with sufficient accuracy for our application. Consequently, the α RF,p (t) simplify to weighted Dirac impulses α p δ(t), where the α p = [α] p are given by: The coefficient α 1 approximately corresponds to a linear gain of 39 dB, which complies with the specification of the PA [39]. The corresponding transfer characteristic according to (9) is depicted in Fig. 2. The PA is almost linear for input powers up to −14 dBm. The lower part of the graphic shows the deviation from the ideal characteristic. Clearly, we operate the PA below the 1 dB compression point for the given input power range.
In the analog Rx BB, the Tx leakage leads to a total interference power of P Int = E [|y Int BB (t)| 2 ], where we account for the bandlimiting effect of the CSF and the DC removal. In Table 1, P Int is listed for selected PA output powers P PA, out at the fundamental. For comparison, the Rx signal power in the LTE-10 reference sensitivity case can be as low as −97 dBm [3] and an exemplary Rx noise floor is −101 dBm at 75 • C [40]. Obviously, for moderate to high transmit power levels, the harmonic interference can have substantially higher power than the wanted Rx signal. For the values in Table 1, we neglect the frequency shaping of the leakage path and solely consider an attenuation of 50 dB [36]. A perfect cancellation of the component (x BB [n]) θ in the least-squares sense reduces the interference power to the value P Int,l>0 . The index l is used in accordance with (13)- (14) and indicates the removed summand. Instead of the absolute value, Table 1 gives the ratio R l>0 = P Int,l>0 /P Int since it directly represents a bound on the normalized estimation error of the AF. Moreover, we quantify the impact of canceling higherorder terms by including the ratio R l>1 = P Int,l>1 /P Int . P Int,l>1 is obtained by optimally canceling both, (x BB [n]) θ and |x BB [n]| 2 (x BB [n]) θ , in the least-squares sense. For l > 0, the bound on the cancellation performance for H3 is about −36 dB, a value which is not easily reached by standard N-LMS algorithms operating on LTE signals. This example indicates that the summands for l > 0 are hardly relevant for LMS based algorithms even for high input powers. Note that similar numbers are obtained for other allocation patterns and frequency offsets. Since in a self-interference cancellation application, adaptation time and consistent steady-state performance are crucial while maintaining a reasonable computational complexity, we focus on the cancellation of the first term (l = 0) only, which allows to employ a heavily optimized algorithm.

III. STATISTICAL ANALYSIS OF HARMONICS OF LTE SIGNALS
A prominent low-complex approach for real-time selfinterference cancellation is the N-LMS algorithm. Though, when applied to LTE sequences, it suffers from low adaptation rates and intolerable performance variations in the steady-state [41]. We aim to improve the LMS-based approach by employing a variant of the transform-domain (TD) concept [42]. This requires a sufficiently accurate knowledge of the reference signal statistics. In case of the Tx harmonics problem, the basis is an LTE uplink (UL) signal. In accordance with the power level considerations in the previous section, we focus on the second and third power of the UL signal.

A. SIGNAL MODEL OF HARMONICS
The generation of an UL symbolx can be modeled as a linear transformation of a vector of transmitted data The construction of the matrix A UL ∈ C N Tx sym ×L d is covered in [41]. The elements d k = [d] k of the data vector d contain the transmit data encoded by a modulation alphabet, the elements of the vectorx represent the BB time samples for transmission. In preparation for further considerations, we allow for additional transformations ofx and combine them to the single matrix:x One basic example for this extension is resampling. In order to compute the θ -th harmonic ofx, we have to increase the sampling rate to accommodate for the spectral widening caused by the exponentiation. The required upsampling by the integer factor U is represented by the modified convolution matrix H ↑U ∈ R UN Tx sym ×N Tx sym , built up using the impulse response h ↑U of an interpolation filter: The upsampling process corresponds to taking only the k U-th columns of a regular convolution matrix of size UN Tx sym × UN Tx sym . The choice of U depends on the allocated bandwidth inx and is discussed in Section IV-A. The upsampled symbol of length UN Tx sym is In this case, we have A mdl = H ↑U A UL , where the dimensions of the input and output vector of the model are identical. Yet, the model is not limited to this kind of transforms. For example, we might also extend the data vector to include multiple symbols and use the additional samples to cut off the settling of the interpolation filter. In the following, we derive the general auto-covariance matrices ofx H2 =x 2 mdl andx H3 =x 3 mdl , where in the exponent shall denote an element-wise operation.

B. HIGHER-ORDER MOMENTS OF MODULATED DATA SYMBOLS
The computation of the auto-covariance matrices ofx H2 andx H3 requires several moments of the data symbols d k . We assume an independent and identical distribution of the complex discrete random variables d k . In the following we omit the index k for simplicity and write d = d I + jd Q . The distribution of d is determined by the underlying modulation alphabet but has always a mean of zero due to the inherent symmetry. Examples for LTE are BPSK, 1 QPSK 2 or M-QAM, 3 where the first is real-valued and primarily used to transmit control information [43].
We start by deriving the second moment, also known as the variance which depends on the constellation. QPSK is included in the M-QAM case with M = 4. We also provide the fourth moment and the sixth moment As a first step, we expressed the moments of d in terms of the moments of its I-and Q-components. This reveals that we need to evaluate σ  using the Bernoulli polynomials B n (x) [31]. These show the following properties for a non-negative integer n and x ∈ R where B n is the n-th Bernoulli number. Note that in the derivation, we assumed n to be even, since for odd n the moments σ (n) d,I vanish due to the symmetry of the constellation mappings. For convenience, we provide evaluations of (40) for n = 2, 4, 6: Often the data symbols are normalized to variance 1 to allow for easier comparison of different modulation alphabets, leading to the momentš Besides the covariance, in general also the pseudocovariance is required for second-order characterization of a complex random variable [44], [45].
The third moment of d, which is also required in subsequent calculations, is This holds because E [d I/Q ] = 0 and E [d 3 I/Q ] = 0, independent of the constellation.

C. AUTO-COVARIANCE MATRIX OF LTE UL SYMBOL 1) SECOND HARMONIC
We start with the auto-covariance matrix of the second harmonicx H2 by inserting into the definition: The elements of the first term E [x H2x H H2 ] at the position (k, r) are with the matrix elements a k,l = [A mdl ] k,l . Utilizing the linearity of the expectation operator allows to exchange it with the summation once the product of the two sums is expanded. E [d] = 0 allows to simplify the expression, leading to Generalizing the index scheme of the individual elements leads to the full matrix where The off-diagonal entries of C 1 are obtained by assuming independence of the data symbols, which allows to split the expectation E [d 2 l (d 2 s ) * ] for l = s. For the second term in (51) we require the mean ofx H2 , which is given by Clearly, the mean is non-zero only in case of a BPSK alphabet. The second term in matrix form is where 1 K×L is a K × L matrix of ones. By combining the results (54) and (57), we retrieve the auto-covariance matrix of the second harmonic.
2) THIRD HARMONIC The definition of the auto-covariance matrix for the third harmonic is The evaluation of the expectations requires the expansion of the cubic of a sum, which is achieved by means of the multinomial theorem [46]. An advantageous general formulation for the third power of N summands is Note that some terms of the sums do not exist due to empty index ranges. Nevertheless, this notation is used here, as it follows the conventions in literature. Using this equality and the elements x k = [x] k , the elements of the first term in (59) are with The expansion of x 3 r is identical, except for an index change from k, l, m and n to r, s, t and u, respectively. The product k (x 3 r ) * contains various combinations of data symbols with different indices where a majority is zero for M-QAM, but remains for BPSK. To limit the complexity of the resulting expressions, we do not consider BPSK in the following. Since this constellation mapping is mostly used for control channels [43], [47], this decision does not limit the applicability of the proposed self-interference cancellation approach significantly. The remaining summands are The mean ofx H3 is

D. REFERENCE SIGNAL STATISTICS
In order to enable a linear estimation of the combined impulse responseα Hθ BB,0 [n] * h TxL BB [n], the reference signal of the AF has to contain the respective nonlinearities. This concept is referred to as Hammerstein model [28]. We choose the reference signal to be f Rx s is the baseband sampling rate of the Rx path, defined by the LTE standard. The subscripts ↑ U and ↓ D and the finer time grid n U in (66) indicate the resampling involved in this computation. This processing scheme is depicted in the digital BB at the right side of Fig. 1. After resampling, the harmonic of the Tx BB data is computed. Next, a possible frequency offset is compensated before the signal is limited to the channel bandwidth by the CSF. Furthermore, the DC has to be eliminated in order to match the DC cancellation in a standard Rx chain. The resulting signal can be passed to the AF as the reference signal.
In an implementation, it is beneficial to perform each step at the lowest possible sampling rate. We already covered the upsampling of a single symbol by a factor of U. Similarly, the downsampling by the integer factor D can also be modeled by a modified convolution matrix. It is obtained by taking only the kD-th rows of a regular convolution matrix of size DN Tx sym × DN Tx sym . The downsampling matrix H ↓D ∈ R N Tx sym ×DN Tx sym , which selects the first polyphase component, is given by where h ↓D [n] is the impulse response of the decimation filter. The ratio between U and D is chosen with respect to the Tx bandwidth, the harmonic order and the Rx bandwidth. Frequency offsets (especially if large) f have to be considered, too. For the TD-LMS, we need the steady-state statistics of the reference signal (66). Therefore, we have to ensure that all involved filters are settled for the calculation of the statistics. We also need to include the effects of the transition from symbol to symbol. This is achieved by extending the sample vector compared to (32). The required length is where Q up , Q dn , Q c and Q af are the lengths of the interpolation filter, the decimation filter, the combined CSF and DC filter and the AF, respectively. N Rx sym = U/D · N Tx sym is the length of a symbol at the Rx sampling rate and shall be an integer. It is reasonable to assume that the corresponding constraints on U and D are fulfilled because of the allowed channel bandwidths in the LTE-A standard. Extending the sample vector requires taking several entire symbols S = L x,ext /N Tx sym because the single-carrier frequency-division multiple access (SC-FDMA) modulation scheme does not enable a finer granularity. The corresponding linear transform is with H ↑U,ext ∈ R UL x,ext ×L x,ext . Based on this model, the autocovariance matrices Cxx ,H2 and Cxx ,H3 can be directly calculated from (58) and (65), respectively. Following (66), the next step after the exponentiation is the frequency shift by f , which is conveniently represented by the diagonal matrix where the settling of each filter stage is removed directly after applying the filter. I K is the K × K identity matrix. The final length of the sequence is L af = L c − Q c + 1.

IV. LMS-BASED SELF-INTERFERENCE CANCELLATION
Combining the insights gained in the previous sections, we now summarize the cancellation concept for second-and third-order Tx harmonics. Based on the second-order statistics derived in Section III-D, we propose a novel variant of the TD-LMS algorithm. It avoids an online power estimation by employing a precomputed, averaged power normalization for non-wide-sense stationary reference signals. The resulting algorithm is tailored to the Tx harmonics problem and shall be abbreviated as model-based for harmonics (MBH)-LMS in the following. Besides the mathematical formulation of the MBH-LMS, we cover various details relevant for implementation, such as the design of the resampling filters and the CSF, and the total computational complexity.

A. IMPLEMENTATION OF REFERENCE SIGNAL GENERATION
The reference sequence of the AF is computed according to (66). The first step is the upsampling of the Tx BB stream. The factor U is determined by the harmonic order θ , the frequency contents of x BB [n] and the frequency shift f . The frequency content results from the actual allocation. For LTE, the allocation granularity of UL signals is defined in terms of RBs, where each RB commonly corresponds to 12 subcarriers with a subcarrier spacing of 15 kHz. Since other spacings are possible, we use the general notation f sc . For a total of R allowable RBs and the index set ϒ of actually allocated RBs, f min and f max are given by R is assumed to be even. Consequently, the upsampling factor is constrained by Larger values of U might be required to ensure that Uf Tx s is an integer multiple of f Rx s in the Rx. With the choice of U, the factor D = Uf Tx s /f Rx s is fixed, too. The selection of U with respect to the actual allocation pattern instead of the full Tx bandwidth leads to a complexity reduction. However, in a practical implementation, often only a number of switchable resampling stages with fixed factors are available. Table 2 shows exemplary design specifications for interpolation and decimation filters with the prime factors 2 and 3 assuming the full Tx bandwidth and LTE-10 signals. f pass , f stop , f filt s and A stop are the passband edge frequency, the stopband edge frequency, the sampling rate of the filter itself and the minimum stopband attenuation, respectively. By scaling the sampling rate, and thus also the pass-and stopband frequencies, the provided resampling filters can be cascaded to achieve higher resampling factors. All stages are realized as polyphase finite impulse response (FIR) M-th-band filters [48], 4 where M is the interpolation or decimation factor. For this filter type, one polyphase component does not require any arithmetic operations, because it contains a single '1' and is zero otherwise. The other polyphase components are either symmetric in itself or there are laterally reversed pairs. In the latter case, the symmetry can be utilized in the interpolation stages by appropriately combining the pairs to even and odd impulse responses. In order to meet the specifications, the filter orders for ↑ 2, ↓ 2, ↑ 3 and ↓ 3 are 14, 14, 20 and 22, respectively.
The exponentiation (x BB [n]) θ is basically comprised of one or two complex multiplications, depending on θ . Some terms are redundant, thus, the number of real-valued operations can be reduced slightly. Depending on the available hardware, the frequency shift by f might be implemented using a multiplication with precomputed complex exponentials or the CORDIC 5 algorithm [49]. The specification of the CSF is provided in Table 2, too. It is implemented using infinite impulse response (IIR) design of order 14 to limit the complexity. Simulations showed that the inherent nonlinear phase of this design does not cause a noticeable reduction in cancellation performance. For the computation of the reference signal statistics, the impulse response h c [n] of the CSF is truncated appropriately. Due to the inevitable delay between the Tx data stream and the interference in the Rx path, the DC filter might be implemented by subtracting a constant on a symbol basis. An iterative mean formulation can be used to estimate the DC. This is an approximation of the method used in the derivation of the symbol auto-covariance matrix. All filter stages operate on complex-valued data, thus, their computational complexity doubles when assuming real-valued coefficients. The total number of real-valued additions and multiplications required to generate the reference sequence for the second-and third harmonic at the oversampling factors 2 and 3, respectively, is listed in Table 3. Clearly, the computational complexity is relatively low and feasible for a real-time implementation. 4. In the reference, this filter type is denoted as L-th-band filters. 5. Coordinate Rotation Digital Computer.

B. TRANSFORM-DOMAIN LMS ALGORITHM
with the assumption that x Hθ ref [0] is the first sample of a full symbol. J K is the reversal matrix of size K × K [50].
Similarly to the analysis in [41], the samples of the reference signal are highly correlated and, thus, the Eigenvalue spread of the auto-covariance matrix of x[n] is high for narrow allocations. The TD-LMS is a standard concept to improve the performance of the N-LMS algorithm in such cases [42] by transforming the reference sequence. Given a suitable, usually unitary, transformation T, the new input vector u[n] is

C uu [n] = E [u[n]u[n]
H ] is at least diagonally dominant. A common transformation is the discrete cosine transform (DCT), whose decorrelation properties are well-known [51]. Additionally, the DCT can be implemented efficiently for delay line vectors [52]. 6 It is suitable for the Tx harmonics problem, too. In general, the auto-covariance may be time-dependent, although the input is often assumed to be only slightly non-stationary [53]. After the transformation, the power of the Q af partially independent streams in u[n] is normalized as follows: In contrast to the standard TD-LMS, we circumvent these issues by using a precomputed power normalization. For the Tx harmonics problem, the reference signal x Hθ ref [n] is not wide-sense stationary, but rather cyclostationary with a period of one symbol length N Rx sym . Hence, we would require distinct values of p[n] at each time-step within the cycle. This is not feasible for a real-time implementation in a mobile 6. In the reference, this DCT variant is named sliding cosine transform (SCT). device, since the associated configuration and memory efforts for providing the allocation-dependent power values would be too high. Thus, we propose to average p[n] over one symbol including the symbol transition as follows: D Q af is the DCT matrix of size K × K andP Hθ = diag(p Hθ ).
We opt for the DCT type-I because it features the lowest complexity [52] without influencing the performance compared to the commonly used type-II. In any practical scenario, D Q af does not diagonalize the averaged autocovariance matrix of the reference signal.
The original input x[n] is composed as defined in (75), where its components are computed according to (66) for a selected harmonic order θ . We limit our analysis to constant stepsizes μ, but, of course, various published variable step-size schemes could be applied directly. The regularization ξ usually has a negligible impact on the adaptation rate but is useful to avoid a very small denominator in the update equation. Simulations of the proposed algorithm show that the originally required dynamic power normalization would not provide any performance gains compared to the final static variant.

V. STATE-OF-THE-ART HAMMERSTEIN APPROACHES
Besides the MBH-LMS, (parallel) Hammerstein models with classical AFs can be applied to the Tx harmonics problem, too. The reference signal generation described in Section IV-A can be combined with a standard N-LMS algorithm -or with RLS in order to improve the adaptation speed. However, RLS might be limited by the use of a single basis function (x BB [n]) θ since it cannot exceed the performance bounds R l>0 given in Table 1. The BB interference signal (29) indicates that additional basis functions are required to achieve higher cancellation performance. The general form of these functions is ψ Hθ While the MBH-LMS as well as all described state-ofthe-art algorithms are related to Hammerstein models, their computational complexity differs substantially. In Table 3 we compare the MBH-LMS to the N-LMS, the RLS and the PH-RLS in terms of real-valued additions, multiplications and divisions for an exemplary filter length of 16 [41]. Adding up the corresponding numbers for a chosen AF with the reference signal generation at a specific sampling rate (see Section IV-A) gives the total complexity of the Tx harmonic cancellation algorithm. The number of operations per sample for the N-LMS and the MBH-LMS is in a reasonable range for a real-time implementation. In contrast, the RLS and the PH-RLS require a substantially higher computational effort. In addition to the numbers listed in Table 3, the PH-RLS necessitates the basis functions x Hθ ref,l [n]. Since the PH-RLS only serves as a benchmark for the following simulations, we do not provide the exact number of operations for the basis function generation.

VI. SIMULATION RESULTS
The accuracy of an estimation algorithm and its computational complexity are often opposed properties. This rule also holds for the proposed MBH-LMS in relation to LMS-and RLS-based solutions, as shown in the following comprehensive performance simulations. The used setup is similar to the RF measurements described in Section VII, although all test signals are generated in the complex BB using the model (26). In accordance with the subsequent measurements, a discrete PA is modeled using the fitted coefficients (31). To avoid aliasing, the interference x Hθ BB is computed at an oversampling factor of 3. For the given partial LTE-10 allocation ϒ = {24, . . . , 33}, this bandwidth increase is sufficient to evaluate the polynomial of degree 16. The Rx signal is chosen to be fully allocated, with a power level of −90 dBm and signal-to-noise ratio (SNR) of 10 dB. A timing inaccuracy between the Tx reference signal and the interference is covered by including a fractional delay of 22 ns or one third of the sample time. The leakage path is represented by 6 different FIR coefficient sets based on measured stop-band impulse responses of real duplexers. All results are based on ensemble averaging over all leakage path models, with 50 simulation runs each.
We start our performance validation with the cancellation of the second harmonic. The signal-to-interference-plusnoise ratio (SINR) serves as the performance metric and is given by  Fig. 3 we depict the SINR for the MBH-LMS, the standard N-LMS, the RLS and the PH-RLS over a range of PA output powers. The RLS based algorithms clearly show the highest performance and, thus, are the reference for this problem when complexity is disregarded. The PH-RLS solely gives an advantage over the RLS for the highest transmit power but performs slightly worse for low and medium power levels. This is due to the high number of parameters to be estimated. This shows that restricting to a single basis function is sensible. The proposed MBH-LMS exceeds the N-LMS by up to 2.8 dB and keeps the SINR above 0 dB. An exception is the highest PA output power of 24 dBm, where all algorithms drop below the 0 dB limit. What is not shown in the plot: for this output power the MBH-LMS still improves the SINR by 35 dB compared to no cancellation. Furthermore, if no cancellation is applied at all, the 0 dB limit is already reached for low PA output powers of around 5 dBm, which highlights the relevance of this self-interference problem.
In addition to the steady-state performance, in Fig. 4, we illustrate the adaptation behavior of the four algorithms at a PA output power of 20 dBm. It is evaluated in terms of the normalized mean square error (NMSE): Again, the RLS is the reference and reaches steady-state after about 600 samples. Furthermore, with NMSE of around 38 dB, it comes close to the theoretical limit in the leastsquares sense of 44 dB provided in Table 1. The PH-RLS shows the same steady-state performance, albeit with a higher adaptation time. The MBH-LMS and the N-LMS reach 95 % of their final NMSE within the first symbol or about 1100 samples, which is often deemed to be sufficient for self-interference cancellation applications. At the symbol boundaries, all algorithms show error spikes, which are explained by a momentary bandwidth increase during the transition from one symbol to the next [41]. Unlike the H2 case, the ratios in Table 1 indicate that the power of (x BB [n]) 3 is lower compared to the other interference components. Thus, the expected performance . The RLS yields good performance but shows a limitation for high transmit powers due to our restriction to the dominating interference component. In this power region, the PH-RLS demonstrates performance gains, while it exhibits the lowest performance for low and medium power levels. The performance differences between the LMS based algorithms are smaller, too. Still, the MBH-LMS manages to outperform the N-LMS by up to 2.1 dB for the important high power levels, while its SINR is at most 0.4 dB below for lower powers. This is partly explained by the inherent bandwidth increase caused by the third power, limiting the effect of the DCT-based decorrelation employed by the MBH-LMS. Again, for the highest PA output power of 24 dBm all algorithms drop below 0 dB. Nonetheless, the MBH-LMS still improves the SINR by 28 dB compared to no cancellation.

VII. EVALUATION ON MEASUREMENT DATA
We provide experimental results in order to demonstrate the effectiveness of our proposed self-interference cancellation scheme for real-world scenarios. Fig. 6 schematically shows the measurement setup. One vector signal generator (VSG) provides the Tx signal up-converted to f Tx1 = 910 MHz, while the other VSG generates the Rx signal at f Rx2 = 1821.5 MHz. This represents the H2 interference example already used throughout the rest of the paper. The Tx signal is amplified by the wideband PA ZVA-183G-S+ and then passed to the Tx port of a band 8 duplexer. The Rx port of this duplexer is unused and, thus, terminated by 50 , while the antenna part is connected to a power combiner. The second port of the combiner is fed by the generated Rx signal. The output of the combiner drives  the antenna port of a band 3 duplexer. Here, the unused Tx port is terminated by 50 , while the Rx port is connected to vector signal analyzer (VSA) after passing through a DC block. The VSA is used for down-conversion and digitization, yielding a digital BB signal with an LTE-10 sampling rate of 15.36 MHz. A MATLAB script controls the whole measurement setup and evaluates the cancellation performance by processing the retrieved BB samples. The Tx signal is chosen to be an LTE-10 signal with QPSK modulation and 10 out of 50 RBs allocated at the positions ϒ = {24, . . . , 33}, which agrees with our simulation setup. However, the Rx signal is an LTE-10 signal with QPSK modulation and 25 out of 50 RBs allocated at the positions {26, . . . , 50}. Then, the Rx signal does not fully overlap with the interference in the power spectral density (PSD). This is different from the simulations but allows for a clear spectral visualization of the cancellation. The Rx signal has a power of −65 dBm, while the transmit signal at the PA input has −19 dBm, which corresponds to PA output power of about 20 dBm.
Before applying the cancellation, the Tx BB stream and the captured Rx BB signal have to be time aligned. This is necessary to compensate the unknown delay of the setup and the leakage path. Therefore, the cross correlation of these two signals is computed and the peak is used to align the signals. The AFs are configured similarly to the simulation setup, using 16 weights and a step-size suitable for the interference power. The first symbol of the first slot is assumed to be available for adaptation and excluded for calculating the PSD. Measurement results in terms of the PSD before cancellation and after applying the N-LMS and the MBH-LMS are shown in Fig. 7. Obviously, the H2 interference is the dominant part in the total Rx signal. Both algorithms provide substantial cancellation below the wanted Rx signal, though the MBH-LMS performs better as predicted by the simulations. Obviously, the wanted Rx occupying the right half of the PSD is recovered after cancellation.

VIII. CONCLUSION
We presented a novel self-interference cancellation scheme tailored to PA induced Tx harmonics. Based on a detailed modeling of transmitter nonlinearities, we derived the interference BB model and quantified the interference power levels. First, this demonstrated that Tx harmonics can desensitize the Rx severely due to the high power difference of the Tx and Rx signals. Second, the analysis allowed to extract the most dominating interference component which subsequently is targeted in the cancellation scheme. The proposed solution consists of a static reference signal generation, which models the nonlinearity, and the MBH-LMS, a low-complex LMS variant which estimates the leakage path. Deriving the autocovariance matrix for the reference signal of the AF allowed to improve the average steady-state performance by employing a novel variant of the TD-LMS concept. Furthermore, we covered implementation aspects such as the design of appropriate resampling stages and the computational complexity. This analysis confirmed that our approach is capable of real-time adaptation and cancellation. Simulations and validation via measurement data showed effective cancellation in real-world scenarios.