Complex-valued Adaptive System Identification via Low-Rank Tensor Decomposition

Machine learning (ML) and tensor-based methods have been of significant interest for the scientific community for the last few decades. In a previous work we presented a novel tensor-based system identification framework to ease the computational burden of tensor-only architectures while still being able to achieve exceptionally good performance. However, the derived approach only allows to process real-valued problems and is therefore not directly applicable on a wide range of signal processing and communications problems, which often deal with complex-valued systems. In this work we therefore derive two new architectures to allow the processing of complex-valued signals, and show that these extensions are able to surpass the trivial, complex-valued extension of the original architecture in terms of performance, while only requiring a slight overhead in computational resources to allow for complex-valued operations.


I. INTRODUCTION
D EEP LEARNING (DL) and neural networks (NNs) [1] are among the most popular techniques of machine learning (ML), and are broadly used in signal processing, among other disciplines [2]- [5]. However, the umbrella of ML covers many other techniques, such as support vector machines [6], [7], kernel adaptive filters [8], [9], random forests [10], [11] and tensor-based estimators [12]. Although it has been shown that tensor-based methods can deliver on par, or even better performance than other methods [13], [14], and can be used in a variety of applications [12], [15]- [20], they are usually disregarded due to their high memory-and computational footprint needed to approximate a given system.
In an attempt to reduce complexity of tensor-based methods, we recently introduced [21] the combination of tensors with least mean squares (LMS) filters for system identification with minimal model knowledge, so called Wiener and Hammerstein models [22] or combinations thereof. We analyzed several of these combinations and came to the conclusion that the proposed methods cannot only outperform (or be on par with) architectures utilizing a single tensor or spline adaptive filters (SAFs), but significantly reduce complexity compared to these methods. However, a downside of the proposed architectures is, that they are only able to deal with real-valued input and output signals. Therefore, they are not suitable for a variety of signal processing and communications related problems.
In this work, we extend the theory of the tensor-LMS (TLMS) block, originally presented in [21] to allow complexvalued input and output signals. The presented theory can be trivially extended to all other architectures presented in [21] and hence is not repeated in this work. As will be shown, the resulting architectures, still keeping complexity at an absolute minimum, yield very good performance for the simulated scenario.

II. PRELIMINARIES AND NOTATION
Before reviewing the findings of [21] and presenting our extensions, this section briefly repeats the overall problem statement and mathematical notation used in [21], as it is also needed in the remainder of this work.

A. Preliminaries
Like the overall problem discussed in [21] (see Fig. 1), the aim is to approximate an unknown system with an adaptive filter by utilizing the same input signal and only observing the output . Naturally, the ideal output of the unknown system is subject to noise to yield the overall output = + . Besides updating the approximation of the system with each observed sample (i.e. one optimization step per timestep), the adaptive filter further assumes that the unknown system itself may not remain static over time [21].

B. Tensor Background
In this work, we adhere to the widely adopted definition of the term tensor as presented in [13], [14]. That is, a tensor may be represented as an -dimensional array, indexed by 1 , 2 , 3 , . . . , [21]. We denote a tensor by X, and, like The original tensor-LMS architecture from [21], which is only able to handle real valued data.
in [21], we use the notations ⊚, ⊛, ⊙ to refer to the outer (tensor) product, the Hadamard product and the Khatri-Rao product, respectively. A rank-1 tensor X of order (also called -way tensor), is the outer product of a collection of vectorsâ ∈ R ×1 , which can also be written as [21] X( 1 , 2 , . . . , withâ ( ) = A ( , 1). Further, any -way tensor X with a higher rank than one can be decomposed into a sum of rank-1 tensors [21] Additionally, the Hadamard product over all matrices A with ≠ ′ is defined as [21] The discretization used to obtain an index for the tensor input is given by the function [21] if Bins is even and with Δ denoting the discretization interval. Lastly, the superscripts (·) T , (·) H , (·) * denote the transpose, Hermitian transpose and conjugate, respectively.

III. TLMS -REVIEW
Before presenting the extensions proposed in this paper, this section reviews the TLMS approach presented in [21] and depicted in Fig. 2 to introduce the most important notations. As the name TLMS implies, this adaptive filter consists of a tensor followed by an LMS filter, hence it is suitable for Hammerstein-type problems (i.e. nonlinearity before linear block). The overall output of this system is denoted byˆ and allows to express the joint cost function as [21] denotes the tapped delay line (TDL) block and where is the desired TLMS output [21].
In order to derive an update for the coefficients A ′ of the tensor, the gradient of the cost function is approximated by [21] G TLMS, ′ , := z This approximation˜ (i.e. for A ′ , the time is omitted) is necessary to be able to take the derivative with respect to A ′ [21], as also can be seen in [23]. Therefore, the tensor update is given by which is evaluated for ′ = 1, . . . , and with where 0 ′ , − +1 −1× ∈ R ′ , − +1 −1× denotes a matrix with all elements being zero [21].

IV. COMPLEX-VALUED TLMS
In order to deal with complex-valued Hammerstein models (i.e. a nonlinearity followed by a linear filter) we propose three architectures based on the TLMS from [21]. For all architectures, the input is the same (and complex-valued), whereby the real and imaginary parts of this signal are first stacked on top of each other to yield Fig. 3. This vector is then discretized according to (6) and serves as an input to a two dimensional TDL (that is, two TDLs working on the rows of a matrix). The resulting output i of the TDL then serves as the input for the tensor(s). After this block, the three architectures differ in their operations, which is described in detail in the following.
Starting from the standard TLMS architecture, the first, obvious choice (denoted as TLMS-2R) is depicted in Fig. 3a and simply uses two realizations of the same architecture for the real and imaginary paths (denoted by the superscripts R and C), respectively. This straightforward concept basically results in the same equations as for the simple TLMS case [21]. However, this approach lacks the ability to utilize connections between the real and imaginary parts of the system, as they appear in complex multiplications.
To alleviate this drawback, the second architecture (TTLMS) utilizes a complex-valued LMS (CLMS) for the linear part of the system and two tensors for the real and imaginary parts of the nonlinear part (cf. Fig. 3b). This approach enables the CLMS to leverage the interplay of real and imaginary parts of the complex signal while the update of the tensors still requires only small adaptions compared to Sec. III, detailed in the following.
By re-defining the cost function as the update equation for the normalized CLMS becomes The two tensors are updated via for all ∈ [1, ], with While this representation may reduce repetition of blocks compared to the first case, the tensors are still not able to make full use of the complex gradient. The final architecture (CTLMS) shown in Fig. 3c, reduces to (mostly) the same architecture as shown in Fig. 2. The difference however is, that the input signal is split into its real and imaginary parts and the LMS as well as the tensor are now fully complexvalued in their outputs. This, of course, necessitates to derive new update equations for the tensor modeling the nonlinearity. This can be achieved by utilizing Wirtinger's calculus [24] and applying the complex chain rule to the cost function. The update for the complex-valued tensor becomes for all ∈ [1, ], with S ′ , according to (13). The CLMS weight update stays the same as in (16). This change now fully supports the complex domain without having to repeat filters (i.e. two tensor or LMS blocks).
In terms of normalization, the first two architectures utilize the same update for Ten as presented in [21], the normalization of the CLMS is straight forward as well and works as presented in (16) for both, TTLMS and CTLMS. In order to normalize the complex-valued tensor in the CTLMS architecture, the same principle as for a real-valued one is used, i.e. the error is approximated via the first order complexvalued Taylor expansion [25] +1 ≈ + follows directly from (20) and therefore, To maintain convergence of the algorithm [26], the norm of the error +1 has to be smaller or equal than the norm of the right side of equation (25). This can be achieved when Solving this equation for Ten , the normalization can be introduced by replacing Ten in (20) by The computational complexity in terms of additions, multiplications and divisions for all architectures is depicted in Table I. The complexity is the least for the first architecture, which just repeats the tensor and LMS blocks for both paths, and is highest with the fully complex-valued architecture. However, it is important to note that the fully complex implementation is able to leverage the full information present in the real and imaginary parts of all signals, while the other two architectures are not able to achieve this.

VI. SIMULATIONS
To evaluate the proposed models for their performance on a complex-valued system identification example, we chose the well-known case of transmitter (Tx) induced harmonics which can occur in 4G/5G cellular transceivers in the case of downlink carrier aggregation coupled with a non-ideal Tx power amplifier (PA). For more details on the exact signal model, the reader is referred to [27]. Additionally, this model simulates saturation behavior of the PA which might occur if the Tx signal power is close to the limit of the PA's dynamic range [21]. Therefore, the interference signal we want to estimate is = h T Dup z + , where h Dup ∈ C constitutes the complex-valued stop-band frequency response of a linear filter (the so-called duplexer), constitutes a noise term, are the complex-valued transmit samples after the PA and is modeled as colored noise, i.e. = −1 + √ 1 − 2 , with denoting complex-valued white Gaussian noise. The used evaluation metric is the mean squared error (MSE), defined as MSE dB = 10 log 10 where ( ) is the desired signal,ˆ ( ) is the estimate at time of the -th run, and is the total number of runs. For the simulations we chose a filter order of = 16, the memory, i.e. dimensionality , of all tensors is two (one dimension for the real-and imaginary parts of the input signal, respectively), the rank of all tensors has been chosen empirically and is set to = 10, and the length of the (C)LMS filters has been chosen to coincide with . The step-sizes for the tensors are Ten = 0.009, Ten = 0.009, Ten = 0.075 and for the (C)LMS LMS = 0.009, LMS = 0.005, LMS = 0.009, for the first, second and third architectures shown in Fig. 3, respectively, and all regularization parameters have been set to = 10 −12 . Lastly, the signal resides 10 dB above . The comparison of all three proposed architectures is shown in Fig. 4, where the simulation was repeated and averaged over = 20 different real-life duplexer fittings. It can be seen that the first architecture, which just repeats the processing pipeline of the original real-valued algorithm twice, performs the worst. Using a complex-valued LMS filter with two tensors already drastically improves performance, and as expected, the fully complex-valued architecture yields the best overall performance.

VII. CONCLUSION
In this paper we extended current state-of-the-art architectures for system identification via a joint tensor-LMS based framework to complex valued models. We proposed three different architectures that comply with complex-valued models. The first architecture simply repeats the estimation blocks (tensor and LMS) for the real and imaginary paths. While this is the most straight-forward approach, it yields poor performance as the two paths cannot interact with each other. To mitigate this problem for the linear subsystem, the LMS block has been replaced with a CLMS filter in our second architecture, which showed moderate improvements compared to the previous case. To fully leverage the complex valued approach, we finally proposed an architecture that models all sub-systems in a complex manner, i.e. via a complex-valued tensor and CLMS filter. This final solution significantly outperforms both other architectures in our considered application.