A Block-Based LMS Using the Walsh Transform for Digital Predistortion of Power Amplifiers

A novel non-linear adaptive filter for the linearization of Radio Frequency (RF) Power Amplifiers (PAs) is presented. In this study, we aim at reducing the Digital Predistortion (DPD) complexity and enhancing its convergence speed for reduced computation time. The Walsh Transform is used as a computational basis for evaluating a predistorter (PD) model. The mathematical properties of the Walsh theory are exploited to adapt a memory polynomial (MP) in the sequency domain. A block-based Walsh LMS is introduced to seek the optimal PD coefficients. Simulations and results of linearization of class-AB PAs are exhibited. The comparison with conventional DPD algorithms shows that the proposed method converges 10 times faster with a reduction of 12% of the complexity for similar accuracy. Finally, a complete DPD architecture based on the Walsh Transform is proposed.

Abstract-A novel non-linear adaptive filter for the linearization of Radio Frequency (RF) Power Amplifiers (PAs) is presented.In this study, we aim at reducing the Digital Predistortion (DPD) complexity and enhancing its convergence speed for reduced computation time.The Walsh Transform is used as a computational basis for evaluating a predistorter (PD) model.The mathematical properties of the Walsh theory are exploited to adapt a memory polynomial (MP) in the sequency domain.A block-based Walsh LMS is introduced to seek the optimal PD coefficients.Simulations and results of linearization of class-AB PAs are exhibited.The comparison with conventional DPD algorithms shows that the proposed method converges 10 times faster with a reduction of 12% of the complexity for similar accuracy.Finally, a complete DPD architecture based on the Walsh Transform is proposed.

NOTATION AND DEFINITION
Scalars are in italics and vectors or matrices are in bold: x [1] x [2] . . .
For complex data, the transpose (.) T must be replaced by the Hermitian (complex conjugate) transpose (.) H .
Uppercase variables correspond to the Fast Walsh Transform (FWT) of lowercase variables: The logical convolution, also known as dyadic convolution is defined by the symbol ⊛.

I. INTRODUCTION
D IGITAL communications techniques (OFDM, carrier aggregation) increase considerably the spectral efficiency of communications to meet their exponential growth.However, these techniques generate signals with a high Peakto-Average-Power Ratio (PAPR).They are very sensitive to the non-linearities and memory effects introduced by the Power Amplifier (PA) [1], [2].These effects distort the amplitude and phase of the useful signal, leading to asymmetric spectral regrowth in adjacent communication channels [3], [4], [5], [6], [7] and degradation of the error vector magnitude (EVM) [8].Linearization techniques minimize distortions and allow the PA to operate near their saturation point which improves the linearity versus power added efficiency (PAE) trade-off [9], [10].DPD techniques have been investigated in the past decades as powerful linearization solutions.These methods invert amplitude, phase, and memory effects non-linearities with a predistorter (PD).Depending on the position of the PD, the DPD can operate in baseband (BB) as shown in Fig. 1, intermediate frequencies (IF), or radio frequencies (RF) [11].
The most adopted method is the BB pre-distortion which operates at the lowest operating frequency compared to IF and RF and can be easily implemented on a DSP or an FPGA.In the literature, the derived functions of the Volterra series [12], [13] are conventionally used to model the dynamic behavior of the PA as well as its inverse characteristic.These include memory polynomial (MP) [14], [15], generalized memory polynomial (GMP) [16], Hammerstein and Wiener models [17] and neural networks [18].However, the number of required coefficients becomes larger as the complexity of the models increases to reach mandatory EVM and adjacent channel power ratio (ACPR).This results in a time-consuming process to compute the DPD.Researches have focused on improving PD mathematical models [19], [20], [21] in an effort to reduce the number of terms to save time without losing accuracy.Another way to reduce the processing time is to lower the complexity and enhance the convergence speed of Least Mean Square algorithms (LMS) [15], [22] or Recursive Least Square algorithms (RLS) [23], [24], also known as adaptive filters, which compute the PD terms.While these methods provide a scalar approach to data processing, other comparable algorithms employ a vector approach [25] to enhance their performance, at the expense of hardware resources [26].To address this concern, researchers have proposed the use of transforms, such as the Fourier Transform [25], [27].In this paper, we use a block LMS associated with the Walsh Transform [28] to compute the coefficients of linear and non-linear adaptive filters with a 10-time faster convergence and 12% reduction of complexity compared to conventional scalar LMS approaches.Also, similar linearization performances to these methods are obtained with an MP PD computed with the proposed method.One potential application of the proposed method is the implementation of DPD in energy-efficient devices like smartphones.The method enables the rapid development of a straightforward PD MP model, which may not necessarily deliver the best DPD performances.Nevertheless, this model is effective in meeting spectral pollution regulations while keeping costs low, as it does not require an excessive amount of energy for computation.
The paper is organized as follows: Section II introduces a block-based LMS using the Walsh Transform.Section III presents simulations and performance results of the proposed adaptive filter.In Section IV, experimental results are presented as a proof of concept (PoC).Section V presents a complete DPD architecture based on the Walsh Transform.Finally, conclusions and future work are stated in Section VI.

II. THE WALSH-BASED BLOCK LMS A. The Walsh Transform
The Walsh Transform and the Fourier Transform, among other transformations, are a possible base of the adaptive filtering process.The Walsh functions W W W are an ordered set of rectangular waveforms and only take the values +1 and −1. Figure 2 displays the first 8 Walsh sequences.The binary nature of the sequences makes them very suitable for digital implementation.Therefore, the discrete Walsh Transform (DWT) described by Eq. 1 can be computed much faster than the discrete Fourier Transform (DFT).
The Walsh Transform decomposes the time-discrete signal into a sequency-discrete signal.The set of Walsh coefficients X[i] represents the sequency spectrum [29] of x[n] in the same sense that a set of Fourier coefficients represents a frequency spectrum.The sequency domain convolution, also known as dyadic convolution [29], is described in Eq. 2.1: With j ⊕ i the dyadic sum of integers j an i: 2) The Walsh Transform also has analog properties to the Fourier Transform.A dyadic convolution in the Walsh domain is equivalent to a point-wise multiplication in the time domain.
An N -point DWT can also be performed with matrix multiplication as an N -point Fast Walsh Transform (FWT).As an example for N = 4: With: The Walsh matrix is symmetric, hence W W W = W W W T .Moreover, as the Walsh sequences constitute an orthonormal basis we have: Authorized licensed use limited to the terms of the applicable license agreement with IEEE.Restrictions apply.Therefore W W W = W W W T = W W W −1 .A Walsh power spectrum can also be computed.The Walsh power spectrum P P P of a signal x can be obtained after averaging A energy spectra windows This theorem is known as the sequency Wiener-Khintchine theorem [29].Comparisons of Fourier power spectra and Walsh power spectra are illustrated in the following example.
The input signal x is composed of two tones at frequency bins f b [1] = 15 and f b [2] = 19.A non-linear function (representing a PA) is applied to the signal.The Fourier power spectra of x and y are shown in Fig. 3(a).The non-linearities of the PA create intermodulation products (IM) in y at frequency bins Similarly, as illustrated in Fig. 3(b), the Walsh power spectra also display intermodulation products at several sequency bins.From this result, one can extract the non-linearities of a system to feed a non-linear adaptive filtering method based on the Walsh Transform as described in the following sub-section.

B. The Walsh Block LMS (WLMS)
WLMS algorithm combines the convergence properties of an overlap-save block-based LMS algorithm with the Walsh Transform to reduce its computational complexity.The block LMS algorithm is fully described in Appendix A-D, it consists of the following set of equations: The following base change formula is applied to pass χ χ χ j into the Walsh domain: Where W W W 2N is the symmetric matrix that applies a 2N -DWT.Since χ χ χ j is circulant, it has been shown in [30] that X X X j is diagonal by part and antisymmetric.The structure of X X X j is shown in Fig. 4(a) for 2N = 8.Depending on the application, the Walsh sequences forming W W W 2N can be arranged in an order called the Hadamard order [31].Figure 4(d) illustrates the Hadamard matrix structure H H H 2N .Using the Hadamard matrix in Eq. 8, X X X j becomes block-diagonal and antisymmetric [30].
Using this diagonal structure, the number of non-zero coefficients (N Z ) of X X X j can be deduced.It is defined in Eq. 9.
, y y y j , e e e j , c c c j respectively.The index 2N in W W W 2N no longer appears in the following equations for simplicity.The output of the filter in the WLMS is: Since X X X j is block-diagonal and antisymmetric, the number of multiplications N M to process Eq. 10 is equal to N Z and the number of additions N A is: The error vector in the Walsh domain is: Authorized licensed use limited to the terms of the applicable license agreement with IEEE.Restrictions apply.By factoring by W W W −1 : with: According to the gradient method, the updating equation of the filter coefficients is: with ∇ ∇ ∇C C C j the estimated gradient of the squared error in the Walsh domain.The squared error is defined as: , therefore: Moreover, since f has its right N × N bottom corner equal to I I I, it is known that f f f f f f = f f f .Therefore, it can be stated: Equation 13.2 can be restated as: We assume that X X X j and D D D j are independent of C C C j and that the system is time-invariant.Taking the partial derivative of the squared error in regard to C C C j : Therefore the updating equation of the filter coefficients in the WLMS is: µ W has the same purpose as µ in the conventional LMS algorithm.Figure 5 shows the WLMS block diagram.
In the WLMS, the estimated gradient of the squared error is computed as an average of the data instead of its instantaneous value as in the scalar time domain approach.Therefore, it is a more accurate representation of the true gradient and leads to faster convergence of the algorithm [25].
Equation 12.2 and 13.7 perform a linear system identification.However, in the case of DPD, the adaptive filter is non-linear.As explained in the introduction, the derived functions of the Volterra series are conventionally used to model a non-linear system.Therefore, for a scalar LMS approach, given by Eq. 26.1 to Eq. 26.3 in Appendix A-A can be restated as: where Q is the non-linearity order of the system, M the memory depth and x x x k the state vector of MP input samples defined as: To present the form of the block LMS needed in the WLMS, the input data set is segmented into Q vectors (one for each q order) of length 2N points vectors that overlap on the last N points every j iteration.Also, M is assumed to be less than or equal to N .If M is not a power of two, N is the nearest higher power of 2 to M .
Therefore, the adaptive filter is c c c j and is also segmented into Q sub-adaptive filters of length 2N with their last 2N − M values set to 0. c c c T j,q = (c j,q0 , c 2j,q1 , . . ., c j,qM −1 , 0, . . ., 0) The final output vector is the sum of each circular convolution between x x x j,q and c c c j,q : With its last N points being the result of the linear convolution.The error vector is the same as in Eq. 33 and the update of the coefficients is defined by: c c c j+1,q = c c c j,q + 2µχ χ χ T j,q e e e j (18) The WLMS equations for the identification of an MP model are derived by running Eq. 16 to Eq. 18 through the same process as described above.The base change: The model adaptation: The error equation is the same in Eq. 12.2.The update of each sub-filter's coefficients is Eq. 21.

III. SIMULATION RESULTS AND COMPARISON WITH OTHER TECHNIQUES
This section presents a comparison between the WLMS algorithm and conventional algorithms: LMS, Normalized LMS (NLMS), and Recursive Least Square (RLS).For each algorithm, the study assesses various factors such as convergence speed, accuracy, sensitivity to noise, and complexity.Additional details on the conventional methods can be found in Appendix A.

A. Convergence Speed, Accuracy, and Noise Simulations
First, the accuracy and convergence speed are evaluated throughout the Normalized Mean Square Error (NMSE) given by Eq. 22.
The system to be identified is a FIR filter of length N = 32.The input is a complex randomized data set drawn from the standard normal distribution and quantified on 10 bits.For each algorithm, the convergence constant was chosen to achieve the fastest convergence rate without causing the algorithm to diverge: The NMSE evolution of each algorithm was evaluated over J = 1000 iterations.This experience was repeated 100 times to evaluate the average behavior of each algorithm.Figure 6 displays the simulation results.
Authorized licensed use limited to the terms of the applicable license agreement with IEEE.Restrictions apply.
One can notice from the results that the RLS and WLMS algorithms demonstrate a much faster convergence rate compared to the LMS and NLMS algorithms.To achieve an NMSE of -40dB, RLS and WLMS required only 150 and 160 iterations, respectively, while LMS and NLMS took 400 and 600 iterations, respectively.Furthermore, the minimum NMSE threshold, which is noise and quantization dependent [32], [33] is comparatively lower in the WLMS than in LMS, NLMS, and RLS.The reason for this is that the estimation of the error gradient in WLMS is calculated as an average of the data rather than the instantaneous value, effectively minimizing the influence of noise.Consequently, fluctuations in the NMSE are significantly reduced in WLMS compared to the other methods.
Although this study initially focused on linear system estimation through an equation, a second study was conducted to benchmark the DPD algorithms for the estimation of MP models to linearize a PA.The simulation utilized an NXP Airfast LDMOS PA with an operating frequency of 3.6-3.8GHz,29dB gain, and an output compression point at 47dBm with inherent non-linearities and memory effects due to its Doherty topology.The PA model was available in Mathworks [34].The study used two different inputs.The first input was a 16-QAM Root Raised Cosinus filter (RRC) with 50MHz bandwidth, 5.6dBm input power, and 7.5dB PAPR.The second input was a 64-QAM OFDM, 50MHz bandwidth, 2.7dBm input power, and 10.7dB PAPR.For both inputs, the parameters of the PD MP model were Q = 5 and M = 8.The results of the study are presented in Fig. 7(a) and Fig. 7(b), which illustrate the power spectra of the PA output and the evolution of the ACPR reduction for each algorithm over 20 iterations with the second input.For each algorithm, the convergence parameters were chosen to achieve the fastest convergence rate without causing the algorithm to diverge: The parameters of the PD are equal in the four algorithms, therefore the PD coefficients will converge to the same values at some point, leading to the same ACPR reduction.However, as illustrated in Fig. 7(b), the WLMS has the second-fastest convergence speed and accuracy for a finite number of iterations with the chosen parameters.Also, a few non-symmetric spectral regrowths, resulting from the memory effect of the PA [5], [6], [7], remain after the application of DPD.This phenomenon arises from the attempt to compensate for the memory effect by employing an inverse memory effect generated by the PD within the MP model.Due to the limited number of iterations in our study, the obtained model does not fully compensate for the memory effect.Table I presents a comparison of the performance of the MP DPD with each algorithm for both modulation schemes.The evaluated metrics are the ACPR over two different adjacent bandwidths (50MHz and 100MHz), NMSE and EVM.
From the Table I results, one can notice that RLS DPD generally provides the best performance in both modulation schemes, achieving the lowest ACPR, NMSE, and EVM values.However, it is important to note that the WLMS outperforms the LMS/NLMS methods in the given scenarios.As a first additional study, the behavior of the WLMS DPD was examined in both modulation scenarios by adjusting the parameters Q and M .Detailed simulation results are provided in Appendix B-A.Notably, for a wide range of parameter values, the WLMS algorithm exhibited consistent and stable behavior in both modulation schemes.
A second additional study investigated the impact of noise and SNR on the WLMS algorithm.The simulation was performed by varying the SNR range from 25dB to 50dB and the number of quantization bits from 5 to 16.The results of the simulation are presented in Appendix B-B.The WLMS algorithm demonstrated a stable behavior and did not diverge in the presence of noise.It was able to effectively linearize the PA at an SNR of 35dB and with quantization of 7 bits of data.

B. Complexity
This section provides a detailed analysis of the computational complexity of the algorithms.To simplify the complexity calculation, subtractions are considered additions.
To filter L points of data with a filter of size QM , the conventional LMS algorithm requires L iterations.In each iteration, the adaptation of the model and the update of the Authorized licensed use limited to the terms of the applicable license agreement with IEEE.Restrictions apply.To filter L points of data with a filter of size QM , the NLMS algorithm requires L iterations.In each iteration, the adaptation of the model needs QM additions and QM multiplications.The update of the coefficients needs 2QM additions and 2QM multiplications.Therefore the NLMS requires 3QM additions and multiplications in total.
To filter L points of data with a filter of size QM , the RLS algorithm requires L iterations.In each iteration, the adaptation of the model needs QM additions and QM multiplications.The update of R R R−1 requires 2(QM ) 2 additions and 3(QM ) 2 multiplications.The update of the coefficients needs (QM ) 2 additions and QM + (QM ) 2 multiplications.Therefore, the RLS requires QM + 2(QM ) 2 additions and 2QM + 4(QM ) 2 multiplications.
For the WLMS, since it has a block approach, to filter L points of data with a filter of size QM it needs L/N iterations (with N the nearest higher power of 2 to M ).2Q + 1 DWT are processed, each composed of 2N log 2 (2N ) additions.The adaptation of the model requires Q(2N + N A ) additions and QN M multiplications (cf.Eq. 9 and Eq.11).The update of the coefficients requires QN A additions and QN M multiplications.Therefore, the WLMS requires (2Q + 1)(2N log 2 (2N )) + 2Q(N + N A ) additions and 2QN M multiplications.
Table II compares the computational resources for LMS, NLMS, RLS and WLMS for different filter sizes M over L = M data points.
The WLMS algorithm has significantly lower computational complexity in terms of both additions and multiplications compared to the RLS.For example, when Q = 7 and M = 8, the WLMS algorithm only requires 2192 additions and 1204 multiplications, while RLS requires 75712 additions and 101696 multiplications.The NLMS algorithm is a direct competitor to the WLMS algorithm in terms of computational complexity.Both algorithms have a linear complexity in terms of the number of filter taps.The WLMS requires more additions than the NLMS since the Walsh Transform is only composed of additions and substractions.However, the number of multiplications required has been reduced thanks to the block-diagonal structure of X X X and becomes 12% lower than the number of multiplications in the NLMS when M is a power of two.Indeed, as the transformation can be applied to vectors having a size equal to a power of 2, when M is not a power of 2, it is necessary to apply a zero padding to verify this requirement making the computation more expensive.This phenomenon is shown in Fig. 8(a) and Fig. 8(b) displays the evolution of the ratio of multiplications between WLMS and NLMS (M u W LM S /M u N LM S ) for the different parameters.This phenomenon is also found in the ratio of the number of additions of both algorithms.

C. Comparative Performance
An overview of the comparative performances of the algorithms is presented in Fig. 9.The RLS algorithm has better performances in terms of convergence speed and accuracy compared to other conventional algorithms, but it may suffer from noisy systems and also consumes more energy due to its high complexity of calculus [33].Therefore, the RLS would be preferable in a DPD application where performance is a priority and noise/energy consumption is not a major issue.On the other hand, in terms of accuracy and convergence speed, the WLMS outperforms the LMS and NLMS algorithms while having a reduced computational complexity compared to the NLMS.Therefore, the WLMS is the preferred choice in a DPD application where convergence speed, accuracy, low-noise sensitivity, and energy consumption are important factors.For instance, in the case where energy consumption is a critical criterion for low-power 5G devices and meeting the minimum ACPR and EVM values is sufficient to comply with ETSI regulations [35], the WLMS algorithm may be a suitable option.

IV. EXPERIMENTS AND MEASUREMENT RESULTS
In order to validate the performances of the proposed algorithm, the WLMS DPD was applied on a 2.5GHz SiGe PA.
Authorized licensed use limited to the terms of the applicable license agreement with IEEE.Restrictions apply.

A. Experimental Setup
Figure 10 provides detailed information about the measurement setup.The signal processing is performed in MATLAB.The input signal of the PA is generated by the ROHDE & SCHWARZ (R&S) SMW200A Vector Signal Generator [36], which is USB 3.0 (VISA) controlled.On the other hand, the  R&S FSW Signal/Spectrum Analyzer [37] samples the output of the amplifier and is controlled with TCP/IP.The two devices are synchronized to minimize the frequency shift between their local oscillators.The PA that is being linearized is a deep AB-class PA.Its structure is a common emitter with a BFP780 SiGe transistor from Infineon.It operates at 2.5GHz, has a linear gain of 11dB, and its output 1dB-compression point is equal to 14dBm.Figure 11 shows the experimental setup.

B. Measurement Results
The signal used is a 16-QAM OFDM signal with a bandwidth of 50MHz, an average power level of -3dBm, and a PAPR of 7dB.The evaluation of an MP PD with a Q = 4 and M = 8 was performed with LMS, NLMS, RLS and WLMS.The evaluation was conducted on a dataset of 5000 samples and after 20 iterations.For each algorithm, the convergence parameters were chosen to achieve the fastest convergence rate without causing the algorithm to diverge: µ = 0.01 (LMS), µ n = 0.5 α = 3 (NLMS), δ = 1 λ = 0.99 (RLS), µ W = 0.05 (WLMS).The results are presented in Fig. 12 which displays the measured PA output spectra with and without DPD using the four different algorithms.Table III compares the ACPR obtained with each algorithm for a 50MHz and 100MHz adjacent channel bandwidth.Figure 13(a) and Fig. 13(b) depict the linearized AM/AM and AM/PM characteristics, respectively, of the measured PA, using the WLMS.
Figure 12 shows that the WLMS can perform a strong ACPR reduction on AB-class PA with 50MHz signal bandwidth.Furthermore, the AM/AM and AM/PM in Fig .13(a) and Fig .13(b) indicate that the memory effect is also significantly reduced but not completely eliminated, as indicated by the presence of the remaining asymmetric regrowths in  the PA output spectrum.Table III indicates that WLMS DPD offers the second-best linearization performance, which is consistent with the simulation results.The only algorithm that outperforms WLMS is RLS DPD, but it comes at the cost of much higher computational complexity.V. WALSH-BASED DPD ARCHITECTURE Figure 14 illustrates a comprehensive DPD transmitter architecture that utilizes both the Walsh Transform and WLMS.
In the DPD transmitter architecture, the direct path comprises the upper part.To generate the Tx RF signal, the Inverse Walsh Transform (IWT) (also known as Walsh series decomposition) is carried out, as described in Eq. 23.
Recent research indicates that using Walsh sequences for generating RF signals is a viable option [38], [39].To produce these sequences, digital clocks generate the Walsh sequences W (i, t), which are then weighted by the Walsh coefficients X X X j using DACs.The resulting weighted sequences, X j (i, t)W (i, t), are combined to form the RF signal.The Walsh sequence set W (i, t) can be obtained from a single clock working at the system's highest frequency, f W , using frequency dividers and XOR gates.This method is also wideband, as it generates signals ranging from the lowest frequency in the Walsh sequences (f R = f W /N with N being the number of sequences in the set) to f W . Additionally, the Walsh coefficients change only every T R = 1/F R .
Moving to the feedback path, a coupler captures a copy of the PA's output signal, which is then subjected to analog Walsh Transform using a Radix architecture to compute its Walsh coefficients [40].These coefficients are sampled and input into the Walsh DPD algorithm.By knowing the input/output Walsh coefficients of the PA, the WLMS computes the Walsh coefficients of the pre-distorted signal.The pre-distorted RF signal is generated by the direct path with the IWT.
The system has been simulated in MATLAB.The data is quantized over 8 bits.The signal used is a 5G NR-FR1 [41].It is a 64-QAM OFDM with 100MHz bandwidth, a PAPR of 10.4dB, and an average power P in of 2.6dBm.The PA to be linearized is the NXP Airfast LDMOS Doherty from the previous simulations.The PD model used has a Q = 5 and M = 8.It was calibrated after 100 iterations of the WLMS (µ W = 0.08), each using a set of 10000 samples.

VI. CONCLUSION
This work proposes a Walsh-based LMS algorithm for evaluating a PD model using the input/output Walsh coefficients of the system.Both simulation and measurement results are presented, showing that the novel algorithm can perform a 11dB ACPR reduction with up to 10 times the speed of conventional LMS algorithms at 12% lower computational complexity.As a result, the energy cost of the digital resources is reduced and the energy efficiency of the PA is increased as it operates in the non-linear region.A Walsh-based DPD system was also presented.The Walsh Transform offers a direct generation of the RF signal and a copy of the PA's output signal.Simulation results of the WLMS DPD on an RF PA are also shown, paving the way for a novel RF DPD system based on Walsh theory.Finally, future work will integrate the proposed architecture in 28nm FDSOI CMOS technology from STMicroelectronics.

APPENDIX A LMS AND RLS ALGORITHMS
This section presents the different algorithms used to compare the proposed method, including LMS, NLMS, and RLS.The block LMS is also described, as the proposed method is based on this approach.

A. The LMS Algorithm
The LMS algorithm is a time-domain adaptive filter.Its goal is to find weights of the transversal filter c[n] to minimize the error between the output of the filter y and a reference signal d [33].It can be used to filter a noisy set of data or to find a model for a system or inverse model as for DPD.The optimal solution (Wiener [17]) of the filter/model is given by: Authorized licensed use limited to the terms of the applicable license agreement with IEEE.Restrictions apply.
with R R R = E[x x x k x x x T k ] and p p p = E[d k x x x T k ] assuming that x and d are jointly Wide Sense Stationary (WSS).Since the model has to be evaluated with a finite number of samples, an instantaneous estimation of matrix R R R denoted R R R and vector p p p, denoted by p p p are used in a steepest-descend algorithm to search for the Wiener solution as follows: The result gradient estimate is given by: 3) The resulting gradient-based algorithm is known as the LMS algorithm using the equation updated as: where x x x k corresponds to the state vector of input samples stored in the adaptive filter: c c c k the coefficients of the filter after the kth iteration: and µ is the convergence factor that has to be chosen in an appropriate range to guarantee the convergence of the algorithm.As the value of µ increases, the algorithm will converge more quickly, but this may also result in instability and divergence of the algorithm.

B. The NLMS Algorithm
The NLMS algorithm [42] is similar to the LMS algorithm, with the only variance lying in the coefficient updating equation.In the NLMS algorithm, µ is optimized to achieve faster convergence.It is normalized by x x x T k x x x k to reduce the instantaneous error.Also, a factor α is introduced to avoid large step size when x x x T k x x x k becomes too small, leading to a divergence of the algorithm.The updating equation of the NLMS is given by:

C. The RLS Algorithm
The Wiener solution can be evaluated if R R R −1 and p p p are known.However, only an estimation of these parameters can be computed ( R R Rk and p p pk ) [33].The RLS uses a recursive estimation of R R R−1 k to improve the convergence of the algorithm at the cost of an increase in computational complexity.The updating equation of the RLS is given by: The parameter λ is an exponential weighting factor that should be chosen in the range ]0,1].This parameter is also called the forgetting factor since the information from the distant past has an increasingly negligible effect on the coefficient updating.δ can be chosen as the inverse input signal power estimate.

D. The Block LMS Algorithm
As described above, the conventional LMS, NLMS, and RLS filters use scalars (e k , y k , d k ).However, in a block-based LMS, the output of the filter and the error are vectors.This algorithm uses the overlap-save method [43].
The order M of the adaptive filter is assumed to be less than or equal to N .In this method, the adaptive filter is of size 2N .It has M coefficients followed by 2N − M values set to 0: c c c T j = (c j,0 , c j,1 , . . ., c j,M −1 , 0, . . ., 0) In a block LMS, the input data set x is segmented into 2N point vectors that overlap on the last N points every j iteration: In a time overlap-save algorithm, the output vector y y y j is obtained by circular convolution of the input vector and the filter coefficients.It is defined in Eq. 31.
Authorized licensed use limited to the terms of the applicable license agreement with IEEE.Restrictions apply.Which can be reduced as: With χ χ χ j the circulant matrix of the input data set x x x j .The error vector e e e j is defined by: e e e j = f f f (d d d j − y y y j ) With f f f a diagonal window matrix with its first N elements equal to 0 and its last N elements equal to 1.The update of the filter coefficients c c c j is given by: c c c j+1 = c c c j + 2µχ χ χ T j e e e j (34) Because the last N points of c c c j are equal to 0, the result of the linear convolution is the last N point of y y y j hence the use of the window matrix f f f .This algorithm offers enhanced convergence speed compared to the conventional scalar LMS algorithm at the expense of hardware resources [26].

APPENDIX B ADDITIONAL SIMULATIONS
This section presents additional simulations complementing the WLMS study presented in Section III.The simulations explore various models and consider different levels of SNR and quantization bits.

A. ACPR Simulations for Different MP Models
The ACPR evolution for different Q and M combinations of the predistorter during 20 iterations of the WLMS algorithm is shown in Fig. 16 The simulations demonstrate that the behavior of the algorithm remains stable in the given scenarios, regardless of the number of coefficients utilized for the PD.

B. SNR and Quantization Simulations
The ACPR evolution during 20 iterations of the WLMS algorithm with different bits of quantization and SNR (Gaussian noise) is shown in Fig. 17 To achieve a reduction in ACPR, the simulations revealed that the output signal should be quantized to at least 6 bits, and the SNR must be 30dB.Also, the WLMS demonstrated stable performance during the linearization phase.

Fig. 3 .
Fig. 3. (a) Normalized Fourier power spectra and (b) normalized Walsh power spectra of a two-tone input (black) and output (red/blue) of a non-linear system.
1) e e e j = f f f (d d d j − y y y j ) (7.2) c c c j+1 = c c c j + 2µχ χ χ T j e e e j (7.3)With χ χ χ j the circulant matrix of the input data set x x x j , d d d j the desired output of the model, e e e j the error vector, c c c j the coefficients of the model and f f f a diagonal window matrix.

Fig. 7 .
Fig. 7. (a) Power spectrum of PA output with and without DPD after 20 iterations of each algorithm.(b) ACPR evolution over 20 iterations of each algorithm.

Fig. 8 .
Fig. 8. (a) Ratio of multiplications between WLMS and NLMS for multiple Q and M orders.(b) Ratio of multiplications between WLMS and NLMS for multiple M orders with Q fixed.

Fig. 9 .
Fig. 9. Overview of the comparative performances of the LMS, NLMS, RLS and WLMS algorithms.

Fig. 12 .
Fig. 12. Measured power spectrum of PA output with and without DPD after 20 iterations of each algorithm.

Figure 15 (
a) displays the PA output spectra with and without DPD.
Figure 15(b) shows the constellations of the input and output of the PA with and without DPD.Table IV presents a comparison of the ACPR (lower/upper) and EVM levels with and without WLMS DPD.The proposed architecture achieves an 11.2dB improvement in ACPR and a 50% reduction in EVM for high PAPR large signal bandwidth.These results indicate that the architecture

Fig. 14 .
Fig. 14.DPD RF Transmitter based on the WLMS and the Walsh Transform.

Fig. 15 .
Fig. 15.(a) Power spectrum of PA output with and without DPD after 100 iterations of WLMS algorithm.(b) Output constellation with and without WLMS DPD.TABLE IV SIMULATED ACPR AND EVM LEVELS OF PA OUTPUT SPECTRA

Fig. 16 .
Fig. 16.(a) ACPR evolution over 20 iterations of WLMS for multiple models with a 16-QAM RRC input.(b) ACPR evolution over 20 iterations of WLMS for multiple models with a 64-QAM OFDM input.

TABLE I ACPR
, NMSE AND EVM COMPARISONS AFTER 20 ITERATIONS OF LMS, NLMS, RLS, AND WLMS DPD FOR 2 DIFFERENT INPUTS TABLE II COMPARISON OF COMPUTATIONAL REQUIREMENTS FOR LMS, NLMS, RLS AND WLMS WITH DIFFERENT FILTER SIZES coefficients both need QM additions and QM multiplications.Therefore, the LMS requires 2QM additions and multiplications in total.

TABLE IV SIMULATED
ACPR AND EVM LEVELS OF PA OUTPUT SPECTRA has the potential for direct RF signal generation with highly competitive linearization capabilities, made possible by the use of the Walsh Transform.