Recursive Neural Network With Phase-Normalization for Modeling and Linearization of RF Power Amplifiers

This letter presents a novel phase-normalized recurrent neural network (PN-RNN) to linearize radio frequency (RF) power amplifiers (PAs) in high-bandwidth communication systems with significant memory effects. The proposed approach builds on proper phase alignment of the internal hidden variables in the recursive processing system. The provided RF measurement-based modeling and digital predistortion (DPD) results at 1.8 and 3.5 GHz demonstrate a significantly improved modeling capacity and predistortion ability when applying phase normalization, confirming the validity of the proposed approach.


I. INTRODUCTION
D IGITAL predistortion (DPD) is an established technique to correct for nonlinear impairments of radio frequency (RF) power amplifiers (PAs) in base-station transmitters.While PAs are bound to efficiency versus linearity tradeoffs, DPD allows operating the PA at a higher efficiency [1] by recovering the required linearity through signal processing.In the upcoming 6G era, signal quality will be of crucial importance for enabling high-throughput communications, with very large instantaneous transmission bandwidths [2].This challenges existing DPD methods, as PA memory effects occurring with such high signal bandwidths are difficult to compensate for.
Neural networks (NNs) have been identified as a capable and scalable approach for modeling and predistortion of RF PA systems with sophisticated memory effects.Feed-forward NNs such as the real-valued time-delay NN (RVTDNN) have been considered in [3].While the feed-forward NN itself has no notion of time, a finite history of past samples is provided as input, which enables the modeling of memory effects.For real-valued NN processing, the baseband samples are decomposed into I and Q parts.In [4], phase normalization is applied to the time-delay inputs (PN-TDNN), which significantly enhances the modeling capability of the RVTDNN.However, higher sampling rates and bandwidths increase the relevant sample history for properly handling the PA memory effects.A finite sample history may constrain the PN-TDNN's modeling ability, whereas inputting more past samples naturally increases the PN-TDNN's complexity.
Despite their popularity in time-series processing, recurrent NNs (RNNs) have been less explored in the DPD context.Unlike feed-forward NNs, RNNs can encode and remember past information with hidden states.Thus, the RNN can identify the relevant history during training, which optimizes the tradeoff between NN size and accuracy.Long short-termmemory (LSTM) RNNs were proposed for DPD in [5] and [6], where only the complex envelope is shaped by the RNN.In [7], a combined approach of LSTM with polynomial kernels is presented.An RNN with improved envelope and phase modeling ability was proposed in [8] and refined in [9].
Building upon the prior literature, we propose the phase-normalized RNN (PN-RNN) by incorporating efficient phase normalization into the recursive processing.We apply phase renormalization to the RNN's hidden states, which we interpret as complex-valued signals with a phase coherent to the NN input.We demonstrate the effectiveness of the proposed PN-RNN with PA modeling and PA linearization experiments and provide a comparison with existing RNN approaches in the DPD literature.Section II introduces the proposed concepts, Section III presents the modeling and linearization results, and Section IV concludes this letter.

II. PROPOSED RNN MODEL
For modeling the PA distortions at the baseband, the NN needs to map the complex-valued input samples x(k) to the corresponding output samples y(k).A top-level block diagram of the proposed PN-RNN is shown in Fig. 1.At the core of the NN is a recurrent layer, composed of N RNN cells, which encode the inputs X (k) into N complex-valued hidden states H (k). The hidden states are looped back so that the states of the previous time step H (k − 1) serve as additional inputs to the recurrent layer.The model's output is then formed by a linear combination of hidden states in the output layer.Although any information on past samples can generally be stored in the hidden states of the recurrent layer, we also input a few previous samples alongside the current sample, which we summarize in These fixed memory inputs are optional-the RNN will also function without them, however, as our experiments will show a slight performance benefit, and a more stable convergence is achieved with the additional inputs.

A. Phase Normalization
Similar to the RVTDNN, the NN structures internal to the RNN are limited to real-valued operation.Thus, for the actual NN processing, the complex-valued input and output signals are typically separated into respective I and Q parts.However, as detailed in [4] and [10], the real-valued NN structures cannot efficiently handle the role of the baseband signal phase.That is, PA distortions occurring at RF are independent of the specific baseband phase but depend on the baseband envelope and the phase derivative that modulates the RF carrier frequency.The lack of support for complex-valued operation in real-valued NNs prevents the NN from efficiently expressing passband distortions at the baseband.Instead, their high generality allows us to model effects that are nonphysical at RF. Therefore, [4] introduce phase normalization to remove the specific phase offset from the NN inputs and outputs, and let the NN map distortions based on phase differences relative to a normalization reference.This effectively reduces the NN's generality and helps to comply with the physical origin of the distortions, yielding a significantly improved modeling ability.
Extending the concept of phase normalization to RNNs, also the hidden states need to be considered in the normalization.Since the hidden states carry information across multiple NN inferences, phase normalization cannot be applied to just the NN inputs and outputs.To this end, we propose to include the hidden states in the normalization.As a first step, we define the hidden states to be complex-valued signals that are coherent with the phase of the currently processed sample.Naturally, when applying phase normalization on the NN input X (k), the hidden states will have a phase relative to the normalization reference.As the specific phase changes from one processed sample to another, a different normalization r (k) is applied to each set of inputs and outputs.To then retain a valid phase of the hidden states, we re-normalize the hidden states by complex rotation with the phase difference between two consecutive samples.Formalizing the concept, the normalization factor of the current sample is extracted with with x * being the complex conjugate of x.Then, the input vector X (k) is phase normalized to X (k) through The recurrent layer then produces a vector of phasenormalized hidden states Ĥ (k) and the model output is formed with where multiplication with r * (k) denormalizes the phase and W o ∈ C N are the complex-valued weights of the output layer.
The hidden states of the previous iteration are renormalized and fed back into the recurrent layer as inputs Ĥ ′ (k) with where r (k)r * (k − 1) equals the phase difference of the two consecutive samples.

B. RNN Cell
For the RNN cells, we take inspiration from [8] and [9].In line with the PA physics, Just Another Network (JANET) [11] was identified as a suitable, lightweight RNN cell for DPD in [8].Different from the LSTM cell used in [5], a JANET cell consists of only a single sigmoidbased (σ ) gating mechanism (forget gate), combined with the hyperbolic input activation (tanh).The JANET concept is refined in [9] with a more tailored RNN cell (DVR-JANET), with additional dedicated filters for phase and envelope, based on the DVR concept [12].In addition, separate hidden states are introduced for the I and Q parts, which are jointly gated with a common forget gate.
The phase normalization of our proposed RNN eliminates the need for dedicated envelope and phase processing.Instead, we return to a simpler realization of JANET shown in Fig. 2. We maintain the idea of separate hidden states, h h h I and h h h Q for the I and Q components, which, outside the RNN cell, are considered as complex-valued state with Ĥ (k) = h h h I + jh h h Q .For the JANET forget gate, we use the squared magnitude of the hidden states, to avoid including the square root inside the trained NN.It is noted that all internal operations and signals of the recurrent cell are real-valued, which is highlighted by bold symbol notation.Additionally, we omit the sample dependency k for notational simplicity.The RNN cell is then given by where and ⊗ denotes an elementwise product.The vector x x x contains the separated real and imaginary parts of the phase normalized inputs X (k).

III. MODELING AND RF MEASUREMENT RESULTS
This section provides a performance evaluation of the proposed PN-RNN.For a comparative analysis, we also included Authorized licensed use limited to the terms of the applicable license agreement with IEEE.Restrictions apply.the VDLSTM (with four direct memory inputs), the DVR-JANET (three DVR kernels), and the PN-TDNN (two hidden layers) in our evaluation.For all models, we investigate the PA behavioral modeling capacity and the DPD linearization ability.The modeling accuracy is assessed in terms of normalized mean squared error (NMSE).For the DPD linearization, we additionally report the worst adjacent channel leakage ratio (ACLR) and error vector magnitude (EVM).The RF measurements were conducted with the NI PXIe-5840 vector signal transceiver (VST), which we used for analog signal generation and upconversion to RF, in the transmit path, as well as for the downconversion and digitization of the PA output signal in the DPD feedback path.The PA DUT is mounted on an evaluation board which is connected to the VST in line with a power attenuator.The setup used in the linearization experiment is shown in Fig. 3.
The models were implemented and trained using the Ten-sorFlow framework [13].We optimize the RNN models with a sequence-to-sequence learning approach.Therein, the RNN structures are unfolded during the training phase such that T = 40 time steps are processed at the same time.This allows us to estimate coefficient gradients with backpropagation across multiple RNN iterations.The gradients were accumulated for batches of 20T samples, after which the model parameters were updated using the Adam optimization algorithm.The models were trained during 2200 repetitions (epochs) on the training dataset, with a decaying learning rate that a proper convergence of the parameters was achieved.

A. Behavioral Modeling Example
We first assess the forward modeling capability of the models with measured data from a Doherty GaN PA (RTH18008S-30) running a 5G-compliant 100-MHz OFDM waveform at 1.8 GHz with an output power of +38.1 dBm.We used 180k input-output sample pairs for training and dedicated 120k samples for evaluation.The modeling results are shown in Fig. 4, where the complexity of the RNN models is varied by altering the number of hidden states.The data points represent the averaged performance of five training cycles, to account for variations due to the random NN coefficient initialization.For the PN-TDNN we adjusted the sizes of the hidden layers as well as the delay depth of the time-delay inputs.We provide PN-RNN results for M = 0 and M = 3 direct memory inputs.Both PN-RNN realizations achieve a substantially reduced modeling error, that is, improved accuracy, compared to the reference methods.Additionally, a slight advantage is found when including the direct memory inputs.

B. DPD Linearization Example
Next, we evaluated the linearization capability of the models in a DPD linearization experiment with another GaN Doherty PA (QPA3503).A 5G-compliant, continuously aggregated, 200-MHz wide multicarrier waveform with an overall limited PAPR of 8 dB at a center frequency of 3.5 GHz and with a linearized PA output power of +34.2 dBm was applied in the experiment.To generate the training data, we applied the iterative learning control (ILC) scheme from [14] to derive an ideal DPD signal.The RNN parameters were then optimized offline while refreshing the ILC training data after every 400 training epochs.For the DPD evaluation, the output layers of the trained NN models were again fine-tuned and the achieved linearity was assessed with a dedicated test sequence with 210k samples.Fig. 5 shows the AM/AM and AM/PM behavior with and without DPD, and Fig. 6 provides the power spectral density (PSD) of the linearized PA output using the different  I.The reported PN-TDNN in this experiment was configured with eight time-delay inputs and two hidden layers with 24 and 15 neurons, respectively.
The linearization experiment confirms the trend observed in the behavioral modeling example.The PN-RNN achieves the highest linearity rating in terms of PA output NMSE and ACLR.Also, the EVM approaches the PAPR-limiter-induced minimum.
IV. CONCLUSION This letter presented a PN-RNN model for modeling and linearization of RF PAs with strong memory effects.Extending the concept of phase normalization to RNNs, not only the RNN inputs and output but also the complex-valued hidden states are considered for phase normalization.This enables the RNN to incorporate phase normalization and retain a valid memory of the I /Q phase trajectory.Additionally, direct memory inputs were studied and shown to further improve the RNN's modeling capability.Our results show a superior behavioral modeling accuracy of the PN-RNN by about 3 dB lower NMSE compared to existing RNN models, which translated to an improved DPD ability with lower ACLR in our PA linearization experiments.

Fig. 2 .
Fig. 2. Block diagram of the JANET-inspired RNN cell using dedicated input activations (green) for the I and Q parts and a joint forget gate (red).

Fig. 4 .
Fig. 4. NMSEs of the compared models for modeling the 1.8-GHz GaN Doherty PA.

TABLE I DPD
EXPERIMENTAL RESULTS WITH THE DOHERTY GAN PA AT 3.5 GHz methods.The corresponding model configuration and detailed results are reported in Table