Introduction
With various emerging applications, we are seeing a new landscape of wireless connectivity. To meet increasing demands for high speed and high capacity, small-cell base stations and large-scale multiple-input–multiple-output (MIMO) antenna arrays will be widely deployed in 5G wireless networks [1]. Compared to that in earlier generation macro base stations, the number of radio frequency (RF) chains in 5G transmitters is significantly increased, which demands for energy-efficient, low-cost, and highly integrated solutions. This places significant challenges on designing RF front-end circuits and systems. In the new application scenarios, individual RF power amplifiers (PAs) are expected to produce lower power, but they still need to maintain high linearity to meet error vector magnitude and spectral emission requirements [2].
In modern wireless transmitters, digital predistortion (DPD) is widely employed to mitigate the nonlinear distortions caused by the PAs [3]. The use of DPD allows PAs to be operated at higher power levels for higher efficiency without losing linearity. To reduce power consumption of the overall system, when the PAs are designed to produce lower power, the power budget for DPD shall shrink correspondingly. Hence, it becomes important in optimizing the model complexity and power consumption of DPD blocks to meet the system efficiency requirements [4].
Traditionally, DPD models are constructed by pruning the Volterra series. Some examples include memory polynomials (MPs) [5], generalized memory polynomial (GMP) [6], and dynamic deviation reduction (DDR) models [7]. To improve the modeling performance while keeping low complexity, different techniques have been proposed to design a composite model by aggregating multiple small models. In the literature, it has been achieved by combining the different models in parallel [8] or in cascade [9], [10]. To reduce hardware complexity, model structures different from polynomials have also been proposed. One solution is to replace polynomials with low-complexity basis functions, e.g., decomposed vector rotation [11] and spline [12]. Lookup tables (LUTs) are also widely used because of its low implementation cost [13], [14]. Similar to using LUTs, some existing works built model in a piecewise manner by using different coefficient sets for different input samples. The selection process can be realized using vector switch [15], magnitude-selective affine function [16], or decision tree [17].
Besides manually designed models, the structure of DPD models can also be optimized using machine learning techniques. Recently, various model pruning algorithms have been developed to select effective model terms and remove redundant basis functions from a large Volterra model [18]. The model selection process can also be implemented in the DPD model adaptation unit to update the model structure dynamically [19]. Thus, with the help of reconfigurable digital circuit design techniques [20], power consumption of DPD can be optimized. The complexity of DPD models also depends on the design at digital hardware level. A common design method is to realize the polynomial functions using LUTs [21], [22], which minimizes computation by storing precomputed results in memory. For polynomial-based implementation, power consumption can also be reduced by optimizing the design of data path [23] or taking advantage of parallel processing [24].
In this article, a novel real-time model switching technique is presented to reduce the computational complexity of DPD models. In the proposed approach, while multiple cross terms and basis functions are implemented in the system, only the most useful ones are selected and activated for each input data sample in real-time operation. After choosing the suitable cross terms, the model further selects the coefficients from multiple coefficient sets. Thus, the model can dynamically switch between different model terms and coefficients. The switching process is realized via a decision tree model, which is jointly optimized with the model coefficients using a novel iterative alternate minimization algorithm. By using the proposed approach, the computational complexity is significantly reduced when calculating the predistorted output, and thus, the power consumption of DPD can be much lower, compared to that using the conventional methods.
The rest of this article is organized as follows. Section II gives a brief background on the effect of cross terms in DPD models. Section III describes the proposed model switching technique and the design of switch controller. The training method for the proposed model is presented in Section IV. The experimental results and complexity analysis are reported in Section V, followed by a conclusion in Section VI.
Background
DPD takes effect by implementing an inverse nonlinear model of the PA in digital baseband. For models that are linear in parameters, DPD can be expressed in a matrix format as \begin{equation*} \mathbf {u}=\mathbf {X}\mathbf {c}\tag{1}\end{equation*}
\begin{equation*} |\tilde {x}_{n-m}|^{p}\tilde {x}_{n-m}\tag{2}\end{equation*}
\begin{align*} \mathbf {X}= \left [{\begin{array}{cccccc} \tilde {x}_{N} & |\tilde {x}_{N}|\tilde {x}_{N} & |\tilde {x}_{N}|^{2}\tilde {x}_{N} & \cdots & \tilde {x}_{N-1} & \cdots \\ \tilde {x}_{N-1} & |\tilde {x}_{N-1}|\tilde {x}_{N-1} & |\tilde {x}_{N-1}|^{2}\tilde {x}_{N-1} & \cdots & \tilde {x}_{N-2} & \cdots \\ \vdots & \vdots & \vdots & \ddots & \vdots & \ddots \end{array} }\right].\tag{3}\end{align*}
To effectively linearize PAs in wideband wireless transmitters, more complex models are usually needed, due to the complicated nonlinearity caused by high-efficiency PA architectures and wideband modulation signals. In conventional pruned Volterra models, various types of cross terms are adopted, which mixes different delayed samples in polynomial terms. In most cases, different memory samples are mixed by either multiplying the amplitude \begin{equation*} |\tilde {x}_{n-m-l}|^{p}\tilde {x}_{n-m}\tag{4}\end{equation*}
\begin{align*}&|\tilde {x}_{n}|^{p}\tilde {x}^{2}_{n}\tilde {x}^{*}_{n-m} \tag{5}\\&|\tilde {x}_{n}|^{p}|\tilde {x}_{n-m}|^{2}\tilde {x}_{n}.\tag{6}\end{align*}
While the incorporation of rich basis functions leads to improved accuracy, it results in high hardware complexity and power consumption. On the one hand, considering the high sampling rate required to process wideband signals, the power consumption of DPD blocks may continue to increase [4], while on the other hand, in future cellular systems, the use of MIMO techniques and small-cell base stations reduces the output power of individual PAs. To maintain the overall system efficiency, the power budget of DPDs must shrink accordingly. Thus, it is desirable to develop new methods to retain the expressive power of these cross terms while minimizing the cost in complexity.
Real-Time Model Switching
In this work, we propose a novel model switching framework to reduce the running complexity of DPD model. In the proposed scheme, different model components can be dynamically switched ON and OFF. Because only part of hardware recourses is activated, dramatic reduction in power consumption can be achieved. In this section, we introduce two switching mechanisms, namely, cross-term switching and coefficient switching, as well as the design of switch controller, in detail.
A. Cross-Term Switching
In the literature, different types of cross terms have been proposed to improve the linearization performance of DPD models. To ensure the modeling accuracy under varying operating conditions, careful selection of basis functions is necessary and the final model may need to include all helpful cross terms.
In the conventional setup, the DPD function is usually fixed before implementation, and determining which terms to include involves tedious trial-and-error procedures. In some works [18], the selection of model basis functions can be realized using model pruning algorithms, but they usually have high computational complexity and can only operate in an off-line environment.
Besides complexity concerns, from a behavioral modeling perspective, using a fixed set of model basis functions may not be the optimal design choice. Depending on the data distribution and the PA characteristics, different data samples may exhibit distinct nonlinear effect, and thus, they should be modeled by using different types of nonlinear terms. It is hence reasonable to adopt different model basis functions based on the past and current input samples. Therefore, to reduce the running complexity, we propose to switch between the different memory cross terms in real time using a switch controller, as shown in Fig. 1. Different from the prior art, as the controller is realized using a low-complexity classification model, it can operate at sampling frequency and each data sample can be predistorted using different parts of the implemented model.
Let us consider the case of GMP model as an example. Its cross-term part has the form
Besides the switchable cross terms, we can also designate some terms as “fixed terms.” As shown in Fig. 1, they are used by all input data and always turned on. For example, since the MP terms are not considered as cross terms in general, they can be set as fixed terms.
B. Coefficient Switching
In prior work [15]–[17], it has demonstrated that, by switching the model coefficients based on signal characteristics, lower order models can achieve comparable or even better performance than higher order models.
Inspired by the previous progress, to further enhance the modeling capability, we use multiple sets of coefficients for each cross-term type, and let the model select the most suitable coefficient set for every data sample. Similar to the selection of cross-term type, which coefficient set to use is also determined by the controller. Thus, using multiple coefficient sets is equivalent to adding new branches to the cross-term switching architecture, except that the new branches have the same basis functions as existing ones. Hence, coefficient switching can also be understood as a generalization of cross-term switching technique. With extra degrees of freedom, it can further extend the modeling accuracy and flexibility of switched models.
A complete description of the model switching technique employing both types of switching mechanisms is shown in Fig. 3. The cross-term switching part controls which type of cross term to activate. When specific cross terms are selected, each nonlinear block can further choose the proper coefficients from a pool of coefficient sets.
Model architecture with both cross-term and coefficient switching, where
C. Realization of Switch Controller
Considering both cross-term and coefficient switching, the controller needs to choose the proper operation state from a number of possible configurations. In this work, we refer to each potential configuration as an operation mode. Assuming that there are
To build the switch controller, a multiclass classification model is needed. Given the input data, such a classification model will produce an integer output ranging from 1 to
In this work, decision tree is used to build the controller, and we use
To validate the choice of classification model, we compare three common models, including decision tree, support vector machine (SVM), and
D. Practical Implementation Considerations
In this new architecture, the hardware complexity and power consumption can be minimized with careful arrangement in digital circuit design. In many cases, the hardware implementation of a DPD model is composed of two stages, i.e., basis function generation and multiplication with model coefficients. To reduce power consumption, as shown in Fig. 5, the basis function generation blocks are adaptively turned on/off, e.g., by using clock gating technique [29]. Special attention may be required here to ensure that the selected basis functions are built just in time when they are required. The same control signal also decides the routing for the output signals of basis functions. To realize coefficient switching, the switch controller generates another control signal, which is responsible for selecting the correct coefficient set. The selected coefficients and basis functions can then multiply with each other. Thus, the coefficient multiplier can be shared by all cross-term branches. Moreover, while the required basis function generation blocks are still implemented in the DPD hardware, only one cross-term type is activated, so the average power consumption is expected to be much lower than that using a full-size conventional model.
Because of the flexibility of model formulation, the arrangement for model terms can be conveniently adjusted for specific application scenarios. In real-world applications, it is advisable for the system engineers to design different model terms that best suit individual use cases. For example, the leading envelope cross terms in the GMP model [6] may also be used. Different splitting feature formulations are also possible, as the only requirement for them is that they need to be real-valued. Moreover, the allocation of cross-term branches can be freely adjusted, and the overlapping between different branches is also allowed. For instance, one basis function can be assigned to two branches simultaneously to achieve higher performance. Also, if the number of switchable ranches is limited, different GMP-type cross terms may be merged into a single branch.
Once the model structure is fixed, there may exist further room to optimize hardware utilization. For example, if we can switch among different GMP-type branches, hardware resources can be shared between the branches. As shown in Fig. 2, all cross-term branches can share the same multiplier when performing the multiplication between
Training of Switch Controller
To achieve optimal performance, the decision tree model of the switch controller needs to be jointly optimized with the model coefficients for all operation modes. To solve this optimization problem, an iterative alternate minimization framework is adopted.
For initialization, we need to choose a mode for every input data sample
With the initialized mode index, the optimization procedure iterates between two main steps.
Optimize model coefficients and obtain the modeling residue for all operational modes.
Optimize the decision tree model of switch controller.
In the first step, we estimate model coefficients with least squares (LS) for all modes. In the \begin{equation*} \mathbf {c_{k}^{t}}=\left ({\mathbf {\Phi _{k}^{t}} ^{\mathrm {H}}\mathbf {\Phi _{k}^{t}}}\right)^{-1}\mathbf {\Phi _{k}^{t}} ^{\mathrm {H}}\mathbf {y_{k}^{t}} \tag{7}\end{equation*}
The modeling residue can be calculated by \begin{equation*} \mathbf {e_{k}^{t}}=\mathbf {y}-\mathbf {\Phi _{k}}\mathbf {c_{k}^{t}} \tag{8}\end{equation*}
Subsequently, in the second step, we train the switch controller. Different from the previous step, the decision tree controller is a classification model, so it is trained to produce the target labels, i.e., the index of desired operation mode. To find the target label for every input sample, we compare the residue generated by all modes and select the mode that results in the lowest modeling error. The mode index
The input data for training the tree are gathered in the splitting feature matrix
After preparing the training data
Since the training is a nonlinear optimization process, the system may converge to a local minimum. In that case, manual tuning may be required. Nevertheless, based on our experience, the algorithm can converge quickly within ten iterations in most cases. The complete description is shown in Algorithm 1. Finally, by applying (7) again, the model coefficients can be optionally fine-tuned after the main algorithm. The training of switch controller only needs to be conducted in the system startup. During the real-time operation, the decision tree controller can be kept unchanged unless the behavior of the PA varies significantly. In this case, the model becomes linear in the coefficients, and the adaptation complexity can thus be reduced to the same level as that in conventional DPD models using LS.
Training of Switch Controller
Decision tree model
Initialize mode index
repeat
for
for
Generate
Estimate model coefficients with (7)
Calculate residue signal with (8)
end for
end for
for
Find mode with lowest modeling error and record the index as
end for
Gather all
Decision tree model =
Update mode index
until the error converge or maximum number of iterations is reached
Experimental Results
A. Experimental Setup
To validate the model performance, a test platform was set up, as shown in Fig. 6, which includes PC, signal generator, driver amplifier, PA, attenuator, and spectrum analyzer. Two in-house designed broadband gallium nitride (GaN) Doherty PAs were used under test. The first PA operated at 2.2 GHz with 36-dBm output power and 48% drain efficiency [31], whereas the second PA operated at 3.5 GHz with 34-dBm output power and 42% drain efficiency [32]. The excitation input signals were five-carrier 100-MHz orthogonal frequency-division multiplexing (OFDM) signals with 8-dB peak-to-average power ratio (PAPR). The sampling rate was 400 MHz. Recorded I/Q input and output samples were time-aligned and normalized before training the model. The model extraction was performed using a closed-loop estimator in MATLAB. During the test, 80 000 samples were used for model extraction and another set of 80 000 data points were used for performance evaluation. In the DPD tests, we employ both NMSE and adjacent channel power ratio (ACPR) as performance metrics. In the complexity analysis, we follow the same methodology as in [33] using the number of floating-point operations per sample (FLOPs). The estimated power consumption of the models on field-programmable gate array (FPGA) is also reported.
In the experimental comparison, different model configurations were considered, which are detailed in Table III. The available cross terms include the GMP-type lagging cross terms in (4) and two DDR-type cross terms in (5) and (6). The proposed real-time model switching technique was applied to two models, i.e., “GMP (all terms)” “GMP-DDR (all terms).” The resulting models are referred to as “switched GMP model” and “switched GMP-DDR model,” respectively. In both situations, only one type of cross term was selected for each data sample from all available terms. MP terms were designated as fixed terms. Note that our configurations were mainly used as a prototype to illustrate the effectiveness of the proposed framework. In practice, the cross terms to be used are not limited to the GMP and DDR types. Many other types of cross terms can be used.
In the test, the polynomial order
B. Experimental Results on PA 1
The proposed DPD method was first compared with conventional DPD methods on the first PA. Spectral results using GMP models are shown in Fig. 7. In this comparison, the conventional models with three different settings were tested. The first model used the MP terms only, while the other two models employed additional GMP-type cross terms. It shows that the proposed switched GMP model achieved a similar level of linearization performance as GMP model with all terms and performed better than the conventional model with similar complexity. It clearly demonstrates that the model switching technique can effectively reduce the model complexity with little compromise in accuracy.
We then present the results obtained with GMP-DDR models. In this comparison, we also tested the conventional models with different settings. The spectral results are compared in Fig. 8. It reveals that the switched GMP-DDR model achieved better performance than the conventional models with similar complexity. The detailed results of this case are given in Table IV.
The AM–AM and AM–PM curves with and without the switched GMP-DDR model are shown in Fig. 9. The spectral and time-domain AM–AM results show that the PA exhibits a significant memory effect, but the proposed method can still offer good linearization performance.
AM–AM and AM–PM characteristics with and without the switched GMP-DDR model on PA 1.
Finally, we analyze the selection of different cross-term branches. In Fig. 10, the selection of cross-term types for test signals is displayed. It shows that the first GMP branch was selected most often. It suggests that the GMP term branch is most useful, which agrees with the comparison made in Table IV. Moreover, as all branches are used, it validates our earlier argument that different data samples are best suited to different DPD model structures and justifies the improvement of the proposed switching method.
C. Experimental Results on PA 2
The DPD test and experimental comparison were then conducted on the second PA. Fig. 11 shows the spectral results. It is shown that, by adopting the model switching techniques, the modified model achieves comparable linearization performance as the more complex full-size GMP model, which is noticeably better than the conventional model with similar complexity.
When the GMP-DDR model was used, the proposed method also achieved excellent performance. The spectral results are shown in Fig. 12. The AM–AM and AM–PM curves in this case are shown in Fig. 13. A detailed performance and complexity comparison is drawn in Table V. The switched GMP-DDR model again achieved better performance than the conventional models with similar complexity. Actually, the proposed method can even achieve similar NMSE and ACPR performance as the full-size GMP-DDR model using less than 60% complexity.
AM–AM and AM–PM characteristics with and without the switched GMP-DDR model on PA 2.
The selection of cross-term branches in the GMP-DDR model in this test is given in Fig. 14. We notice that the distribution is very different from that in Fig. 10. In this case, most data samples used DDR-2 terms. It again matches the comparison in Table V, which shows that the DDR-2 model has better accuracy than DDR-1 and GMP models. Thus, the results clearly demonstrate that the proposed method can accurately identify the important cross terms. As the selection is performed automatically during the training process, the proposed method can bring more flexibility into the DPD model.
D. Complexity and Power Consumption Analysis
To draw a more clear comparison of power consumption between the proposed method and conventional DPD models, we estimate the power consumption of the DPD block using Xilinx Power Estimator [34]. The resource utilization is estimated by counting the required resources for the IP cores of the computational components, e.g., adders, multipliers, and complex multipliers. The FPGA board employed was Virtex-7 XC7VX485T. Each I/Q sample of both input and output was kept as 32-bit data, where real and imaginary parts each had 16 bits. A 50% toggle rate and a 400-MHz sampling rate are assumed for power estimation. For the proposed switched model, the toggle rate of different cross-term branches is set in accordance with the probability given in Figs. 10 and 14. It is worth noting that the complexity comparison and power estimation here are indicative and aim to show the relative number/trend only. The absolute numbers and power consumption depend on the specific model used and the actual circuit implementation in real applications.
The comparison for both PAs is given in Tables VI and VII. Compared with the full-size GMP-DDR models, the proposed switched model used only around 60% DSP units and reduced the power consumption by 40.4% and 38.3% for the two cases. For reference, the results for using just one type of cross term without switching are also listed. Thus, it shows that the real-time model switching technique can greatly reduce the required hardware utilization and power consumption while maintaining comparable linearization performance.
Conclusion
In this article, we present a novel real-time model switching approach to reduce the computational complexity of DPD model implementation. By combining the cross-term switching and coefficient switching techniques, the model can dynamically select the most suitable cross terms and coefficient set for every input data sample. The iterative alternate minimization framework is also shown to be an effective method to optimize the proposed DPD model. Therefore, by applying the proposed method, the power consumption of DPD models can be greatly reduced with little cost on performance.