Abstract:
Advanced power amplifier (PA) architectures are critical for 5G communication which requires PAs to have sufficient output power, high linearity, and high energy efficien...Show MoreMetadata
Abstract:
Advanced power amplifier (PA) architectures are critical for 5G communication which requires PAs to have sufficient output power, high linearity, and high energy efficiency. The time-variant operational environment further demands the self-reconfigurability in the PA design. Recent developments in bandit-problems and reinforcement learning (RL) have led to data-driven control algorithms. This paper presents various RL-based algorithms for Doherty PA control, which achieve robust adaptive operation over environmental changes. Multiple RL frameworks are incorporated in the control algorithms including multi-armed bandit (MAB), continuum-armed bandit (CAB), contextual-bandit (CB), and actor-critic with experience replay (AC). The control algorithms based on the latter three frameworks leverage on the prior information about the Doherty PA's characteristics to improve learning efficiency. In our simulation test where the optimal policy needs to be adjusted due to transmitter output load impedance mismatch, the MAB-based control learns the optimal policy within 25,000 samples. The control algorithms based on CAB and CB learn the optimal policy within 5,000 samples. The fastest learning rate is achieved by the control algorithm based on AC, which learns the optimal policy within 1,500 samples.
Published in: IEEE Transactions on Circuits and Systems I: Regular Papers ( Volume: 67, Issue: 12, December 2020)