Abstract:
We study a novel multi-armed bandit (MAB) setting which mandates the agent to probe all the arms periodically in a non-stationary environment. In particular, we develop T...Show MoreMetadata
Abstract:
We study a novel multi-armed bandit (MAB) setting which mandates the agent to probe all the arms periodically in a non-stationary environment. In particular, we develop TS-GE that balances the regret guarantees of classical Thompson sampling (TS) with the broadcast probing (BP) of all the arms simultaneously in order to detect a change in the reward distributions actively. Once a system-level change is detected, the changed arm is identified by an optional subroutine called group exploration (GE) which scales as \log_{2}(K) for a K-armed bandit setting. We characterize the probability of missed detection and the probability of false alarms in terms of the environment parameters. The latency of change-detection is upper bounded by \sqrt{T} while within a period of \sqrt{T}, all the arms are probed at least once. We highlight the conditions in which the regret guarantee of TS-GE outperforms that of the state-of-the-art algorithms, in particular, ADSWITCH and M-UCB. Furthermore, unlike the existing bandit algorithms, TS-GE can be deployed for applications such as timely status updates, critical control, and wireless energy transfer, which are essential features of next-generation wireless communication networks.
Published in: 2023 21st International Symposium on Modeling and Optimization in Mobile, Ad Hoc, and Wireless Networks (WiOpt)
Date of Conference: 24-27 August 2023
Date Added to IEEE Xplore: 22 December 2023
ISBN Information: