Abstract:
We study the problem of best-arm identification in a distributed variant of the multi-armed bandit setting, with a central learner and multiple agents. Each agent is asso...Show MoreMetadata
Abstract:
We study the problem of best-arm identification in a distributed variant of the multi-armed bandit setting, with a central learner and multiple agents. Each agent is associated with an arm of the bandit, generating stochastic rewards following a distribution that is a priori unknown to the learner. Further, each agent can communicate the observed rewards with the learner over a bit-constrained channel. We propose a novel quantization scheme called ICQ that can be applied to existing confidence-bound based learning algorithms such as Successive Elimination and requires only an exponentially sparse frequency of communication between the learner and the agents. We analyze the performance of ICQ applied to Successive Elimination, and show that the overall algorithm, which we call ICQ-SE, has order-optimal sample complexity and uses considerably fewer bits than existing quantization schemes to successfully identify the best arm. We are also able to verify our findings via numerical experiments.
Published in: 2023 21st International Symposium on Modeling and Optimization in Mobile, Ad Hoc, and Wireless Networks (WiOpt)
Date of Conference: 24-27 August 2023
Date Added to IEEE Xplore: 22 December 2023
ISBN Information: