By Topic

Online Learning of Rested and Restless Bandits

Sign In

Cookies must be enabled to login.After enabling cookies , please use refresh or reload or ctrl+f5 on the browser for the login options.

Formats Non-Member Member
$31 $13
Learn how you can qualify for the best price for this item!
Become an IEEE Member or Subscribe to
IEEE Xplore for exclusive pricing!
close button

puzzle piece

IEEE membership options for an individual and IEEE Xplore subscriptions for an organization offer the most affordable access to essential journal articles, conference papers, standards, eBooks, and eLearning courses.

Learn more about:

IEEE membership

IEEE Xplore subscriptions

2 Author(s)
Tekin, C. ; Dept. of Electr. Eng. & Comput. Sci., Univ. of Michigan, Ann Arbor, MI, USA ; Mingyan Liu

In this paper, we study the online learning problem involving rested and restless bandits, in both a centralized and a decentralized setting. In a centralized setting, the system consists of a single player/user and a set of K finite-state discrete-time Markov chains (arms) with unknown state spaces (rewards) and statistics. The objective of the player is to decide in each step which M of the K arms to play over a sequence of trials so as to maximize its long-term reward. In a decentralized setting, multiple uncoordinated players each makes its own decision on which arm to play in a step, and if two or more players select the same arm simultaneously, a collision results and none of the players selecting that arm gets a reward. The objective of each player again is to maximize its long-term reward. We first show that logarithmic regret algorithms exist both for the centralized rested and restless bandit problems. For the decentralized setting, we propose an algorithm with logarithmic regret with respect to the optimal centralized arm allocation. Numerical results and extensive discussion are also provided to highlight insights obtained from this study.

Published in:

Information Theory, IEEE Transactions on  (Volume:58 ,  Issue: 8 )