By Topic

Multi-armed bandits with switching penalties

Sign In

Full text access may be available.

To access full text, please use your member or institutional sign in.

Formats Non-Member Member
$31 $13
Learn how you can qualify for the best price for this item!
Become an IEEE Member or Subscribe to
IEEE Xplore for exclusive pricing!
close button

puzzle piece

IEEE membership options for an individual and IEEE Xplore subscriptions for an organization offer the most affordable access to essential journal articles, conference papers, standards, eBooks, and eLearning courses.

Learn more about:

IEEE membership

IEEE Xplore subscriptions

2 Author(s)
Asawa, M. ; Dept. of Electr. Eng. & Comput. Sci., Michigan Univ., Ann Arbor, MI, USA ; Teneketzis, D.

The multi-armed bandit problem with switching penalties (switching cost and switching delays) is investigated. It is shown that under an optimal policy, decisions about the processor allocation need to be made only at stopping times that achieve an appropriate index, the well-known “Gittins index” or a “switching index” that is defined for switching cost and switching delays. An algorithm for the computation of the “switching index” is presented. Furthermore, sufficient conditions for optimality of allocation strategies, based on limited look-ahead techniques, are established. These conditions together with the above-mentioned feature of optimal scheduling policies simplify the search for an optimal allocation policy. For a special class of multi-armed bandits (scheduling of parallel queues with switching penalties and no arrivals), it is shown that the aforementioned property of optimal policies is sufficient to determine an optimal allocation strategy. In general, the determination of optimal allocation policies remains a difficult and challenging task

Published in:

Automatic Control, IEEE Transactions on  (Volume:41 ,  Issue: 3 )