By Topic

Machine learning and nonparametric bandit theory

Sign In

Cookies must be enabled to login.After enabling cookies , please use refresh or reload or ctrl+f5 on the browser for the login options.

Formats Non-Member Member
$33 $13
Learn how you can qualify for the best price for this item!
Become an IEEE Member or Subscribe to
IEEE Xplore for exclusive pricing!
close button

puzzle piece

IEEE membership options for an individual and IEEE Xplore subscriptions for an organization offer the most affordable access to essential journal articles, conference papers, standards, eBooks, and eLearning courses.

Learn more about:

IEEE membership

IEEE Xplore subscriptions

2 Author(s)
Tze-Leung Lai ; Dept. of Stat., Stanford Univ., CA, USA ; S. Yakowitz

In its most basic form, bandit theory is concerned with the design problem of sequentially choosing members from a given collection of random variables so that the regret, i.e., Rnj (μ*-μj)ETn(j), grows as slowly as possible with increasing n. Here μj is the expected value of the bandit arm (i.e., random variable) indexed by j, Tn(j) is the number of times arm j has been selected in the first n decision stages, and μ*=supj μj. The present paper contributes to the theory by considering the situation in which observations are dependent. To begin with, the dependency is presumed to depend only on past observations of the same arm, but later, we allow that it may be with respect to the entire past and that the set of arms is infinite. This brings queues and, more generally, controlled Markov processes into our purview. Thus our “black-box” methodology is suitable for the case when the only observables are cost values and, in particular, the probability structure and loss function are unknown to the designer. The conclusion of the analysis is that under lenient conditions, using algorithms prescribed herein, risk growth is commensurate with that in the simplest i.i.d. cases. Our methods represent an alternative to stochastic-approximation/perturbation-analysis ideas for tuning queues

Published in:

IEEE Transactions on Automatic Control  (Volume:40 ,  Issue: 7 )