Abstract:
248 IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. IT-24, NO. 2, MARCH 1978 automaton that is close to optimal and eliminates the need for artificial randomization was als...Show MoreMetadata
Abstract:
248 IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. IT-24, NO. 2, MARCH 1978 automaton that is close to optimal and eliminates the need for artificial randomization was also provided. This automaton is close to optimal in the sense that it requires at most 2 extra bits of memory, independent of m, to match the performance of the optimal randomized m-state automaton for all PA and PB. Both the problems studied here, however, involve only 2 coins. How to extend the results of this paper to situations where more than 2 coins are involved is an open question. Some ad hoc expedient automata are available in the literature [6], [7]. Before an optimal solution to the many-armed bandit problem is possible, the problem of multiple hypothesis testing with finite memory needs to be solved. For some recent results concerning this problem, see [13]. Further, finite time finite memory solutions to these problems are of interest. Vasilev [14] and Witten [15] studied the finite time behavior of some solutions to the TABP. No optimal solution, however, is available. Some recent progress has been reported by Cover et al. [16]. ACKNOWLEDGMENT The authors thank the referees for comments which helped to improve the paper. APPENDIX Denote by p(a;pA,PB) the asymptotic proportion of heads achieved, given the coins A and B and the automaton a. Even though p (a;pA,PB) is maximized over all m-state automata if and only if r (a;pA,PB) is maximized, maximizing {inf p (a;pA,PB)} is not necessarily equivalent to maximizing linf r(a;pA,Ps)} where the infimum is over {(PA,PS)}. In fact, for the TABPO, where PS is known precisely, an automaton that maximizes {infp(a;pA,PB)} tosses coin B exclusively. Furthermore, this automaton is not even expedient, and thus in some sense this solution is unsatisfactory. REFERENCES [1] H. Robbins, "Some Aspects of the sequential design of experiments," Bull. Am. Math. Soc., vol. 58, pp. 527-535, 1952. [2] H. Robbins, "A sequential decision problem with a finite memor...
Published in: IEEE Transactions on Information Theory ( Volume: 24, Issue: 2, March 1978)