Skip to Main Content
We propose novel multi-armed bandit (explore/exploit) schemes to maximize total clicks on a content module published regularly on Yahoo! Intuitively, one can "explore'' each candidate item by displaying it to a small fraction of user visits to estimate the item's click-through rate (CTR), and then "exploit'' high CTR items in order to maximize clicks. While bandit methods that seek to find the optimal trade-off between explore and exploit have been studied for decades, existing solutions are not satisfactory for Web content publishing applications where dynamic set of items with short lifetimes, delayed feedback and non-stationary reward (CTR) distributions are typical. In this paper, we develop a Bayesian solution and extend several existing schemes to our setting. Through extensive evaluation with nine bandit schemes, we show that our Bayesian solution is uniformly better in several scenarios. We also study the empirical characteristics of our schemes and provide useful insights on the strengths and weaknesses of each. Finally, we validate our results with a "side-by-side'' comparison of schemes through live experiments conducted on a random sample of real user visits to Yahoo!