Abstract:
We model the ad selection task as a multi-armed bandit problem. Standard assumptions in the multi-armed bandit (MAB) setting are that samples drawn from each arm areindep...Show MoreMetadata
Abstract:
We model the ad selection task as a multi-armed bandit problem. Standard assumptions in the multi-armed bandit (MAB) setting are that samples drawn from each arm areindependent and identically distributed, rewards (or conversionrates in our scenario) are stationary and rewards feedback areimmediate. Although the payoff function of an arm is allowed toevolve over time, the evolution is assumed to be slow. Display ads, on the other hand, are regularly created while others are removed from circulation. This can occur when budgets run out, campaign goal changes, holiday season ends and many other latent factors that go beyond the control of the ad selection system. Another big challenge is that the set of available ads is often extremely huge but standard multi-armed bandit strategies converge with linear time complexity that cannot accommodate the usually dynamic changes. Due to the above challenges and the restrictions of the original MAB, we propose a novel dynamic contextual MAB which tightly integrates components of dynamic conversion rates prediction, contextual learning and arm overlapping modeling in a principled framework. Besides we propose an accompaniedmeta analyses framework that allows us to conclude experiments in a more statistically robust manner. We demonstrate on a world leading demand side platform (DSP) that our framework can effectively discriminate premium arms and significantly outperform some standard variations of MAB to these settings.
Date of Conference: 12-15 December 2016
Date Added to IEEE Xplore: 02 February 2017
ISBN Information:
Electronic ISSN: 2374-8486