Skip to Main Content
The personalization of treatment via genetic biomarkers and other risk categories has drawn increasing interest among clinical researchers and scientists. A major challenge here is to construct individualized treatment rules (ITR), which recommend the best treatment for each of the different categories of individuals. In general, ITRs can be constructed using data from clinical trials, however these are generally very costly to run. In order to reduce the cost of learning an ITR, we explore active learning techniques designed to carefully decide whom to recruit, and which treatment to assign, throughout the online conduct of the clinical trial. As an initial investigation, we focus on simple ITRs that utilize a small number of subpopulation categories to personalize treatment. To minimize the maximal uncertainty regarding the treatment effects for each subpopulation, we propose the use of a minimax bandit model and provide an active learning policy for solving it. We evaluate our active learning policy using simulated data and data modeled after a clinical trial involving treatments for depressed individuals. We contrast this policy with other plausible active learning policies. The techniques presented in the paper may be generalized to tackle problems of efficient exploration in other domains.