Skip to Main Content
Policy improvement in discrete event dynamic systems is usually based on simulation, which is time-consuming and provides only noisy performance evaluation. For a given system state, it is of great practical interest to understand how to allocate the computing budget among action candidates so that the best action is correctly selected with high probability. Despite the abundant studies on simulation-based policy optimization, few consider this important allocation problem, which is considered in this paper. We develop the method of optimal computing budget allocation for policy improvement (OCBAPI) which is shown to asymptotically maximize a lower bound of the probability of correctly selecting the best action. OCBAPI can also be used when there are multiple base policies available. This allocation procedure is compared with equal allocation and proportional-to-variance on an academic toy example and an engine maintenance policy optimization problem. The numerical results show that even when there are only finite computing budget to allocate, OCBAPI performs well. We hope this work brings insight to computing budget allocation for simulation-based policy improvement in more general situations.