Skip to Main Content
Model-Based Reinforcement Learning (MBRL) can greatly profit from using world models for estimating the consequences of selecting particular actions: an animat can construct such a model from its experiences and use it for computing rewarding behavior. We study the problem of collecting useful experiences through exploration in stochastic environments. Towards this end we use MBRL to maximize exploration rewards (in addition to environmental rewards) for visits of states that promise information gain. We also combine MBRL and the Interval Estimation algorithm (Kaelbling, 1993). Experimental results demonstrate the advantages of our approaches.