Abstract:
We consider the problem of learning the optimal robust value function and the optimal robust policy in discounted-reward Robust Markov Decision Process (RMDP). The goal o...Show MoreMetadata
Abstract:
We consider the problem of learning the optimal robust value function and the optimal robust policy in discounted-reward Robust Markov Decision Process (RMDP). The goal of the RMDP framework is to find a policy that is robust to the parameter uncertainties due to the mismatch between the simulator model and real-world settings. While the optimal robust value function and policy can be computed using robust dynamic programming, it requires the exact knowledge of the nominal simulator model and the uncertainty set around it. This paper proposes a model-based robust reinforcement learning algorithm that learns an -optimal robust value function and policy in a finite state and action space setting when the exact knowledge of the nominal simulator model is not known. We assume access to a standard generative sampling model, which can generate next-state samples for all state-action pairs of the nominal simulator model. We give a precise characterization of the sample complexity of obtaining an ϵ-optimal robust value function and policy using our algorithm. Finally, we demonstrate the performance of our algorithm on some benchmark problems.
Published in: 2021 60th IEEE Conference on Decision and Control (CDC)
Date of Conference: 14-17 December 2021
Date Added to IEEE Xplore: 01 February 2022
ISBN Information: