Abstract:
The fashionable DQN algorithm suffers from substantial overestimations of action-state value in reinforcement learning problem, such as games in the Atari 2600 domain and...Show MoreMetadata
Abstract:
The fashionable DQN algorithm suffers from substantial overestimations of action-state value in reinforcement learning problem, such as games in the Atari 2600 domain and path planning domain. To reduce the overestimations of action values during learning, we present a novel combination of double Q-learning and dueling DQN algorithm, and design an algorithm called Variant of Double dueling DQN (V-D D3QN). We focus on the idea behind V-D D3QN algorithm and propose the feasible idea of using two dueling DQN networks to reduce the overestimations of action values during training, and the specific approach is to randomly select one dueling DQN network at each time step to update its parameters, by exploiting the remaining dueling DQN network to determine the update targets. And then we do our experiments in the customized virtual environment-gridmap. Our experiments demonstrate that our proposed algorithm not only reduces the overestimations more efficiently than Double DQN(DDQN) algorithm, but also leads to much better performance on route planning domain with great generalization ability of the new and rapidly changing environments.
Published in: 2018 37th Chinese Control Conference (CCC)
Date of Conference: 25-27 July 2018
Date Added to IEEE Xplore: 07 October 2018
ISBN Information:
Electronic ISSN: 1934-1768