Processing math: 100%
Distributional Soft Actor-Critic: Off-Policy Reinforcement Learning for Addressing Value Estimation Errors | IEEE Journals & Magazine | IEEE Xplore

Distributional Soft Actor-Critic: Off-Policy Reinforcement Learning for Addressing Value Estimation Errors


Abstract:

In reinforcement learning (RL), function approximation errors are known to easily lead to the Q -value overestimations, thus greatly reducing policy performance. This ...Show More

Abstract:

In reinforcement learning (RL), function approximation errors are known to easily lead to the Q -value overestimations, thus greatly reducing policy performance. This article presents a distributional soft actor–critic (DSAC) algorithm, which is an off-policy RL method for continuous control setting, to improve the policy performance by mitigating Q -value overestimations. We first discover in theory that learning a distribution function of state–action returns can effectively mitigate Q -value overestimations because it is capable of adaptively adjusting the update step size of the Q -value function. Then, a distributional soft policy iteration (DSPI) framework is developed by embedding the return distribution function into maximum entropy RL. Finally, we present a deep off-policy actor–critic variant of DSPI, called DSAC, which directly learns a continuous return distribution by keeping the variance of the state–action returns within a reasonable range to address exploding and vanishing gradient problems. We evaluate DSAC on the suite of MuJoCo continuous control tasks, achieving the state-of-the-art performance.
Published in: IEEE Transactions on Neural Networks and Learning Systems ( Volume: 33, Issue: 11, November 2022)
Page(s): 6584 - 6598
Date of Publication: 08 June 2021

ISSN Information:

PubMed ID: 34101599

Funding Agency:


Contact IEEE to Subscribe

References

References is not available for this document.