Loading [MathJax]/extensions/MathMenu.js
Bias-Corrected Q-Learning With Multistate Extension | IEEE Journals & Magazine | IEEE Xplore

Bias-Corrected Q-Learning With Multistate Extension


Abstract:

Q-learning is a sample-based model-free algorithm that solves Markov decision problems asymptotically, but in finite time, it can perform poorly when random rewards and t...Show More

Abstract:

Q-learning is a sample-based model-free algorithm that solves Markov decision problems asymptotically, but in finite time, it can perform poorly when random rewards and transitions result in large variance of value estimates. We pinpoint its cause to be the estimation bias due to the maximum operator in Q-learning algorithm, and present the evidence of max-operator bias in its Q value estimates. We then present an asymptotically optimal bias-correction strategy and construct an extension to bias-corrected Q-learning algorithm to multistate Markov decision processes, with asymptotic convergence properties as strong as those from Q-learning. We report the empirical performance of the bias-corrected Q-learning algorithm with multistate extension in two model problems: A multiarmed bandit version of Roulette and an electricity storage control simulation. The bias-corrected Q-learning algorithm with multistate extension is shown to control max-operator bias effectively, where the bias-resistance can be tuned predictably by adjusting a correction parameter.
Published in: IEEE Transactions on Automatic Control ( Volume: 64, Issue: 10, October 2019)
Page(s): 4011 - 4023
Date of Publication: 22 April 2019

ISSN Information:

Author image of Donghun Lee
Department of Computer Science, Princeton University, Princeton, NJ, USA
Donghun Lee received the B.A. degree in biochemistry from Columbia University, New York, NY, USA, in 2007, and an M.S. degree in computational biology from Carnegie Mellon University, Pittsburgh, PA, USA, in 2009. He is working toward the Ph.D. degree in computer science in the Department of Computer Science at Princeton University, Princeton, NJ, USA, under the advisement of Professor Warren B. Powell.
He worked with Sams...Show More
Donghun Lee received the B.A. degree in biochemistry from Columbia University, New York, NY, USA, in 2007, and an M.S. degree in computational biology from Carnegie Mellon University, Pittsburgh, PA, USA, in 2009. He is working toward the Ph.D. degree in computer science in the Department of Computer Science at Princeton University, Princeton, NJ, USA, under the advisement of Professor Warren B. Powell.
He worked with Sams...View more
Author image of Warren B. Powell
Department of Operations Research and Financial Engineering, Princeton University, Princeton, NJ, USA
Warren B. Powell (M’06) is a Professor in the Department of Operations Research and Financial Engineering at Princeton University, Princeton, NJ, USA, where he been teaching since 1981. He founded and directs CASTLE Labs (www.castlelab.princeton.edu), specializing in fundamental contributions to computational stochastic optimization with a wide range of applications. He has authored/coauthored over 200 publications and tw...Show More
Warren B. Powell (M’06) is a Professor in the Department of Operations Research and Financial Engineering at Princeton University, Princeton, NJ, USA, where he been teaching since 1981. He founded and directs CASTLE Labs (www.castlelab.princeton.edu), specializing in fundamental contributions to computational stochastic optimization with a wide range of applications. He has authored/coauthored over 200 publications and tw...View more

Author image of Donghun Lee
Department of Computer Science, Princeton University, Princeton, NJ, USA
Donghun Lee received the B.A. degree in biochemistry from Columbia University, New York, NY, USA, in 2007, and an M.S. degree in computational biology from Carnegie Mellon University, Pittsburgh, PA, USA, in 2009. He is working toward the Ph.D. degree in computer science in the Department of Computer Science at Princeton University, Princeton, NJ, USA, under the advisement of Professor Warren B. Powell.
He worked with Samsung Electronics from 2012 to 2016. His research interests include designing efficient learning algorithms in hierarchical online decision making problems.
Donghun Lee received the B.A. degree in biochemistry from Columbia University, New York, NY, USA, in 2007, and an M.S. degree in computational biology from Carnegie Mellon University, Pittsburgh, PA, USA, in 2009. He is working toward the Ph.D. degree in computer science in the Department of Computer Science at Princeton University, Princeton, NJ, USA, under the advisement of Professor Warren B. Powell.
He worked with Samsung Electronics from 2012 to 2016. His research interests include designing efficient learning algorithms in hierarchical online decision making problems.View more
Author image of Warren B. Powell
Department of Operations Research and Financial Engineering, Princeton University, Princeton, NJ, USA
Warren B. Powell (M’06) is a Professor in the Department of Operations Research and Financial Engineering at Princeton University, Princeton, NJ, USA, where he been teaching since 1981. He founded and directs CASTLE Labs (www.castlelab.princeton.edu), specializing in fundamental contributions to computational stochastic optimization with a wide range of applications. He has authored/coauthored over 200 publications and two books. His research interest includes computational stochastic optimization, with applications in energy, transportation, health, and finance.
Warren B. Powell (M’06) is a Professor in the Department of Operations Research and Financial Engineering at Princeton University, Princeton, NJ, USA, where he been teaching since 1981. He founded and directs CASTLE Labs (www.castlelab.princeton.edu), specializing in fundamental contributions to computational stochastic optimization with a wide range of applications. He has authored/coauthored over 200 publications and two books. His research interest includes computational stochastic optimization, with applications in energy, transportation, health, and finance.View more
Contact IEEE to Subscribe

References

References is not available for this document.