Abstract:
This article aims to build a probabilistic framework for Howard's policy iteration algorithm using the language of forward–backward stochastic differential equations (FBS...Show MoreMetadata
Abstract:
This article aims to build a probabilistic framework for Howard's policy iteration algorithm using the language of forward–backward stochastic differential equations (FBSDEs). As opposed to conventional formulations based on partial differential equations, our FBSDE-based formulation can be easily implemented by optimizing criteria over sample data and is, therefore, less sensitive to the state dimension. In particular, both on-policy and off-policy evaluation methods are discussed by constructing different FBSDEs. The backward-measurability-loss criterion is then proposed for solving these equations. By choosing specific weight functions in the proposed criterion, we can recover the popular deep BSDE method or the martingale approach for BSDEs. The convergence results are established under both ideal and practical conditions, depending on whether the optimization criteria are decreased to zero. In the ideal case, we prove that the policy sequences produced by the proposed FBSDE-based algorithms and the standard policy iteration have the same performance and, thus, have the same convergence rate. In the practical case, the proposed algorithm is still proved to converge robustly under mild assumptions on optimization errors.
Published in: IEEE Transactions on Automatic Control ( Volume: 69, Issue: 8, August 2024)