Batch Reinforcement Learning With a Nonparametric Off-Policy Policy Gradient | IEEE Journals & Magazine | IEEE Xplore