Learning Stochastic Optimal Policies via Gradient Descent | IEEE Journals & Magazine | IEEE Xplore