Infinite-Horizon Policy-Gradient Estimation with Variable Discount Factor for Markov Decision Process | IEEE Conference Publication | IEEE Xplore