Policy gradient reinforcement learning method for discrete-time linear quadratic regulation problem using estimated state value function | IEEE Conference Publication | IEEE Xplore