An Adaptive Policy Evaluation Network Based on Recursive Least Squares Temporal Difference With Gradient Correction | IEEE Journals & Magazine | IEEE Xplore