Maximizing the average reward in episodic reinforcement learning tasks | IEEE Conference Publication | IEEE Xplore