Convergence of the Q-ae learning under deterministic MDPs and its efficiency under the stochastic environment | IEEE Conference Publication | IEEE Xplore