Actor-Critic Off-Policy Learning for Optimal Control of Multiple-Model Discrete-Time Systems | IEEE Journals & Magazine | IEEE Xplore