An asymptotically optimal learning controller for finite Markov chains with unknown transition probabilities | IEEE Journals & Magazine | IEEE Xplore