Mirror decent algorithm for a multi-armed bandit governed by a stationary finite state Markov chain | IEEE Conference Publication | IEEE Xplore