Skip to Main Content
A new class of P-model absorbing learning automata is introduced. The proposed automata are based on the use of a stochastic estimator in order to achieve a rapid and accurate convergence when operating in stationary random environments. According to the proposed stochastic estimator scheme, the estimates of the reward probabilities of actions are not strictly dependent on the environmental responses. The dependence between the stochastic estimates and the deterministic ones is more relaxed for actions that have been selected only a few times. In this way, actions that have been selected only a few times, have the opportunity to be estimated as "optimal," to increase their choice probability and consequently, to be selected. In this way, the estimates become more reliable and consequently, the automaton rapidly and accurately converges to the optimal action. The asymptotic behavior of the proposed scheme is analyzed and it is proved to be ε-optimal in every stationary random environment. Furthermore, extensive simulation results are presented that indicate that the proposed stochastic estimator scheme converges faster than the deterministic-estimator-based DPRI and DGPA schemes when operating in stationary P-model random environments.