Skip to Main Content
A learning automaton is a machine that interacts with a random environment and that simultaneously learns the optimal action that the environment offers to it. Learning automata with variable structure are considered. Such automata are completely defined by a set of probability updating rules. Contrary to all the variable-structure stochastic automata (VSSA) discussed in the literature, which update the probabilities in such a way that an action probability can take any real value in the interval [0,1], the probability space is discretized so as to permit the action probability to assume one of a finite number of distinct values in [0,1]. The discretized automaton is termed linear or nonlinear depending on whether the subintervals of [0,1] are of equal length. It is proven that 1) discretized two-action linear reward-inaction automata are absorbing and Â¿-optimal in all environments; 2) discretized two-action linear inaction-penalty automata are ergodic and expedient in all environments; 3) discretized two-action linear inaction-penalty learning automata with artificially created absorbing barriers are Â¿-optimal in all random environments; and 4) there exist nonlinear discretized reward-inaction automata that are Â¿-optimal in all random environments. The maximum advantage gained by rendering any finite-state discretized automaton nonlinear has also been derived.