Skip to Main Content
Two new models of nonstationary random environments whose response characteristics depend on the actios performed on them are intoduced in this paper. The models appear interesting because no one action is optimal and all the actions have to be chosen successively. A preliminary analysis of a linear learning automaton acting in such environments is presented, and certain mathematical questions of convergence which arise are brought to light. The analysis reveals that the automaton tends to equalize the penalty probabilities. Simulation studies of the abstract models as well as the routing of calls in telephone networks appear to reinforce the analytical results and the relevance of such models.
Systems, Man and Cybernetics, IEEE Transactions on (Volume:10 , Issue: 5 )
Date of Publication: May 1980