Skip to Main Content
The learning behavior is investigated. The game has three variable-structure stochastic automata and a random environment. In the game the players do not possess prior information concerning the payoff matrix, and at the end of every play all the players update their own strategies on the basis of the response from the random environment. Under such situations, if a payoff matrix satisfies some conditions, it can be shown that the learning behavior of the automata converges to the optimal strategies.
Systems, Man and Cybernetics, IEEE Transactions on (Volume:SMC-14 , Issue: 6 )
Date of Publication: Nov.-Dec. 1984