Skip to Main Content
This paper investigates learning algorithm design in potential game theoretic cooperative control, where it is in general required for agents' collective action to converge to the most efficient equilibria while standard game theory aims at just computing a Nash equilibrium. In particular, the equilibria maximizing the potential function should be selected in case the utility functions are already aligned to a global objective function. In order to meet the requirement, this paper develops a learning algorithm called Payoff-based Inhomogeneous Partially Irrational Play (PIPIP). The main feature of PIPIP is to allow agents to make irrational decisions with a specified probability, i.e. agents can choose an action with a low utility from the past actions stored in the memory. We then prove convergence in probability of the collective action to the potential function maximizers. Finally, the effectiveness of the present algorithm is demonstrated through simulation on a sensor coverage problem.