Skip to Main Content
Rescue Simulation System is an example of multi-agent systems in which we encounter many challenges. One of these challenges is to having Tradeoff between exploration and exploitation in path planning phase. In this paper we present an exploration method based on variable structure S model learning automaton which uses the entropy of action's probability vector as a criteria to give reward or to penalize its selected action. This method can leads agents to establish a logical balance between exploration and exploitation too. The results show that the proposed method has good performance from both exploration and acquired final score point of view in rescue simulation system.