This paper is concerned with the dynamic pricing problems of a duopoly case in electronic retail markets. Combined with the concept of performance potential, the simulated annealing Q-learning (SA-Q) and the win-or-learn-fast policy hill climbing algorithm (WoLF-PHC) are used to solve the learning problems of multi-agent systems with either average- or discounted-reward criteria, under the case that only partial information about the opponent is known. The simulation results show that the WoLF-PHC algorithm performs well in adapting environment's change and in deriving better learning values than the SA-Q algorithm.
Published in:
Automation and Logistics, 2009. ICAL '09. IEEE International Conference on
Date of Conference: 5-7 Aug. 2009