Skip to Main Content
Balancing between exploration and exploitation with adaptation of the Q-learning (QL) parameters to the condition of dynamic uncertain environment has always been a significant subject of interest in the context of reinforcement learning. The peculiarities of the electricity market have provided such complex dynamic economic environment, and consequently have increased the requirement for advancement of the learning methods. In this economic system, the agent's market power plays a vital role in bidding decision-making problem. In order to improve the QL method, as main idea, adaptation of its parameters to the market power is proposed for making a good balance between exploration and exploitation. To implement this adaptation process, due to the fuzzy nature of human's decision-making process, a fuzzy system is designed to map each agent's market power into the QL parameters. Therefore, a fuzzy QL method is developed to model the power supplier's strategic bidding behavior in a computational electricity market. In the simulation framework, the QL algorithm selects the power supplier's bidding strategy according to the past experiences and the values of the parameters, which show the human's risk characteristic. The application of the proposed methodology for the power supplier in a multiarea power system shows the performance improvement in comparison to the QL with fixed parameters.