Skip to Main Content
This paper provides the first convergence proof for fuzzy reinforcement learning (FRL) as well as experimental results supporting our analysis. We extend the work of Konda and Tsitsiklis, who presented a convergent actor-critic (AC) algorithm for a general parameterized actor. In our work we prove that a fuzzy rulebase actor satisfies the necessary conditions that guarantee the convergence of its parameters to a local optimum. Our fuzzy rulebase uses Takagi-Sugeno-Kang rules, Gaussian membership functions, and product inference. As an application domain, we chose a difficult task of power control in wireless transmitters, characterized by delayed rewards and a high degree of stochasticity. To the best of our knowledge, no reinforcement learning algorithms have been previously applied to this task. Our simulation results show that the ACFRL algorithm consistently converges in this domain to a locally optimal policy.