Skip to Main Content
Reinforcement learning where decision-making agents learn optimal policies through environmental interactions is an attractive paradigm for direct, adaptive controller design. However, results for systems with continuous variables are rare. Here, we generalize a previous work on deterministic linear systems, to stochastic ones, since uncertainty is almost always present and needs to be accounted for to ensure good closed-loop performance. In this work, we present convergence results and also show an example suggesting automatic controller order-reduction. We also highlight key differences between the algorithms for deterministic and stochastic systems.