Loading [MathJax]/extensions/MathMenu.js
On the design of LQR kernels for efficient controller learning | IEEE Conference Publication | IEEE Xplore

On the design of LQR kernels for efficient controller learning


Abstract:

Finding optimal feedback controllers for nonlinear dynamic systems from data is hard. Recently, Bayesian optimization (BO) has been proposed as a powerful framework for d...Show More

Abstract:

Finding optimal feedback controllers for nonlinear dynamic systems from data is hard. Recently, Bayesian optimization (BO) has been proposed as a powerful framework for direct controller tuning from experimental trials. For selecting the next query point and finding the global optimum, BO relies on a probabilistic description of the latent objective function, typically a Gaussian process (GP). As is shown herein, GPs with a common kernel choice can, however, lead to poor learning outcomes on standard quadratic control problems. For a firstorder system, we construct two kernels that specifically leverage the structure of the well-known Linear Quadratic Regulator (LQR), yet retain the flexibility of Bayesian nonparametric learning. Simulations of uncertain linear and nonlinear systems demonstrate that the LQR kernels yield superior learning performance.
Date of Conference: 12-15 December 2017
Date Added to IEEE Xplore: 22 January 2018
ISBN Information:
Conference Location: Melbourne, VIC, Australia

I. Introduction

A core problem of learning control is to determine optimal feedback controllers for (partially) unknown nonlinear systems from experimental data. Reinforcement learning (RL) [1], [2] is a promising framework for this, yet often requires performing many experiments on the physical system to even find suitable controllers, which limits the applicability of such techniques. Therefore, a lot of research effort has been invested into data efficiency of RL aiming at learning controllers from as few experiments as possible. Recently, Bayesian optimization (BO) has been proposed for RL as a promising approach in this direction. BO employs a probabilistic description of the latent objective function (typically a Gaussian process (GP)), which allows for selecting next control experiments in a principled manner, e.g., to maximize information gain [3] or perform safe exploration [4].

Contact IEEE to Subscribe

References

References is not available for this document.