Abstract:
Introduces a reinforcement learning framework based on dynamic programming for a class of control problems, where no explicit terminal state exists. This situation especi...Show MoreMetadata
Abstract:
Introduces a reinforcement learning framework based on dynamic programming for a class of control problems, where no explicit terminal state exists. This situation especially occurs in the context of technical process control: the control task is not terminated once a predefined target value is reached, but instead the controller has to continue to control the system in order to avoid the system's output drifting away from its target value again. We propose a set of assumptions and give a proof for the convergence of the value iteration method. From this a new algorithm, which we call the fixed horizon algorithm, is derived. The performance of the proposed algorithm is compared to an approach that assumes the existence of an explicit terminal state. The application to a cart/double pole-system finally shows the application to a difficult practical control task.
Date of Conference: 04-09 May 1998
Date Added to IEEE Xplore: 06 August 2002
Print ISBN:0-7803-4859-1
Print ISSN: 1098-7576