Skip to Main Content
The work presented in this paper provides a practical, customized learning algorithm for reinforcement learning tasks that evolve episodically over acyclic state spaces. The presented results are motivated by the optimal disassembly planning (ODP) problem described in, and they complement and enhance some earlier developments on this problem that were presented in. In particular, the proposed algorithm is shown to be a substantial improvement of the original algorithm developed in, in terms of, both, the involved computational effort and the attained performance, where the latter is measured by the accumulated reward. The new algorithm also leads to a robust performance gain over the typical Q-learning implementations for the considered problem context.