Temporal Difference Learning is an important class of incremental learning procedures which learn to predict outcomes of sequential processes through experience. Although these algorithms have been used in a variety of notorious intelligent systems such as Samuel's checker-player and Tesauro's Backgammon program, their convergence properties remain poorly understood. This paper provides a brief summary of the theoretical basis for these algorithms and documents observed convergence performance in a variety of experiments. The implications of these results are also briefly discussed
Published in:
Aerospace and Electronics Conference, 1996. NAECON 1996., Proceedings of the IEEE 1996 National
(Volume:2
)
Date of Conference: 20-23 May 1996