Skip to Main Content
We compared computational models and human performance on learning to solve a high-level, planning-intensive problem. Humans and models were subjected to three learning regimes: reinforcement, imitation, and instruction. We modeled learning by reinforcement (rewards) using SARSA, a softmax selection criterion and a neural network function approximator; learning by imitation using supervised learning in a neural network; and learning by instructions using a knowledge-based neural network. We had previously found that human participants who were told if their answers were correct or not (a reinforcement group) were less accurate than participants who watched demonstrations of successful solutions of the task (an imitation group) and participants who read instructions explaining how to solve the task. Furthermore, we had found that humans who learn by imitation and instructions performed more complex solution steps than those trained by reinforcement. Our models reproduced this pattern of results.