Skip to Main Content
One oft-envisioned function of search is planning actions (e.g., by exploring routes through a cognitive map). Yet, among the most prominent and quantitatively successful neuroscentific theories of the brain's systems for action choice is the temporal-difference account of the phasic dopamine response. Surprisingly, this theory envisions that action sequences are learned without any search at all, but instead wholly through a process of reinforcement and chaining. This chapter considers recent proposals that a related family of algorithms, called model-based reinforcement learning, may provide a similarly quantitative account for action choice by cognitive search. It reviews behavioral phenomena demonstrating the insufficiency of temporal-difference-like mechanisms alone, then details the many questions that arise in considering how model-based action valuation might be implemented in the brain and in what respects it differs from other ideas about search for planning.