By Topic

Reinforcement Landmark Learning

Sign In

Cookies must be enabled to login.After enabling cookies , please use refresh or reload or ctrl+f5 on the browser for the login options.

Formats Non-Member Member
$15 $15
Learn how you can qualify for the best price for this item!
Become an IEEE Member or Subscribe to
IEEE Xplore for exclusive pricing!
close button

puzzle piece

IEEE membership options for an individual and IEEE Xplore subscriptions for an organization offer the most affordable access to essential journal articles, conference papers, standards, eBooks, and eLearning courses.

Learn more about:

IEEE membership

IEEE Xplore subscriptions

4 Author(s)

Collett, Cartwright and Smith (1986) trained gerbils to find a hidden food reward at a fixed location relative to an arrangement of cylindrical landmarks. Having learnt where the food is, animals are tested by probe trials in which the food is absent, and their search pattern recorded. Testing with modified arrangements of the landmarks provides information about the computations underlying the animals behaviour. Experiments involving two and three landmarks are simulated using simple and reactive anirnats, embedded within a bounded, 2-D environment. Internal processing maps the sensory array, through a convolution network, to a topography preserving motor array that stochastically determines the direction of movement. Temporal difference reinforcement learning modifies the convolution network in response to a reinforcement signal received only at the goal location. These experiments are simulated with landmark distance coded as either a 1-D intensity array, or a 2-D vector array, plus a simple compass sense. Vector coding animats significantly outperform those using intensity coding and do so with fewer hidden units. More importantly, in the three landmark task, vector coding animats search behaviour closely matches that of gerbils when tested with modified landmark arrangements. This paper provides further evidence that complex spatial navigation behaviour need not be predicated on complex and navigation specific computations.