Scheduled Maintenance on March 25th, 2017:
Single article purchases and IEEE account management will be unavailable from 4:00 AM until 6:30 PM (ET). We apologize for the inconvenience.
By Topic

Reinforcement Landmark Learning

Sign In

Cookies must be enabled to login.After enabling cookies , please use refresh or reload or ctrl+f5 on the browser for the login options.

The purchase and pricing options are temporarily unavailable. Please try again later.
4 Author(s)

Collett, Cartwright and Smith (1986) trained gerbils to find a hidden food reward at a fixed location relative to an arrangement of cylindrical landmarks. Having learnt where the food is, animals are tested by probe trials in which the food is absent, and their search pattern recorded. Testing with modified arrangements of the landmarks provides information about the computations underlying the animals behaviour. Experiments involving two and three landmarks are simulated using simple and reactive anirnats, embedded within a bounded, 2-D environment. Internal processing maps the sensory array, through a convolution network, to a topography preserving motor array that stochastically determines the direction of movement. Temporal difference reinforcement learning modifies the convolution network in response to a reinforcement signal received only at the goal location. These experiments are simulated with landmark distance coded as either a 1-D intensity array, or a 2-D vector array, plus a simple compass sense. Vector coding animats significantly outperform those using intensity coding and do so with fewer hidden units. More importantly, in the three landmark task, vector coding animats search behaviour closely matches that of gerbils when tested with modified landmark arrangements. This paper provides further evidence that complex spatial navigation behaviour need not be predicated on complex and navigation specific computations.