Learning and development are essential processes for an animat to adapt itself to environmental changes so as to accomplish a given task. This paper proposes a single mechanism for learning and self-improvement that results in learning curves similar to the “U-shape” phenomena observed in several psychological experiments concerning the human learning process such as in language acquisition. The basic idea is that (1) the animat monitors its success rate in goal achievement so as to perceive environmental changes instead of relying on signals from a teacher, and (2) in order to reuse acquired knowledge and accelerate reinforcement learning, the animat does not memorize the action values but transfers only the learned policy. The resultant policy (a state transition map where transitions indicate the best actions) may not be optimal in any given environment but it may be able to better handle differences between environments. We apply this model to a mobile robot navigation problem for which the task is to reach the target while avoiding obstacles by means of uninterpreted sonar and visual information. Our experimental results demonstrate the validity of the model.