Close category search window
 

Dynamics of Learning in Multilayer Perceptrons Near Singularities

Sign In

Cookies must be enabled to login.After enabling cookies , please use refresh or reload or ctrl+f5 on the browser for the login options.

Formats Non-Member Member
$31 $13
Learn how you can qualify for the best price for this item!
Become an IEEE Member or Subscribe to
IEEE Xplore for exclusive pricing!
close button

puzzle piece

IEEE membership options for an individual and IEEE Xplore subscriptions for an organization offer the most affordable access to essential journal articles, conference papers, standards, eBooks, and eLearning courses.

Learn more about:

IEEE membership

IEEE Xplore subscriptions

3 Author(s)
Cousseau, F. ; Brain Sci. Inst., Amari Unit for Math. Neurosci., RIKEN, Wako ; Ozeki, T. ; Amari, S.-I.

The dynamical behavior of learning is known to be very slow for the multilayer perceptron, being often trapped in the "plateau." It has been recently understood that this is due to the singularity in the parameter space of perceptrons, in which trajectories of learning are drawn. The space is Riemannian from the point of view of information geometry and contains singular regions where the Riemannian metric or the Fisher information matrix degenerates. This paper analyzes the dynamics of learning in a neighborhood of the singular regions when the true teacher machine lies at the singularity. We give explicit asymptotic analytical solutions (trajectories) both for the standard gradient (SGD) and natural gradient (NGD) methods. It is clearly shown, in the case of the SGD method, that the plateau phenomenon appears in a neighborhood of the critical regions, where the dynamical behavior is extremely slow. The analysis of the NGD method is much more difficult, because the inverse of the Fisher information matrix diverges. We conquer the difficulty by introducing the "blow-down" technique used in algebraic geometry. The NGD method works efficiently, and the state converges directly to the true parameters very quickly while it staggers in the case of the SGD method. The analytical results are compared with computer simulations, showing good agreement. The effects of singularities on learning are thus qualitatively clarified for both standard and NGD methods.

Published in:
Neural Networks, IEEE Transactions on  (Volume:19 ,  Issue: 8 )

Date of Publication: Aug. 2008

Need Help?


IEEE Advancing Technology for Humanity About IEEE Xplore | Contact | Help | Terms of Use | Nondiscrimination Policy | Site Map | Privacy & Opting Out of Cookies

A not-for-profit organization, IEEE is the world's largest professional association for the advancement of technology.
© Copyright 2013 IEEE - All rights reserved. Use of this web site signifies your agreement to the terms and conditions.