By Topic

Pruning and growing hierachical mixtures of experts

Sign In

Cookies must be enabled to login.After enabling cookies , please use refresh or reload or ctrl+f5 on the browser for the login options.

Formats Non-Member Member
$33 $33
Learn how you can qualify for the best price for this item!
Become an IEEE Member or Subscribe to
IEEE Xplore for exclusive pricing!
close button

puzzle piece

IEEE membership options for an individual and IEEE Xplore subscriptions for an organization offer the most affordable access to essential journal articles, conference papers, standards, eBooks, and eLearning courses.

Learn more about:

IEEE membership

IEEE Xplore subscriptions

2 Author(s)
S. R. Waterhouse ; Cambridge Univ., UK ; A. J. Robinson

The `hierarchical mixture of experts' (HME) is a tree-structured statistical model that is an alternative to multilayer perceptrons. Its training algorithm consists of a number of forward and backward passes through the tree. These are computationally expensive, especially when the trees are large. To reduce the computation, we may either allow the network to find its own structure in a constructive manner (tree growing) or consider only the most likely paths through the tree (path pruning). Pruning keeps the number of parameters constant but considers only the most likely paths through the tree at any time; this leads to significant speedups in training and evaluation. In the growing algorithm, we start with a small tree and apply a splitting criterion based on maximum likelihood to each terminal node. After splitting the best node according to this criterion, we retrain the tree for a set number of iterations, or until there is no further increase in likelihood, at which point the tree is grown again. This results in a flexible architecture which is both faster to train and more efficient in terms of its parameters. To aid the convergence of these algorithms, it is beneficial to introduce regularization into the HME, which stops the evolution of large weights which would otherwise cause branches of the tree to be pinched off. This also aids generalization, as we demonstrate on a toy regression problem. Results for the growing and pruning algorithms show significant speedups over conventional algorithms in discriminating between two interlocking spirals and classifying 8-bit parity patterns

Published in:

Artificial Neural Networks, 1995., Fourth International Conference on

Date of Conference:

26-28 Jun 1995