Skip to Main Content
The Levenberg-Marquardt algorithm is one of the most common choices for training medium-size artificial neural networks. Since it was designed to solve nonlinear least-squares problems, its applications to the training of neural networks have so far typically amounted to using simple regression even for classification tasks. However, in this case the cross-entropy function, which corresponds to the maximum likelihood estimate of the network weights when the sigmoid or softmax activation function is used in the output layer, is the natural choice of the error function and a convex function of the weights in the output layer. It is an important property which leads to a more robust convergence of any descent-based training method. By constructing and implementing a modified version of the Levenberg-Marquardt algorithm suitable for minimizing the cross-entropy function, we aim to close a gap in the existing literature on neural networks. Additionally, as using the cross-entropy error measure results in one single error value per training pattern, our approach results in lower memory requirements for multi-valued classification problems when compared to the direct application of the algorithm.
Date of Conference: July 31 2011-Aug. 5 2011