Skip to Main Content
In the above paper by Ergezinger and Thomsen (ibid. vol.6 (1991)), a new method for training multilayer perceptron, called optimization layer by layer (OLL), was introduced. The present paper analyzes the performance of OLL. We show, from theoretical considerations, that the amount of work required with OLL-learning scales as the third power of the network size, compared with the square of the network size for commonly used conjugate gradient (CG) training algorithms. This theoretical estimate is confirmed through a practical example. Thus, although OLL is shown to function very well for small neural networks (less than about 500 weights per layer), it is slower than CG for large neural networks. Next, we show that OLL does not always improve on the accuracy that can be obtained with CG. It seems that the final accuracy that can be obtained depends strongly on the initial network weights.