Skip to Main Content
We found in previous work that the error surfaces of recurrent networks have spurious valleys that can cause significant difficulties in training these networks. Our earlier work focused on single-layer networks. In this paper, we extend the previous results to general layered digital dynamic networks. We describe two types of spurious valleys that appear in the error surfaces of these networks. These valleys are not affected by the desired network output (or by the problem that the network is trying to solve). They depend only on the input sequence and the architecture of the network. The insights gained from this analysis suggest procedures for improving the training of recurrent neural networks.