Skip to Main Content
Simple recurrent error backpropagation networks have been widely used to learn temporal sequence data, including regular and context-free languages. However, the production of relatively large and opaque weight matrices during learning has inspired substantial research on how to extract symbolic human-readable interpretations from trained networks. Unlike feedforward networks, where research has focused mainly on rule extraction, most past work with recurrent networks has viewed them as dynamical systems that can be approximated symbolically by finite-state machine (FSMs). With this approach, the network's hidden layer activation space is typically divided into a finite number of regions. Past research has mainly focused on better techniques for dividing up this activation space. In contrast, very little work has tried to influence the network training process to produce a better representation in hidden layer activation space, and that which has been done has had only limited success. Here we propose a powerful general technique to bias the error backpropagation training process so that it learns an activation space representation from which it is easier to extract FSMs. Using four publicly available data sets that are based on regular and context-free languages, we show via computational experiments that the modified learning method helps to extract FSMs with substantially fewer states and less variance than unmodified backpropagation learning, without decreasing the neural networks' accuracy. We conclude that modifying error backpropagation so that it more effectively separates learned pattern encodings in the hidden layer is an effective way to improve contemporary FSM extraction methods.