Skip to Main Content
The accurate prediction of enzyme catalytic sites remains an open problem in bioinformatics. Recently, several structure-based methods have become popular; however, few robust sequence-only methods have been developed. In this report, we demonstrate that three different feed forward neural networks, trained on a variety of sequence-based properties, can reliably predict enzyme catalytic sites. To the best of our knowledge, this is only the second report using neural networks to predict catalytic sites, and is the first relying solely on sequence-derived information. Scaled conjugate gradient is used during training of the models. The simplest of the models uses only sequence conservation, diversity of position and residue identity within the input. Surprisingly, model accuracy is largely unaffected when sequence-based predictions of structural properties (i.e. solvent accessibility and secondary structure) are added to the input. A similar lack of improvement is observed when evolutionary information in the form of phylogenetic motifs is included. These results are noteworthy because they indicate that routine neural network architectures can accurately predict catalytic using only residue identity and conservation inputs. However, applying these methods on a per protein basis still produces a significant number of false positives, which significantly reduces the model's utility to experimentalists.