Skip to Main Content
The prediction of protease cleavage sites in proteins is critical to effective drug design. One of the important issues in constructing an accurate and efficient predictor is how to present nonnumerical amino acids to a model effectively. As this issue has not yet been paid full attention and is closely related to model efficiency and accuracy, we present a novel neural learning algorithm aimed at improving the prediction accuracy and reducing the time involved in training. The algorithm is developed based on the conventional radial basis function neural networks (RBFNNs) and is referred to as a bio-basis function neural network (BBFNN). The basic principle is to replace the radial basis function used in RBFNNs by a novel bio-basis function. Each bio-basis is a feature dimension in a numerical feature space, to which a nonnumerical sequence space is mapped for analysis. The bio-basis function is designed using an amino acid mutation matrix verified in biology. Thus, the biological content in protein sequences can be maximally utilized for accurate modeling. Mutual information (MI) is used to select the most informative bio-bases and an ensemble method is used to enhance a decision-making process, hence, improving the prediction accuracy further. The algorithm has been successfully verified in two case studies, namely the prediction of Human Immunodeficiency Virus (HIV) protease cleavage sites and trypsin cleavage sites in proteins.