Skip to Main Content
In this article, a new database that relates structural information from proteins in protein data bank to closely related protein sequences in humans was developed. Because the match criteria are extremely stringent, the structure of proteins in other species to infer characteristics of the human proteins was used. As a demonstration of the approach, this database has been applied to the problem of identifying likely trypsin miscleavage sites, a significant problem in proteomics. However, the approach is very general, and can be used to answer many kinds of structural questions (including questions related to posttranslational modifications). The study found that both the surface area and the secondary structure of cleavage sites have highly statistically significant effects on trypsin cleavage. The results of this analysis do not, however, suggest that surface area or secondary structure properties of particular peptides can be used to predict miscleavage sites, at least at a global level. This analysis of cleavage sites demonstrates the general power of homology-based techniques, in which the characteristics of a single protein that has a structure that has been solved can be used to infer properties of other proteins. We expect that our database of related proteins, structures, and sequences and our ability to query experimentally determined sets of peptides against this database will allow us to answer many other questions relation to global protein expression and modification.