Scheduled System Maintenance on May 29th, 2015:
IEEE Xplore will be upgraded between 11:00 AM and 10:00 PM EDT. During this time there may be intermittent impact on performance. We apologize for any inconvenience.
By Topic

The classification of a protein from its primary sequence using functional and structural-specific PSSMs in quantitative measurement

Sign In

Cookies must be enabled to login.After enabling cookies , please use refresh or reload or ctrl+f5 on the browser for the login options.

Formats Non-Member Member
$31 $13
Learn how you can qualify for the best price for this item!
Become an IEEE Member or Subscribe to
IEEE Xplore for exclusive pricing!
close button

puzzle piece

IEEE membership options for an individual and IEEE Xplore subscriptions for an organization offer the most affordable access to essential journal articles, conference papers, standards, eBooks, and eLearning courses.

Learn more about:

IEEE membership

IEEE Xplore subscriptions

4 Author(s)
Kyung Dae Ko ; Dept. of Biol., Penn State Univ., University park, PA, USA ; Yoo Jin Hong ; van Rossum, D.B. ; Patterson, R.L.

In principle, the amino acid sequence of a protein contains structural, functional, and evolutionary characteristics. Investigation of these characteristics using computational methods provides a powerful resource. However, these methods have limitations in their ability to annotate the characteristics of proteins accurately. In an attempt to overcome this drawback, we have developed a unified computational pipeline, called the Gestalt domain detection algorithm basic local alignment tool (GDDA-BLAST), for measuring the structural, functional and evolutionary characteristics of a protein. The performance of GDDA-BLAST is better than those of other method such as SAM and psi-BLAST in homology detection. Using GDDA-BLAST, we implemented a classification library to find quantitative thresholds capable of inferring protein function. Using this library, we first identified RNA-binding proteins (RBPs) containing structural unique motifs by 2695 expanded position specific scoring metric (PSSM) profiles in a testing dataset with 37 positive and 118 negative sequences. We achieved 100% specificity, 96.8% accuracy, and 86.5% sensitivity. For the specific nucleotide binding folds (dsRNA vs. dsDNA, dsRNA vs. dsDNA, and ssRNA vs. ssDNA), our results exceeded those of obtained using support vector machine (SVM) learning algorithms. Using this method, we also identified 29 and 168 novel RBPs in yeast and human proteomes. We extend our experiment to additional protein functions, such as Ankyrin-repeat (ANK), integral lipid-binding(ILB), and calmodulin(CaM)-binding. For ANK, 449 ANK PSSMs are used to measure 126 negative and 32 postive sequences. And, for ILB and CaM-binding, we had used 24,378 PSSMs to measure 24 negatives and 32 positives, and 820 PSSMs used to measure 17 negatives and 65 positives, respectively. By ROC curve analysis,calmodulin we achieved ~100%, ~93%, ~72% sensitivity at false positive rate ~10%, for ANK, ILB, and CaM-binding classification. The result again con- firmed that we can classify the proteins using function-specific PSSM sets. We believe that the performance can be improved with more carefully curated PSSM sets. All of these results suggest that this method can be used to create PSSM databases for the quantitative measurement and classification of any protein function.

Published in:

Bioinformatics and Biomedicine Workshop, 2009. BIBMW 2009. IEEE International Conference on

Date of Conference:

1-4 Nov. 2009