Given two proteins in a living system, can we predict whether they are interacting with each other merely according to the information of their sequences? This is an interesting problem because knowledge of protein-protein interactions may be applied to protein subunit aggregation, to computer-aided drug design, and to fundamental problems of cellular signalling and expression. With the explosion of newly-found protein sequences in the post-genomic era, its importance has become self-evident, and the challenge to address it even more urgent. Based on the pseudo amino acid composition (Chou, K.C.: PROTEINS: Structure, Function, and Genetics, 43: 246- 255, 2001) approach and nearest neighbor rule, a predictor called "NN-PseAA" classifier was developed to deal with this problem. As a showcase, prediction was performed on 8,797 fruitfly protein pairs. To avoid redundancy and homology bias, none of the protein pairs investigated has > 25% sequence identity with any other. The overall success rate obtained by jackknife cross-validation for such a stringent dataset was 73.74%, indicating a quite promising sign of the new approach in stimulating the development of this important area and other related areas.
Published in:
BioMedical Engineering and Informatics, 2008. BMEI 2008. International Conference on
(Volume:1
)
Date of Conference: 27-30 May 2008