Skip to Main Content
The past decades witnessed extensive efforts to study the relationships among proteins. Particularly, sequence-based protein-protein interactions (PPIs) prediction is fundamentally important in speeding up the process of mapping interactomes of organisms. The composition vectors are usually constructed to encode proteins as real-value vectors, which is feeding to a machine learning framework. However, the composition vector value might be highly correlated to the distribution of amino acids, i.e., amino acids which are frequently observed in nature tend to have a large value of composition vector. Thus formulation to estimate the noise may be needed during representations. Here, we introduce two kinds of denoising composition vectors, which are efficient in construction of phylogenetic trees, to eliminate the noise. When validating these two denoising composition vectors on Escherichia coli (E.coli) and Saccharomyces cerevisiae (S.cerevisiae) randomly and artificial negative datasets, respectively, the predictive performance is not improved, and even worse than non-denoised prediction. These results suggest that, the denoising formulation efficient in phylogenetic trees construction can not improve the PPIs prediction, that is, what is noise is dependent on the applications.