Skip to Main Content
We used human protein-protein interaction (PPI) data transformed into documents to perform text-mining via concept clusters. The advantage of text-mining PPI data is that words (proteins) that are very sparse or over-abundant can be dropped, leaving the remaining bulk of data for clustering and rule mining. Libraries of tissue-specific binary PPIs were constructed from a list of 36,137 binary PPIs in the Human Protein Reference Database(HPRD). A randomization test for intermittency in the form of spikes and holes in frequency distributions of cluster-specific word frequencies was developed using scaled factorial moments. The test was based on a permutation form of a log-linear regression model to determine differences in slopes for ln(F2) vs. ln(M) in the intermittent and null distributions. Significant intermittency (p < 0:0005) in PPI was detected for prostate and testis tissue after a Bonferroni adjustment for multiple tests. The presence of intermittency reflects spikes and holes in histograms of cluster-specific word frequencies and possibly suggests identification of novel large signal transduction pathways or networks.