Fold recognition based on sequence-derived features is a complex classification problem and usually sequence-derived features are exploited using proper machine learning techniques. Here we adress the task of fold recognition on a protein similarity network (PSN) basis. We construct a protein sequence similarity network (PSeSN) using a set of 125 sequence-derived features for an available set of 311 proteins. PSeSN is optimized by using a Genetic Algorithm (GA) to select the features that construct a PSeSN which is as similar as possible with the corresponding protein structure similarity network (PStSN). A random walk based algorithm is then utilized to recognize the fold of a query protein sequence by calculating its affinities to sequences-vertices both in the initial and the optimized PSeSN. Total accuracy (TA) measurements obtained using 10-fold cross validation show that the use of 48 out of 125 sequence-derived features (optimized PSeSN) yielded better results (mean TA: 0.35 in testing sets) than the initial PSeSN (mean TA: 0.316 in testing sets).
Published in:
BioInformatics and BioEngineering, 2008. BIBE 2008. 8th IEEE International Conference on
Date of Conference: 8-10 Oct. 2008