Skip to Main Content
The PCA linear transformation method is used for feature extraction to the secondary structure prediction problem. The method of dimensionality reduction is applied on PSI-Blast profiles built on NCBI's Nonredundant Protein database. Different numbers of components extracted are used as input to three artificial neural networks with 30, 35 or 40 nodes in the hidden layer. Those classifiers are trained with the RPROP algorithm. To estimate the accuracy of the predictor the sevenfold cross-validation method is applied to CB396, a database used previously to evaluate the performance of several predictors. Aiming to increase the efficiency of the predictor presented here, the outputs of the classifiers are combined through five simple rules: product, average, voting, minimum and maximum. This original application for the PCA method derives relevant results. Even with a drastic reduction from 260 to 80 components, the accuracy obtained is at least 1% superior to the best one published for another predictor, the CONSENSUS, a combination of four other predictors. With a reduction from 260 to 180 components the performance is even better, achieving an Q3 accuracy of 74.5%. The results flag the PCA as a promising method for feature extraction in the secondary structure prediction problem.