Using X-ray crystallography to determine the 3D structure of a protein is a costly and time-consuming process. One of the major reasons is that the protein needs to be purified and crystallized first, and the failure rate of protein crystallization is quite high. Thus it is desired to use a computational method to predict protein crystallizability based on the primary structure information before the whole process starts. This can dramatically lower the average cost for protein structure determination. In this paper, we investigated the feature sets used in previous research. The support vector machine (SVM) was chosen as the predictor. Different weightings are set for the penalty parameters of the two classes to deal with the imbalanced data problem. As a result, a combined set of features is able to produce better results, especially on the specificity.
Published in:
Innovations in Information Technology, 2008. IIT 2008. International Conference on
Date of Conference: 16-18 Dec. 2008