By Topic

Localization site prediction for membrane proteins by integrating rule and SVM classification

Sign In

Cookies must be enabled to login.After enabling cookies , please use refresh or reload or ctrl+f5 on the browser for the login options.

Formats Non-Member Member
$31 $13
Learn how you can qualify for the best price for this item!
Become an IEEE Member or Subscribe to
IEEE Xplore for exclusive pricing!
close button

puzzle piece

IEEE membership options for an individual and IEEE Xplore subscriptions for an organization offer the most affordable access to essential journal articles, conference papers, standards, eBooks, and eLearning courses.

Learn more about:

IEEE membership

IEEE Xplore subscriptions

2 Author(s)
Zhou, S. ; Sch. of Comput. Sci., Simon Fraser Univ., Burnaby, BC, Canada ; Ke Wang

We study the localization prediction of membrane proteins for two families of medically important disease-causing bacteria, called gram-negative and gram-positive bacteria. Each such bacterium has its cell surrounded by several layers of membranes. Identifying where proteins are located in a bacterial cell is of primary research interest for antibiotic and vaccine drug design. This problem has three requirements: First, with any subsequence of amino acid residues being potentially a dimension, it has an extremely high dimensionality, few being irrelevant. Second, the prediction of a target localization site must have a high precision in order to be useful to biologists, i.e., at least 90 percent or even 95 percent, while recall is as high as possible. Achieving such a precision is made harder by the fact that target sequences are often much fewer than background sequences. Third, the rationale of prediction should be understandable to biologists for taking actions. Meeting all these requirements presents a significant challenge in that a high dimensionality requires a complex model that is often hard to understand. The support vector machine (SVM) model has an outstanding performance in a high-dimensional space, therefore, it addresses the first two requirements. However, the SVM model involves many features in a single kernel function, therefore, it does not address the third requirement. We address all three requirements by integrating the SVM model with a rule-based model, where the understandable if-then rules capture "major structures" and the elaborated SVM model captures "subtle structures". Importantly, the integrated model preserves the precision/ recall performance of SVM and, at the same time, exposes major structures in a form understandable to the human user. We focus on searching for high quality rules and partitioning the prediction between rules and SVM so as to achieve these properties. We evaluate our method on several membrane localization problems. The purpose of this paper is not improving the precision/recall of SVM, but is manifesting the rationale of a SVM classifier through partitioning the classification between if-then rules and the SVM classifier and preserving the precision/recall of SVM.

Published in:

Knowledge and Data Engineering, IEEE Transactions on  (Volume:17 ,  Issue: 12 )