Skip to Main Content
This paper addresses the protein classification problem, andexplores how its accuracy can be improved by using information fromtime-course gene expression data. The methods are tested on datafrom the most deadly species of the parasite responsible for malariainfections, Plasmodium falciparum. Even though avaccination for Malaria infections has been under intense study formany years, more than half of Plasmodiumproteins still remain uncharacterized and therefore are exemptedfrom clinical trials. The task is further complicated by arapid life cycle of the parasite, thus making precisetargeting of the appropriate proteins for vaccination a technicalchallenge. We propose to integrate protein-protein interactions (PPIs),sequence similarity, metabolic pathway, andgene expression, to produce a suitable set of predicted proteinfunctions for P.falciparum. Further,we treat gene expression data withrespect to various changes that occur during the five phases of theintraerythrocytic developmental cycle (IDC) (as determinedby our segmentation algorithm) ofP.falciparum and show that this analysis yields asignificantly improved protein function prediction, e.g., whencompared to analysis based on Pearson correlation coefficients seenin the data. The algorithm is able to assign ``meaningful''functions to 628 out of 1439 previously unannotated proteins, whichare first-choice candidates for experimental vaccine research.
Date of Conference: 3-5 Nov. 2008