Skip to Main Content
We have previously built a subcellular location image finder (SLIP) system, which extracts information regarding protein subcellular location patterns from both text and images in journal articles. One important task in SLIP is to identify fluorescence microscope images. To improve the performance of this binary classification problem, a set of 7 edge features extracted from images and a set of "bag of words" text features extracted from text have been introduced in addition to the 64 intensity histogram features we have used previously. An overall accuracy of 88.6% has been achieved with an SVM classifier. A co-training algorithm has also been applied to the problem to utilize the unlabeled dataset and it substantially increases the accuracy when the training set is very small but can contribute very little when the training set is large.