Abstract:
Figures in scientific papers represent an intuitive and concise way of knowledge presentation. With more attention being paid on full-text mining in bioinformatics, we in...Show MoreMetadata
Abstract:
Figures in scientific papers represent an intuitive and concise way of knowledge presentation. With more attention being paid on full-text mining in bioinformatics, we initiated an effort of studying figures in full articles. FigSearch is a prototype figure legend indexing and classification system, using both text-mining and supervised machine learning. We defined schematic representations of protein interactions and signaling events as an interesting figure type. A maximum entropy classifier was used in categorizing each figure, by assigning an estimated likelihood, as being relevant/non-relevant according to our definition. One advantage of the maximum entropy principle is that it provides a probability of decision, instead of a binary assignment. In our pilot study, FigSearch showed satisfactory performance in a preliminary validation by domain experts. Such a system can be useful in applications such as for a publisher's website, in bio-picture gallery constructions, or as an aid for other complicated text-mining projects.
Published in: Proceedings. 2004 IEEE Computational Systems Bioinformatics Conference, 2004. CSB 2004.
Date of Conference: 19-19 August 2004
Date Added to IEEE Xplore: 08 October 2004
Print ISBN:0-7695-2194-0
Related Articles are not available for this document.