Skip to Main Content
A PubMed search often returns a long list of queryrelated papers that a researcher cannot cope with in a short time. As a first step to address this issue by summarizing retrieved papers, we developed a system to classify sentences of abstracts obtained from the MEDLINE database into five rhetorical statuses: background, purpose, method, result, or conclusion. We used Support Vector Machine (SVM) classifiers and trained each of them for a different rhetorical status on structured abstracts. A structured abstract is one that has labels indicating rhetorical statuses of the sentences, while an unstructured abstract does not. The classifiers were tested on both structured and unstructured abstracts. The former were randomly obtained from the MEDLINE database and the latter were manually labeled by humans. We compared our method with a previously reported one. In addition, we combined them and evaluated the combined method. Our method outperformed the previously reported one, and the combined method showed even better results. Classified abstracts can be used for multi-document summarization that provides researchers with a way of learning a research topic efficiently and effectively.
Date of Conference: 05-08 April 2005