Conferences >2005 International Conference...

Automatic content based title extraction for Chinese documents using support vector machine

Download PDF
Download References
Request Permissions
Save to
Alerts

Abstract:

In this paper, a content-based and domain-independent method for automatically extracting titles from Chinese research papers is proposed. The information contained in th...Show More

Metadata

Abstract:

In this paper, a content-based and domain-independent method for automatically extracting titles from Chinese research papers is proposed. The information contained in the title itself and the similarity between the title and the body of the paper is exploited, under the condition that the experiment is carried out on plain texts in which no any format information such as font is used. A list of words only used in Chinese titles and a list of words never used in Chinese titles are further collected to facilitate the title extraction. We use the support vector machine classifier to perform a robust and more adaptable automatic title extraction. The method achieves good performance on a test set consisting of 2438 research papers which cover almost all of the academic disciplines.

Published in: 2005 International Conference on Natural Language Processing and Knowledge Engineering

Date of Conference: 30 October 2005 - 01 November 2005

Date Added to IEEE Xplore: 27 February 2006

Print ISBN:0-7803-9361-9

DOI: 10.1109/NLPKE.2005.1598799

Conference Location: Wuhan, China