Skip to Main Content
Bibliographical information of scientific papers is of great value since the Science Citation Index is introduced to measure research impact. Most scientific documents available on the web are unstructured or semi-structured, and the automatic reference metadata extraction process becomes an important task. This paper describes a framework for automatic reference metadata extraction from scientific papers. Our system can extract title, author, journal, volume, year, and page from scientific papers in PDF. We utilize a document metadata knowledge base to guide the reference metadata extraction process. The experiment results show that our system achieves a high accuracy.