A system has been developed to extract bibliographic data (grant numbers and databank accession numbers) from online biomedical journal articles for the National Library of Medicine's MEDLINEreg database. Rule-based algorithms and a string matching algorithm are proposed to extract the bibliographic data from HTML-formatted articles. Experiments conducted with 411 medical articles from 73 journal issues show an accuracy exceeding 96%
Published in:
Computer-Based Medical Systems, 2006. CBMS 2006. 19th IEEE International Symposium on
Date of Conference: 0-0 0