Skip to Main Content
Extracting information from biology databases manually can be an overwhelming task. GenBank, the US National Institutes of Health database containing all publicly available DNA sequences, has more than 14 billion bases in 13 million genetic-sequence records. Medline, a literature database available through PubMed, has over 11 million journal citations. In a May 2001 search request for "cytokine" (regulatory proteins in the immune system), PubMed returned 296556 articles. Given the quantity and complexity of biomedical literature, demands for computational tools to extract specific information are increasing. The author reviews biomedical information extraction methods and presents research done by KAIST's natural language processing group on a system that shows encouraging performance using combinatory categorial grammar as a natural language grammar formalism.