Skip to Main Content
The National Cancer Institute (NCI) Thesaurus (NCIt) is a biomedical ontology which has been developed for over a decade. Nearly every month from 2003 through 2011, the NCI has published an updated version of the NCIt to the Web as an OWL ontology (as well as in other formats). We collected all 88 OWL versions of the NCIt available and conducted a cross-sectional study on this corpus to investigate and characterize the evolution of the NCIt. In particular, we gathered and analysed various axiom and entity statistics, and carried out a reasoner performance test over the corpus. Additionally, we extracted two complete sets of pairwise, consecutive diffs: the first set was generated by a purely syntactic difference analysis (based on OWL's notion of “structural equivalence”); for the second set, we also checked whether the additions or removals changed the set of entailments between versions. We discovered a high level of “merely syntactic” removals and additions. We develop a categorization of such changes based on a heuristic inference of the impact of the change. As a result, not only do we get a rich, purely analytic characterization of the change history of the NCIt, but also we generate a realistic test corpus for incremental classification.
Date of Conference: 27-30 June 2011