This paper introduces CODA (Computer-aided Ontology Development Architecture), which is both an architecture and an associated framework supporting the transformation of unstructured and semi-structured content into RDF (Resource Description Framework) datasets. The purpose of CODA is to support the entire process that ranges from data extraction and transformation to identity resolution. The final objective is to feed semantic repositories with knowledge extracted from unstructured content. The motivation behind CODA lies in the large effort and design issues required for developing knowledge acquisition systems using content analytics frameworks such as UIMA™ (Unstructured Information Management Architecture) and GATE (General Architecture for Text Engineering). Therefore, CODA extends UIMA with facilities and a powerful language for projection and transformation of UIMA-annotated content into RDF. The proposed platform is oriented towards a wide range of beneficiaries, from semantic applications developers to final users that can easily plug CODA components into compliant desktop tools. We describe and discuss the features of CODA through the article, and we conclude by reporting on the adoption of the CODA framework in the context of a usage scenario, related to knowledge acquisition in the agricultural domain.
Note: The Institute of Electrical and Electronics Engineers, Incorporated is distributing this Article with permission of the International Business Machines Corporation (IBM) who is the exclusive owner. The recipient of this Article may not assign, sublicense, lease, rent or otherwise transfer, reproduce, prepare derivative works, publicly display or perform, or distribute the Article.