Skip to Main Content
Provenance, a record of the derivation history of scientific results, is critical for scientific workflows to support reproducibility, result interpretation, and problem diagnosis. Both prospective provenance, which captures an abstract workflow specification as a recipe for future data derivation, and retrospective provenance, which captures past workflow execution and data derivation information, provide important contextual information for the comprehensive analysis of scientific results. In this paper, we explore and design: i) a provenance model that models both prospective and retrospective provenance as an extension to the Open Provenance Model (OPM), which only models retrospective provenance; ii) a provenance collection framework to collect both prospective and retrospective provenance according to our model; iii) a relational provenance store to store, reason, and query prospective and retrospective provenance, which is captured via the proposed provenance collection framework. An experimental study is performed to show the performance of our provenance store using provenance queries for the Third Provenance Challenge. While most existing systems use an internal proprietary provenance model and develop an import/export facility to convert between the proprietary model and OPM, our provenance collection framework and provenance store feature the native support of OPM.