Skip to Main Content
Large experiments on distributed infrastructures become increasingly complex to manage, in particular to trace all computations that gave origin to a piece of data or an event such as an error. The work presented in this paper describes the design and implementation of an architecture to support experiment provenance and its deployment in the concrete case of a particular e-infrastructure for biosciences. The proposed solution consists of: (a) a data provenance repository to capture scientific experiments and their execution path, (b) a software tool (crawler) that gathers, classifies, links, and stores the information collected from various sources, and (c) a set of user interfaces through which the end-user can access the provenance data, interpret the results, and trace the sources of failure. The approach is based on an OPM-compliant API, PLIER, that is flexible to support future extensions and facilitates interoperability among heterogeneous application systems.