By Topic

Distributed Storage and Querying Techniques for a Semantic Web of Scientific Workflow Provenance

Sign In

Cookies must be enabled to login.After enabling cookies , please use refresh or reload or ctrl+f5 on the browser for the login options.

Formats Non-Member Member
$33 $13
Learn how you can qualify for the best price for this item!
Become an IEEE Member or Subscribe to
IEEE Xplore for exclusive pricing!
close button

puzzle piece

IEEE membership options for an individual and IEEE Xplore subscriptions for an organization offer the most affordable access to essential journal articles, conference papers, standards, eBooks, and eLearning courses.

Learn more about:

IEEE membership

IEEE Xplore subscriptions

5 Author(s)
John Abraham ; Dept. of Comput. Sci., Univ. of Texas Pan-American, Edinburg, TX, USA ; Pearl Brazier ; Artem Chebotko ; Jaime Navarro
more authors

In scientific workflow environments, scientific discovery reproducibility, result interpretation, and problem diagnosis primarily depend on provenance, which records the history of an in-silico experiment. Resource Description Framework is frequently used to represent provenance based on vocabularies such as the Open Provenance Model. For complex scientific workflows that generate large amounts of RDF triples, single-machine provenance management becomes inadequate over time. In this paper, we research how HBase Bigtable-like capabilities can be leveraged for distributed storage and querying of provenance data represented in RDF. In particular, we architect the ProvBase system that incorporates an HBase/Hadoop backend, propose a storage schema to hold provenance triples, and design querying algorithms to evaluate SPARQL queries in the system. Using the Third Provenance Challenge queries, we conduct an experimental study to show the feasibility of our approach.

Published in:

Services Computing (SCC), 2010 IEEE International Conference on

Date of Conference:

5-10 July 2010