Skip to Main Content
Provenance plays a fundamental role in e-science to keep track of the data processing execution, evaluate the data quality, reproduce the analysis results, and especially share and re-use workflows. How to take full advantage of provenance to help scientists discover, match and select scientific workflows is a challenging work. Although some studies have been done to model, store, and query scientific workflows, little is done to build practical systems to support workflow matching and discovery. In this paper, we devise and implement a Provenance-Based Workflow Matching and Discovery System (PWMDS) for task-based pipelines in a proteomics data analysis platform called CoPExplorer to address the above challenge. With the proposed novel provenance model and workflow matching & discovery algorithms, PWMDS can provide scientists a ranked list of suitable service candidates for their specified workflows, and initial experiments demonstrate its effectiveness.