Skip to Main Content
MapReduce has recently gained a lot of attention as a parallel programming model for scalable data-intensive business and scientific analysis. In order to benefit from this powerful programming model in a scientific workflow environment, we propose a MapReduce-enabled scientific workflow composition framework consisting of: i) a dataflow based scientific workflow model that separates the declaration of the workflow interface from the definition of its functional body; ii) a set of dataflow constructs, including Map, Reduce, Loop, and Conditional, and their composition semantics to enable MapReduce-style scientific workflows; iii) an XML-based scientific workflow specification language, called WSL, in which both Map and Reduce are fully composable with other dataflow constructs in both flat and hierarchical manners. Besides leveraging the power of MapReduce to the workflow level, our workflow composition framework is unique in that workflows are the only operands for composition; in this way, our approach elegantly solves the two-world problem of existing composition frameworks, in which composition needs to deal with both the world of tasks and the world of workflows. The proposed framework is implemented and a case study is conducted to validate our techniques.
Date of Conference: 6-10 July 2009