Skip to Main Content
Most scientists conduct analyses and run models in several different software and hardware environments, mentally coordinating the export and import of data from one environment to another. The Kepler scientific workflow system provides domain scientists with an easy-to-use yet powerful system for capturing scientific workflows (SWFs). SWFs are a formalization of the ad-hoc process that a scientist may go through to get from raw data to publishable results. Kepler attempts to streamline the workflow creation and execution process so that scientists can design, execute, monitor, re-run, and communicate analytical procedures repeatedly with minimal effort. Kepler is unique in that it seamlessly combines high-level workflow design with execution and runtime interaction, access to local and remote data, and local and remote service invocation. SWFs are superficially similar to business process workflows but have several challenges not present in the business workflow scenario. For example, they often operate on large, complex and heterogeneous data, can be computationally intensive and produce complex derived data products that may be archived for use in reparameterized runs or other workflows. Moreover, unlike business workflows, SWFs are often dataflow-oriented as witnessed by a number of recent academic systems (e.g., DiscoveryNet, Taverna and Triana) and commercial systems (Scitegic/Pipeline-Pilot, Inforsense). In a sense, SWFs are often closer to signal-processing and data streaming applications than they are to control-oriented business workflow applications.