Skip to Main Content
Systems biology is characterized by a large community of scientists who use a wide variety of fragmented and competing data sets and computational tools of all scales to support their research. In order to provide a more coherent computational environment for systems biology, we are working as part of the Department of Energy Systems Biology Knowledgebase (Kbase) project to define a federated cloud-based system architecture. The Kbase will eventually host massive amounts of biological data, provide high performance and scalable computational resources, and support a large user community with tools and services to enable them to utilize the Kbase resources. In this paper, we describe the results of our investigations into the design of a workflow infrastructure suitable for use in the Kbase. The approach utilizes standards-based workflow description and open source integration technologies, and incorporates a data aware workflow execution layer for exploiting data locality in the federated architecture. We describe a use case and the initial prototype implementation we have built that demonstrates the feasibility of our approach.