Skip to Main Content
Systems Biology research, even more than many other scientific domains, is becoming increasingly data-intensive. Not only have advances in experimental and computational technologies lead to an exponential increase in scientific data volumes and their complexity, but increasingly such databases are providing the basis for new scientific discoveries. To engage effectively with these community resources, integrated analyses, synthesis and simulation software is needed, supported by scientific workflows. In order to provide a more collaborative, community driven research environment for this heterogeneous setting, the Department of Energy (DOE) has decided to develop a federated, cloud based cyber infrastructure the Systems Biology Knowledgebase (Kbase). In this context the Pacific Northwest National Laboratory (PNNL) has been defining and testing the basic federated cloud-based system architecture and developed a prototype implementation. Community wide accessibility of biological data and the capability to integrate and analyze this data within its changing research context were seen as key technical functionalities the Kbase needs to enable. In this paper we describe the results of our investigations into the design of this cloud based federated infrastructure for: 1) Semantics driven data discovery, access and integration 2) Data annotation, publication and sharing 3) Workflow enabled data analysis 4) Project based collaborative working We describe our approach, exemplary use cases and our prototype implementation that demonstrates the feasibility of this approach.