Skip to Main Content
Modern scientific computations are usually data intensive, involving large-scale, heterogeneous and structured scientific datasets. Modeling, organizing, and processing scientific data have become key challenges for scientific workflow management systems (SWFMSs). In contrast to business data, which is usually relational and stored in databases, scientific data is often hierarchically organized and collection oriented. Although several data models have been proposed for SWFMSs, none of them provides a formal data model with a set of well-defined operators. In this paper, we take a first step towards formalizing a collection-oriented data model, called collectional data model, to model hierarchical collection oriented scientific data, and a set of well-defined operators to manipulate and query such data. We then apply the collectional data model to VIEW, a dataflow-based scientific workflow composition framework, whose workflow constructs are extended to support collections. We implement our techniques and validate them by a case study in a biological simulation project.