Skip to Main Content
Biomolecular simulations produce more output data than can be managed effectively by traditional computing systems. Researchers need distributed systems that allow the pooling of resources, the sharing of simulation data, and the reliable publication of both tentative and final results. To address this need, we have designed GEMS, a system that enables biomolecular researchers to store, search, and share large scale simulation data. The primary design problem is striking a balance between generosity and gluttony. On one hand, storage providers wish to be generous and share resources with their collaborators. On the other hand, an unchecked data producer can be gluttonous and easily replicate data unnecessarily until it fills all available space. To balance generosity and gluttony, GEMS allows both storage providers and data producers to state and enforce policies on the consumption of storage and the replication of data. By taking advantage of known properties of simulation data, the system is able to distinguish between high value final results that must be preserved and low value intermediate results that can be deleted and regenerated if necessary. We have built a prototype of GEMS on a cluster of workstations and demonstrate its ability to store new data, to replicate within policy limits, and to recover from failures.