Skip to Main Content
The study of macromolecular protein structures at an atomic resolution is the source of many data and compute intensive challenges, from simulation, to image processing, to model building. We have developed a general platform for the secure deployment of structural biology computational tasks and workflows into a federated grid which maximizes robustness, ease of use, and performance, while minimizing data movement. This platform leverages several existing grid technologies for security and web-based data access, adding protocols for VO, user, task, workflow, and individual job data staging. We present the strategies used to deploy and maintain tens of GB of data and applications to a significant portion of the US Open Science Grid, and the workflow management mechanisms to optimize task execution, both for performance and correctness. Significant observations are made about real operating conditions in a grid environment from automated analysis of hundreds of thousands of jobs over extended periods. We specifically focus on one novel application which harnesses the capacity of national cyberinfrastructure to dramatically accelerate the process of protein structure determination. This workflow requires 20 - 50 thousand hours to compute with 1e5 tasks, requiring tens of GB of input data, and producing commensurate output. We demonstrate the success of our platform through the successful completion of this workflow in half a day using Open Science Grid.