The size of datasets generated and analyzed by scientific applications are increasing abruptly. Computing resource configuration required in data intensive science is being changed rapidly as well. Scientists have dynamic computing environment needs, which can be supported by cloud computing. We propose a light-weight cloud-based job management system to execute data analysis in distributed systems. In contrast to the traditional job managers based on scheduler with push mode, light-weight job coordinator is designed to process data request by pull mode in the proposed job management system. The agents in distributed virtual machine instances request the data URLs to be analyzed through the coordinator and dispatch the data from repositories. Using cloud service for provisioning of computing resources offers scalable, effective, and possibly massive computing environment. Hence the designed job management system is suitable for data intensive science requiring high throughput computing resources and reasonable I/O performance for big scientific data. In order to find out requirements and experiment system functions, we applied one of astronomy applications.
Published in:
Utility and Cloud Computing (UCC), 2011 Fourth IEEE International Conference on
Date of Conference: 5-8 Dec. 2011