Skip to Main Content
High performance computing on a large-scale computational grid is complicated by the heterogeneous computational capabilities of each node, node unavailability, and unreliable network connectivity. Replicating computation on multiple nodes can significantly improve performance by reducing task completion time on a grid's dynamic environment. We develop an analytical model to determine the number of task replicas to meet the performance goals in different computational grid configurations. Furthermore, taking advantage of the statistical nature of grid-based Monte Carlo applications, we extend the computational replication technique to an N-out-of-M scheduling strategy for grid-based Monte Carlo applications, which can potentially form a large category of grid-computing applications. In addition, we establish a corresponding model for the N-out-of-M scheduling mechanism. Simulations are used to validate the computational replication models. Our preliminary results show that the models we use are effective in predicting the required number of replicas to achieve short task completion time with a given high probability.