Skip to Main Content
This work expands upon our earlier work involving the concept of computation-at-risk (CaR). In particular, CaR refers to the risk that certain computations may not get done within a timely manner. We examine a number of CaR distributions on several large clusters. The important contribution of This work is that it shows that there exist CaR-reducing strategies and by employing such strategies, a facility can significantly reduce the risk of inefficient resource utilization. Grids are shown to be one means for employing a CaR-reducing strategy. For example, we show that a CaR-reducing strategy applied to a common queue can have a dramatic effect on the wait times for jobs on a grid of clusters. In particular, we defined a CaR Sharpe rule that provides a decision rule for determining the best machine in a grid to place a new job.