Skip to Main Content
Replication is a key technique for improving fault tolerance but can introduce considerable performance overhead under some circumstances. To explore the tradeoff between performance and failure resilience, we develop a calculus that takes into consideration the I/O characteristics of applications and failure behavior of distributed storage nodes. With the developed evaluation model, we then prescribe a file system replication strategy that maximizes the utilization of computational resources for long-running and compute-intensive grid applications.
Date of Conference: 19-22 May 2008