Skip to Main Content
In order to gain better cost-effectiveness, current large-scale storage systems are typically built up by thousands of individual components. As systems scale up, the probability of the failure of multiple components increases. And for large-scale storage system, failures are normal rather than exception. How to build file systems providing both high throughput and highly available service under such circumstances is a big challenge. We have designed and implemented HA-DCFS3, a highly available cluster file system prototype. It uses a scalable replication algorithm called asynchronous primary copy protocol (APCP). Unlike traditional primary copy protocol that must synchronize updates to all replicas, APCP introduces an asynchronous approach where write operation is permitted to be synchronized to a subset of replicas. This flexible approach greatly improves the write performance. Furthermore, HA-DCFS3 also introduces a fine-grained failure detection called Â¿ data path detectionÂ¿, which is integrated into the fault-tolerant framework based on data replication. Hence, HA-DCFS3 can provide continuous service even when component failures occur. And finally, HA-DCFS3 adopts a two-level data recovery strategy that handles transient failures with reintegration and persistent failures with re-replication respectively to reduce the cost of data repair. Our performance results show that HA-DCFS3 can deliver high and scalable aggregate performance and provide highly available service at very low cost.