Skip to Main Content
Experimental evaluation is an important way to assess distributed systems, and fault injection is the dominant technique in this area for the evaluation of a system's dependability. For distributed systems, network failure is an important fault model. Physical network failures often have far-reaching effects, giving rise to multiple correlated failures as seen by higher-level protocols. This paper presents an experimental evaluation, using the Loki fault injector, which provides insight into the impact that correlated network partitions have on the Coda distributed file system. In this evaluation, Loki created a network partition between two Coda file servers, during which updates were made at each server to the same replicated data volume. Upon repair of the partition, a client requested directory resolution to converge the diverging replicas. At various stages of the resolution, Loki invoked a second correlated network partition, thus allowing us to evaluate its impact on the system's correctness, performance, and availability.