Skip to Main Content
This paper considers the reliability of software Distributed Shared Memory systems where the unit of sharing is a persistent read-write object. We present art extended coherence protocol for causal consistency model, which integrates replication management with independent checkpointing. It uses a trove! coordinated burst checkpoint operation in order to replicate consistent checkpoints of shared objects in local memory of distinct system nodes. No special reliable hardware devices are required. The protocol offers high availability of shared objects with limited overhead and ensures fast recovery in case of multiple node failures. lit case of the network partitioning all the processes in a majority partition of the system can continuously access all the objects.