Skip to Main Content
High availability systems typically rely on redundant components and functionality to achieve fault detection, isolation and fail over. In the future, increases in error rates will make high availability important even in the commodity and volume market. Systems will be built out of chip multiprocessors (CMPs) with multiple identical components that can be configured to provide redundancy for high availability. However, the 100% overhead of making all components redundant is going to be unacceptable for the commodity market, especially when all applications might not require high availability. In particular, duplicating the entire memory like the current high availability systems (e.g. NonStop and Stratus) do is particularly problematic given the fact that system costs are going to be dominated by the cost of memory. In this paper, we propose a novel technique called a duplication cache to reduce the overhead of memory duplication in CMP-based high availability systems. A duplication cache is a reserved area of main memory that holds copies of pages belonging to the current write working set (set of actively modified pages) of running processes. All other pages are marked as read-only and are kept only as a single, shared copy. The size of the duplication cache can be configured dynamically at runtime and allows system designers to trade off the cost of memory duplication with minor performance overhead. We extensively analyze the effectiveness of our duplication cache technique and show that for a range of benchmarks memory duplication can be reduced by 60-90% with performance degradation ranging from 1-12%. On average, a duplication cache can reduce memory duplication by 60% for a performance overhead of 4% and by 90% for a performance overhead of 5%.