Skip to Main Content
This paper presents a new concurrent checkpoint mechanism that allows the checkpointed process to run without stopping while checkpoints are set. The checkpointed process can keep running until a memory access request is captured by tracing TLB misses while dumping memory pages (the most time-consuming step when setting a checkpoint). At that time, the checkpointer in the kernel will copy the memory access target page to the designated memory buffer for constructing a consistent state of the checkpointed process, and then resume the memory access. From the experimental results, in contrast to non-concurrent checkpoint techniques, this mechanism can reduce the downtime time of the checkpointed process by 47.4% - 89.8% to ensure concurrency between setting a checkpoint and execution of the checkpointed process. In addition, compared with a traditional concurrent checkpoint system, this mechanism saves more than 2.2% of the checkpoint time and decreases the downtime of the checkpointed process by more than 10%.