Skip to Main Content
The master/worker pattern is widely used to construct the cross-domain, large scale computing infrastructure. The applications supported by this kind of infrastructure usually features long-running, speculative execution etc. Fault recovery mechanism is significant to them especially in the wide area network environment, which consists of error prone components. Inter-node cooperation is urgent to make the recovery process more efficient. The traditional log-based rollback recovery mechanism which features independent recovery cannot fulfill the global cooperation requirement due to the waste of bandwidth and slow application data transfer which is caused by the exchange of a large amount of logs. In this paper, we propose a two-phase log-based recovery mechanism which is of merits such as space saving and global optimization and can be used as a complement of the current log-based rollback recovery approach in some specific situations. We have demonstrated the use of this mechanism in the Drug Discovery Grid environment, which is supported by China National Grid. Experiment results have proved efficiency of this mechanism.