In the past decade, the use of distributed algorithms to model simulations is considerably increased, in order to gain speedup over traditional sequential simulations. Also, there has been much interest in using inexpensive, powerful workstation nets, with high speed interconnection, instead of expensive parallel computers. In this paper, we briefly present the kernel of a distributed system (PV2M) implemented on top of PVM routines, where synchronization is based on the concept of Virtual Time. Special emphasis is given to the fault tolerant mechanisms provided in it. PV2M implements a checkpoint-restart mechanism, with respect to processes located on non master hosts, in such a way as to be 1-resilient with respect to failures occurring to these hosts
Published in:
Parallel and Distributed Processing, 1995. Proceedings. Euromicro Workshop on
Date of Conference: 25-27 Jan 1995