By Topic

Process-replication technique for fault-tolerance and performance improvement in distributed computing systems

Sign In

Cookies must be enabled to login.After enabling cookies , please use refresh or reload or ctrl+f5 on the browser for the login options.

Formats Non-Member Member
$31 $13
Learn how you can qualify for the best price for this item!
Become an IEEE Member or Subscribe to
IEEE Xplore for exclusive pricing!
close button

puzzle piece

IEEE membership options for an individual and IEEE Xplore subscriptions for an organization offer the most affordable access to essential journal articles, conference papers, standards, eBooks, and eLearning courses.

Learn more about:

IEEE membership

IEEE Xplore subscriptions

2 Author(s)
Chiu, J.-F. ; Dept. of Electron. Eng. & Technol., Nat. Taiwan Inst. of Technol., Taipei, Taiwan ; Ge-Ming Chiu

The paper presents a process-replication protocol which aims at providing fault-tolerance as well as performance improvement to applications such as long-running and real-time tasks. Identical delivering order of messages are enforced on all replicas of a troupe using multicasts for inter- and intra-troupe communication. The detailed design of the protocol is given in the paper. The protocol is self-contained in the sense that crashes in a troupe are handled internally without affecting the operation of other troupes. The crash-handling procedure is simple and associated overhead during fail-free operation is small. The protocol takes advantages of the redundancy of processes to expedite the completion of a distributed task by speeding up the determination of message sequences and transmission of outgoing data messages at the expense of small control messages. Simulation is carried out to show the performance improvement

Published in:

High Performance Distributed Computing, 1994., Proceedings of the Third IEEE International Symposium on

Date of Conference:

2-5 Aug 1994