Scheduled System Maintenance:
On Monday, April 27th, IEEE Xplore will undergo scheduled maintenance from 1:00 PM - 3:00 PM ET (17:00 - 19:00 UTC). No interruption in service is anticipated.
By Topic

A unified approach to fault-tolerance in communication protocols based on recovery procedures

Sign In

Cookies must be enabled to login.After enabling cookies , please use refresh or reload or ctrl+f5 on the browser for the login options.

Formats Non-Member Member
$31 $13
Learn how you can qualify for the best price for this item!
Become an IEEE Member or Subscribe to
IEEE Xplore for exclusive pricing!
close button

puzzle piece

IEEE membership options for an individual and IEEE Xplore subscriptions for an organization offer the most affordable access to essential journal articles, conference papers, standards, eBooks, and eLearning courses.

Learn more about:

IEEE membership

IEEE Xplore subscriptions

2 Author(s)
Agarwal, A. ; Dept. of Electr. & Comput. Eng., Concordia Univ., Montreal, Que., Canada ; Atwood, J.W.

Discusses fault tolerance in computer communication protocols, modeled by communicating finite state machines, by providing an efficient algorithmic procedure for recovery in such systems. Even when the communication network is reliable and maintains the order of messages, any kind of transient error that may not be detected immediately could contaminate the system, resulting in protocol failure. To achieve fault-tolerance, the protocol must be able to detect the error, and then it must recover from that error and eventually reach a legal (or consistent) state, and resume its normal execution. A protocol that possesses the latter feature of recovering and continuing its execution starting from a legal state is also called a self-stabilizing protocol. Our recovery procedure does not require the application of an intrusive checkpointing procedure. The stable storage requirement for each process is less than that required for other proposed recovery procedures. The recovery procedure provides us with a legal protocol state, which is the global state before reaching any illegal state and before the effects of the error make other states illegal. Only a minimal number of processes affected by error propagation are required to rollback. Our recovery procedure can be used to recover from any number of transient errors in the system. Our recovery procedure has also been modeled in PROMELA, a language to describe validation models, which shows the syntactic correctness of our recovery protocol design. Finally, our procedure is compared with the existing approaches of handing the errors, and an illustrative example is provided

Published in:

Networking, IEEE/ACM Transactions on  (Volume:4 ,  Issue: 5 )