Scheduled System Maintenance on May 29th, 2015:
IEEE Xplore will be upgraded between 11:00 AM and 10:00 PM EDT. During this time there may be intermittent impact on performance. We apologize for any inconvenience.
By Topic

Support for software interrupts in log-based rollback-recovery

Sign In

Cookies must be enabled to login.After enabling cookies , please use refresh or reload or ctrl+f5 on the browser for the login options.

Formats Non-Member Member
$31 $13
Learn how you can qualify for the best price for this item!
Become an IEEE Member or Subscribe to
IEEE Xplore for exclusive pricing!
close button

puzzle piece

IEEE membership options for an individual and IEEE Xplore subscriptions for an organization offer the most affordable access to essential journal articles, conference papers, standards, eBooks, and eLearning courses.

Learn more about:

IEEE membership

IEEE Xplore subscriptions

2 Author(s)
Slye, J.H. ; AT&T Bell Labs., Murray Hill, NJ, USA ; Elnozahy, E.N.

The piecewise deterministic execution model is a fundamental assumption in many log-based rollback-recovery protocols. Process execution in this model consists of intervals, each starting with the receipt of a message at an application-defined execution point. Execution within each interval is deterministic and messages are the only source of nondeterminism that affects the computation. This simple model excludes the nondeterminism that results when asynchronous signals or interrupts occur at arbitrary execution points. As a result, a wide range of applications cannot use log-based rollback-recovery in practice. We present a solution that removes this restriction and allows applications to replay interrupts at the same execution points during recovery. The solution relies on using a software counter to compute the number of instructions between the asynchronous signals during normal operation. Should a failure occur, the instruction counts are used to force the replay of these signals at the same execution points. The execution of the application thus can be replayed to recreate the prefailure state while accommodating nondeterminism due to asynchronous signals. We then use the deterministic replay of interrupts to solve another problem, namely tracking nondeterminism due to interleaved shared memory access in multithreaded applications on a single processor. We use the instruction counter solution to implement a user-level thread package in which thread scheduling decisions can be replayed if a failure occurs. By repeating the scheduling decisions during an execution replay, threads access the shared memory in the same order and the execution to be reconstructed. This technique allows multithreaded applications to use log-based rollback-recovery with low overhead, which was not previously possible. We carried out two prototype implementations that have shown the overhead is no more than a 6 percent slowdown in application execution on the DEC Alpha, and from 6 percent to 18 percent on the Inter Pentium. Thus, restrictions of the piecewise deterministic execution model can be lifted at a reasonable cost

Published in:

Computers, IEEE Transactions on  (Volume:47 ,  Issue: 10 )