By Topic

A Transparent Control-Flow Based Approach to Record-Replay Non-deterministic Bugs

Sign In

Cookies must be enabled to login.After enabling cookies , please use refresh or reload or ctrl+f5 on the browser for the login options.

Formats Non-Member Member
$33 $13
Learn how you can qualify for the best price for this item!
Become an IEEE Member or Subscribe to
IEEE Xplore for exclusive pricing!
close button

puzzle piece

IEEE membership options for an individual and IEEE Xplore subscriptions for an organization offer the most affordable access to essential journal articles, conference papers, standards, eBooks, and eLearning courses.

Learn more about:

IEEE membership

IEEE Xplore subscriptions

3 Author(s)
Nan Wang ; Inst. of Comput. Technol., Beijing, China ; Jizhong Han ; Jinyun Fang

Record-replay is effective to reproduce non-deterministic bugs, and has gained attentions in research community. However, current approaches fall short of handling nondeterministic bugs in multi-processor platforms and distributed systems due to several reasons. First, multi-thread programs on multi-processor platforms, which are common in today's distributed systems, are difficult to be recorded and replayed because of data-races. Second, increasing systems scale makes production environment more sensitive to perturbation from recording. Even hacking control scripts has been unacceptable because of the boosting complexity comes from variety of programs and large number of computing cores. Third, when deployed in distributed systems, large scale will also multiply recording traces, which overwhelms developers, and also slows down the whole system dramatically. To address the above issues, we propose following mechanisms to efficiently record-reply in multi-processor distributed systems: control-flow based record-replay, low-perturbation loading and proportion sampling. We have implemented these mechanisms in ReBranch -- a practical record-replay system for debugging multi-thread programs in multi-processor platforms and distributed systems. ReBranch has already shown its power on dealing with real bugs. We also present our debugging experiences using ReBranch with a case study on handling a bug in memcached -- an important component in many commercial systems.

Published in:

Networking, Architecture and Storage (NAS), 2012 IEEE 7th International Conference on

Date of Conference:

28-30 June 2012