By Topic

Resilience to Various Failures for Read-mostly In-memory Data Structures

Sign In

Cookies must be enabled to login.After enabling cookies , please use refresh or reload or ctrl+f5 on the browser for the login options.

Formats Non-Member Member
$33 $13
Learn how you can qualify for the best price for this item!
Become an IEEE Member or Subscribe to
IEEE Xplore for exclusive pricing!
close button

puzzle piece

IEEE membership options for an individual and IEEE Xplore subscriptions for an organization offer the most affordable access to essential journal articles, conference papers, standards, eBooks, and eLearning courses.

Learn more about:

IEEE membership

IEEE Xplore subscriptions

4 Author(s)
Larry Kaplan ; Cray Inc., Seattle, WA, USA ; Preston Briggs ; Miles Ohlrich ; Will Leslie

As massively parallel processing (MPP) machines and their associated applications become larger, more work on resiliency is needed if those applications are to have a chance of running for significant lengths of time in the face of the expected component failure rates. This paper describes an approach for protecting large read-mostly in-memory data structures from various forms of failures by applying the concept of software erasure-correcting codes. A prototype library for this scheme was implemented on the Cray XMT and applied to a sample application. It is also portable to other global shared memory architectures that meet certain requirements, including the Cray XE.

Published in:

Parallel and Distributed Processing Symposium Workshops & PhD Forum (IPDPSW), 2012 IEEE 26th International

Date of Conference:

21-25 May 2012