By Topic

Investigating resilient high performance reconfigurable computing with minimally-invasive system monitoring

Sign In

Cookies must be enabled to login.After enabling cookies , please use refresh or reload or ctrl+f5 on the browser for the login options.

Formats Non-Member Member
$31 $13
Learn how you can qualify for the best price for this item!
Become an IEEE Member or Subscribe to
IEEE Xplore for exclusive pricing!
close button

puzzle piece

IEEE membership options for an individual and IEEE Xplore subscriptions for an organization offer the most affordable access to essential journal articles, conference papers, standards, eBooks, and eLearning courses.

Learn more about:

IEEE membership

IEEE Xplore subscriptions

4 Author(s)
Bin Huang ; Dept. of Electr. & Comput. Eng., Univ. of North Carolina at Charlotte, Charlotte, NC, USA ; Schmidt, A.G. ; Mendon, A.A. ; Sass, R.

As researchers push for Exascale computing, one of the emerging challenges is system resilience. Unlike fault-tolerance which corrects errors, recent reports suggest that resilient systems will need to continue to make progress on an application despite faults. A first step in developing a resilient system is to have robust, scalable system monitoring. The work described here presents a novel, minimally-invasive system monitor that operates over a separate network. We analytically characterize the performance for an arbitrary set of nodes and demonstrate a working implementation of the design. We argue that the hardware approach is inherently superior to the ad hoc, software techniques currently employed in practice.

Published in:

High-Performance Reconfigurable Computing Technology and Applications ( HPRCTA), 2010 Fourth International Workshop on

Date of Conference:

14-14 Nov. 2010