By Topic

Detecting anomalies in high-performance parallel programs

Sign In

Cookies must be enabled to login.After enabling cookies , please use refresh or reload or ctrl+f5 on the browser for the login options.

Formats Non-Member Member
$31 $13
Learn how you can qualify for the best price for this item!
Become an IEEE Member or Subscribe to
IEEE Xplore for exclusive pricing!
close button

puzzle piece

IEEE membership options for an individual and IEEE Xplore subscriptions for an organization offer the most affordable access to essential journal articles, conference papers, standards, eBooks, and eLearning courses.

Learn more about:

IEEE membership

IEEE Xplore subscriptions

5 Author(s)
Florez, G. ; The Center for Comput. Security Res., Mississippi State Univ., MS, USA ; Zhen Liu ; Bridges, S. ; Vaughn, R.
more authors

Message passing interface (MPI) is an effective programming technique for implementing parallel programs for distributed computation. As these applications run, a number of different types of irregularities can occur including those that result from intrusions, user misbehavior, corrupted data, deadlocks or failure of cluster components. We perform a comparison of different artificial intelligence (AI) techniques that can be used to implement a lightweight monitoring and detection system for parallel applications on a cluster of Linux workstations. We study the accuracy and performance of deterministic and stochastic algorithms when we observe the flow of function library and OS system calls of parallel programs written with MPI. We demonstrate that monitoring of MPI programs can be achieved with high accuracy and in some cases with a 0% false positive rate in real-time, and we show that the added computational load on each node is small. Finally we demonstrate that simple deterministic methods perform poorly when the program flow grows in size and variety, and that more complex methods are required.

Published in:

Information Technology: Coding and Computing, 2004. Proceedings. ITCC 2004. International Conference on  (Volume:2 )

Date of Conference:

5-7 April 2004