By Topic

Analysis of system overhead on parallel computers

Sign In

Cookies must be enabled to login.After enabling cookies , please use refresh or reload or ctrl+f5 on the browser for the login options.

Formats Non-Member Member
$33 $13
Learn how you can qualify for the best price for this item!
Become an IEEE Member or Subscribe to
IEEE Xplore for exclusive pricing!
close button

puzzle piece

IEEE membership options for an individual and IEEE Xplore subscriptions for an organization offer the most affordable access to essential journal articles, conference papers, standards, eBooks, and eLearning courses.

Learn more about:

IEEE membership

IEEE Xplore subscriptions

4 Author(s)
R. Gioiosa ; Div. of Comput. & Comput. Sci., Los Alamos Nat. Lab., NM, USA ; F. Petrini ; K. Davis ; F. Lebaillif-Delamare

Ever-increasing demand for computing capability is driving the construction of ever-larger computer clusters, typically comprising commodity compute nodes, ranging in size up to thousands of processors, with each node hosting an instance of the operating system (OS). Recent studies [E. Hendriks (2002), F. Petrini et al. (2003)] have shown that even minimal intrusion by the OS on user applications, e.g. a slowdown of user processes of less than 1.0% on each OS instance, can result in a dramatic performance degradation - 50% or more - when the user applications are executed on thousands of processors. The contribution of this paper is the explication and demonstration by way of a case study, of a methodology for analyzing and evaluating the impact of the system (all software and hardware other than user applications) activity on application performance. Our methodology has three major components: 1) a set of simple benchmarks to quickly measure and identify the impact of intrusive system events; 2) a kernel-level profiling tool Oprofile to characterize all relevant events and their sources; and, 3) a kernel module that provides timing information for in-depth modeling of the frequency and duration of each relevant event and determines which sources have the greatest impact on performance (and are therefore the most important to eliminate). The paper provides a collection of experimental results conducted on a state-of-the-art dual AMD Opteron cluster running GNU/Linux 2.6.5. While our work has been performed on this specific OS, we argue that our contribution readily generalizes to other open source and commercial operating systems.

Published in:

Signal Processing and Information Technology, 2004. Proceedings of the Fourth IEEE International Symposium on

Date of Conference:

18-21 Dec. 2004