By Topic

Optimizing Distributed Architectures to Improve Performance on Checkpointing Applications

Sign In

Cookies must be enabled to login.After enabling cookies , please use refresh or reload or ctrl+f5 on the browser for the login options.

Formats Non-Member Member
$33 $13
Learn how you can qualify for the best price for this item!
Become an IEEE Member or Subscribe to
IEEE Xplore for exclusive pricing!
close button

puzzle piece

IEEE membership options for an individual and IEEE Xplore subscriptions for an organization offer the most affordable access to essential journal articles, conference papers, standards, eBooks, and eLearning courses.

Learn more about:

IEEE membership

IEEE Xplore subscriptions

5 Author(s)
Nunez, A. ; Comput. Sci. Dept., Univ. Carlos III de Madrid, Leganés, Spain ; Fernandez, J. ; Carretero, J. ; Prada, L.
more authors

Nowadays, satisfying the global throughput targets of each application in High Performance Computing systems is a difficult task because of the high number of architectural configurations having a considerable impact on the overall system performance, such as the number of storage servers, features of the communication links, number of CPU cores per node, etc. In this paper we have performed a thorough study of the compared performance of scaling up HPC cluster architectures using a checkpointing application model. This study is specifically focused on multi-core HPC clusters and the scaling process is oriented towards the three main resources: computing power, communications and storage. The main goal of this work is to evaluate and analyze how evolves both scalability and bottlenecks existent on different HPC multi-core architectures using different architectural configurations. In order to achieve this goal, a set of simulation experiments has been achieved using a simulation framework, called SIMCAN, specifically designed for modeling and simulating HPC architectures. The results obtained show that the computing power is well suited thanks to the multi-core processors, while the problems are found on the storage and on the communications channels, being the storage network the main bottleneck.

Published in:

High Performance Computing and Communications (HPCC), 2011 IEEE 13th International Conference on

Date of Conference:

2-4 Sept. 2011