Loading [MathJax]/extensions/MathMenu.js
A Year in the Life of a Parallel File System | IEEE Conference Publication | IEEE Xplore

A Year in the Life of a Parallel File System


Abstract:

I/O performance is a critical aspect of data-intensive scientific computing. We seek to advance the state of the practice in understanding and diagnosing I/O performance ...Show More

Abstract:

I/O performance is a critical aspect of data-intensive scientific computing. We seek to advance the state of the practice in understanding and diagnosing I/O performance issues through investigation of a comprehensive I/O performance data set that captures a full year of production storage activity at two leadership-scale computing facilities. We demonstrate techniques to identify regions of interest, perform focused investigations of both long-term trends and transient anomalies, and uncover the contributing factors that lead to performance fluctuation. We find that a year in the life of a parallel file system is comprised of distinct regions of long-term performance variation in addition to short-term performance transients. We demonstrate how systematic identification of these performance regions, combined with comprehensive analysis, allows us to isolate the factors contributing to different performance maladies at different time scales. From this, we present specific lessons learned and important considerations for HPC storage practitioners.
Date of Conference: 11-16 November 2018
Date Added to IEEE Xplore: 14 March 2019
ISBN Information:
Conference Location: Dallas, TX, USA

Contact IEEE to Subscribe

References

References is not available for this document.