Skip to Main Content
Petascale computing requires I/O subsystems that can keep up with the dramatic computing power demanded by such systems. TOP500.org ranks top computers based on their peak compute performance, but there has not been adequate investigation of the current state-of-the-art and future requirements of storage area networks that support petascale computers. Dependable scaling of an I/O subsystem to support petascale computing is not as simple as adding more storage servers. In this paper, we present a stochastic activity network model that uses failure rates computed from real logs to predict the reliability and availability of the storage architecture of the Abe cluster at the National Center for Supercomputing Applications (NCSA). We then use the model to evaluate the challenges encountered as one scales the number of storage servers to support petascale computing. The results present new insights regarding the dependability challenges that will be encountered when building next-generation petabyte storage. Furthermore, we provide insight into a new design approach that will enable system designers to integrate the trace-based analysis of parameter values from real system data into their stochastic models to allow informed design choices.