Modeling and tolerating heterogeneous failures in large parallel systems | IEEE Conference Publication | IEEE Xplore