A large-scale study of failures in high-performance computing systems | IEEE Conference Publication | IEEE Xplore