A Large-Scale Study of Failures in High-Performance Computing Systems | IEEE Journals & Magazine | IEEE Xplore