Fault tolerant memory design for HW/SW co-reliability in massively parallel computing systems | IEEE Conference Publication | IEEE Xplore