Loading [a11y]/accessibility-menu.js
Low Overhead Tag Error Mitigation for GPU Architectures | IEEE Conference Publication | IEEE Xplore

Low Overhead Tag Error Mitigation for GPU Architectures


Abstract:

Cache structures on modern GPUs or CPUs occupy a large area and are frequently accessed. This increases their vulnerability to transient errors. With some area and energy...Show More

Abstract:

Cache structures on modern GPUs or CPUs occupy a large area and are frequently accessed. This increases their vulnerability to transient errors. With some area and energy overhead, these structures are often protected by ECC or parity checking. However, in deference to the energy efficiency and scalability challenges in high-performance computing, it is crucial to minimize any unnecessary overhead while maintaining the desired reliability. This paper evaluates the reliability of unprotected tag SRAM structures in modern GPUs, and studies the use of a low-overhead tag error mitigation mechanism. The proposed mechanism exploits Galois-based hash functions for set-index calculation to mitigate some pathological address strides that cause false hit events. Extensive analysis on a modern GPU indicates that the hash-based mechanism yields 10x reduction in false hit probability (with 2% improvement in hit rate) for write-through data caches when compared to a baseline cache indexing scheme.
Date of Conference: 25-28 June 2018
Date Added to IEEE Xplore: 23 July 2018
ISBN Information:
Electronic ISSN: 2158-3927
Conference Location: Luxembourg, Luxembourg

Contact IEEE to Subscribe

References

References is not available for this document.