Abstract:
The demand for powerful GPUs continues to grow, driven by modern-day applications that require ever increasing computational power and memory bandwidth. Multi-Chip Module...Show MoreMetadata
Abstract:
The demand for powerful GPUs continues to grow, driven by modern-day applications that require ever increasing computational power and memory bandwidth. Multi-Chip Module (MCM) GPUs provide the scalability potential by integrating GPU chiplets on an interposer substrate, however, they are hindered by their GPU-centric design, i.e., off-chip GPU bandwidth is statically (at design time) allocated to local versus remote memory accesses. This paper presents the memory-centric MCM-GPU architecture. By connecting the HBM stacks on the interposer, rather than the GPUs, and by connecting the GPUs to bridges on the interposer network, the full off-chip GPU bandwidth can be dynamically allocated to local and remote memory accesses. Preliminary results demonstrate the potential of the memory-centric architecture offering an average 1.36× (and up to 1.90×) performance improvement over a GPU-centric architecture.
Published in: IEEE Computer Architecture Letters ( Volume: 24, Issue: 1, Jan.-June 2025)
Funding Agency:
Ghent University, Gent, Belgium
Delft University of Technology, Delft, Netherlands
Norwegian University of Science and Technology, Trondheim, Norway
Ghent University, Gent, Belgium
Ghent University, Gent, Belgium
Delft University of Technology, Delft, Netherlands
Norwegian University of Science and Technology, Trondheim, Norway
Ghent University, Gent, Belgium