Conferences >2023 IEEE/ACM International C...

PRIMO: A Full-Stack Processing-in-DRAM Emulation Framework for Machine Learning Workloads

Download PDF
Download References
Request Permissions
Save to
Alerts

Abstract:

Recently, the size of deep learning models has significantly increased, making the excessive memory access between the AI processor and DRAM a major bottleneck of the sys...Show More

Metadata

Abstract:

Recently, the size of deep learning models has significantly increased, making the excessive memory access between the AI processor and DRAM a major bottleneck of the system. The processing-in-DRAM (DRAM-PIM) concept has emerged as a promising solution, which integrates computing logic within memory, thus saving abundant access to external memory. Although many simulators have been proposed to model and analyze the benefits of DRAM-PIM, they are often too slow to run an entire application. FPGA-based emulators have been introduced to overcome this limitation. However, none of the prior works include the full software stack from the model to DRAM-PIM hardware. This paper presents a full-stack processing-in-DRAM emulation framework named PRIMO, the first emulation framework that can model and analyze DRAM-PIM for end-to-end ML inference. PRIMO enables software developers to develop and test their customized software stacks on various ML workloads without requiring a real DRAM-PIM chip. Moreover, it allows designers to explore design space and monitor memory access patterns, facilitating software and hardware co-design for efficient DRAM-PIM architectures. To achieve these goals, we develop a real-time FPGA emulator that emulates DRAM-PIM architecture and generates experimental results such as predicted cycle information and computed output at incomparably high speeds compared to the CPU-based simulation. In addition, we propose a software stack comprising a PIM compiler that enables the execution of various ML workloads, including end-to-end inference, and a PIM driver that runs the workloads with high bandwidth utilization by leveraging virtual memory scatter-gather DMA. Finally, we demonstrate that PRIMO can successfully emulate DRAM-PIM 106.64-6093.56× faster than the CPU-based simulation framework for ML workloads ranging from small microbenchmarks to end-to-end inference of ResNets.

Published in: 2023 IEEE/ACM International Conference on Computer Aided Design (ICCAD)

Date of Conference: 28 October 2023 - 02 November 2023

Date Added to IEEE Xplore: 30 November 2023

ISBN Information:

ISSN Information:

DOI: 10.1109/ICCAD57390.2023.10323637

Conference Location: San Francisco, CA, USA

Contents

References is not available for this document.

PRIMO: A Full-Stack Processing-in-DRAM Emulation Framework for Machine Learning Workloads

Abstract:

Metadata

Abstract:

ISSN Information:

References

IEEE Account

Purchase Details

Profile Information

Need Help?

PRIMO: A Full-Stack Processing-in-DRAM Emulation Framework for Machine Learning Workloads

Alerts

Abstract:

Metadata

Abstract:

ISSN Information:

References

IEEE Account

Purchase Details

Profile Information

Need Help?