Abstract:
Recently, the size of deep learning models has significantly increased, making the excessive memory access between the AI processor and DRAM a major bottleneck of the sys...Show MoreMetadata
Abstract:
Recently, the size of deep learning models has significantly increased, making the excessive memory access between the AI processor and DRAM a major bottleneck of the system. The processing-in-DRAM (DRAM-PIM) concept has emerged as a promising solution, which integrates computing logic within memory, thus saving abundant access to external memory. Although many simulators have been proposed to model and analyze the benefits of DRAM-PIM, they are often too slow to run an entire application. FPGA-based emulators have been introduced to overcome this limitation. However, none of the prior works include the full software stack from the model to DRAM-PIM hardware. This paper presents a full-stack processing-in-DRAM emulation framework named PRIMO, the first emulation framework that can model and analyze DRAM-PIM for end-to-end ML inference. PRIMO enables software developers to develop and test their customized software stacks on various ML workloads without requiring a real DRAM-PIM chip. Moreover, it allows designers to explore design space and monitor memory access patterns, facilitating software and hardware co-design for efficient DRAM-PIM architectures. To achieve these goals, we develop a real-time FPGA emulator that emulates DRAM-PIM architecture and generates experimental results such as predicted cycle information and computed output at incomparably high speeds compared to the CPU-based simulation. In addition, we propose a software stack comprising a PIM compiler that enables the execution of various ML workloads, including end-to-end inference, and a PIM driver that runs the workloads with high bandwidth utilization by leveraging virtual memory scatter-gather DMA. Finally, we demonstrate that PRIMO can successfully emulate DRAM-PIM 106.64-6093.56× faster than the CPU-based simulation framework for ML workloads ranging from small microbenchmarks to end-to-end inference of ResNets.
Date of Conference: 28 October 2023 - 02 November 2023
Date Added to IEEE Xplore: 30 November 2023
ISBN Information: