Loading [MathJax]/extensions/MathMenu.js
Ventus: A High-performance Open-source GPGPU Based on RISC-V and Its Vector Extension | IEEE Conference Publication | IEEE Xplore

Ventus: A High-performance Open-source GPGPU Based on RISC-V and Its Vector Extension


Abstract:

General-purpose Graphics Processing Unit (G PG PU) has become the most popular platform for accelerating modern applications such as Large Language Models and Generative ...Show More

Abstract:

General-purpose Graphics Processing Unit (G PG PU) has become the most popular platform for accelerating modern applications such as Large Language Models and Generative AI, while the lack of advanced open-source hardware micro architectures restricts the high-performance GPGPU research. In this work, we propose Ventus, a high-performance open-source GPGPU based on RISC- V with Vector Extension (RVV). Customized instructions and a holistic software toolchain are implemented to achieve high performance. Ventus is successfully deployed on an FPGA platform consisting of 4 Xilinx VU19P, scaling up to 16 Streaming Multiprocessors (SMs) with 256 warps. Results imply that Ventus possesses critical features of commercial GPGPUs and has achieved an average reduction of 83.9% in instruction count and 87.4% in CPI over the state-of-the-art open-source implementation. Ventus can be found on Github (https://github.com/THU-DSP-LAB/ventus-gpgpu).
Date of Conference: 18-20 November 2024
Date Added to IEEE Xplore: 02 January 2025
ISBN Information:

ISSN Information:

Conference Location: Milan, Italy

Funding Agency:


I. Introduction

In recent years, General-purpose Graphics Processing Units (GPGPUs) have become increasingly popular in the field of artificial intelligence such as Large Language Models and Generative AI. To maximize GPGPU's parallelism and ease its programming, specialized programming models and libraries have been developed, such as CUDA [1] and OpenCL [2]. In these programming models, programmers describe the behavior of a single work item (OpenCL terminology)/thread (CUDA terminology) and let the software stack generate instructions executing on the hardware in parallel.

Contact IEEE to Subscribe

References

References is not available for this document.