Abstract:
The Data Processing Unit (DPU) (i.e., programmable SmartNICs with System-on-Chip or SoC cores) has emerged as a valuable supplementary resource to the host CPU. The DPU a...Show MoreMetadata
Abstract:
The Data Processing Unit (DPU) (i.e., programmable SmartNICs with System-on-Chip or SoC cores) has emerged as a valuable supplementary resource to the host CPU. The DPU architecture has been attracting significant attention within High-Performance Computing (HPC) and data center clusters due to its advanced capabilities and accelerators, which include a hardware-based data compression engine. This positions the DPU as a prospective tool for accelerating and offloading compression workloads from the hosts, which can potentially speed up data-intensive applications. The convergence of Big Data, HPC, and Machine Learning (ML) systems has rendered large data volumes a major performance bottleneck in message communication and data storage. While compression can boost performance, recent studies reveal that compression techniques (e.g., lossy and lossless) are compute-intensive and time-consuming, particularly with larger data sizes. Consequently, this paper characterizes the performance of three lossy (SZ3) and lossless (DEFLATE and zlib) compression algorithms with seven real-world data sets on the popular NVIDIA’s BlueField DPUs to explore potential opportunities for offloading these workloads from the host. We find that compared to DPU’s SoC cores, DPU’s hardware compression engine can obtain up to 26.8x performance speedup. Furthermore, we discuss the challenges and opportunities associated with employing NVIDIA’s BlueField DPUs to accelerate lossy and lossless compression/decompression workloads. Our research discloses five important takeaways which shed light on future research directions for lossy and lossless compressions on DPUs.
Date of Conference: 23-25 August 2023
Date Added to IEEE Xplore: 20 October 2023
ISBN Information: