Loading [MathJax]/extensions/MathZoom.js
nDirect2: A High-Performance Library for Direct Convolutions on Multi-Core CPUs | IEEE Journals & Magazine | IEEE Xplore

nDirect2: A High-Performance Library for Direct Convolutions on Multi-Core CPUs

; ; ; ; ;

Abstract:

Convolution kernels are widely seen in high-performance computing (HPC) and deep learning (DL) workloads and are often responsible for performance bottlenecks. Prior work...Show More

Abstract:

Convolution kernels are widely seen in high-performance computing (HPC) and deep learning (DL) workloads and are often responsible for performance bottlenecks. Prior works have demonstrated that the direct convolution approach can outperform the conventional convolution implementation. Although well-studied, the existing approaches for direct convolution are either incompatible with the mainstream deep learning (DL) data layouts or lead to suboptimal performance. We design nDirect2, a novel direct convolution approach that targets multi-core CPUs commonly found in smartphones and HPC systems. nDirect2 is compatible with the data layout formats used by mainstream DL frameworks and offers new optimizations for the computational kernel, data packing, advanced operator fusion, and parallelization. We evaluate nDirect2 by applying it to representative convolution kernels and demonstrating how well it performs on four distinct ARM-based CPUs and an X86-based CPU. Experimental results show that nDirect2 outperforms four state-of-the-art convolution approaches across most evaluation cases and hardware architectures.
Published in: IEEE Transactions on Computers ( Early Access )
Page(s): 1 - 14
Date of Publication: 19 February 2025

ISSN Information:


Contact IEEE to Subscribe