Journals & Magazines >IEEE Transactions on Computers >Volume: 74 Issue: 6

nDirect2: A High-Performance Library for Direct Convolutions on Multicore CPUs

Download PDF
Download References
Request Permissions
Save to
Alerts

Abstract:

Convolution kernels are widely seen in high-performance computing (HPC) and deep learning (DL) workloads and are often responsible for performance bottlenecks. Prior work...Show More

Metadata

Abstract:

Convolution kernels are widely seen in high-performance computing (HPC) and deep learning (DL) workloads and are often responsible for performance bottlenecks. Prior works have demonstrated that the direct convolution approach can outperform the conventional convolution implementation. Although well-studied, the existing approaches for direct convolution are either incompatible with the mainstream DL data layouts or lead to suboptimal performance. We design nDirect2, a novel direct convolution approach that targets multi-core CPUs commonly found in smartphones and HPC systems. nDirect2 is compatible with the data layout formats used by mainstream DL frameworks and offers new optimizations for the computational kernel, data packing, advanced operator fusion, and parallelization. We evaluate nDirect2 by applying it to representative convolution kernels and demonstrating how well it performs on four distinct ARM-based CPUs and an X86-based CPU. Experimental results show that nDirect2 outperforms four state-of-the-art convolution approaches across most evaluation cases and hardware architectures.

Published in: IEEE Transactions on Computers ( Volume: 74, Issue: 6, June 2025)

Page(s): 1829 - 1843

Date of Publication: 19 February 2025

ISSN Information:

DOI: 10.1109/TC.2025.3543677

Funding Agency:

No metrics found for this document.

Contents

No metrics found for this document.

References is not available for this document.

nDirect2: A High-Performance Library for Direct Convolutions on Multicore CPUs

Abstract:

Metadata

Abstract:

ISSN Information:

Funding Agency:

References

IEEE Account

Purchase Details

Profile Information

Need Help?

nDirect2: A High-Performance Library for Direct Convolutions on Multicore CPUs

Alerts

Abstract:

Metadata

Abstract:

ISSN Information:

Funding Agency:

Authors

Figures

References

Keywords

Metrics

Footnotes

References

IEEE Account

Purchase Details

Profile Information

Need Help?