Conferences >2024 IEEE 17th International ...

Performance Evaluation of CUDA Parallel Matrix Multiplication using Julia and C++

Download PDF
Download References
Request Permissions
Save to
Alerts

Abstract:

Compute Unified Device Architecture (CUDA) was developed as a GPU parallel programming platform and API, primarily designed for use with C/C++. Over the years, fundamenta...Show More

Metadata

Abstract:

Compute Unified Device Architecture (CUDA) was developed as a GPU parallel programming platform and API, primarily designed for use with C/C++. Over the years, fundamental linear algebra functionalities on CUDA have reached a mature state, and many of these are now accessible on CUDA’s GitHub repository. As other high-level programming languages have begun incorporating CUDA-compatible methods into their libraries, the Julia Programming Language introduced CUDA support in 2021, aiming to offer an abstraction level similar to that of C implementations. However, research has shown that Julia’s linear algebra computations— despite leveraging CUDA for parallelization and computational reduction—have yet to match the execution speed achieved by C implementations. This study uses matrix multiplication as a representative linear algebra computation, given its well-optimized CUDA kernel. Outputs of the study include an NSight report file and an SQLite database, which are analysed using NVIDIA Nsight Systems to assess each kernel's runtime and memory usage for performance evaluation. Findings indicate that Julia’s CUDA kernel invocation has a high runtime overhead, growing at a rate of O(n²), which presents a bottleneck when performing high-throughput computations on square binary matrices. This paper suggests that resolving this issue may involve developing a custom CUDA kernel in Julia that employs a more efficient reduction technique to reduce overhead and enhance performance.

Published in: 2024 IEEE 17th International Symposium on Embedded Multicore/Many-core Systems-on-Chip (MCSoC)

Date of Conference: 16-19 December 2024

Date Added to IEEE Xplore: 03 January 2025

ISBN Information:

ISSN Information:

DOI: 10.1109/MCSoC64144.2024.00064

Conference Location: Kuala Lumpur, Malaysia

Contents

References is not available for this document.

Performance Evaluation of CUDA Parallel Matrix Multiplication using Julia and C++

Abstract:

Metadata

Abstract:

ISSN Information:

References

IEEE Account

Purchase Details

Profile Information

Need Help?

Performance Evaluation of CUDA Parallel Matrix Multiplication using Julia and C++

Alerts

Abstract:

Metadata

Abstract:

ISSN Information:

Authors

Figures

References

Keywords

Metrics

References

IEEE Account

Purchase Details

Profile Information

Need Help?