Loading [MathJax]/extensions/MathMenu.js
Evaluating Machine LearningWorkloads on Memory-Centric Computing Systems | IEEE Conference Publication | IEEE Xplore

Evaluating Machine LearningWorkloads on Memory-Centric Computing Systems


Abstract:

Training machine learning (ML) algorithms is a computationally intensive process, which is frequently memory-bound due to repeatedly accessing large training datasets. As...Show More

Abstract:

Training machine learning (ML) algorithms is a computationally intensive process, which is frequently memory-bound due to repeatedly accessing large training datasets. As a result, processor-centric systems (CPU, GPU) waste large amounts of energy and execution cycles due to the data movement between memory units and processing units. Memory-centric computing systems, i.e., systems with processing-in-memory (PIM) capabilities, can alleviate this data movement bottleneck. Our goal is to understand the potential of general-purpose PIM architectures to accelerate ML training. To do so, we (1) implement several classic ML algorithms (namely, linear regression, logistic regression, decision tree, K-Means clustering) on a real-world generalpurpose PIM architecture, (2) evaluate and characterize them in terms of accuracy, performance and scaling, and (3) compare to their counterpart state-of-the-art implementations on CPU and GPU. Our evaluation on a real memory-centric computing system with more than 2500 PIM cores shows that PIM greatly accelerates memorybound ML workloads, when the necessary operations and datatypes are natively supported by PIM hardware. For example, our PIM implementation of decision tree is 27× faster than the CPU implementation on an 8-core Intel Xeon, and 1.34× faster than the GPU implementation on an NVIDIA A100. Our PIM implementation of K-Means clustering is 2.8× and 3.2× faster than CPU and GPU implementations, respectively. We provide several key observations, takeaways, and recommendations for users of ML workloads, programmers of PIM architectures, and hardware designers and architects of future memory-centric computing systems. We open-source all our code and datasets at https://github.com/CMU-SA FARI/pim-ml.
Date of Conference: 23-25 April 2023
Date Added to IEEE Xplore: 23 June 2023
ISBN Information:
Conference Location: Raleigh, NC, USA

1 Introduction

Machine learning (ML) algorithms [1 –6] have become ubiquitous in many fields of science and technology due to their ability to learn from and improve with experience with minimal human intervention. These algorithms train by updating their model parameters in an iterative manner to improve the overall prediction accuracy. However, training ML algorithms is a computationally intensive process, which requires large amounts of training data [7] –[9]. Accessing training data in current processor-centric systems (e.g., CPU, GPU) requires costly data movement between memory and processors, which results in high energy consumption and a large percentage of the total execution cycles. This data movement can become the bottleneck of the training process, if there is not enough computation and locality to amortize its cost [10 –15].

Contact IEEE to Subscribe

References

References is not available for this document.