Conferences >2023 IEEE International Sympo...

Redwood: Flexible and Portable Heterogeneous Tree Traversal Workloads

Download PDF
Download References
Request Permissions
Save to
Alerts

Abstract:

Shared memory heterogeneous systems are now mainstream, with nearly every mobile phone and tablet containing integrated processing units. However, developing applications...Show More

Notes: Acknowledgement: "This research was supported in part by the DARPA SDH Program under agreement No. FA8650-18-2-7862 and the U.S. Government. The views and conclusions contained herein are those of the authors and should not be interpreted as representing the official policies or endorsements, either expressed or implied, of DARPA or the U.S. Government."

Metadata

Abstract:

Shared memory heterogeneous systems are now mainstream, with nearly every mobile phone and tablet containing integrated processing units. However, developing applications for such devices is difficult as workloads must be decomposed across different processing units, and the decomposition must be flexible to account for the growing diversity of devices, each with different relative processing unit throughput. Furthermore, many devices require distinct programming front ends, requiring significant effort to write cross-platform applications. In this work, we identify a pragmatic class of applications, which we call traverse-compute applications, that are ideal for shared memory heterogeneous systems. These applications have a flexible heterogeneous decomposition where CPUs excel at traversing a tree structure, while accelerators excel at node computations. Leveraging this insight, we present Redwood: a framework for writing heterogeneous traverse-compute workloads. Redwood provides a simple processing unit abstraction and a tree traversal library that enables heterogeneous optimizations. Using Redwood, we implement Grove, a benchmark suite containing nine pragmatic tree traversal applications, e.g., k-nearest neighbors. We instantiate Redwood for three different heterogeneous programming platforms: CUDA, SYCL, and HighLevel Synthesis; we use Grove to evaluate five shared memory heterogeneous systems. Our evaluation highlights the importance of flexible heterogeneous decomposition as the optimal parameters differ widely across platforms and applications. However, once optimally configured, heterogeneous implementations can provide up to 13.53× speedups (geomean of 3.01×) over homogeneous implementations, showcasing the potential of heterogeneous computing for these workloads.

Published in: 2023 IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS)

Date of Conference: 23-25 April 2023

Date Added to IEEE Xplore: 23 June 2023

ISBN Information:

DOI: 10.1109/ISPASS57527.2023.00028

Conference Location: Raleigh, NC, USA

Contents

I. Introduction

As Moore’s Law and Dennard’s scaling come to an end, the demand for ever-increasing performance and energy efficiency has driven the development of Shared-Memory Heterogeneous Systems (SMHSs), particularly in mobile System-on-Chips (SoCs), e.g., an Apple A12 SoC has over 80% of the die area consisting of accelerators [45]. SMHSs incorporate diverse specialized processing units (PUs), including traditional CPUs and Programmable Accelerating PUs (PAPUs), such as integrated GPUs and embedded FPGAs, all interconnected through a shared-memory hierarchy on the same chip. In contrast to conventional accelerator-oriented heterogeneous systems (e.g., [23], [41]), SMHSs architecture enables efficient communication and data sharing between different PUs, compared to discrete heterogeneous systems where data is typically transferred via PCIe, as studied in [12], [19], [33].

References is not available for this document.

Redwood: Flexible and Portable Heterogeneous Tree Traversal Workloads

Abstract:

Metadata

Abstract:

I. Introduction

References

IEEE Account

Purchase Details

Profile Information

Need Help?

Redwood: Flexible and Portable Heterogeneous Tree Traversal Workloads

Alerts

Abstract:

Metadata

Abstract:

I. Introduction

Authors

Figures

References

Citations

Keywords

Metrics

References

IEEE Account

Purchase Details

Profile Information

Need Help?