Abstract:
A common approach for improving performance uses FPGAs to accelerate critical code regions, which often involves two processes: hardware/software partitioning, which iden...Show MoreMetadata
Abstract:
A common approach for improving performance uses FPGAs to accelerate critical code regions, which often involves two processes: hardware/software partitioning, which identifies regions to offload to the FPGA; and optimizing those regions (e.g., through HLS directives). As both processes are separate and usually applied in sequence, the interplay between them is unnatural, and it is unclear how the choices made in one step can benefit the choices made in the other step. This paper presents our work-in-progress for combining partitioning and optimization into a single holistic process. First, our source-to-source compiler builds a task-based representation from the input application. Then, a greedy algorithm builds clusters of tasks and assigns each cluster to either hardware (FPGA) or software (CPU). The algorithm iteratively refines the clusters and offloading decisions by: a) minimizing the communication costs between clusters by assigning tasks that work with shared data to the same cluster; b) reducing the global execution time by applying code optimizations to the tasks in each cluster. We show the impact of our holistic approach to a motivating edge detection example and compare the results when applying partitioning and code optimizations as independent steps. The results show that a holistic partitioning can lead to a speedup of up to 28.7\times when compared to a simple offloading of the application to an FPGA.
Published in: 2023 32nd International Conference on Parallel Architectures and Compilation Techniques (PACT)
Date of Conference: 21-25 October 2023
Date Added to IEEE Xplore: 27 December 2023
ISBN Information: