Loading [MathJax]/extensions/MathMenu.js
TLP Balancer: Predictive Thread Allocation for Multi-Tenant Inference in Embedded GPUs | IEEE Journals & Magazine | IEEE Xplore

TLP Balancer: Predictive Thread Allocation for Multi-Tenant Inference in Embedded GPUs


Abstract:

This paper introduces a novel software technique to optimize thread allocation for merged and fused kernels in multi-tenant inference systems on embedded Graphics Process...Show More

Abstract:

This paper introduces a novel software technique to optimize thread allocation for merged and fused kernels in multi-tenant inference systems on embedded Graphics Processing Units (GPUs). Embedded systems equipped with GPUs face challenges in managing diverse deep learning workloads while adhering to Quality-of-Service (QoS) standards, primarily due to limited hardware resources and the varied nature of deep learning models. Prior work has relied on static thread allocation strategies, often leading to suboptimal hardware utilization. To address these challenges, we propose a new software technique called TLP Balancer. TLP Balancer automatically identifies the best-performing number of threads based on performance modeling. This approach significantly enhances hardware utilization and ensures QoS compliance, outperforming traditional fixed-thread allocation methods. Our evaluation shows that TLP Balancer improves throughput by 40% compared to the state-of-the-art automated kernel merge and fusion techniques.
Published in: IEEE Embedded Systems Letters ( Early Access )
Page(s): 1 - 1
Date of Publication: 14 November 2024

ISSN Information:

Funding Agency:


Contact IEEE to Subscribe