Abstract:
Despite considerable advancements in specialized hardware, the majority of IoT edge devices still rely on CPUs. The burgeoning number of IoT users amplifies the challenge...Show MoreMetadata
Abstract:
Despite considerable advancements in specialized hardware, the majority of IoT edge devices still rely on CPUs. The burgeoning number of IoT users amplifies the challenges associated with performing multiple Deep Neural Network inferences on these resource-limited, CPU-only edge devices. Existing strategies, including model compression, hardware acceleration, and model partitioning, often involve a trade-off in inference accuracy, are unsuitable due to hardware specificity, or lead to inefficient resource utilization. In response to these challenges, this paper introduces L-PIC (Latency Minimized Parallel Inference on CPU)—a framework expressly devised to optimize resource allocation, decrease inference latency, and maintain result accuracy on CPU-only edge devices. A series of comprehensive experiments have verified the superior efficiency and effectiveness of the L-PIC framework in comparison to the state-of-the-art method. Remarkably, compared to the state-of-the-art method, L-PIC can reduce the inference latency of multi-DNN by an average of approximately 30% across all tested scenarios.
Date of Conference: 20-23 May 2024
Date Added to IEEE Xplore: 12 August 2024
ISBN Information: