Latency- and Privacy-Aware Convolutional Neural Network Distributed Inference for Reliable Artificial Intelligence Systems | IEEE Journals & Magazine | IEEE Xplore

Latency- and Privacy-Aware Convolutional Neural Network Distributed Inference for Reliable Artificial Intelligence Systems


Impact Statement:Advance of AI drives the popularization of intelligent services, such as automatic driving and smart healthcare, which bring great convenience to people’s lives. However,...Show More

Abstract:

Reliable artificial intelligence (AI) systems not only propose a challenge on providing intelligent services with high quality for customers but also require customers’ p...Show More
Impact Statement:
Advance of AI drives the popularization of intelligent services, such as automatic driving and smart healthcare, which bring great convenience to people’s lives. However, ultralow service response time and high customers’ privacy security are two main challenges in reliable AI systems. The CNN distributed inference method proposed in this article overcomes the issues. It provides a novel queue mechanism to minimum response time of the most urgent services while reducing waiting time of those with low priorities. Additionally, the method we propose efficiently protects customers’ privacy while ensuring the accuracy loss lower than 10%, performing better than traditional difference privacy whose accuracy loss is up to 20%. The proposed method can greatly improve quality of intelligent services (QoSs) while protecting customers’ privacy in a reliable AI system.

Abstract:

Reliable artificial intelligence (AI) systems not only propose a challenge on providing intelligent services with high quality for customers but also require customers’ privacy to be protected as much as possible during process of the services. Given the ultrahigh computing load brought by deep-learning-based intelligent services to edge devices and the ultralong distance between edge and cloud, low-latency requirement of intelligent services is hard to meet in single edge computing or cloud computing. Edge–cloud collaborative inference of deep neural networks (DNNs) is considered a feasible solution to the problem. However, former work has not reduced the inference latency to the greatest extent and has not considered privacy protection in distributed systems. To solve the problem, we first establish a novel queue mechanism. Then, the convolution layer split decisions are made based on deep reinforcement learning (DRL) to realize the parallel inference of convolutional neural networks...
Published in: IEEE Transactions on Artificial Intelligence ( Volume: 6, Issue: 2, February 2025)
Page(s): 365 - 377
Date of Publication: 19 February 2024
Electronic ISSN: 2691-4581

Funding Agency:


I. Introduction

Nowadays, intelligent services based on deep learning (DL) are becoming more popular with the development of artificial intelligence (AI) and communication technology. As one of the main deep neural networks (DNNs), convolutional neural networks (CNNs) are widely used in vision applications such as object detection, object classification, and visual reality [1], [2], [3]. Considering the relatively less computing resources, little storage space and limited power of end devices and edge servers [4], it is difficult to deploy and infer a whole CNN model with large parameter quantity and computation amount completely with high efficiency on an edge device [5], [6]. One traditional solution to the problem is to offload the inference task of the CNN to the cloud server. With so strong computing power, the cloud server is able to infer a large CNN model with low latency and improve quality of intelligent services (QoSs).

Contact IEEE to Subscribe

References

References is not available for this document.