Loading [MathJax]/extensions/MathZoom.js
PreGAN: Preemptive Migration Prediction Network for Proactive Fault-Tolerant Edge Computing | IEEE Conference Publication | IEEE Xplore

PreGAN: Preemptive Migration Prediction Network for Proactive Fault-Tolerant Edge Computing


Abstract:

Building a fault-tolerant edge system that can quickly react to node overloads or failures is challenging due to the unreliability of edge devices and the strict service ...Show More

Abstract:

Building a fault-tolerant edge system that can quickly react to node overloads or failures is challenging due to the unreliability of edge devices and the strict service deadlines of modern applications. Moreover, unnecessary task migrations can stress the system network, giving rise to the need for a smart and parsimonious failure recovery scheme. Prior approaches often fail to adapt to highly volatile workloads or accurately detect and diagnose faults for optimal remediation. There is thus a need for a robust and proactive fault-tolerance mechanism to meet service level objectives. In this work, we propose PreGAN, a composite AI model using a Generative Adversarial Network (GAN) to predict preemptive migration decisions for proactive fault-tolerance in containerized edge deployments. PreGAN uses co-simulations in tandem with a GAN to learn a few-shot anomaly classifier and proactively predict migration decisions for reliable computing. Extensive experiments on a Raspberry-Pi based edge environment show that PreGAN can outperform state-of-the-art baseline methods in fault-detection, diagnosis and classification, thus achieving high quality of service. PreGAN accomplishes this by 5.1% more accurate fault detection, higher diagnosis scores and 23.8% lower overheads compared to the best method among the considered baselines.
Date of Conference: 02-05 May 2022
Date Added to IEEE Xplore: 20 June 2022
ISBN Information:

ISSN Information:

Conference Location: London, United Kingdom
Related Articles are not available for this document.

I. Introduction

Edge computing is the processing of data close to the source where it is produced to optimize the service performance of such systems. This paradigm is closely integrated with the sensors and actuators in the Internet of Things (IoT) framework [1]. With edge computing becoming ubiquitous, it is essential to ensure that the edge nodes themselves do not become a point-of-failure for the running applications, and robust countermeasures are in place to incorporate network or node overloads/failures. Modern application demands of low latency task execution and resource constraints of the edge devices further exacerbate the problem [2]. The increasing volumes of the data requiring immediate processing and the resource constraints at the edge are pushing the compute resources to their limits, giving rise to a high chance of resource contention and node downtimes [3], [4]. This leads to resource unavailability and Service Level Objective (SLO) violations that can lead to significant financial losses [5]. Thus, it is critical to develop a fault-tolerance mechanism for edge computing to maintain low latency and high reliability.

Contact IEEE to Subscribe

References

References is not available for this document.