I. Introduction
Edge computing is the processing of data close to the source where it is produced to optimize the service performance of such systems. This paradigm is closely integrated with the sensors and actuators in the Internet of Things (IoT) framework [1]. With edge computing becoming ubiquitous, it is essential to ensure that the edge nodes themselves do not become a point-of-failure for the running applications, and robust countermeasures are in place to incorporate network or node overloads/failures. Modern application demands of low latency task execution and resource constraints of the edge devices further exacerbate the problem [2]. The increasing volumes of the data requiring immediate processing and the resource constraints at the edge are pushing the compute resources to their limits, giving rise to a high chance of resource contention and node downtimes [3], [4]. This leads to resource unavailability and Service Level Objective (SLO) violations that can lead to significant financial losses [5]. Thus, it is critical to develop a fault-tolerance mechanism for edge computing to maintain low latency and high reliability.