Fault tolerance, reliability and resilience in Cloud Computing are of paramount importance to ensure continuous operation and correct results, even in the presence of a given maximum amount of faulty components. Most existing research and implementations focus on architecture-specific solutions to introduce fault tolerance. This implies that users must tailor their applications by taking into account environment-specific fault tolerant features. Such a need results in non transparent and inflexible Cloud environments, requiring too much effort to developers and users. This paper introduces an innovative perspective on creating and managing fault tolerance that shades the implementation details of the reliability techniques from the users by means of a dedicated service layer. This allows users to specify and apply the desired level of fault tolerance without requiring any knowledge about its implementation.
Published in:
Systems Conference (SysCon), 2012 IEEE International
Date of Conference: 19-22 March 2012