We develop an object-oriented framework to support fault tolerance in distributed systems using nested atomic actions. The inherent properties of the object-oriented programming paradigm enhance the error detection and recovery capabilities of the fault tolerance schemes. In our approach, error detection is performed locally within the object, whereas the recovery from errors is accomplished either locally within the object or globally across the active objects. We develop a queue-based backward recovery scheme for the global object restoration which greatly reduces the performance and storage overhead when compared to the existing schemes. We illustrate our approach with the help of prototype implementations
Published in:
Software Reliability Engineering, 1993. Proceedings., Fourth International Symposium on
Date of Conference: 3-6 Nov 1993