Skip to Main Content
Fault tolerance is an important issue in Grid computing, where many and heterogenous machines are used. In this paper we present a flexible failure handling framework which extends a service-oriented architecture for Distributed Data Mining previously proposed, addressing the requirements for fault tolerance in the Grid. The framework allows users to achieve failure recovery whenever a crash can occur on a Grid node involved in the computation. The implemented framework has been evaluated on a real Grid setting to assess its effectiveness and performance.