Skip to Main Content
Distributed systems will be increasingly built on top of wireless networks, such as sensor networks or hand-held devices with advanced sensing and computational abilities. Supporting cooperative processes executed by such unreliable and dynamic system components poses a various number of new technical challenges. In terms of recovery, limited resource capabilities have be considered during re-scheduling of failed process activities. In terms of concurrency, a non-blocking protocol is required to allow a high degree of parallelism. In this paper, we introduce a flexible and resource-oriented failure handling mechanism for cooperative processes in hierarchical and distributed systems. The objective is to ensure both - transactional semantics as well as the selection of suitable nodes with respect to available resource capabilities. Based on a nested execution model, we develop a multi-stage algorithm that uses constraint solving techniques in a parallel fashion thus achieving a more efficient recovery. We evaluate our proposed techniques in a prototype implementation, and demonstrate significant performance gains by using a parallel re-scheduling.