Skip to Main Content
The interconnection network communicates and links together the processing units of modern high-performance computing systems. In this context, network faults have an extremely high impact since most routing algorithms were not designed to tolerate faults. Because of this, just a single fault may stall messages in the network, preventing the finalization of applications, or may lead to deadlocked configurations. In this paper we introduce a scalable deadlock avoidance technique specifically designed to deal with large interconnection networks suffering from a large number of dynamic faults. Our method is based on adding one-slot deadlock avoidance buffers and does not require the use of any virtual channels. Additionally, fully-adaptive routing algorithms may be designed on the basis of our proposal.
Date of Conference: 20-24 Sept. 2010