Skip to Main Content
The device-level size and complexity of reconfigurable architectures makes fault tolerance an important concern in system design. In this paper, we introduce a fully automated fault recovery system for networked systems, which contain FPGAs (field programmable gate arrays). If a fault is detected hat cannot be addressed locally, fault information is transferred to a reconfiguration server. Following design recompilation to avoid the fault, a new FPGA configuration is returned to the remote system and computation is reinitiated. To illustrate the benefit of this approach, we have implemented a complete fault recovery system, which requires no manual intervention. An important part of the system is a timing-driven incremental router for Xilinx Virtex devices. This router is directly interfaced to Xilinx JBits and uses no CAD tools from the standard Xilinx Alliance tool flow. Our completed system has been applied to three benchmark designs and exhibits complete fault recovery in up to 12x less time than the standard incremental Xilinx PAR flow.