Skip to Main Content
The increasingly parallel landscape of embedded computing platforms is bringing the reliability concern for the on-chip interconnection network (NoC) to the forefront. While very few works in the open literature bring their error recovery mechanisms down to microarchitectural and physical implementation, this paper documents the effort of optimizing a baseline NoC switch architecture for different fault-tolerant strategies against single-event upsets. As key contributions achieved, we not only come up with a new efficient fault-tolerant flow control protocol, but also we contrast correction vs. retransmission oriented switch microarchitectures, each implementing both data and control path protection, with physical implementation awareness. The accuracy of the analysis methodology enables us to report counterintuitive power-reliability trade-offs between the design points, serving as guidelines for implementing fault-tolerant communication in a power-constrained environment.