Skip to Main Content
Reliability becomes a key issue in computer system design as microprocessors are increasingly susceptible to transient faults. Many previously proposed schemes exploit simultaneous multithreaded (SMT) architectures to achieve transient-fault tolerance by running a program concurrently on two threads, a main thread and a redundant checker thread. Such schemes however often incur high performance overheads due to resource contention and redundancy checking. In this paper, we propose dual-thread execution (DTE) for SMT processors to efficiently achieve transient-fault tolerance. DTE is derived from the recently proposed fault-tolerant dual-core execution (FTDCE) paradigm, in which two processor cores on a single chip perform redundant execution to improve both reliability and performance. In this paper, we apply the same principles as in FTDCE to SMT architectures and explore fetch policies to address the critical resource-sharing issue in SMT architectures. Our experimental results show that DTE achieves an average of 56.1% speedup over the previously proposed simultaneously and redundantly threaded processor with recovery (SRTR). More impressively, even compared to single-thread execution, DTE achieves full-coverage transient-fault tolerance along with an average of 15.5% performance improvement.
Date of Conference: 1-4 Oct. 2007