Skip to Main Content
This paper presents an approach to conducting experimental studies for the characterization and comparison of the error behavior in different computing systems. The proposed approach is applied to characterize and compare the error behavior of three commercial systems (Linux 2.6 on Pentium 4, Solaris 10 on UltraSPARC IIIi, and AIX 5.3 on POWER 5) under hardware transient faults. The data is obtained by conducting extensive fault injection into kernel code, kernel stack, and system registers with the NFTAPE framework while running the Apache Web server as a workload. The error behavior comparison shows that the Linux system has the highest average crash latency, the Solaris system has the highest hang rate, and the AIX system has the lowest error sensitivity and the least amount of crashes in the more severe categories.