Skip to Main Content
An important means of validating the design of commercial-grade shared memory multiprocessors is to run a large number of pseudo-random test programs on them. However, when intentional data races are placed in a test program, there may be many correct results according to the memory consistency model supported by the system. For popular memory models like SC and TSO, the problem of verifying correctness of an execution is known to be NP-complete. As a result, analysis techniques implemented in the past have been incomplete: violations of the memory model are flagged if provable, otherwise the result is inconclusive and it is assumed optimistically that the machine's results are correct. In this paper, we describe for the first time a practical, new algorithm which can solve this problem with certainty, thus ensuring that incorrect behavior of a large, complex multiprocessor cannot escape. We present results of our analysis algorithm on test programs run on a newly designed multiprocessor system built by Sun Microsystems. We show that our algorithm performs very well, typically analyzing a program with 512 K memory operations distributed across 60 processors within a few minutes. Our algorithm runs in less than 2.6 times the time taken by an incomplete baseline algorithm which may miss errors. Our approach greatly increases the confidence in the correctness of the results generated by the multiprocessor, and allows us to potentially uncover more bugs in the design than was previously possible.