Skip to Main Content
Despite the success of the Message Passing Interface (MPI), many MPI libraries have suffered from software bugs. These bugs severely impact the productivity of a large number of users, causing program failures or other errors. As a result, MPI application developers often have to spend days or weeks in vain debugging their own code. To address this daunting problem, this paper presents a new method called FlowChecker, which detects communication related bugs in MPI libraries. First, FlowChecker extracts program intentions of message passing (MP-intentions), which specify messages to be delivered from the sources to the destinations. Then FlowChecker tracks the message flows that actually occur in the underlying MPI libraries. Finally, FlowChecker checks whether the messages are correctly delivered from the sources to the destinations by comparing the message flows against the MP-intentions. If a mismatch is found, FlowChecker reports a bug and provides diagnostic information to help MPI library developers to understand and fix it. We have built a FlowChecker prototype on Linux and evaluated it with five real-world and two injected bug cases in three widely used MPI libraries, including Open MPI, MPICH2, and MVAPICH2. Our experimental results show that FlowChecker effectively detects all seven evaluated bug cases. Additionally, it provides useful diagnostic information for narrowing down or even pinpointing root causes of the bugs. Moreover, our experiments with High Performance Linpack and NAS Parallel Benchmarks show that FlowChecker induces low runtime overhead (0.9-5.6 percent on Open MPI, 0.9-8.1 percent on MPICH2, and 1.6-9.7 percent on MVAPICH2).