Skip to Main Content
Efficient collective output of intermediate results to secondary storage becomes more and more important for scientific simulations as the gap between processing power/interconnection bandwidth and the I/O system bandwidth enlarges. Dedicated servers can offload I/O from compute processors and shorten the execution time, but it is not always possible or easy for an application to use them. We propose the use of active buffering with threads (ABT) for overlapping I/O with computation efficiently and flexibly without dedicated I/O servers. We show that the implementation of ABT in ROMIO, a popular implementation of MPI-IO, greatly reduces the application-visible cost of ROMIO's collective write calls, and improves an application's overall performance by hiding I/O cost and saving implicit synchronization overhead from collective write operations. Further, ABT is high-level, platform-independent, and transparent to users, giving users the benefit of overlapping I/O with other processing tasks even when the file system or parallel I/O library does not support asynchronous I/O.