Skip to Main Content
We now have 20 years of data under our belt about the performance of supercomputers against at least a single floating-point benchmark from dense linear algebra. Until about 2004, a single model of parallel programming, bulk synchronous using the MPI model, was sufficient to permit translation into reasonable parallel programs for more complex applications. Starting in 2004, however, a confluence of events changed forever the architectural landscape that underpinned MPI. The first half of this article goes into the underlying reasons for these changes, and what they mean for system architectures. The second half then addresses the view going forward in terms of our standard scaling models and their profound implications for future programming and algorithm design.