Skip to Main Content
Performance trade-offs between fast data access by local data replication and cache capacity maximization by global data sharing have been extensively studied for many-core Chip Multiprocessors (CMPs). Costly simulations over a wide spectrum of the design space are generally required to gain insight for a sound design. To lower the cost, we develop an abstract model for understanding the performance impact of data replication on CMP caches. To overcome the lack of real-time interactions among multiple cores in the model, we further develop an efficient single-pass stack simulation to study the performance of CMP cache organizations with various degrees of data replication. The global stack logically incorporates a shared stack and per-core private stacks; shared/private reuse (stack) distances can be collected in a single-pass simulation. With the reuse distances, one can calculate the performance of CMP cache organizations with various degrees of data replication. We verify both the model and the stack simulation against execution-driven simulations with commercial multithreaded workloads. The results show that the abstract model provides accurate information about performance trade-offs of data replication. The stack simulation accurately predicts the performance of various cache organizations with 2-9 percent error margins using only about 8 percent of the simulation time.