Skip to Main Content
Trends in VLSI and micro architecture design have ushered in the multi-core era, where the number of cores on a chip is expected to grow with every processor generation. Soon, each chip will have a large number of tightly integrated processing cores with communication latencies substantially lower than those present in conventional clusters. Clusters made of such microprocessors experience non-uniform latencies between cores: cores on the same chip can communicate faster than cores on different chips, cores on the same machine can communicate faster than cores on different machines. In this paper, we characterize the performance of PDES models on a cluster of dual quad-core machines using a parameterizable modified version of Phold, a standard benchmark for parallel simulation. We study various combinations of regional and remote communication patterns to quantify the impact of communication on overall performance of simulation. We discover that the amount of communication has determining impact and it's essential to optimize this communication at each level to take maximum advantage of multi-core platform. We show that partitioning significantly improves performance. We also explore the impact of load imbalance on application performance and provide critical insight into how to partition for these different environments. We believe that this study represents a significant first step in characterizing the performance space for PDES on this emerging platform.