Skip to Main Content
This paper makes the following contributions: It proposes a new methodology for quantifying remote memory access contention on hardware DSM multiprocessors. The most valuable aspect of this methodology is that it assesses the impact of contention on real parallel programs running on real hardware. The methodology uses as input the number of accesses from each DSM node to each page in memory. A trace of the memory accesses of the program obtained at runtime from hardware counters is used to compute an accurate estimate of the fraction of execution time wasted due to contention. The paper presents also a new algorithm which detects potential hot spots in pages and resolves contention on them using dynamic page migration. The algorithm balances the remote memory accesses across the nodes of the system, while trying to improve memory access locality. Experiments with five parallel codes with irregular memory access patterns on a 128-processor Origin2000 show that our algorithm yields respectable reductions of execution time, averaging 27.7%.