Skip to Main Content
With the introduction of multi-core processors, thread affinity has quickly appeared to be one of the most important factors to accelerate program execution times. The current article presents a complete experimental study on the performance of various thread pinning strategies. We investigate four application independent thread pinning strategies and five application sensitive ones based on cache sharing. We made extensive performance evaluation on three different multi-core machines reflecting three usual utilisation: workstation machine, server machine and high performance machine. In overall, we show that fixing thread affinities (whatever the tested strategy) is a better choice for improving program performance on HPC ccNUMA machines compared to OS-based thread placement. This means that the current Linux OS scheduling strategy is not necessarily the best choice in terms of performance on ccNUMA machines, even if it is a good choice in terms of cores usage ratio and work balancing. On smaller Core2 and Nehalem machines, we show that the benefit of thread pinning is not satisfactory in terms of speedups versus OS based scheduling, but the performance stability is much better.