Skip to Main Content
In this work, we extend and evaluate a simple performance model to account for NUMA and bandwidth effects for single and multi-threaded calculations within the Gaussian 03 computational chemistry code on a contemporary multi-core, NUMA platform. By using the thread and memory placement APIs in Solaris, we present results for a set of calculations from which we analyze on-chip interconnect and intra-core bandwidth contention and show the importance of load-balancing between threads. The extended model predicts single threaded performance to within 1% errors and most multi-threaded experiments within 15% errors. Our results and modeling shows that accounting for bandwidth constraints within user-space code is beneficial.