I. Introduction
Non-Uniform Memory Access (NUMA) architectures now dominate shared memory parallel computer platforms. In these systems a processing core can access nearby memory faster than remote memory. To achieve maximum aggregate memory bandwidth all processes must simultaneously access data from their closest memory location. As a consequence obtaining good performance from multithreaded codes in NUMA environments requires application codes to minimise access to remote data structures.