Skip to Main Content
The choice of a suitable memory allocation strategy greatly affects the performance of data-intensive applications on large shared-memory systems (SMPs). Standard memory allocators often provide poor performance because they do not properly reflect the different memory access latencies in deep NUMA architectures with their on-chip, off-chip, and off-blade communication. We analyze memory allocation strategies for data-intensive MapReduce applications on a large SMP with 512 cores and 2~TB main memory. We compare the efficiency of the MapReduce frameworks MR-Search and Phoenix++ and provide performance results on two benchmark applications, k-means and shortest-path search. Already on small SMPs with 128 cores a 6-fold speedup can be achieved by substituting the standard glibc by a better adapted memory allocation strategy, and these savings become more pronounced on larger SMPs. We identify two types of overhead: (1) the cost for executing the allocation requests and (2) poor memory locality caused by inefficient mapping to the underlying memory topology. We give detailed results on the NUMA traffic and show how the cost increases on large SMPs with many cores and a deep NUMA hierarchy.