Skip to Main Content
In this paper we study performance characteristics and parallelization strategies for recently shipped, powerful multi-core processors - IBM Power6 and Sun T2 Plus - for high-end scientific computing. Central aspect is data locality. First, we investigate the impacts of good and bad data locality by modifying data accesses. Next, we study the impact of multithreading with respect to data locality based on the data-parallel programming approach. The level of parallelism is increased by assigning multiple threads onto one core in order to hide processor stalls caused by bad data locality. We measure the impacts of data locality and multithreading in terms of execution times and bandwidth for synthetic micro-benchmarks, a matrix multiplication kernel, and an application from Bioinformatics. The results indicate that substantial performance improvements can be obtained with minor effort by utilizing multithreading.