Skip to Main Content
Global communication costs in future single-chipmultiprocessors will increase linearly with distance. In this paper,we revisit the issues of locality and load balance in order totake advantage of these new costs. We present a technique whichsimultaneously migrates data and threads based on vectors specifyinglocality and resource usage. This technique improves performanceon applications with distinguishable locality and imbalancedresource usage. 64% of the ideal reduction in execution timewas achieved on an application with these traits while no improvementwas obtained on a balanced application with little locality.