Skip to Main Content
Parallel sorting methods for distributed memory systems often use partitioning algorithms to prepare the redistribution of data items. This article proposes a partitioning algorithm that calculates a redistribution specified by the number of data items to be finally located on each process. This partitioning algorithm can also be used for data items with weights, which might express a computational load to be expected, and to produce a redistribution with an individual accumulated weight of data items specified for each process. Another important feature is that data sets with duplicated data keys can be handled. Parallel sorting with those properties is often needed for parallel scientific application codes, such as particle simulations, in which the dynamics of the simulated system may destroy locality and load balance required for an efficient computation. It is applied to random sample data and to a particle simulation code requiring a sorting. Performance results have been obtained on an IBM Blue Gene/P platform with up to 32768 cores. The results show that the proposed parallel sorting method performs well in comparison to other existing algorithms.