Array redistribution is usually required to enhance algorithm performance in many parallel programs on distributed memory multicomputers. Since it is performed at run-time, there is performance tradeoff between the efficiency of new data decomposition for a subsequent phase of an algorithm and the cost of redistributing data among processors. We present efficient algorithms for array redistribution. The most significant improvement of our algorithms is that a processor does not need to construct the send/receive data sets for a redistribution. Based on the packing/unpacking information that derived from the BLOCK-CYCLIC(kr) to BLOCK-CYCLIC(r) redistribution (or vice versa), a processor can pack/unpack array elements into (from) messages directly. To evaluate the performance of our methods, we have implemented our methods along with Thakur's (1994) methods on an IBM SP2 parallel machine. The results show that the execution time of our algorithms is approximately 5% to 27% faster than that of Thakur's methods
Published in:
Computer Software and Applications Conference, 1997. COMPSAC '97. Proceedings., The Twenty-First Annual International
Date of Conference: 11-15 Aug 1997