This paper describes the design, implementation, and performance of a new parallel QR factorization algorithm based on the Compact WY representation of Householder reflections. In contrast to existing parallel algorithms, the multiprocessor partitioning strategy is not governed by an underlying static data distribution scheme. Rather, a dynamic distribution strategy is employed to exploit the capabilities of message passing architectures to overlap computation with communication. Experiments conducted on a 128-processor SGI Origin 2000 and a 64-processor HP SPP-2000 show that this new algorithm has a lower execution time than available tuned parallel routines installed on the machines including a version of ScaLAPACK's distributed QR factorization algorithm PDGEQRF
Published in:
Performance, Computing, and Communications Conference, 2002. 21st IEEE International
Date of Conference: 2002