The critical bottlenecks in the implementation of the conjugate gradient algorithm on distributed memory computers are the communication requirements of the sparse matrix-vector multiply and of the vector recurrences. The data distribution and communication patterns of five general implementations whose realizations demonstrate that the cost of communication can be overcome to a much larger extent than is often assumed are described. The results also apply to more general settings for matrix-vector products, both sparse and dense.
Published in:
Supercomputing '93. Proceedings
Date of Conference: 15-19 Nov. 1993