A three-parameter fast Givens QR algorithm for superscalarprocessors
Carrig, J.J., Jr.; Meyer, G.G.L.
Parallel Processing, 1996., Proceedings of the 1996 International Conference on
Volume 2, Issue , 12-16 Aug 1996 Page(s):11 - 18 vol.2
Digital Object Identifier 10.1109/ICPP.1996.537375
Summary:We present a three parameter fast Givens QR algorithm that
exploits parallelism to improve performance on superscalar processors.
We provide a selection of parameter values for which the new algorithm
reduces to the standard algorithm, but show that non-standard values
minimize the number of cache misses, memory references and pipeline
stalls. Using a tractable model of a superscalar machine architecture,
we derive rules for estimating the optimal combination of parameter
values. Applying these rules, we observe a speedup over the standard
algorithm of 2.4 on the Intel Pentium Pro system, 2.0 on a single thin
POWER2 processor of the IBM SP2, 1.6 on a single wide POWER2 processor
of the IBM SP2, and 4.2 on a single R8000 processor of the SGI POWER
Challenge XL
View citation and abstract |