A three-parameter fast Givens QR algorithm for superscalarprocessors
Carrig, J.J., Jr.
Meyer, G.G.L.
Dept. of Electr. & Comput. Eng., Johns Hopkins Univ., Baltimore, MD;
This paper appears in: Parallel Processing, 1996., Proceedings of the 1996 International Conference on
Publication Date: 12-16 Aug 1996
Volume: 2,
On page(s): 11-18 vol.2
Meeting Date: 08/12/1996 - 08/16/1996
Location: Ithaca, NY, USA
ISBN: 0-8186-7623-X
References Cited: 17
INSPEC Accession Number: 5376112
Digital Object Identifier: 10.1109/ICPP.1996.537375
Current Version Published: 2002-08-06
Abstract
We present a three parameter fast Givens QR algorithm that
exploits parallelism to improve performance on superscalar processors.
We provide a selection of parameter values for which the new algorithm
reduces to the standard algorithm, but show that non-standard values
minimize the number of cache misses, memory references and pipeline
stalls. Using a tractable model of a superscalar machine architecture,
we derive rules for estimating the optimal combination of parameter
values. Applying these rules, we observe a speedup over the standard
algorithm of 2.4 on the Intel Pentium Pro system, 2.0 on a single thin
POWER2 processor of the IBM SP2, 1.6 on a single wide POWER2 processor
of the IBM SP2, and 4.2 on a single R8000 processor of the SGI POWER
Challenge XL
Index
Terms
Available to subscribers and IEEE members.
References
Available to subscribers and IEEE members.
Citing Documents
Available to subscribers and IEEE members.