By Topic

Adaptive Strassen and ATLAS's DGEMM: a fast square-matrix multiply for modern high-performance systems

Sign In

Cookies must be enabled to login.After enabling cookies , please use refresh or reload or ctrl+f5 on the browser for the login options.

Formats Non-Member Member
$31 $13
Learn how you can qualify for the best price for this item!
Become an IEEE Member or Subscribe to
IEEE Xplore for exclusive pricing!
close button

puzzle piece

IEEE membership options for an individual and IEEE Xplore subscriptions for an organization offer the most affordable access to essential journal articles, conference papers, standards, eBooks, and eLearning courses.

Learn more about:

IEEE membership

IEEE Xplore subscriptions

2 Author(s)
D'Alberto, P. ; Dept. of Electr. & Comput. Eng., Carnegie Mellon Univ., Pittsburgh, PA ; Nicolau, A.

Strassen's algorithm has practical performance benefits for architectures with simple memory hierarchies, because it trades computationally expensive matrix multiplications (MM) with cheaper matrix additions (MA). However, it presents no advantages for high-performance architectures with deep memory hierarchies, because MAs exploit limited data reuse. We present an easy-to-use adaptive algorithm combining Strassen's recursion and high-tuned version of ATLAS MM. In fact, we introduce a last step in the ATLAS-installation process that determines whether Strassen's may achieve any speedup. We present a recursive algorithm achieving up to 30% speed-up versus ATLAS alone. We show experimental results for 14 different systems

Published in:

High-Performance Computing in Asia-Pacific Region, 2005. Proceedings. Eighth International Conference on

Date of Conference:

1-1 July 2005