By Topic

Making TifaMMy fit for tomorrow: Towards future shared memory systems and beyond

Sign In

Cookies must be enabled to login.After enabling cookies , please use refresh or reload or ctrl+f5 on the browser for the login options.

Formats Non-Member Member
$31 $13
Learn how you can qualify for the best price for this item!
Become an IEEE Member or Subscribe to
IEEE Xplore for exclusive pricing!
close button

puzzle piece

IEEE membership options for an individual and IEEE Xplore subscriptions for an organization offer the most affordable access to essential journal articles, conference papers, standards, eBooks, and eLearning courses.

Learn more about:

IEEE membership

IEEE Xplore subscriptions

2 Author(s)
Heinecke, A. ; Inst. fur Inf., Tech. Univ. Munchen, Garching, Germany ; Trinitis, C.

In this paper, we present the recent port to and latest results of our cache-oblivious algorithms and implementations of parallel LU decomposition code TifaMMy on two new architectures: SGI's UltraViolet distributed shared memory machine, and Intel's latest x86 architecture Sandy Bridge. TifaMMy's matrix multiplication and LU decomposition routines have been further optimized with regard to these new architectures. Results are discussed and compared with Intel's architecture specific and optimized numerical Math Kernel Library (MKL) for both the standard C++ version with vectorization compiler switches and TifaMMy's highly optimized vector intrinsics version.

Published in:

High Performance Computing and Simulation (HPCS), 2011 International Conference on

Date of Conference:

4-8 July 2011