By Topic

Exploiting memory hierarchies in scientific computing

Sign In

Cookies must be enabled to login.After enabling cookies , please use refresh or reload or ctrl+f5 on the browser for the login options.

Formats Non-Member Member
$31 $13
Learn how you can qualify for the best price for this item!
Become an IEEE Member or Subscribe to
IEEE Xplore for exclusive pricing!
close button

puzzle piece

IEEE membership options for an individual and IEEE Xplore subscriptions for an organization offer the most affordable access to essential journal articles, conference papers, standards, eBooks, and eLearning courses.

Learn more about:

IEEE membership

IEEE Xplore subscriptions

2 Author(s)
Bader, M. ; Dept. of Inf., Tech. Univ. Munchen, Munich, Germany ; Weidendorfer, J.

The ratio between processor and main memory performance has been increasing since quite some time, and can safely be expected to do so throughout the oncoming years. In the era of single-core processors, this was mainly observable by increased latency, for example when measured in number of (possibly stalled) CPU clock cycles. Nowadays, with multicore chips, multiple cores share the same connection to off-chip main memory, which effectively reduces available bandwidth, as well. Caches help in both cases: they provide both a much lower latency and a much higher bandwidth by being located on-chip. By holding copies of least recently used memory blocks, caches exploit the fact that programs on the average access memory in ways that often access the same memory cell (temporal locality), or nearby memory cells (spatial locality). However, this natural locality is not enough for scientific computing in HPC. Further improving any existing access locality of given algorithms is very much wanted. In this talk, we present strategies to improve the locality of memory accesses for linear algebra problems occurring in different kinds of applications: (1) an algorithmic approach based on Peano spacefilling curves that leads to inherently cache efficient (cache oblivious) matrix algorithms, such as matrix multiplication or LU decomposition for dense and sparse matrices - on single-core CPUs, as well as in the context of shared-memory multicore platforms. (2) cache optimization strategies for matrix-vector multiplications with very large, sparse matrices, as they occur in the iterative MLEM algorithm, which is used for image reconstruction in nuclear medicine. Here, different cache-aware optimization strategies are combined in order to better exploit large caches, small caches, and single cache lines.

Published in:

High Performance Computing & Simulation, 2009. HPCS '09. International Conference on

Date of Conference:

21-24 June 2009