Close category search window
 

Fast indexing for blocked array layouts to improve multi-level cache locality

Sign In

Cookies must be enabled to login.After enabling cookies , please use refresh or reload or ctrl+f5 on the browser for the login options.

Formats Non-Member Member
$31 $13
Learn how you can qualify for the best price for this item!
Become an IEEE Member or Subscribe to
IEEE Xplore for exclusive pricing!
close button

puzzle piece

IEEE membership options for an individual and IEEE Xplore subscriptions for an organization offer the most affordable access to essential journal articles, conference papers, standards, eBooks, and eLearning courses.

Learn more about:

IEEE membership

IEEE Xplore subscriptions

2 Author(s)
Athanasaki, E. ; National Tech. Univ. of Athens, Greece ; Koziris, N.

One of the key challenges computer architects and compiler writers are facing, is the increasing discrepancy between processor cycle times and main memory access times. To overcome this problem, program transformations that decrease cache misses are used, to reduce average latency for memory accesses. Tiling is a widely used loop iteration reordering technique for improving locality of references. In this paper, we further reduce cache misses, restructuring the memory layout of multi-dimensional arrays, that are accessed by tiled instruction code. In our method, array elements are stored in a blocked way, exactly as they are swept by the tiled instruction stream. We present a straightforward way to easily translate multi-dimensional indexing of arrays into their blocked memory layout using simple binary-mask operations. Indices for such array layouts are now easily calculated based on the algebra of dilated integers, similarly to morton-order indexing. Actual experimental results on three different hardware platforms, using 5 benchmarks, illustrate that execution time is greatly improved when combining tiled code with tiled array layouts and binary mask-based index translation functions. Both TLB and L1 cache misses are concurrently minimized, for the same tile size, thus, applying the proposed layouts, locality of references is greatly improved. Finally, simulations using the Simplescalar tool, verify that our enhanced performance is due to the considerable reduction of cache misses in all levels of memory hierarchy.

Published in:
Interaction between Compilers and Computer Architectures, 2004. INTERACT-8 2004. Eighth Workshop on

Date of Conference: 15 Feb. 2004

Need Help?


IEEE Advancing Technology for Humanity About IEEE Xplore | Contact | Help | Terms of Use | Nondiscrimination Policy | Site Map | Privacy & Opting Out of Cookies

A not-for-profit organization, IEEE is the world's largest professional association for the advancement of technology.
© Copyright 2013 IEEE - All rights reserved. Use of this web site signifies your agreement to the terms and conditions.