Skip to Main Content
One of the key challenges computer architects and compiler writers are facing, is the increasing discrepancy between processor cycle times and main memory access times. To overcome this problem, program transformations that decrease cache misses are used, to reduce average latency for memory accesses. Tiling is a widely used loop iteration reordering technique for improving locality of references. In this paper, we further reduce cache misses, restructuring the memory layout of multi-dimensional arrays, that are accessed by tiled instruction code. In our method, array elements are stored in a blocked way, exactly as they are swept by the tiled instruction stream. We present a straightforward way to easily translate multi-dimensional indexing of arrays into their blocked memory layout using simple binary-mask operations. Indices for such array layouts are now easily calculated based on the algebra of dilated integers, similarly to morton-order indexing. Actual experimental results on three different hardware platforms, using 5 benchmarks, illustrate that execution time is greatly improved when combining tiled code with tiled array layouts and binary mask-based index translation functions. Both TLB and L1 cache misses are concurrently minimized, for the same tile size, thus, applying the proposed layouts, locality of references is greatly improved. Finally, simulations using the Simplescalar tool, verify that our enhanced performance is due to the considerable reduction of cache misses in all levels of memory hierarchy.
Date of Conference: 15 Feb. 2004