1 Introduction
Array operations are used in a large number of important scientific codes, such as molecular dynamics [10], finite-element methods [16], climate modeling [33], etc. To implement these array operations efficiently, many methods have been proposed in the literature. For example, for two-dimensional arrays, by applying the loop repermutation [4], [28] to reorder the memory accesses for array elements of certain operations, we can obtain better performance. However, the majority of these methods are focused on the two-dimensional arrays. When extended to higher dimensional arrays, these methods usually do not perform well. The reason is that one usually uses the traditional matrix representation (TMR) that is also known as canonical data layouts [8] to represent higher dimensional arrays. In the TMR scheme, a three-dimensional array of size can be viewed as five two-dimensional arrays. This scheme has two drawbacks for higher dimensional array operations. First, the costs of index computations of array elements for array operations increase as the dimension increases. Second, the cache miss rate for array operations increases as the dimension increases due to more cache lines accessed. Hence, multidimensional arrays represented by the TMR scheme become less manageable and difficult for programmers to design efficient algorithms.