Skip to Main Content
After hitting the power wall, the dramatic change in computer architecture from single core to multicore/manycore brings us new challenges on high performance computing, especially for the data intensive applications. Sparse matrix-vector multiplication (SpMV) is one of the most important computations in this area, and has therefore received a lot of attention in recent decades. In contrast to the uniform/regular dense matrix computations, SpMV's irregular data access patterns with compact data structure for storage make the SpMV optimization more complex than optimizing regular/dense matrix computation. In this work, we look at the SpMV optimization problem in the context of emerging multicores from a different architecture conscious perspective, and propose an optimization strategy that has three key components: mapping, scheduling and data layout reorganization. Specifically, the mapping component derives a suitable iteration-to-core mapping; the scheduling component determines the execution order of loop iterations assigned to each core in the target multicore architecture; and finally, the data layout reorganization component prepares multiple memory layouts for the source (input) vector customized for different row patterns. A distinguishing characteristic of our approach is that it is cache hierarchy aware, that is, all three components take the underlying cache hierarchy of the target multicore architecture into account, and therefore, the derived solution is, in a sense, customized to the target architecture. We evaluate the proposed strategy using 10 sparse matrices with two different multicore systems. Our experimental evaluation reveals that the proposed optimization algorithm brings significant performance improvements (up to 26.5%) over the unoptimized case.