Abstract:
Understanding shared cache performance when executing multithreaded object-oriented applications and optimizing these applications for multicores have not received much a...Show MoreMetadata
Abstract:
Understanding shared cache performance when executing multithreaded object-oriented applications and optimizing these applications for multicores have not received much attention. In this paper, we first quantify the intra-thread and inter-thread cache line (block) reuse characteristics of a set of multithreaded C++ programs when executed in shared cache based multicores. Our results show that, as far as shared on-chip caches are concerned, inter-thread cache line (block) reuse distances are much higher than intra-thread cache line reuse distances. We study the impact of these characteristics on the hit/miss behavior of the shared last-level cache on a commercial multicore machine. We then show that, by rearranging accesses to the objects shared across different threads and to the objects stored in nearby memory locations, inter-thread (temporal and spatial) object reuse distances can be reduced, which in turn helps to reduce inter-thread cache line reuse distances. The results we collected using eight multithreaded applications show that our proposed shared cache-aware code restructuring strategy can reduce misses in the last-level on-chip cache of a commercial multicore machine by 25.4%, on average. These savings in cache misses translate in turn to average execution time improvement of 11.9%.
Date of Conference: 07-10 November 2011
Date Added to IEEE Xplore: 15 December 2011
ISBN Information:
ISSN Information:
Pennsylvania State University, University Park, PA, USA
Pennsylvania State University, University Park, PA, USA
Argonne National Laboratory, Argonne, IL, USA
Pennsylvania State University, University Park, PA, USA
Pennsylvania State University, University Park, PA, USA
Argonne National Laboratory, Argonne, IL, USA