Skip to Main Content
With the ability to place large numbers of transistors on a single silicon chip, manufacturers have begun developing chip multiprocessors (CMPs) containing multiple processor cores, varying amounts of level 1 and level 2 caching, and on-chip directory structures for level 3 caches and memory. The level 3 cache may be used as a victim cache for both modified and clean lines evicted from on-chip level 2 caches. Efficient area and performance management of this cache hierarchy is paramount given the projected increase in access latency to off-chip memory. This paper proposes simple architectural extensions and adaptive policies for managing the L2 and L3 cache hierarchy in a CMP system. In particular, we evaluate two mechanisms that improve cache effectiveness. First, we propose the use of a small history table to provide hints to the L2 caches as to which lines are resident in the L3 cache. We employ this table to eliminate some unnecessary clean write backs to the L3 cache, reducing pressure on the L3 cache and utilization of the on-chip bus. Second, we exam-ine the performance benefits of allowing write backs from L2 caches to be placed in neighboring, on-chip L2 caches rather than forcing them to be absorbed by the L3 cache. This not only reduces the capacity pressure on the L3 cache but also makes subsequent accesses faster since L2-W-L2 cache transfers have typically lower latencies than accesses to a large L3 cache array. We evaluate the performance improvement of these two designs, and their combined effect, on four commercial workloads and observe a reduction in the overall execution time of up to 13%.