Scheduled System Maintenance:
On May 6th, single article purchases and IEEE account management will be unavailable from 8:00 AM - 5:00 PM ET (12:00 - 21:00 UTC). We apologize for the inconvenience.
By Topic

A multi-banked shared-l1 cache architecture for tightly coupled processor clusters

Sign In

Cookies must be enabled to login.After enabling cookies , please use refresh or reload or ctrl+f5 on the browser for the login options.

Formats Non-Member Member
$31 $13
Learn how you can qualify for the best price for this item!
Become an IEEE Member or Subscribe to
IEEE Xplore for exclusive pricing!
close button

puzzle piece

IEEE membership options for an individual and IEEE Xplore subscriptions for an organization offer the most affordable access to essential journal articles, conference papers, standards, eBooks, and eLearning courses.

Learn more about:

IEEE membership

IEEE Xplore subscriptions

3 Author(s)
Kakoee, M.R. ; DEIS, Univ. of Bologna, Bologna, Italy ; Petrovic, V. ; Benini, L.

A shared-L1 cache architecture is proposed for tightly coupled processor clusters. Sharing an L1 tightly coupled data memory (TCDM) among a significant (up to 16) number of processors is challenging in terms of speed. Sharing L1 cache is even more challenging, since operation is more complex, as it eases programming. The feasibility in terms of performance of shared TCDM was shown in ST Microelectronics platform 2012, but the performance cost of supporting shared L1 cache remains to be proven. In this paper we show that replacing TCDM with a multi-banked shared-L1 cache imposes limited speed overhead. Of course, it comes at the cost of area and power. We explore the shared L1 cache architecture in terms of number of processing elements (PEs) and cache banks. Experimental results show that our multi-banked shared-L1 cache can operate with almost the same frequency as that of related TCDM architecture if the cache controller uses a cache line of 4 words. Results also show that, the area overhead with respect to TCDM is less than 18% for a cluster containing 16 Leon3 processors and 32 cache banks. We also show that the overhead on MIPS/Watt and MIPS/mm2 is from 5% to 30% depending on the size of processor in the cluster for a 16×32 configuration (16 cores and 32 cache/memory banks).

Published in:

System on Chip (SoC), 2012 International Symposium on

Date of Conference:

10-12 Oct. 2012