Skip to Main Content
As more processing cores are integrated into one chip and the feature size continues to shrink, the increasing on-chip access latency complicates the design of the on-chip last-level cache for chip multiprocessors. At the same time, the overhead of maintaining on-chip directory cannot be ignored as the number of processing cores increasing. There is an urgent need for scalable organization of on-chip last-level cache. In this work, we propose fast hierarchical cache directory for tiled CMP, which divides CMP tiles into multiple regions hierarchically, and combines it with data replication. Multi-level directory is used to record the share information within a region and assist the regional home node to complete operation efficiently. Fast directory is used to get lower L2 slice access latency at the same time. Most cache requests to last-level cache can be handled within the local level-1 region. Evaluation indicates this architecture is highly scalable. Simulation results show that for a 16-core CMP, hierarchical cache directory reduces average access latency to last-level cache by 46.35% and average on-chip network traffic by 19.25% respectively. The system performance is increased by 20.82% at the same time.