Skip to Main Content
Switched networks have been adopted in on-chip communication for their scalability and efficient resource sharing. However, using a general network for a specific domain may result in unnecessary high cost and low performance when the interconnects are not optimized for the domain. Designing an optimal network for the specific domain is challenging because in-depth knowledge of interconnects and the application domain is required. Recently proposed Nonuniform Cache Architectures (NUCAs) use wormhole-routed 2D mesh networks in L2 caches. We observe that in NUCAs, network resources are underutilized with the considerable area cost (41 percent of cache) and the network delay is significantly large (63 percent of cache access time). Motivated by our observations, we investigate both router architecture and network topology for communication behaviors in large-scale cache systems. We present Fast-LRU replacement, where cache replacement overlaps with data request delivery. Next, we propose a deadlock-free XYX routing algorithm in a mesh network and present a new halo network topology to reduce the required links. Finally, we introduce a single-cycle multicast router that needs small modification of the unicast router design. Simulation results show that our design improves the average IPC by 38 percent over the mesh design with Multicast Promotion replacement and uses 12 percent of the interconnection area of the mesh network.