Skip to Main Content
It is widely accepted that the disproportionate scaling of transistor and conventional on-chip interconnect performance presents a major barrier to future high performance systems. Previous research has focused on wire-centric designs that use parallelism, locality, and on-chip wiring bandwidth to compensate for long wire latency. An alternative approach to this problem is to exploit newly-emerging on-chip transmission line technology to reduce communication latency. Compared to conventional RC wires, transmission lines can reduce delay by up to a factor of 30 for global wires, while eliminating the need for repeaters. However, this latency reduction comes at the cost of a comparable reduction in bandwidth. In this paper, we investigate using transmission lines to access large level-2 on-chip caches. We propose a family of transmission line cache (TLC) designs that represent different points in the latency/bandwidth spectrum. Compared to the recently-proposed dynamic non-uniform cache architecture (DNUCA) design, the base TLC design reduces the required cache area by 18% and reduces the interconnection network's dynamic power consumption by an average of 61%. The optimized TLC designs attain similar performance using fewer transmission lines but with some additional complexity. Simulation results using full-system simulation show that TLC provides more consistent performance than the DNUCA design across a wide variety of workloads. TLC caches are logically simpler than DNUCA designs, but require greater circuit and manufacturing complexity.