Skip to Main Content
Network-on-chips (NoCs) have emerged as a scalable solution to the wire delay constraints, thereby providing a high-performance communication fabric for future multicores. Research has shown that power, area, and performance of the NoC architecture are tightly integrated with the design and optimization of the link, router (buffer and crossbar), and topology. Recent work has shown that adaptive channel buffers (on-link storage) can considerably reduce power consumption and area overhead by reducing or replacing the power-hungry router buffers. However, channel buffer design can lead to head-of-line (HoL) blocking, which eventually reduces the throughput of the network. In this paper, we design channel buffers and router crossbars to improve the performance (latency, throughput) while reducing the power consumption. In addition, we implement the proposed channel buffers and crossbar organizations in a concentrated torus (CTorus) topology which is a dual network without the additional area overhead. We compare other dual networks with leading topologies such as mesh2X, concentrated mesh2X (CMesh2X), and flattened butterfly2X (FBfly2X), each implemented with channel buffers. Our proposed designs analyze the power-performance-area tradeoff in designing channel buffers for NoC architectures while alleviating HoL blocking through buffer organizations and crossbar optimizations. Results using Synopsys design compiler showed that the buffer and crossbar organizations for an 8 × 8 mesh architecture can reduce power consumption by 25%-40%, improve throughput and reduce latency by 525%, while occupying 4%-13% more area when compared to the baseline architecture for both synthetic as well as real benchmark traces such as Princeton Application Repository for Shared-Memory Computers (PARSEC) and Standard Performance Evaluation Corporation (SPEC). CPU2006. When the energy-efficient buffer and crossbar organization was inserted into our CTorus topology, we further reduced - nergy dissipation by 32% and area by 53%, on average, over mesh2X, CMesh2X, and FBfly2X.