Skip to Main Content
Router microarchitecture plays a central role in the performance of networks-on-chip (NoCs). Buffers are needed in routers to house incoming flits that cannot be immediately forwarded due to contention. This buffering can be done at the inputs or the outputs of a router, corresponding to an input-buffered router (IBR) or an output-buffered router (OBR). OBRs are attractive because they can sustain higher throughputs and have lower queuing delays under high loads than IBRs. However, a direct implementation of an OBR requires a router speedup equal to the number of ports, making such a design prohibitive under aggressive clocking needs and limited power budgets of most NoC applications. In this paper, a new router design based on a distributed shared-buffer (DSB) architecture is proposed that aims to practically emulate an OBR. The proposed architecture introduces innovations to address the unique constraints of NoCs, including efficient pipelining and novel flow control. Practical DSB configurations are also presented with reduced power overheads while exhibiting negligible performance degradation. Compared to a state-of-the-art pipelined IBR, the proposed DSB router achieves up to 19% higher throughput on synthetic traffic and reduces packet latency on average by 61% when running SPLASH-2 benchmarks with high contention. On average, the saturation throughput of DSB routers is within 7% of the theoretically ideal saturation throughput under the synthetic workloads evaluated.