Skip to Main Content
We describe the design and layout of a radix-128 crossbar in 90 nm CMOS. The data path is 32 bits wide and runs at 750 MHz using a three-stage pipeline, while fitting in a silicon area as small as 6.6 mm2 by filling it at the 90% level. The control path occupies 7 mm2 next to the data path by filling it at 35% level, and reconfigures the data path once every three clock cycles. Next, we arrange 128 1 mm2 “user tiles” around the crossbar, forming a 150 mm2 die, and we connect all tiles to the crossbar via global links running on top of the tiles. Including the overhead of repeaters and flip flops on global links, the area cost of the crossbar is 11% of the die. Thus, we prove that crossbar networks-on-chips (NoCs) are small enough for radices exceeding by far the few tens of ports, that were believed to be the practical limit up to now, and reaching above 100 ports. We also attempt a first-order comparison between our crossbar and a model of a popular mesh NoC, and we find that our crossbar NoC increases performance when traffic is global and stressed, at the cost of worse performance when traffic is local and benign. Finally, we present an experimental cost analysis showing that crossbar area practically grows as O(N2W), as all wiring of the crossbar fits over its standard cells, while crossbar delay grows as O(N√W) , as wire length increases with the perimeter of the crossbar.