Skip to Main Content
This paper describes the design of a coherency hub ASIC for a 4-socket highly-threaded multiprocessor using Sun's Victoria Falls processor. Victoria Falls is an 8-core CMT processor in the Niagara family, with 8 threads per core and a shared L2 cache. The coherency hub, named Zambezi, enables cost-effective scaling to 4 sockets with a total thread count of 256 and near-linear performance scaling on transaction processing workloads. Extending a 2-socket "glueless" system to a 4-socket system with no change to the processor was a key requirement. Zambezi broadcasts snoop requests to all nodes (i.e. sockets), serializes requests to the same address, and consolidates snoop responses. The hub communicates with each node via point-to-point serial links, using a proprietary data link layer implemented over an FBDIMM PHY. In this paper, we summarize the ASIC micro-architecture and coherency scheme, highlight how we addressed the engineering challenges we faced, and report performance scalability results we achieved on key commercial server benchmarks. Conflicting constraints (800 MHz operation and a 6-stage pipeline budget) presented the primary challenge to architecture, design and layout.