Skip to Main Content
Many future shared memory multiprocessor servers will both target commercial workloads and use highly integrated "glueless" designs. Implementing low latency cache coherence in these systems is difficult, because traditional approaches either add indirection for common cache to cache misses (directory protocols) or require a totally ordered interconnect (traditional snooping protocols). Unfortunately, totally ordered interconnects are difficult to implement in glueless designs. An ideal coherence protocol would avoid indirections and interconnect ordering; however, such an approach introduces numerous protocol races that are difficult to resolve. We propose a new coherence framework to enable such protocols by separating performance from correctness. A performance protocol can optimize for the common case (i.e., absence of races) and rely on the underlying correctness substrate to resolve races, provide safety, and prevent starvation. We call the combination Token Coherence, since it explicitly exchanges and counts tokens to control coherence permissions. We develop TokenB, a specific Token Coherence performance protocol that allows a glueless multiprocessor to both exploit a low latency unordered interconnect (like directory protocols) and avoid indirection (like snooping protocols). Simulations using commercial workloads show that our new protocol can significantly outperform traditional snooping and directory protocols.