Abstract:
We present a system architecture that uses high-efficiency processors as opposed to high-performance processors, NAND flash as byte-addressable main memory, and high-spee...Show MoreMetadata
Abstract:
We present a system architecture that uses high-efficiency processors as opposed to high-performance processors, NAND flash as byte-addressable main memory, and high-speed DRAM as a cache front-end for the flash. The main memory system is interconnected and presents a unified global address space to the client microprocessors. A single cabinet contains 2,550 nodes, networked in a highly redundant modified Moore graph that yields a bisection bandwidth of 9.1 TB/s and a worst-case latency of four hops from any node to any other. At a per-cabinet level, the system supports a minimum of 2.6 petabytes of main memory, dissipates 90 kW, and achieves 2.2 PetaFLOPS. The system architecture provides several features desirable in today's large-scale systems, including a global shared physical address space (and optional support for a global shared virtual space as well), the ability to partition the physical space unequally among clients as in a unified cache architecture (e.g., so as to support multiple VMs in a datacenter), pairwise system-wide sequential consistency on user-specified address sets, built-in checkpointing via journaled non-volatile main memory, memory cost-per-bit approaching that of NAND flash, and memory performance approaching that of pure DRAM.
Published in: IEEE Computer Architecture Letters ( Volume: 15, Issue: 2, 01 July-Dec. 2016)