Slow cache memory systems and low memory bandwidth present a major bottleneck in performance of modern microprocessors. 3-D integration of processor and memory subsystems provides a means to realize a wide data bus that could provide a high bandwidth and low latency on-chip cache. This paper presents a three-tier, 3-D 192-kB cache for a 3-D processor-memory stack. The chip is designed and fabricated in a 0.18 m fully depleted SOI CMOS process. An ultra wide data bus for connecting the 3-D cache with the microprocessor is implemented using dense vertical vias between the stacked wafers. The fabricated cache operates at 500 MHz and achieves up to 96 GB/s aggregate bandwidth at the output.