As technology scales, interconnects have become a major performance bottleneck and a major source of power consumption for sub-micro integrated circuit (IC) chips. One promising option to mitigate the interconnect challenges is 3D ICs, in which a stack of multiple device layers are put together on the same chip. In this paper, we explore the architectural design of cache memories using 3D circuits. We present a delay and energy model 3D cache delay-energy estimation tool (3D-Cacti) to explore different 3D design options of partitioning a cache. The tool allows partitioning of a cache across different device layers at various levels of granularity. The tool has been validated by comparing its results with those obtained from circuit simulation of custom 3D layouts. We also explore the effects of various cache partitioning parameters and 3D technology parameters on delay and energy to demonstrate the utility of the tool.