1 Introduction
CMOS scaling trends result in faster transistors and relatively longer wire delays, making it difficult to have low latency caches [9]. This is due to the long wires required to access the RAM structures. This trend has resulted in pipelined cache access and small-sized Level 1 caches. Another important parameter that affects cache design is the energy consumption in the cache [4], [13], [22]. To reduce the cache energy consumption, designers have decoupled the tag comparisons from the data access [15]. Figure 1 shows the decoupled and pipelined cache read access [11]. A cache access starts with decoding the set index. In the next cycle, byte-offset is decoded in parallel to address tag comparisons, and the bit-lines in the data array are pre-charged. The tag comparisons control whether or not the data is read from a cache block. If the data is read, then it is then driven to the units that requested the data. Pipelined Data Cache Read Access