Skip to Main Content
Contemporary configurable architectures have dedicated internal functional units such as multipliers, high-capacity storage RAM, and even CAM blocks. These RAM blocks allow the implementations to cache data to be reused in the near future, thereby avoiding the latency of external memory accesses. We present a data allocation algorithm that utilizes the RAM blocks in the presence of a limited number of hardware registers. This algorithm, based on a compiler data reuse analysis, determines which data should be cached in the internal RAM blocks and when. The preliminary results, for a set of image/signal processing kernels targeting a Xilinx Virtex™ FPGA device, reveal that despite the increase latency of accessing data in RAM blocks, designs that use them require smaller configurable resources than designs that exclusively use registers, while attaining comparable and in some cases even better performance.