Skip to Main Content
Many embedded systems use a simple pipelined RISC processor for computation and an on-chip SRAM for data storage. We present an enhancement called Intelligent SRAM (ISRAM) that consists of a small computation unit with an accumulator that is placed near the on-chip SRAM. The computation unit can perform operations on two words from the same SRAM row or on one word from the SRAM and the other from the accumulator. This ISRAM enhancement requires only a few additional instructions to support the computation unit. We present a computation partitioning algorithm that assigns the computations to the processor or to the new computation unit for a given data flow graph of a program. Performance improvement results from the reduction in the number of accesses to the SRAM, the number of instructions, and the number of pipeline stalls compared to the same operations in the processor. Experimental results on various benchmarks show up to 1.46X speedup with our enhancement.