The gap between the speed of logic and DRAM access is widening. Traditional processors hide some of the mismatch in latency using techniques such as multi-level caches, instruction prefetching and memory interleaving/pipelining. Even with larger caches, cache miss rates are higher than the rate at which memory can provide data. Moreover, the memory bandwidth visible at the system bus forms a bottleneck. Therefore, there are compelling reasons for integrating DRAM and logic including: (i) the bandwidth available within the chip is many order of magnitude higher than that at the memory bus at a significantly lower access time and with lower power dissipation; and (ii) as typical workloads shift towards data-intensive/multimedia applications, the wide bandwidth can be effectively utilized. To effectively support data-intensive applications, we designed a Parallel Processor in Memory (PPIM) processor. PPIM is based on a distributed data-parallel architecture with limited support for control parallelism. The paper presents ppim-sim, a cycle-accurate simulator that models PPIM processor in software and is capable of running PPIM program binaries. Exponents conducted to evaluate the simulation using a number of data-intensive application models for varying PPIM configurations are presented. It was observed from the experiments that ppim-sim not only simulates large models in tractable amounts of time, but also is memory-efficient. In addition, the parameterized design of ppim-sim coupled with robust and effective interfaces makes it a research tool to study different processing element and controller architectures implemented in memory
Published in:
Simulation Symposium, 2001. Proceedings. 34th Annual
Date of Conference: 2001