Skip to Main Content
Current main memory system design is severely limited by the decades-old synchronous DRAM architecture, which requires the memory controller to track the internal status of memory devices (chips) and schedule the timing of all device operations. This rigidity has become an obstacle of integrating emerging memory technologies such as PCM into existing memory systems, because their timing requirements are vastly different. Furthermore, with the trend of embedding memory controllers into processors, it is crucial to have interoperability among general-purpose processors and diverse memory modules. To address this issue, we propose a new memory architecture framework called universal memory architecture (UniMA). It enables the interoperability by decoupling the scheduling of device operations from memory controller, using a bridge chip at each memory module to perform local scheduling. The new architecture may also help improve memory scalability, power efficiency, and bandwidth as previously proposed decoupled memory organizations. A major focus of this study is to evaluate the performance impact of local scheduling of device operations. We present a prototype implementation of UniMA on top of DDRx memory bus, and then evaluate its efficiency with different workloads. The simulation results show that UniMA actually improves memory system efficiency for memory-intensive workloads due to increased parallelism among memory modules. The overall performance improvement over the conventional DDRx memory architecture is 3.1% on average. The performance of other workloads is reduced slightly, by 1.0% on average, due to a small increase of memory idle latency. In short, the prototype and evaluation demonstrate that it is possible to integrate diverse memory technologies into a single memory architecture with virtually no loss of overall performance.