Skip to Main Content
Heterogeneous Chip Multiprocessors (HMPs), such as the Cell Broadband Engine, offer a new design optimization opportunity by allowing designers to provide accelerators for application specific domains. Data sharing between CPUs and accelerators, and memory access mechanisms and protocols are crucial decisions in the design of an HMP. In this article, we analyze the choices between hardware and software managed coherence between CPU and accelerators for DMA-based data sharing, and find that hardware-coherent DMA shows a performance benefit of up to 3x, even for simple workloads.We explore memory address translation architecture choices for DMA-based data sharing. In multiprogramming environments, address translation is commonly used to separate processes. For efficiency, direct access to system memory requires address translation capabilities in the accelerator. We find that hardware managed address translation shows a performance benefit of up to 5x, even for simple workloads, by avoiding the costs of accelerator/CPU communication and supervisor management of the translation context and the introduction of a serial bottleneck on the CPU.