We consider the challenges in writing efficient code for ePUMA, a novel domain-specific heterogeneous multicore architecture with SIMD DSP slave cores, multi-banked on-chip vector register files for parallel access and configurable permutation hardware that decouples memory access from computation. Suitable data layout in memory and in vector registers, combined with using ePUMA's powerful addressing modes, is key to exploiting SIMD units efficiently and achieving the throughput required for prospective applications in 4G mobile telecommunication and multimedia.
Published in:
Complex, Intelligent and Software Intensive Systems (CISIS), 2011 International Conference on
Date of Conference: June 30 2011-July 2 2011