Skip to Main Content
Computing unto 100GOPS without cooling is essential for high-end embedded systems and much required by markets. A novel master-slave multi-SIMD architecture and its kernel (template) based parallel programming flow is thus introduced as a parallel signal processing platform, ePUMA, embedded Parallel DSP processor with Unique Memory Access. It is an on chip multi-DSP-processor (CMP) targeting to predictable signal processing for communications and multimedia. The essential technologies are to separate the processing of control stream from parallel computing, and to separate parallel data access from parallel arithmetic computing kernels. By separations, the computation and data access can be orthogonal both in hardware and in programs. Orthogonal operations can therefore be executed in parallel and the run time cost of data access can be minimized. Benchmark shows that the computing performance therefore reaches about 80% of the hardware limit. Less than 40% of the hardware limit can be reached by normal processors. The unique SIMD memory subsystem architecture offers programmable conflict free parallel data accesses. Programming flow and tools are also developed to support coding on the unique hardware architecture. A prototype on FPGA shows especially high performance over silicon cost.