Skip to Main Content
Today's high performance embedded computing applications are posing significant challenges for processing throughout. Traditionally, such applications have been realized on application specific integrated circuits (ASICs) and/or digital signal processors (DSP). However, ASICs' advantage in performance and power often could not justify the fast increasing fabrication cost, while current DSP offers a limited processing throughput that is usually lower than 100GFLOPS. On the other hand, current multi-core processors, especially graphics processing units (GPUs), deliver very high computing throughput, and at the same time maintain high flexibility and programmability. It is thus appealing to study the potential of GPUs for high performance embedded computing. In this work, we perform a comprehensive performance evaluation on GPUs with the high performance embedded computing (HPEC) benchmark suite, which consist a broad range of signal processing benchmarks with an emphasis on radar processing applications. We develop efficient GPU implementations that could outperform previous results for all the benchmarks. In addition, a systematic instruction level analysis for the GPU implementations is conducted with a GPU micro-architecture simulator. The results provide key insights on optimizing GPU hardware and software. Meanwhile, we also compared the performance and power efficiency between GPU and DSP with the HPEC benchmarks. The comparison reveals that the major hurdle for GPU's applications in embedded computing is its relatively low power efficiency.