Abstract:
Graphics Processing Units (GPUs) have emerged as the predominant hardware platforms for massively parallel computing. However, their inherent von-Neumann architecture sti...Show MoreMetadata
Abstract:
Graphics Processing Units (GPUs) have emerged as the predominant hardware platforms for massively parallel computing. However, their inherent von-Neumann architecture still suffers performance inefficiency stemming from the sequential instruction execution and frequent data transfer overheads within the memory system. These intrinsic architectural flaws lead to heavy overhead on the latency, area, and energy efficiency, rendering GPUs suboptimal for edge computing applications. To tackle these challenges, this paper introduces a novel circular Reconfigurable Parallel Processor (RPP) to enable massively parallel applications in edge computing with high efficiency. RPP features a novel circular array of reconfigurable compute engines, enabling efficient streaming dataflow processing. In contrast to traditional Coarse Grained Reconfigurable Architecture (CGRA), the circular network topology of RPP is formed by linear switch networks with an innovative gasket memory, which reduces complicated network routing overheads while allowing versatile datapath mapping and optimized data reuse. A dedicated hierarchical memory system is proposed to support different memory access patterns and address mapping strategies, enabling flexible data access with high memory efficiency. Several hardware optimizations are further introduced to improve hardware utilization and performance such as concurrent kernel execution, register split&refill and heterogeneous scalar&vector computing. To fully utilize the hardware capability of RPP, we develop an end-to-end software stack consisting of a compiler, runtime environment, and different RPP libraries. This software stack is designed to be compatible with the GPGPU computing paradigm, enhancing its potential for broader adoption. Fabricated in a 14nm process, RPP occupies an area of 119 mm2 and operates at a maximum power of 15W with a 1GHz clock frequency. From the runtime measurement of various workloads, RPP achieves up to 27.5 × higher ene...
Date of Conference: 29 June 2024 - 03 July 2024
Date Added to IEEE Xplore: 01 August 2024
ISBN Information: