An architecture that is well matched to DSP system workloads, enables high-throughput and high energy-efficiency, and is well suited for advancing VLSI fabrication technologies is presented. These processing systems consist of large numbers of simple uniform programmable processing elements communicating asynchronously through a configurable 2D mesh network that connects adjacent processors at full clock rates. Early estimates predict an area density of 0.15 mm2 per processor in 0.13 μm CMOS. Results from mapping a 16-tap FIR filter over 85 design configurations show a factor of 9 variation in throughput per processor and validate the efficiency of the proposed processor granularity.
Published in:
Signals, Systems and Computers, 2004. Conference Record of the Thirty-Seventh Asilomar Conference on
(Volume:2
)
Date of Conference: 9-12 Nov. 2003