Skip to Main Content
New 3G wireless algorithms require more performance than can be currently provided by embedded processors. ASICs provide the necessary performance but are costly to design and sacrifice generality. This paper introduces a clustered VLIW coprocessor approach that organizes the execution and storage resources differently than a traditional general-purpose processor or DSP. The execution units of the coprocessor are clustered and embedded in a rich set of communication resources. Fine grain control of these resources is imposed by a wide-word horizontal micro-code program. The advantages of this approach are quantified on a suite of six algorithms that are taken from both traditional DSP applications and from the new 3G cellular telephony domain. The result is surprising. The execution clusters retain much of the generality of a conventional processor while simultaneously improving performance by one to two orders of magnitude and by reducing energy-delay by three to four orders of magnitude when compared to a conventional embedded processor such as the Intel XScale.