By Topic

Performance optimization with scalable reconfigurable computing systems

Sign In

Cookies must be enabled to login.After enabling cookies , please use refresh or reload or ctrl+f5 on the browser for the login options.

Formats Non-Member Member
$33 $13
Learn how you can qualify for the best price for this item!
Become an IEEE Member or Subscribe to
IEEE Xplore for exclusive pricing!
close button

puzzle piece

IEEE membership options for an individual and IEEE Xplore subscriptions for an organization offer the most affordable access to essential journal articles, conference papers, standards, eBooks, and eLearning courses.

Learn more about:

IEEE membership

IEEE Xplore subscriptions

3 Author(s)
S. Sangireddy ; Dept. of Electr. Eng., Texas Univ., Richardson, TX, USA ; P. Rajamani ; S. Gaddam

The total time to execute an application, the energy consumed, and the flexibility to manage a large set of applications are among the most important performance parameters used to measure the quality of a computing system. Superior architectures with flexible reconfigurable arrays lead to innovation beyond the limits of traditional silicon. The incorporation of on-chip reconfigurable computing elements generally improves execution time. However, the amount of energy consumed to deliver the required level of performance is an important consideration, to prolong the battery life in portable and mobile devices. In this paper, we have proposed and designed a novel scalable array architecture and explored the performance and energy trade-offs for various applications by scaling various system parameters like hardware resources, operational granularity, and voltage supply. The scalable coprocessor design for mapping discrete cosine transform (DCT) is implemented with 8 taps resulting in an area of 0.0024mm2 at 0.18μ technology. The coprocessor to run 16 taps of convolution function results in an area of 0.0099mm2, while a 256 tap convolution function is designed at an area cost of 0.1585mm2. When the MPEG decode application is executed in the proposed architecture, with the DCT function computed in the scalable coprocessor, the total execution time is reduced to around 24%, and the energy consumed is reduced to around 28% of that consumed in the base architecture without a coprocessor. Further, as the coprocessor's supply voltage is scaled down from 1.8 to 1.0 volts at 0.18μ technology, the relative total execution time varied only slightly (from 23.65% to 24.78%), while resulting in considerable reduction in the energy consumed (from 28.12% to 23.8%). For the FIR application, energy consumption reduced up to 36% when hardware resources are scaled and up to another 12% when voltage is scaled, while execution time reduced up to 50% when hardware resources are scaled and increased up to 15% when voltage is scaled. The study also reveals interesting performance patterns for various applications like CJPEG, MPEG decode/encode, FIR, and IIR, depending on the their characteristics.

Published in:

19th International Conference on VLSI Design held jointly with 5th International Conference on Embedded Systems Design (VLSID'06)

Date of Conference:

3-7 Jan. 2006