By Topic

Advanced architecture optimisation and performance analysis of a reconfigurable grid ALU processor

Sign In

Cookies must be enabled to login.After enabling cookies , please use refresh or reload or ctrl+f5 on the browser for the login options.

Formats Non-Member Member
$33 $33
Learn how you can qualify for the best price for this item!
Become an IEEE Member or Subscribe to
IEEE Xplore for exclusive pricing!
close button

puzzle piece

IEEE membership options for an individual and IEEE Xplore subscriptions for an organization offer the most affordable access to essential journal articles, conference papers, standards, eBooks, and eLearning courses.

Learn more about:

IEEE membership

IEEE Xplore subscriptions

3 Author(s)
S. Uhrig ; Technical University of Dortmund, Robotics Research Institute, TU Dortmund, Dortmund, Germany ; R. Jahr ; T. Ungerer

In the billion transistor era only a few architectural approaches propose new paths to improve the execution of conventional sequential instruction streams. Many legacy applications could profit from processors that are able to speed-up the execution of sequential applications beyond the performance of current superscalar processors. The Grid arithmetic logic unit (ALU) Processor (GAP) accelerates conventional sequential instruction streams without the need for recompilation. The GAP comprises a processor front-end similar to that of a superscalar processor extended by a configuration unit and a two-dimensional array of functional units that forms the execution unit. Instruction sequences are mapped dynamically into the array by the configuration unit so that they form the dataflow graph of the sequence. This study shows a performance evaluation of the GAP architecture with different array dimensions as well as its performance using a simplified interconnection network. GAP outperforms an out-of-order superscalar processor by a maximum of factor 2 with a complete crossbar interconnect between two array rows. Reducing the interconnection network to the minimum shows a maximum performance drawback of 10% for only a particular configuration and a single benchmark. In general, the slowdown is less than 2% for the minimum interconnect (two buses) and about 0.02% if three interconnection buses are used.

Published in:

IET Computers & Digital Techniques  (Volume:6 ,  Issue: 5 )