By Topic

Exploiting Bit-Level Delay Calculations to Soften Read-After-Write Dependences in Behavioral Synthesis

Sign In

Cookies must be enabled to login.After enabling cookies , please use refresh or reload or ctrl+f5 on the browser for the login options.

Formats Non-Member Member
$33 $13
Learn how you can qualify for the best price for this item!
Become an IEEE Member or Subscribe to
IEEE Xplore for exclusive pricing!
close button

puzzle piece

IEEE membership options for an individual and IEEE Xplore subscriptions for an organization offer the most affordable access to essential journal articles, conference papers, standards, eBooks, and eLearning courses.

Learn more about:

IEEE membership

IEEE Xplore subscriptions

3 Author(s)
Rafael Ruiz-Sautua ; Complutense Univ. of Madrid, Madrid ; MarÍa C. Molina ; JosÉ M. Mendias

Conventional high-level synthesis (HLS) algorithms are very conservative when dealing with read-after-write (RAW) dependences, the execution of one operation is allowed once all its predecessors have been calculated. However, in the execution of arithmetic operations, some bits are required later than others, and some bits are produced earlier than others. This paper proposes a presynthesis optimization algorithm that relaxes RAW dependences, taking advantage of this feature for a more efficient HLS of data flow graphs formed by additions, multiplications, and logic operations. The presented preprocessor analyzes the critical path at bit granularity and splits the arithmetic operations into subword fragments. These fragments become the input to any regular HLS tool to speed up circuit execution times through scheduling in different cycles of the fragments obtained from the same original operation. This way, the execution of one operation may begin before the calculus of its predecessors has been completed. This becomes feasible when the execution of the predecessor has begun in the selected cycle or in a previous one, and even if it will finish in a posterior cycle. The experimental results that were carried out show that implementations obtained from the optimized specification are, on the average, 70% faster, with only slight variations in the data path area.

Published in:

IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems  (Volume:26 ,  Issue: 9 )