Skip to Main Content
This paper explores using information about program branch probabilities to optimize the results of hardware compilation. The basic premise is to promote utilization by dedicating more resources to branches which execute more frequently. A new hardware compilation and flow control scheme are presented which enable the computation rate of different branches to be matched to the observed branch probabilities. We propose an analytical queuing network performance model to determine the optimal settings for basic block computation rates given a set of observed branch probabilities. An experimental hardware compilation system has been developed to evaluate this approach. The branch optimization design space is characterized in an experimental study for Xilinx Virtex FPGAs of two complex applications: video feature extraction and progressive refinement radiosity. For designs of equal performance, branch-optimized designs require 24 percent and 27.5 percent less area. For designs of equal area, branch optimized designs run up to three times faster. Our analytical performance model is shown to be highly accurate with relative error between 0.12 and 1.1 × 10-4.