Skip to Main Content
This paper introduces a methodology to optimize coarse-grained floating point units (FPUs) in a hybrid field-programmable gate array (FPGA), where the FPU consists of a number of interconnected floating point adders/subtracters (FAs), multipliers (FMs), and wordblocks (WBs). The wordblocks include registers and lookup tables (LUTs) which can implement fixed point operations efficiently. We employ common subgraph extraction to determine the best mix of blocks within an FPU and study the area, speed and utilization tradeoff over a set of floating point benchmark circuits. We then explore the system impact of FPU density and flexibility in terms of area, speed, and routing resources. Finally, we derive an optimized coarse-grained FPU by considering both architectural and system-level issues. This proposed methodology can be used to evaluate a variety of FPU architecture optimizations. The results for the selected FPU architecture optimization show that although high density FPUs are slower, they have the advantages of improved area, area-delay product, and throughput.