A 28.5mW 2.8GFLOPS floating-point multifunction unit for handheld 3D graphics processors | IEEE Conference Publication | IEEE Xplore

A 28.5mW 2.8GFLOPS floating-point multifunction unit for handheld 3D graphics processors


Abstract:

A low-power, high-performance 4-way 32-bit floatingpoint multifunction unit is developed for handheld 3D graphics processors. It uses logarithmic arithmetic to unify matr...Show More

Abstract:

A low-power, high-performance 4-way 32-bit floatingpoint multifunction unit is developed for handheld 3D graphics processors. It uses logarithmic arithmetic to unify matrix, vector, and elementary functions into a single arithmetic unit. The optimal designs of logarithmic and antilogarithmic converters are presented. An adaptive number conversion scheme is proposed and it reduces total area by 15%. With this scheme, the matrix-vector multiplication (MAT), cross-product, lerp, and logarithm ( logx y with 2 variables) are newly unified with the other operations. The unit achieves 2-cycle throughput for the MAT and single-cycle throughput for all other operations. It takes 451K transistors and achieves 2.8GFLOPS at 200MHz with 28.5mW power consumption.
Date of Conference: 12-14 November 2007
Date Added to IEEE Xplore: 07 January 2008
ISBN Information:
Conference Location: Jeju, Korea (South)

I. Introduction

Modern handheld graphics processing units (GPUs) require various operations to get realistic graphics effects [1]. In [2], a multifunction unit is proposed for this purpose. However, it was a fixed-point unit and didn't deal with the matrix-vector multiplication, required for the frequently used geometry transformation in 3D graphics. In this paper, a 4-way 32-bit floating-point (FLP) unified matrix, vector, and elementary function unit is proposed. It operates on the FLP data to meet the specification of the standard API which requires more than 24-bit FLP precision [1]. It unifies matrix-vector multiplication (MAT), vector multiplication, division, square root, multiply-add, lerp, dot-product, cross-product (CRS), and elementary functions including trigonometric functions (TRGs), power (POW) and logarithm (LOG) with 2 variables in a single 4-way arithmetic unit. The MAT, CRS, lerp, and LOG are newly unified to the previous operation set [2] with little overhead. Although it operates on the FLP data, it uses logarithmic arithmetic for the internal arithmetic. Using this, it achieves power-and area-efficient unification and single-cycle throughput with maximum 5-cycle latency for all the operations except for the MAT, for which 2-cycle throughput and 6-cycle latency is achieved.

Contact IEEE to Subscribe

References

References is not available for this document.