Conferences >2007 IEEE Asian Solid-State C...

A 28.5mW 2.8GFLOPS floating-point multifunction unit for handheld 3D graphics processors

Download PDF
Download References
Request Permissions
Save to
Alerts

Abstract:

A low-power, high-performance 4-way 32-bit floatingpoint multifunction unit is developed for handheld 3D graphics processors. It uses logarithmic arithmetic to unify matr...Show More

Metadata

Abstract:

A low-power, high-performance 4-way 32-bit floatingpoint multifunction unit is developed for handheld 3D graphics processors. It uses logarithmic arithmetic to unify matrix, vector, and elementary functions into a single arithmetic unit. The optimal designs of logarithmic and antilogarithmic converters are presented. An adaptive number conversion scheme is proposed and it reduces total area by 15%. With this scheme, the matrix-vector multiplication (MAT), cross-product, lerp, and logarithm ( logx y with 2 variables) are newly unified with the other operations. The unit achieves 2-cycle throughput for the MAT and single-cycle throughput for all other operations. It takes 451K transistors and achieves 2.8GFLOPS at 200MHz with 28.5mW power consumption.

Published in: 2007 IEEE Asian Solid-State Circuits Conference

Date of Conference: 12-14 November 2007

Date Added to IEEE Xplore: 07 January 2008

ISBN Information:

DOI: 10.1109/ASSCC.2007.4425709

Conference Location: Jeju, Korea (South)

Contents

I. Introduction

Modern handheld graphics processing units (GPUs) require various operations to get realistic graphics effects [1]. In [2], a multifunction unit is proposed for this purpose. However, it was a fixed-point unit and didn't deal with the matrix-vector multiplication, required for the frequently used geometry transformation in 3D graphics. In this paper, a 4-way 32-bit floating-point (FLP) unified matrix, vector, and elementary function unit is proposed. It operates on the FLP data to meet the specification of the standard API which requires more than 24-bit FLP precision [1]. It unifies matrix-vector multiplication (MAT), vector multiplication, division, square root, multiply-add, lerp, dot-product, cross-product (CRS), and elementary functions including trigonometric functions (TRGs), power (POW) and logarithm (LOG) with 2 variables in a single 4-way arithmetic unit. The MAT, CRS, lerp, and LOG are newly unified to the previous operation set [2] with little overhead. Although it operates on the FLP data, it uses logarithmic arithmetic for the internal arithmetic. Using this, it achieves power-and area-efficient unification and single-cycle throughput with maximum 5-cycle latency for all the operations except for the MAT, for which 2-cycle throughput and 6-cycle latency is achieved.

References is not available for this document.

A 28.5mW 2.8GFLOPS floating-point multifunction unit for handheld 3D graphics processors

Abstract:

Metadata

Abstract:

I. Introduction

References

IEEE Account

Purchase Details

Profile Information

Need Help?

A 28.5mW 2.8GFLOPS floating-point multifunction unit for handheld 3D graphics processors

Alerts

Abstract:

Metadata

Abstract:

I. Introduction

Authors

Figures

References

Citations

Keywords

Metrics

References

IEEE Account

Purchase Details

Profile Information

Need Help?