Conferences >2022 IEEE 4th International C...

Optimizing Exponent Bias for Sub-8bit Floating-Point Inference of Fine-tuned Transformers

Download PDF
Download References
Request Permissions
Save to
Alerts

Abstract:

The Transformer-based fine-tuned neural networks have demonstrated remarkable success in natural language processing (NLP) at the cost of a substantial computational burd...Show More

Metadata

Abstract:

The Transformer-based fine-tuned neural networks have demonstrated remarkable success in natural language processing (NLP) at the cost of a substantial computational burden. Post-training quantization (PTQ) is a promising technique to reduce the computational cost without expensive re-training. But prior works either demand complex calibration or suffer noticeable accuracy degradation. This paper proposes a practical method for sub-8bit floating-point (FP) PTQ. The proposed method optimizes the exponent bias to minimize quantization error in terms of signal-to-quantization noise ratio (SQNR) progressively like stochastic gradient descent. We evaluate that the proposed method achieves close to full-precision model accuracy for 6 to 8 bit FP PTQ of fine-tuned BERT on GLUE and SQuAD tasks with negligible run-time overhead.

Published in: 2022 IEEE 4th International Conference on Artificial Intelligence Circuits and Systems (AICAS)

Date of Conference: 13-15 June 2022

Date Added to IEEE Xplore: 05 September 2022

ISBN Information:

DOI: 10.1109/AICAS54282.2022.9869965

Conference Location: Incheon, Korea, Republic of

Contents

References is not available for this document.

Optimizing Exponent Bias for Sub-8bit Floating-Point Inference of Fine-tuned Transformers

Abstract:

Metadata

Abstract:

References

IEEE Account

Purchase Details

Profile Information

Need Help?

Optimizing Exponent Bias for Sub-8bit Floating-Point Inference of Fine-tuned Transformers

Alerts

Abstract:

Metadata

Abstract:

Authors

Figures

References

Keywords

Metrics

Footnotes

References

IEEE Account

Purchase Details

Profile Information

Need Help?