Loading web-font TeX/Main/Regular
Accelerating Binarized Neural Networks via Bit-Tensor-Cores in Turing GPUs | IEEE Journals & Magazine | IEEE Xplore

Accelerating Binarized Neural Networks via Bit-Tensor-Cores in Turing GPUs


Abstract:

Despite foreseeing tremendous speedups over conventional deep neural networks, the performance advantage of binarized neural networks (BNNs) has merely been showcased on ...Show More

Abstract:

Despite foreseeing tremendous speedups over conventional deep neural networks, the performance advantage of binarized neural networks (BNNs) has merely been showcased on general-purpose processors such as CPUs and GPUs. In fact, due to being unable to leverage bit-level-parallelism with a word-based architecture, GPUs have been criticized for extremely low utilization (1 percent) when executing BNNs. Consequently, the latest tensorcores in NVIDIA Turing GPUs start to experimentally support bit computation. In this article, we look into this brand new bit computation capability and characterize its unique features. We show that the stride of memory access can significantly affect performance delivery and a data-format co-design is highly desired to support the tensorcores for achieving superior performance than existing software solutions without tensorcores. We realize the tensorcore-accelerated BNN design, particularly the major functions for fully-connect and convolution layers - bit matrix multiplication and bit convolution. Evaluations on two NVIDIA Turing GPUs show that, with ResNet-18, our BTC-BNN design can process ImageNet at a rate of 5.6K images per second, 77 percent faster than state-of-the-art. Our BNN approach is released on https://github.com/pnnl/TCBNN.
Published in: IEEE Transactions on Parallel and Distributed Systems ( Volume: 32, Issue: 7, 01 July 2021)
Page(s): 1878 - 1891
Date of Publication: 22 December 2020

ISSN Information:

Funding Agency:

High-Performance Computing Group, Pacific Northwest National Laboratory (PNNL), Richland, WA, USA
U.S. Army Research Laboratory (ARL), DoD Supercomputing Resource Center, Aberdeen Proving Ground, MD, USA

1 Introduction

Binarized-neural-network (BNN) [1], [2], [3] is an alternative type of deep-neural-networks (DNNs). Compared to general DNNs, such as multi-layer-perceptrons (MLPs) and convolution-neural-networks (CNNs), the major difference of BNN is that it uses a single bit to represent each entry of the input and weight matrices. BNN evolved from DNN through binarized-weight-network (BWN) [4]. It was first observed that if the weight matrix can be binarized to +1 and -1, the floating-point (FP) multiplications can be degraded to addition (i.e., mul +1) and subtraction (i.e., mul -1). Later, it was further observed that if the input matrix can be binarized as well, then even the floating-point additions and subtractions in BWN can be degraded to logical operations (i.e., xnor for bit dot-product and popc for bit accumulation) [1], [2], [3].

High-Performance Computing Group, Pacific Northwest National Laboratory (PNNL), Richland, WA, USA
U.S. Army Research Laboratory (ARL), DoD Supercomputing Resource Center, Aberdeen Proving Ground, MD, USA

Contact IEEE to Subscribe

References

References is not available for this document.