Fast Batched Matrix Multiplication for Small Sizes Using Half-Precision Arithmetic on GPUs | IEEE Conference Publication | IEEE Xplore