Architectural optimizations for a floating point multiply-accumulate unit in a graphics pipeline | IEEE Conference Publication | IEEE Xplore