Performance drawbacks for matrix multiplication using set associative cache in GPU devices | IEEE Conference Publication | IEEE Xplore