Restructuring and implementations of 2D matrix transpose algorithm using SSE4 vector instructions | IEEE Conference Publication | IEEE Xplore