An Optimal Offline Permutation Algorithm on the Hierarchical Memory Machine, with the GPU Implementation | IEEE Conference Publication | IEEE Xplore