Parallel external sorting for CUDA-enabled GPUs with load balancing and low transfer overhead | IEEE Conference Publication | IEEE Xplore