Combining optimization for cache and instruction-level parallelism | IEEE Conference Publication | IEEE Xplore