Compiler algorithms for optimizing locality and parallelism on shared and distributed memory machines | IEEE Conference Publication | IEEE Xplore