Skip to Main Content
We propose an approach for high-performance scientific computing that separates the description of algorithms from the generation of code for parallel hardware architectures like Multi-Core CPUs, GPUs or FPGAs. This way, a scientist can focus on his domain of expertise by describing his algorithms generically without the need to have knowledge of specific hardware architectures, programming languages, APIs or tool flows. We present our prototype implementation that allows for transforming generic descriptions of algorithms with intensive array-type data access to highly optimized code for GPU and multi GPU cluster systems. We evaluate the approach for an example from the domain of computational nanophotonics and show that our current tool flow is able to generate efficient code that achieves speedups of up to 15.3x for a single GPU and even 35.9x for a multi GPU setup compared to a reference CPU implementation.