Skip to Main Content
Many data-intensive scientific analysis techniques require global domain traversal, which over the years has been a bottleneck for efficient parallelization across distributed- memory architectures. Inspired by MapReduce and other simplified parallel programming approaches, we have designed DStep, a flexible system that greatly simplifies efficient parallelization of domain traversal techniques at scale. In order to deliver both simplicity to users as well as scalability on HPC platforms, we introduce a novel two-tiered communication architecture for managing and exploiting asynchronous communication loads. We also integrate our design with advanced parallel I/O techniques that operate directly on native simulation output. We demonstrate DStep by performing teleconnection analysis across ensemble runs of terascale atmospheric CO2 and climate data, and we show scalability results on up to 65,536 IBM BlueGene/P cores.