Traditional enterprise software is built around a dedicated high-performance infrastructure and it cannot map to an infrastructure cloud directly without a significant performance loss. Although MapReduce holds the promise as a viable approach, it lacks building blocks that enable high-performance optimization, especially in a shared infrastructure. Following on our previous work, we introduce another building block called the block level operator (BLO) and we show how it can be applied to solve a real enterprise application of finding the medians in a large data set. We propose two efficient approaches to compute medians, one using MapReduce and the other using the BLO. We compare the two approaches, as well as with that of using the traditional enterprise software stack, and show that our approach using the BLO gives an order of magnitude of improvement.
Published in:
Cluster Computing and Workshops, 2009. CLUSTER '09. IEEE International Conference on
Date of Conference: Aug. 31 2009-Sept. 4 2009