Skip to Main Content
We present a Hierarchical MapReduce framework that gathers computation resources from different clusters and runs MapReduce jobs across them. The applications implemented in this framework adopt the Map-Reduce-Global Reduce model where computations are expressed as three functions: Map, Reduce, and Global Reduce. Two scheduling algorithms are introduced: Compute Capacity Aware Scheduling for compute-intensive jobs and Data Location Aware Scheduling for data-intensive jobs. Experimental evaluations using a molecule binding prediction tool, Auto Dock, and grep demonstrate promising results for our framework.