This paper presents a new design and an implementation of the runtime system of MapReduce for heterogeneous multicore processors with explicitly managed local memories. We advance the state of the art in runtime support for MapReduce using five instruments: (1) A new multi-threaded, event-driven controller for task instantiation, task scheduling, synchronization, and bulk-synchronous execution of MapReduce stages. The controller improves utilization of control efficient cores, minimizes control overhead in the runtime system, and overlaps task instantiation with task scheduling on compute-efficient cores. (2) An implicit partitioning scheme which eliminates redundant memory copies. (3) An adaptive memory management scheme which combines efficient memory preallocation for applications with statically known output volume with dynamic allocation using runahead tasks for applications with statically unknown output volume. (4) An optimized quick-sort/merge-sort scheme which reduces the critical path length of merge-sort. (5) An optimized execution scheme which avoids redundant data transfers to and from local stores in applications that emit keys with the same value. Put together, these techniques accelerate representative MapReduce workloads by a factor of 1.81x (geometric mean) compared to a reference design that represents the state of the art.