By Topic

Dache: A data aware caching for big-data applications using the MapReduce framework

Sign In

Cookies must be enabled to login.After enabling cookies , please use refresh or reload or ctrl+f5 on the browser for the login options.

The purchase and pricing options are temporarily unavailable. Please try again later.
3 Author(s)
Zhao, Yaxiong ; Google Inc., Mountain View, CA 94043, USA; Temple University, Philadelphia, PA 19122, USA ; Wu, Jie ; Liu, Cong

The buzz-word big-data refers to the large-scale distributed data processing applications that operate on exceptionally large amounts of data. Google's MapReduce and Apache's Hadoop, its open-source implementation, are the defacto software systems for big-data applications. An observation of the MapReduce framework is that the framework generates a large amount of intermediate data. Such abundant information is thrown away after the tasks finish, because MapReduce is unable to utilize them. In this paper, we propose Dache, a data-aware cache framework for big-data applications. In Dache, tasks submit their intermediate results to the cache manager. A task queries the cache manager before executing the actual computing work. A novel cache description scheme and a cache request and reply protocol are designed. We implement Dache by extending Hadoop. Testbed experiment results demonstrate that Dache significantly improves the completion time of MapReduce jobs.

Published in:

Tsinghua Science and Technology  (Volume:19 ,  Issue: 1 )