Efficient Routing for Cost Effective Scale-Out Data Architectures | IEEE Conference Publication | IEEE Xplore

Efficient Routing for Cost Effective Scale-Out Data Architectures


Abstract:

In large scale-out data architectures, data are distributed and replicated across several machines. Queries/tasks to such data architectures, are sent to a router which d...Show More

Abstract:

In large scale-out data architectures, data are distributed and replicated across several machines. Queries/tasks to such data architectures, are sent to a router which determines the machines containing the requested data. Ideally, to reduce the overall cost of analytics, the smallest set of machines required to satisfy the query should be returned by the router. Mathematically, this can be modeled as the set cover problem, which is NP-hard. Given large number of incoming queries in real-time, it is often impractical to compute set cover for each incoming query to perform routing. In this paper, we propose a novel technique to speedup the routing of a large number of real-time queries while minimizing the number of machines that each query touches (query span). We demonstrate that by analyzing the correlation between known queries and performing query clustering, we can reduce the set cover computation time, thereby significantly speeding up routing of unknown queries. Experiments show that our incremental set cover-based routing is 2.5 times faster and can return on average 50% fewer machines per query when compared to repeated greedy set cover and baseline routing techniques.
Date of Conference: 19-21 September 2016
Date Added to IEEE Xplore: 08 December 2016
ISBN Information:
Electronic ISSN: 2375-0227
Conference Location: London, UK

Contact IEEE to Subscribe

References

References is not available for this document.