Loading [MathJax]/extensions/MathMenu.js
Elastic Scaling for Data Stream Processing | IEEE Journals & Magazine | IEEE Xplore

Elastic Scaling for Data Stream Processing


Abstract:

This article addresses the profitability problem associated with auto-parallelization of general-purpose distributed data stream processing applications. Auto-paralleliza...Show More

Abstract:

This article addresses the profitability problem associated with auto-parallelization of general-purpose distributed data stream processing applications. Auto-parallelization involves locating regions in the application's data flow graph that can be replicated at run-time to apply data partitioning, in order to achieve scale. In order to make auto-parallelization effective in practice, the profitability question needs to be answered: How many parallel channels provide the best throughput? The answer to this question changes depending on the workload dynamics and resource availability at run-time. In this article, we propose an elastic auto-parallelization solution that can dynamically adjust the number of channels used to achieve high throughput without unnecessarily wasting resources. Most importantly, our solution can handle partitioned stateful operators via run-time state migration, which is fully transparent to the application developers. We provide an implementation and evaluation of the system on an industrial-strength data stream processing platform to validate our solution.
Published in: IEEE Transactions on Parallel and Distributed Systems ( Volume: 25, Issue: 6, June 2014)
Page(s): 1447 - 1463
Date of Publication: 05 December 2013

ISSN Information:


Contact IEEE to Subscribe

References

References is not available for this document.