Loading [MathJax]/extensions/MathMenu.js
Learning to Reliably Deliver Streaming Data with Apache Kafka | IEEE Conference Publication | IEEE Xplore

Learning to Reliably Deliver Streaming Data with Apache Kafka


Abstract:

The rise of streaming data processing is driven by mass deployment of sensors, the increasing popularity of mobile devices, and the rapid growth of online financial tradi...Show More

Abstract:

The rise of streaming data processing is driven by mass deployment of sensors, the increasing popularity of mobile devices, and the rapid growth of online financial trading. Apache Kafka is often used as a real-time messaging system for many stream processors. However, efficiently running Kafka as a reliable data source is challenging, especially in the case of real-time processing with unstable network connection. We find that changing configuration parameters can significantly impact the guarantee of message delivery in Kafka. Therefore the key to solving the above problem is to predict the reliability of Kafka given various configurations and network conditions. We define two reliability metrics to be predicted, the probability of message loss and the probability of message duplication. Artificial neural networks (ANN) are applied in our prediction model and we select some key parameters, as well as network metrics as the features. To collect sufficient training data for our model we build a Kafka testbed based on Docker containers. With the neural network model we can predict Kafka's reliability for different application scenarios given various network environments. Combining with other metrics that a streaming application user may care for, a weighted key performance indicator (KPI) of Kafka is proposed for selecting proper configuration parameters. In the experiments we propose a rough dynamic configuration scheme, which significantly improves the reliability while guaranteeing message timeliness.
Date of Conference: 29 June 2020 - 02 July 2020
Date Added to IEEE Xplore: 31 July 2020
ISBN Information:
Print on Demand(PoD) ISSN: 1530-0889
Conference Location: Valencia, Spain

Contact IEEE to Subscribe

References

References is not available for this document.