Abstract:
Apache Kafka is a highly scalable distributed messaging system that provides high throughput with low latency. Various kinds of cloud vendors provide Kafka as a service f...Show MoreMetadata
Abstract:
Apache Kafka is a highly scalable distributed messaging system that provides high throughput with low latency. Various kinds of cloud vendors provide Kafka as a service for users who need a messaging system. Given a certain hardware environment, how to set the configurations of Kafka properly will be the first concern of users. In this paper, we analyze the structure and workflow of Kafka and propose a queueing based packet flow model to predict performance metrics of Kafka cloud services. The input configuration parameters of this model contain the number of brokers in Kafka cluster, the number of partitions in a topic and the batch size of messages. Through this model users can obtain the impact of certain configuration parameters on the performance metrics including the producer throughput, the relative payload and overhead and the change of disk storage usage over time. We use queueing theory to evaluate the end-to-end latency of packets. In the experimental validation we see a strong correlation between packet sizes and packet send interval, and the service time of packets fits a phase-type distribution. The correlation and fitting results are substituted to the essential constants in the model. Experiments are performed with various configurations for observing their effects on performance metrics. The results show that our model achieves high accuracy in predicting throughput and latency.
Date of Conference: 10-12 August 2019
Date Added to IEEE Xplore: 03 October 2019
ISBN Information: