Performance Evaluation of Apache Kafka – A Modern Platform for Real Time Data Streaming

2022 2nd International Conference on Innovative Practices in Technology and Management (ICIPTM) Pub Date : 2022-02-23 DOI:10.1109/iciptm54933.2022.9754154

Shubham Vyas, R. Tyagi, Charu Jain, Shashank Sahu

{"title":"Performance Evaluation of Apache Kafka – A Modern Platform for Real Time Data Streaming","authors":"Shubham Vyas, R. Tyagi, Charu Jain, Shashank Sahu","doi":"10.1109/iciptm54933.2022.9754154","DOIUrl":null,"url":null,"abstract":"Current generation businesses become more demanding on timely availability of data. Many real-time data streaming tools and technologies are capable to meet business expectations. Apache Kafka is one of the capable open-source distributed scalable technology that enables real-time data streaming with good throughput and latency. In traditional batch processing, data is getting processed in groups or batches but in streaming services, data records are handled separately and there is a flow of data processing that is continuous and real-time. Once Data is available at the source, Kafka can detect and stream it in real-time to the target application. After doing the literature survey it was observed that there are insufficient experiments have been done till now with a variety of volumes and with different values of the number of partitions and polling intervals. The purpose of this study is to elaborate on Apache Kafka implementation and evaluate its performance. This study will analyse key performance indicators for the streaming platform and will provide useful insights from it. These insights will help to design optimized applications in Apache Kafka. Based on gaps identified after the literature survey, multiple experiments have been conducted for the producer and consumer API (Application Programming interface). Configuration of Kafka with Apache Zookeeper helped to drive the results which are captured in tabular form for different values of polling intervals, volumes, and partitions. Data for all test runs have been analysed further to drive the conclusions as mentioned in the results section. This study provides valuable insights about the utilization of CPU (Central Processing Unit) and memory for Apache Kafka streaming on changing volumes, also elaborates the impacts on streaming performance when key configurations are getting changed.","PeriodicalId":6810,"journal":{"name":"2022 2nd International Conference on Innovative Practices in Technology and Management (ICIPTM)","volume":"46 1","pages":"465-470"},"PeriodicalIF":0.0000,"publicationDate":"2022-02-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"5","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2022 2nd International Conference on Innovative Practices in Technology and Management (ICIPTM)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/iciptm54933.2022.9754154","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 5

Abstract

Current generation businesses become more demanding on timely availability of data. Many real-time data streaming tools and technologies are capable to meet business expectations. Apache Kafka is one of the capable open-source distributed scalable technology that enables real-time data streaming with good throughput and latency. In traditional batch processing, data is getting processed in groups or batches but in streaming services, data records are handled separately and there is a flow of data processing that is continuous and real-time. Once Data is available at the source, Kafka can detect and stream it in real-time to the target application. After doing the literature survey it was observed that there are insufficient experiments have been done till now with a variety of volumes and with different values of the number of partitions and polling intervals. The purpose of this study is to elaborate on Apache Kafka implementation and evaluate its performance. This study will analyse key performance indicators for the streaming platform and will provide useful insights from it. These insights will help to design optimized applications in Apache Kafka. Based on gaps identified after the literature survey, multiple experiments have been conducted for the producer and consumer API (Application Programming interface). Configuration of Kafka with Apache Zookeeper helped to drive the results which are captured in tabular form for different values of polling intervals, volumes, and partitions. Data for all test runs have been analysed further to drive the conclusions as mentioned in the results section. This study provides valuable insights about the utilization of CPU (Central Processing Unit) and memory for Apache Kafka streaming on changing volumes, also elaborates the impacts on streaming performance when key configurations are getting changed.

查看原文本刊更多论文

Apache Kafka的性能评估——一个现代的实时数据流平台

当前一代企业对数据的及时可用性要求越来越高。许多实时数据流工具和技术能够满足业务期望。Apache Kafka是一种功能强大的开源分布式可扩展技术，它支持具有良好吞吐量和延迟的实时数据流。在传统的批处理中，数据是分组或分批处理的，但在流服务中，数据记录是单独处理的，并且存在连续和实时的数据处理流。一旦数据源上的数据可用，Kafka就可以检测并实时将其流式传输到目标应用程序。在做了文献调查之后，我们发现到目前为止，对于各种卷、不同分区数和轮询间隔的值所做的实验还不够。本研究的目的是详细阐述Apache Kafka的实现并评估其性能。本研究将分析流媒体平台的关键性能指标，并从中提供有用的见解。这些见解将有助于在Apache Kafka中设计优化的应用程序。根据文献调查后发现的差距，对生产者和消费者API(应用程序编程接口)进行了多次实验。使用Apache Zookeeper配置Kafka有助于驱动结果，这些结果以表格形式捕获轮询间隔、卷和分区的不同值。对所有测试运行的数据进行了进一步分析，以得出结果部分中提到的结论。这项研究提供了关于Apache Kafka流在改变卷时CPU(中央处理单元)和内存利用率的有价值的见解，也详细说明了关键配置被改变时对流性能的影响。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

2022 2nd International Conference on Innovative Practices in Technology and Management (ICIPTM)

自引率

0.00%

发文量