TRAK: A Testing Tool for Studying the Reliability of Data Delivery in Apache Kafka

2019 IEEE International Symposium on Software Reliability Engineering Workshops (ISSREW) Pub Date : 2019-10-01 DOI:10.1109/ISSREW.2019.00101

Han Wu, Zhihao Shang, K. Wolter

{"title":"TRAK: A Testing Tool for Studying the Reliability of Data Delivery in Apache Kafka","authors":"Han Wu, Zhihao Shang, K. Wolter","doi":"10.1109/ISSREW.2019.00101","DOIUrl":null,"url":null,"abstract":"In modern applications the demand for real-time processing of high-volume data streams is growing. Common application scenarios include market feed processing and electronic trading, maintenance of IoT devices and fraud detection. In some scenarios reliability is the utmost concern while in others speed and simplicity are the top priority. Apache Kafka is a high-throughput distributed messaging system and its reliable stream delivery capability makes it an ideal source of data for stream-processing systems. With various configurable parameters Kafka is very flexible in reliable data delivery thus allowing all kinds of reliability tradeoffs. In this paper we introduce a tool for Testing the Reliability of Apache Kafka (TRAK), to study different data delivery semantics in Kafka and compare their reliability under poor network quality. We build a Kafka testbed using Docker containers and use a network emulation tool to control the network delay and loss. Two metrics, message loss rate and duplicate rate, are used in our experiments to evaluate the reliability of data delivery in Kafka. The experimental results show that under high network delay the size of messages matters. The at-least-once semantics is more reliable than at-most-once in a network with high packet loss, but can lead to duplicated messages.","PeriodicalId":166239,"journal":{"name":"2019 IEEE International Symposium on Software Reliability Engineering Workshops (ISSREW)","volume":"86 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2019-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"11","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2019 IEEE International Symposium on Software Reliability Engineering Workshops (ISSREW)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ISSREW.2019.00101","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 11

Abstract

In modern applications the demand for real-time processing of high-volume data streams is growing. Common application scenarios include market feed processing and electronic trading, maintenance of IoT devices and fraud detection. In some scenarios reliability is the utmost concern while in others speed and simplicity are the top priority. Apache Kafka is a high-throughput distributed messaging system and its reliable stream delivery capability makes it an ideal source of data for stream-processing systems. With various configurable parameters Kafka is very flexible in reliable data delivery thus allowing all kinds of reliability tradeoffs. In this paper we introduce a tool for Testing the Reliability of Apache Kafka (TRAK), to study different data delivery semantics in Kafka and compare their reliability under poor network quality. We build a Kafka testbed using Docker containers and use a network emulation tool to control the network delay and loss. Two metrics, message loss rate and duplicate rate, are used in our experiments to evaluate the reliability of data delivery in Kafka. The experimental results show that under high network delay the size of messages matters. The at-least-once semantics is more reliable than at-most-once in a network with high packet loss, but can lead to duplicated messages.

查看原文本刊更多论文

TRAK:一个用于研究Apache Kafka数据传输可靠性的测试工具

在现代应用中，对实时处理大容量数据流的需求正在增长。常见的应用场景包括市场饲料处理和电子交易、物联网设备维护和欺诈检测。在某些情况下，可靠性是最重要的，而在其他情况下，速度和简单性是最重要的。Apache Kafka是一个高吞吐量的分布式消息传递系统，其可靠的流传输能力使其成为流处理系统的理想数据源。通过各种可配置参数，Kafka在可靠数据传输方面非常灵活，从而允许各种可靠性权衡。本文介绍了一个测试Apache Kafka可靠性的工具(TRAK)，以研究Kafka中不同的数据传递语义，并比较它们在网络质量差的情况下的可靠性。我们使用Docker容器构建了一个Kafka测试平台，并使用网络仿真工具来控制网络延迟和丢失。我们在实验中使用了两个指标，消息损失率和重复率来评估Kafka中数据传输的可靠性。实验结果表明，在高网络延迟的情况下，消息的大小很重要。在丢包率高的网络中，“至少一次”语义比“最多一次”语义更可靠，但可能导致重复的消息。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

2019 IEEE International Symposium on Software Reliability Engineering Workshops (ISSREW)

自引率

0.00%

发文量