Evaluating Micro-batch and Data Frequency for Stream Processing Applications on Multi-cores

2022 30th Euromicro International Conference on Parallel, Distributed and Network-based Processing (PDP) Pub Date : 2022-03-01 DOI:10.1109/pdp55904.2022.00011

A. Garcia, Dalvan Griebler, C. Schepke, L. G. Fernandes

{"title":"Evaluating Micro-batch and Data Frequency for Stream Processing Applications on Multi-cores","authors":"A. Garcia, Dalvan Griebler, C. Schepke, L. G. Fernandes","doi":"10.1109/pdp55904.2022.00011","DOIUrl":null,"url":null,"abstract":"In stream processing, data arrives constantly and is often unpredictable. It can show large fluctuations in arrival frequency, size, complexity, and other factors. These fluctuations can strongly impact application latency and throughput, which are critical factors in this domain. Therefore, there is a significant amount of research on self-adaptive techniques involving elasticity or micro-batching as a way to mitigate this impact. However, there is a lack of benchmarks and tools for helping researchers to investigate micro-batching and data stream frequency implications. In this paper, we extend a benchmarking framework to support dynamic micro-batching and data stream frequency management. We used it to create custom benchmarks and compare latency and throughput aspects from two different parallel libraries. We validate our solution through an extensive analysis of the impact of micro-batching and data stream frequency on stream processing applications using Intel TBB and FastFlow, which are two libraries that leverage stream parallelism on multi-core architectures. Our results demonstrated up to 33% throughput gain over latency using micro-batches. Additionally, while TBB ensures lower latency, FastFlow ensures higher throughput in the parallel applications for different data stream frequency configurations.","PeriodicalId":210759,"journal":{"name":"2022 30th Euromicro International Conference on Parallel, Distributed and Network-based Processing (PDP)","volume":"28 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"4","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2022 30th Euromicro International Conference on Parallel, Distributed and Network-based Processing (PDP)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/pdp55904.2022.00011","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 4

Abstract

In stream processing, data arrives constantly and is often unpredictable. It can show large fluctuations in arrival frequency, size, complexity, and other factors. These fluctuations can strongly impact application latency and throughput, which are critical factors in this domain. Therefore, there is a significant amount of research on self-adaptive techniques involving elasticity or micro-batching as a way to mitigate this impact. However, there is a lack of benchmarks and tools for helping researchers to investigate micro-batching and data stream frequency implications. In this paper, we extend a benchmarking framework to support dynamic micro-batching and data stream frequency management. We used it to create custom benchmarks and compare latency and throughput aspects from two different parallel libraries. We validate our solution through an extensive analysis of the impact of micro-batching and data stream frequency on stream processing applications using Intel TBB and FastFlow, which are two libraries that leverage stream parallelism on multi-core architectures. Our results demonstrated up to 33% throughput gain over latency using micro-batches. Additionally, while TBB ensures lower latency, FastFlow ensures higher throughput in the parallel applications for different data stream frequency configurations.

查看原文本刊更多论文

多核流处理应用的微批处理和数据频率评估

在流处理中，数据不断到达并且通常是不可预测的。它可以显示出到达频率、大小、复杂性和其他因素的大幅波动。这些波动会严重影响应用程序延迟和吞吐量，这是该领域的关键因素。因此，有大量关于自适应技术的研究，包括弹性或微批处理，以减轻这种影响。然而，缺乏基准和工具来帮助研究人员调查微批处理和数据流频率的影响。在本文中，我们扩展了一个基准测试框架，以支持动态微批处理和数据流频率管理。我们使用它来创建自定义基准，并比较两个不同并行库的延迟和吞吐量。我们通过广泛分析微批处理和数据流频率对使用Intel TBB和FastFlow的流处理应用程序的影响来验证我们的解决方案，这两个库在多核架构上利用了流并行性。我们的结果表明，使用微批处理可以获得33%的吞吐量增益。此外，虽然TBB确保较低的延迟，但FastFlow确保在不同数据流频率配置的并行应用程序中具有更高的吞吐量。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

2022 30th Euromicro International Conference on Parallel, Distributed and Network-based Processing (PDP)

自引率

0.00%

发文量