STREAMLINE: Dynamic and Resource-Efficient Auto-Tuning of Stream Processing Data Pipeline Ensembles

IF 7.6 3区 计算机科学 Q1 COMPUTER SCIENCE, INFORMATION SYSTEMS
Stefan Pedratscher , Zahra Najafabadi Samani , Juan Aznar Poveda , Thomas Fahringer , Marlon Etheredge , Abolfazl Younesi , Juan Jose Durillo Barrionuevo , Peter Thoman
{"title":"STREAMLINE: Dynamic and Resource-Efficient Auto-Tuning of Stream Processing Data Pipeline Ensembles","authors":"Stefan Pedratscher ,&nbsp;Zahra Najafabadi Samani ,&nbsp;Juan Aznar Poveda ,&nbsp;Thomas Fahringer ,&nbsp;Marlon Etheredge ,&nbsp;Abolfazl Younesi ,&nbsp;Juan Jose Durillo Barrionuevo ,&nbsp;Peter Thoman","doi":"10.1016/j.iot.2025.101731","DOIUrl":null,"url":null,"abstract":"<div><div>With the growing volume of data generated by IoT devices and user-driven services, stream processing has become essential for handling continuous, real-time data. However, fluctuating workloads and the dynamic nature of data streams make it difficult to maintain consistent performance over time, requiring adaptive resource allocation and frequent configuration tuning. Running multiple data stream processing pipelines on shared resources further exacerbates the problem by increasing contention, leading to higher end-to-end latency and reduced performance stability. Most existing approaches focus on tuning individual configuration parameters in isolation and overlook interactions between concurrently running data pipelines. To address these limitations, we present STREAMLINE, a dynamic multi-layer auto-tuning framework designed for stream processing environments. STREAMLINE uses transformers to predict future workloads and an evolutionary algorithm to automatically tune configuration parameters. It also includes a resource-efficient scheduler that efficiently assigns operators to resources across a compute cluster. Our dynamic update mechanism minimizes downtime and preserves state during configuration parameter and scheduling changes. We evaluate STREAMLINE on the Grid’5000 testbed using real-time IoT and streaming benchmarks. Results show that STREAMLINE outperforms state-of-the-art methods, improving throughput, end-to-end latency, and CPU utilization by up to 4<span><math><mo>×</mo></math></span> , 10<span><math><mo>×</mo></math></span> , and 9<span><math><mo>×</mo></math></span> , respectively, while reducing costs by up to 10<span><math><mo>×</mo></math></span> .</div></div>","PeriodicalId":29968,"journal":{"name":"Internet of Things","volume":"34 ","pages":"Article 101731"},"PeriodicalIF":7.6000,"publicationDate":"2025-09-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Internet of Things","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S2542660525002458","RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, INFORMATION SYSTEMS","Score":null,"Total":0}
引用次数: 0

Abstract

With the growing volume of data generated by IoT devices and user-driven services, stream processing has become essential for handling continuous, real-time data. However, fluctuating workloads and the dynamic nature of data streams make it difficult to maintain consistent performance over time, requiring adaptive resource allocation and frequent configuration tuning. Running multiple data stream processing pipelines on shared resources further exacerbates the problem by increasing contention, leading to higher end-to-end latency and reduced performance stability. Most existing approaches focus on tuning individual configuration parameters in isolation and overlook interactions between concurrently running data pipelines. To address these limitations, we present STREAMLINE, a dynamic multi-layer auto-tuning framework designed for stream processing environments. STREAMLINE uses transformers to predict future workloads and an evolutionary algorithm to automatically tune configuration parameters. It also includes a resource-efficient scheduler that efficiently assigns operators to resources across a compute cluster. Our dynamic update mechanism minimizes downtime and preserves state during configuration parameter and scheduling changes. We evaluate STREAMLINE on the Grid’5000 testbed using real-time IoT and streaming benchmarks. Results show that STREAMLINE outperforms state-of-the-art methods, improving throughput, end-to-end latency, and CPU utilization by up to 4× , 10× , and 9× , respectively, while reducing costs by up to 10× .
流线:流处理数据管道集成的动态和资源高效自动调优
随着物联网设备和用户驱动服务产生的数据量不断增长,流处理对于处理连续、实时数据变得至关重要。但是,波动的工作负载和数据流的动态特性使得很难长期保持一致的性能,这需要自适应的资源分配和频繁的配置调优。在共享资源上运行多个数据流处理管道会增加争用,从而进一步加剧问题,导致更高的端到端延迟和性能稳定性降低。大多数现有方法侧重于单独调优单个配置参数,而忽略了并发运行的数据管道之间的交互。为了解决这些限制,我们提出了streamlined,一个为流处理环境设计的动态多层自动调优框架。streamlined使用变压器来预测未来的工作量,并使用进化算法来自动调整配置参数。它还包括一个资源高效调度器,可以有效地将操作符分配给跨计算集群的资源。我们的动态更新机制最大限度地减少停机时间,并在配置参数和调度变化期间保持状态。我们在Grid的5000测试平台上使用实时物联网和流基准测试来评估streamlined。结果表明,streamlined优于最先进的方法,将吞吐量、端到端延迟和CPU利用率分别提高了4倍、10倍和9倍,同时将成本降低了10倍。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
Internet of Things
Internet of Things Multiple-
CiteScore
3.60
自引率
5.10%
发文量
115
审稿时长
37 days
期刊介绍: Internet of Things; Engineering Cyber Physical Human Systems is a comprehensive journal encouraging cross collaboration between researchers, engineers and practitioners in the field of IoT & Cyber Physical Human Systems. The journal offers a unique platform to exchange scientific information on the entire breadth of technology, science, and societal applications of the IoT. The journal will place a high priority on timely publication, and provide a home for high quality. Furthermore, IOT is interested in publishing topical Special Issues on any aspect of IOT.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信