Yinuo Fan , Dawei Sun , Minghui Wu , Shang Gao , Rajkumar Buyya
{"title":"用于在波动数据流上自动扩展资源的细粒度任务调度策略","authors":"Yinuo Fan , Dawei Sun , Minghui Wu , Shang Gao , Rajkumar Buyya","doi":"10.1016/j.future.2025.108119","DOIUrl":null,"url":null,"abstract":"<div><div>Resource scaling is crucial for stream computing systems in fluctuating data stream scenarios. Computational resource utilization fluctuates significantly with changes in data stream rates, often leading to pronounced issues of resource surplus and scarcity within these systems. Existing research has primarily focused on addressing resource insufficiency at runtime; however, effective solutions for handling variable data streams remain limited. Furthermore, overlooking task communication dependencies during task placement in resource adjustment may lead to increased communication cost, consequently impairing system performance. To address these challenges, we propose Ra-Stream, a fine-grained task scheduling strategy for resource auto-scaling over fluctuating data streams. Ra-Stream not only dynamically adjusts resources to accommodate varying data streams, but also employs fine-grained scheduling to optimize system performance further. This paper explains Ra-Stream through the following aspects: (1) Formalization: We formalize the application subgraph partitioning problem, the resource scaling problem and the task scheduling problem by constructing and analyzing a stream application model, a communication model, and a resource model. (2) Resource scaling and heuristic partitioning: We propose a resource scaling algorithm to scale computational resource for adapting to fluctuating data streams. A heuristic subgraph partitioning algorithm is also introduced to minimize communication cost evenly. (3) Fine-grained task scheduling: We present a fine-grained task scheduling algorithm to minimize computational resource utilization while reducing communication cost through thread-level task deployment. (4) Comprehensive evaluation: We evaluate multiple metrics, including latency, throughput and resource utilization in a real-world distributed stream computing environment. Experimental results demonstrate that, compared to state-of-the-art approaches, Ra-Stream reduces system latency by 36.37 % to 47.45 %, enhances system maximum throughput by 26.2 % to 60.55 %, and saves 40 % to 46.25 % in resource utilization.</div></div>","PeriodicalId":55132,"journal":{"name":"Future Generation Computer Systems-The International Journal of Escience","volume":"175 ","pages":"Article 108119"},"PeriodicalIF":6.2000,"publicationDate":"2025-09-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"A fine-grained task scheduling strategy for resource auto-scaling over fluctuating data streams\",\"authors\":\"Yinuo Fan , Dawei Sun , Minghui Wu , Shang Gao , Rajkumar Buyya\",\"doi\":\"10.1016/j.future.2025.108119\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><div>Resource scaling is crucial for stream computing systems in fluctuating data stream scenarios. Computational resource utilization fluctuates significantly with changes in data stream rates, often leading to pronounced issues of resource surplus and scarcity within these systems. Existing research has primarily focused on addressing resource insufficiency at runtime; however, effective solutions for handling variable data streams remain limited. Furthermore, overlooking task communication dependencies during task placement in resource adjustment may lead to increased communication cost, consequently impairing system performance. To address these challenges, we propose Ra-Stream, a fine-grained task scheduling strategy for resource auto-scaling over fluctuating data streams. Ra-Stream not only dynamically adjusts resources to accommodate varying data streams, but also employs fine-grained scheduling to optimize system performance further. This paper explains Ra-Stream through the following aspects: (1) Formalization: We formalize the application subgraph partitioning problem, the resource scaling problem and the task scheduling problem by constructing and analyzing a stream application model, a communication model, and a resource model. (2) Resource scaling and heuristic partitioning: We propose a resource scaling algorithm to scale computational resource for adapting to fluctuating data streams. A heuristic subgraph partitioning algorithm is also introduced to minimize communication cost evenly. (3) Fine-grained task scheduling: We present a fine-grained task scheduling algorithm to minimize computational resource utilization while reducing communication cost through thread-level task deployment. (4) Comprehensive evaluation: We evaluate multiple metrics, including latency, throughput and resource utilization in a real-world distributed stream computing environment. Experimental results demonstrate that, compared to state-of-the-art approaches, Ra-Stream reduces system latency by 36.37 % to 47.45 %, enhances system maximum throughput by 26.2 % to 60.55 %, and saves 40 % to 46.25 % in resource utilization.</div></div>\",\"PeriodicalId\":55132,\"journal\":{\"name\":\"Future Generation Computer Systems-The International Journal of Escience\",\"volume\":\"175 \",\"pages\":\"Article 108119\"},\"PeriodicalIF\":6.2000,\"publicationDate\":\"2025-09-06\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Future Generation Computer Systems-The International Journal of Escience\",\"FirstCategoryId\":\"94\",\"ListUrlMain\":\"https://www.sciencedirect.com/science/article/pii/S0167739X25004133\",\"RegionNum\":2,\"RegionCategory\":\"计算机科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"COMPUTER SCIENCE, THEORY & METHODS\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Future Generation Computer Systems-The International Journal of Escience","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0167739X25004133","RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, THEORY & METHODS","Score":null,"Total":0}
A fine-grained task scheduling strategy for resource auto-scaling over fluctuating data streams
Resource scaling is crucial for stream computing systems in fluctuating data stream scenarios. Computational resource utilization fluctuates significantly with changes in data stream rates, often leading to pronounced issues of resource surplus and scarcity within these systems. Existing research has primarily focused on addressing resource insufficiency at runtime; however, effective solutions for handling variable data streams remain limited. Furthermore, overlooking task communication dependencies during task placement in resource adjustment may lead to increased communication cost, consequently impairing system performance. To address these challenges, we propose Ra-Stream, a fine-grained task scheduling strategy for resource auto-scaling over fluctuating data streams. Ra-Stream not only dynamically adjusts resources to accommodate varying data streams, but also employs fine-grained scheduling to optimize system performance further. This paper explains Ra-Stream through the following aspects: (1) Formalization: We formalize the application subgraph partitioning problem, the resource scaling problem and the task scheduling problem by constructing and analyzing a stream application model, a communication model, and a resource model. (2) Resource scaling and heuristic partitioning: We propose a resource scaling algorithm to scale computational resource for adapting to fluctuating data streams. A heuristic subgraph partitioning algorithm is also introduced to minimize communication cost evenly. (3) Fine-grained task scheduling: We present a fine-grained task scheduling algorithm to minimize computational resource utilization while reducing communication cost through thread-level task deployment. (4) Comprehensive evaluation: We evaluate multiple metrics, including latency, throughput and resource utilization in a real-world distributed stream computing environment. Experimental results demonstrate that, compared to state-of-the-art approaches, Ra-Stream reduces system latency by 36.37 % to 47.45 %, enhances system maximum throughput by 26.2 % to 60.55 %, and saves 40 % to 46.25 % in resource utilization.
期刊介绍:
Computing infrastructures and systems are constantly evolving, resulting in increasingly complex and collaborative scientific applications. To cope with these advancements, there is a growing need for collaborative tools that can effectively map, control, and execute these applications.
Furthermore, with the explosion of Big Data, there is a requirement for innovative methods and infrastructures to collect, analyze, and derive meaningful insights from the vast amount of data generated. This necessitates the integration of computational and storage capabilities, databases, sensors, and human collaboration.
Future Generation Computer Systems aims to pioneer advancements in distributed systems, collaborative environments, high-performance computing, and Big Data analytics. It strives to stay at the forefront of developments in grids, clouds, and the Internet of Things (IoT) to effectively address the challenges posed by these wide-area, fully distributed sensing and computing systems.