{"title":"Ls-Stream:在倾斜数据流处理的连接算子中减轻掉队者","authors":"Minghui Wu;Dawei Sun;Shang Gao;Keqin Li;Rajkumar Buyya","doi":"10.1109/TC.2025.3575917","DOIUrl":null,"url":null,"abstract":"Load imbalance can lead to the emergence of stragglers, i.e., join instances that significantly lag behind others in processing data streams. Currently, state-of-the-art solutions are capable of balancing the load between join instances to mitigate stragglers by managing hot keys and random partitioning. However, these solutions rely on either complicated routing strategies or resource-inefficient processing structures, making them susceptible to frequent changes in load between instances. Therefore, we present Ls-Stream, a data stream scheduler that aims to support dynamic workload assignment for join instances to lighten stragglers. This paper outlines our solution from the following aspects: (1) The models for partitioning, communication, matrix, and resource are developed, formalizing problems like imbalanced load between join instances and state migration costs. (2) Ls-Stream employs a two-level routing strategy for workload allocation by combining hash-based and key-based data partitioning, specifying the destination join instances for data tuples. (3) Ls-Stream also constructs a fine-grained model for minimizing the state migration cost. This allows us to make trade-offs between data transfer overhead and migration benefits. (4) Experimental results demonstrate significant improvements made by Ls-Stream: reducing maximum system latency by 49.3% and increasing maximum throughput by more than 2x compared to existing state-of-the-art works.","PeriodicalId":13087,"journal":{"name":"IEEE Transactions on Computers","volume":"74 8","pages":"2841-2855"},"PeriodicalIF":3.8000,"publicationDate":"2025-06-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Ls-Stream: Lightening Stragglers in Join Operators for Skewed Data Stream Processing\",\"authors\":\"Minghui Wu;Dawei Sun;Shang Gao;Keqin Li;Rajkumar Buyya\",\"doi\":\"10.1109/TC.2025.3575917\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Load imbalance can lead to the emergence of stragglers, i.e., join instances that significantly lag behind others in processing data streams. Currently, state-of-the-art solutions are capable of balancing the load between join instances to mitigate stragglers by managing hot keys and random partitioning. However, these solutions rely on either complicated routing strategies or resource-inefficient processing structures, making them susceptible to frequent changes in load between instances. Therefore, we present Ls-Stream, a data stream scheduler that aims to support dynamic workload assignment for join instances to lighten stragglers. This paper outlines our solution from the following aspects: (1) The models for partitioning, communication, matrix, and resource are developed, formalizing problems like imbalanced load between join instances and state migration costs. (2) Ls-Stream employs a two-level routing strategy for workload allocation by combining hash-based and key-based data partitioning, specifying the destination join instances for data tuples. (3) Ls-Stream also constructs a fine-grained model for minimizing the state migration cost. This allows us to make trade-offs between data transfer overhead and migration benefits. (4) Experimental results demonstrate significant improvements made by Ls-Stream: reducing maximum system latency by 49.3% and increasing maximum throughput by more than 2x compared to existing state-of-the-art works.\",\"PeriodicalId\":13087,\"journal\":{\"name\":\"IEEE Transactions on Computers\",\"volume\":\"74 8\",\"pages\":\"2841-2855\"},\"PeriodicalIF\":3.8000,\"publicationDate\":\"2025-06-03\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"IEEE Transactions on Computers\",\"FirstCategoryId\":\"94\",\"ListUrlMain\":\"https://ieeexplore.ieee.org/document/11022765/\",\"RegionNum\":2,\"RegionCategory\":\"计算机科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q2\",\"JCRName\":\"COMPUTER SCIENCE, HARDWARE & ARCHITECTURE\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Transactions on Computers","FirstCategoryId":"94","ListUrlMain":"https://ieeexplore.ieee.org/document/11022765/","RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"COMPUTER SCIENCE, HARDWARE & ARCHITECTURE","Score":null,"Total":0}
Ls-Stream: Lightening Stragglers in Join Operators for Skewed Data Stream Processing
Load imbalance can lead to the emergence of stragglers, i.e., join instances that significantly lag behind others in processing data streams. Currently, state-of-the-art solutions are capable of balancing the load between join instances to mitigate stragglers by managing hot keys and random partitioning. However, these solutions rely on either complicated routing strategies or resource-inefficient processing structures, making them susceptible to frequent changes in load between instances. Therefore, we present Ls-Stream, a data stream scheduler that aims to support dynamic workload assignment for join instances to lighten stragglers. This paper outlines our solution from the following aspects: (1) The models for partitioning, communication, matrix, and resource are developed, formalizing problems like imbalanced load between join instances and state migration costs. (2) Ls-Stream employs a two-level routing strategy for workload allocation by combining hash-based and key-based data partitioning, specifying the destination join instances for data tuples. (3) Ls-Stream also constructs a fine-grained model for minimizing the state migration cost. This allows us to make trade-offs between data transfer overhead and migration benefits. (4) Experimental results demonstrate significant improvements made by Ls-Stream: reducing maximum system latency by 49.3% and increasing maximum throughput by more than 2x compared to existing state-of-the-art works.
期刊介绍:
The IEEE Transactions on Computers is a monthly publication with a wide distribution to researchers, developers, technical managers, and educators in the computer field. It publishes papers on research in areas of current interest to the readers. These areas include, but are not limited to, the following: a) computer organizations and architectures; b) operating systems, software systems, and communication protocols; c) real-time systems and embedded systems; d) digital devices, computer components, and interconnection networks; e) specification, design, prototyping, and testing methods and tools; f) performance, fault tolerance, reliability, security, and testability; g) case studies and experimental and theoretical evaluations; and h) new and important applications and trends.