{"title":"水龙头:数据流引擎中用于流量控制的用户级模块化技术","authors":"Andrea Lattuada, Frank McSherry, Zaheer Chothia","doi":"10.1145/2926534.2926544","DOIUrl":null,"url":null,"abstract":"This document presents Faucet, a modular flow control approach for distributed data-parallel dataflow engines with support for arbitrary (cyclic) topologies. When compared to existing backpressure techniques Faucet has the following differentiating characteristics: (i) the implementation only relies on existing progress information exposed by the system and does not require changes to the underlying dataflow system, (ii) it can be applied selectively to certain parts of the dataflow graph, and (iii) it is designed to support a wide variety of use cases, topologies and workloads. We demonstrate Faucet on an example computation for efficiently determining a cyclic join of relations, whose variability in rates of produced and consumed tuples challenges the flow control techniques employed by systems like Storm, Heron, and Spark. Our implementation, prototyped in Timely Dataflow, introduces flow control at critical locations in the computation, keeping the computation stable and resource-bound while introducing at most 20% runtime overhead over an unconstrained implementation. Our experience is that the information Timely Dataflow provides to user logic is sufficient for a variety of flow control and scheduling tasks, and merits further investigation.","PeriodicalId":393776,"journal":{"name":"Proceedings of the 3rd ACM SIGMOD Workshop on Algorithms and Systems for MapReduce and Beyond","volume":"7 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2016-06-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"12","resultStr":"{\"title\":\"Faucet: a user-level, modular technique for flow control in dataflow engines\",\"authors\":\"Andrea Lattuada, Frank McSherry, Zaheer Chothia\",\"doi\":\"10.1145/2926534.2926544\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"This document presents Faucet, a modular flow control approach for distributed data-parallel dataflow engines with support for arbitrary (cyclic) topologies. When compared to existing backpressure techniques Faucet has the following differentiating characteristics: (i) the implementation only relies on existing progress information exposed by the system and does not require changes to the underlying dataflow system, (ii) it can be applied selectively to certain parts of the dataflow graph, and (iii) it is designed to support a wide variety of use cases, topologies and workloads. We demonstrate Faucet on an example computation for efficiently determining a cyclic join of relations, whose variability in rates of produced and consumed tuples challenges the flow control techniques employed by systems like Storm, Heron, and Spark. Our implementation, prototyped in Timely Dataflow, introduces flow control at critical locations in the computation, keeping the computation stable and resource-bound while introducing at most 20% runtime overhead over an unconstrained implementation. Our experience is that the information Timely Dataflow provides to user logic is sufficient for a variety of flow control and scheduling tasks, and merits further investigation.\",\"PeriodicalId\":393776,\"journal\":{\"name\":\"Proceedings of the 3rd ACM SIGMOD Workshop on Algorithms and Systems for MapReduce and Beyond\",\"volume\":\"7 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2016-06-26\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"12\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Proceedings of the 3rd ACM SIGMOD Workshop on Algorithms and Systems for MapReduce and Beyond\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1145/2926534.2926544\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 3rd ACM SIGMOD Workshop on Algorithms and Systems for MapReduce and Beyond","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/2926534.2926544","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Faucet: a user-level, modular technique for flow control in dataflow engines
This document presents Faucet, a modular flow control approach for distributed data-parallel dataflow engines with support for arbitrary (cyclic) topologies. When compared to existing backpressure techniques Faucet has the following differentiating characteristics: (i) the implementation only relies on existing progress information exposed by the system and does not require changes to the underlying dataflow system, (ii) it can be applied selectively to certain parts of the dataflow graph, and (iii) it is designed to support a wide variety of use cases, topologies and workloads. We demonstrate Faucet on an example computation for efficiently determining a cyclic join of relations, whose variability in rates of produced and consumed tuples challenges the flow control techniques employed by systems like Storm, Heron, and Spark. Our implementation, prototyped in Timely Dataflow, introduces flow control at critical locations in the computation, keeping the computation stable and resource-bound while introducing at most 20% runtime overhead over an unconstrained implementation. Our experience is that the information Timely Dataflow provides to user logic is sufficient for a variety of flow control and scheduling tasks, and merits further investigation.