{"title":"Faucet: a user-level, modular technique for flow control in dataflow engines","authors":"Andrea Lattuada, Frank McSherry, Zaheer Chothia","doi":"10.1145/2926534.2926544","DOIUrl":null,"url":null,"abstract":"This document presents Faucet, a modular flow control approach for distributed data-parallel dataflow engines with support for arbitrary (cyclic) topologies. When compared to existing backpressure techniques Faucet has the following differentiating characteristics: (i) the implementation only relies on existing progress information exposed by the system and does not require changes to the underlying dataflow system, (ii) it can be applied selectively to certain parts of the dataflow graph, and (iii) it is designed to support a wide variety of use cases, topologies and workloads. We demonstrate Faucet on an example computation for efficiently determining a cyclic join of relations, whose variability in rates of produced and consumed tuples challenges the flow control techniques employed by systems like Storm, Heron, and Spark. Our implementation, prototyped in Timely Dataflow, introduces flow control at critical locations in the computation, keeping the computation stable and resource-bound while introducing at most 20% runtime overhead over an unconstrained implementation. Our experience is that the information Timely Dataflow provides to user logic is sufficient for a variety of flow control and scheduling tasks, and merits further investigation.","PeriodicalId":393776,"journal":{"name":"Proceedings of the 3rd ACM SIGMOD Workshop on Algorithms and Systems for MapReduce and Beyond","volume":"7 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2016-06-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"12","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 3rd ACM SIGMOD Workshop on Algorithms and Systems for MapReduce and Beyond","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/2926534.2926544","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 12
Abstract
This document presents Faucet, a modular flow control approach for distributed data-parallel dataflow engines with support for arbitrary (cyclic) topologies. When compared to existing backpressure techniques Faucet has the following differentiating characteristics: (i) the implementation only relies on existing progress information exposed by the system and does not require changes to the underlying dataflow system, (ii) it can be applied selectively to certain parts of the dataflow graph, and (iii) it is designed to support a wide variety of use cases, topologies and workloads. We demonstrate Faucet on an example computation for efficiently determining a cyclic join of relations, whose variability in rates of produced and consumed tuples challenges the flow control techniques employed by systems like Storm, Heron, and Spark. Our implementation, prototyped in Timely Dataflow, introduces flow control at critical locations in the computation, keeping the computation stable and resource-bound while introducing at most 20% runtime overhead over an unconstrained implementation. Our experience is that the information Timely Dataflow provides to user logic is sufficient for a variety of flow control and scheduling tasks, and merits further investigation.