FlowDB:集成流处理和一致状态管理

Proceedings of the 11th ACM International Conference on Distributed and Event-based Systems Pub Date : 2017-06-08 DOI:10.1145/3093742.3093929

Lorenzo Affetti, Alessandro Margara, G. Cugola

{"title":"FlowDB:集成流处理和一致状态管理","authors":"Lorenzo Affetti, Alessandro Margara, G. Cugola","doi":"10.1145/3093742.3093929","DOIUrl":null,"url":null,"abstract":"Recent advances in stream processing technologies led to their adoption in many large companies, where they are becoming a core element in the data processing stack. In these settings, stream processors are often used in combination with various kinds of data management frameworks to build software architectures that combine data storage, processing, retrieval, and mining. However, the adoption of separate and heterogeneous subsystems makes these architectures overmuch complex, and this hinders the design, development, maintenance, and evolution of the overall system. We address this issue by proposing a new model that integrates data management within a distributed stream processor. The model enables individual stream processing operators to persist data and make it visible and queryable from external components. It offers flexible mechanisms to control the consistency of data, including transactional updates plus ordering and integrity constraints. The paper contributes to the research on stream processing in various ways: we introduce a new model that has the potential to simplify complex data-intensive applications by integrating data management capabilities within a stream processing system; we define data consistency guarantees and show how they are enforced within this new model; we implement the model into the FlowDB prototype, and study its overhead with respect to a pure stream processing system using real world case studies and synthetic workloads. Finally, we further prove the benefits of the proposed model by showing that FlowDB can outperform a state-of-the-art, in-memory distributed database in data management tasks.","PeriodicalId":325666,"journal":{"name":"Proceedings of the 11th ACM International Conference on Distributed and Event-based Systems","volume":"30 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2017-06-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"21","resultStr":"{\"title\":\"FlowDB: Integrating Stream Processing and Consistent State Management\",\"authors\":\"Lorenzo Affetti, Alessandro Margara, G. Cugola\",\"doi\":\"10.1145/3093742.3093929\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Recent advances in stream processing technologies led to their adoption in many large companies, where they are becoming a core element in the data processing stack. In these settings, stream processors are often used in combination with various kinds of data management frameworks to build software architectures that combine data storage, processing, retrieval, and mining. However, the adoption of separate and heterogeneous subsystems makes these architectures overmuch complex, and this hinders the design, development, maintenance, and evolution of the overall system. We address this issue by proposing a new model that integrates data management within a distributed stream processor. The model enables individual stream processing operators to persist data and make it visible and queryable from external components. It offers flexible mechanisms to control the consistency of data, including transactional updates plus ordering and integrity constraints. The paper contributes to the research on stream processing in various ways: we introduce a new model that has the potential to simplify complex data-intensive applications by integrating data management capabilities within a stream processing system; we define data consistency guarantees and show how they are enforced within this new model; we implement the model into the FlowDB prototype, and study its overhead with respect to a pure stream processing system using real world case studies and synthetic workloads. Finally, we further prove the benefits of the proposed model by showing that FlowDB can outperform a state-of-the-art, in-memory distributed database in data management tasks.\",\"PeriodicalId\":325666,\"journal\":{\"name\":\"Proceedings of the 11th ACM International Conference on Distributed and Event-based Systems\",\"volume\":\"30 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2017-06-08\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"21\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Proceedings of the 11th ACM International Conference on Distributed and Event-based Systems\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1145/3093742.3093929\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 11th ACM International Conference on Distributed and Event-based Systems","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3093742.3093929","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 21

摘要

流处理技术的最新进展使其在许多大公司中得到采用，它们正在成为数据处理堆栈中的核心元素。在这些设置中，流处理器通常与各种数据管理框架结合使用，以构建结合数据存储、处理、检索和挖掘的软件体系结构。然而，采用分离的和异构的子系统使得这些体系结构过于复杂，这阻碍了整个系统的设计、开发、维护和演进。为了解决这个问题，我们提出了一个新的模型，该模型将数据管理集成到分布式流处理器中。该模型使单个流处理操作符能够持久化数据，并使其从外部组件可见和可查询。它提供了灵活的机制来控制数据的一致性，包括事务更新以及排序和完整性约束。本文以多种方式对流处理的研究做出了贡献:我们引入了一个新的模型，该模型通过在流处理系统中集成数据管理功能，有可能简化复杂的数据密集型应用程序;我们定义了数据一致性保证，并展示了如何在这个新模型中执行它们;我们将模型实现到FlowDB原型中，并使用真实世界的案例研究和合成工作负载来研究其相对于纯流处理系统的开销。最后，我们通过显示FlowDB在数据管理任务中优于最先进的内存分布式数据库，进一步证明了所提出模型的好处。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

FlowDB: Integrating Stream Processing and Consistent State Management

Recent advances in stream processing technologies led to their adoption in many large companies, where they are becoming a core element in the data processing stack. In these settings, stream processors are often used in combination with various kinds of data management frameworks to build software architectures that combine data storage, processing, retrieval, and mining. However, the adoption of separate and heterogeneous subsystems makes these architectures overmuch complex, and this hinders the design, development, maintenance, and evolution of the overall system. We address this issue by proposing a new model that integrates data management within a distributed stream processor. The model enables individual stream processing operators to persist data and make it visible and queryable from external components. It offers flexible mechanisms to control the consistency of data, including transactional updates plus ordering and integrity constraints. The paper contributes to the research on stream processing in various ways: we introduce a new model that has the potential to simplify complex data-intensive applications by integrating data management capabilities within a stream processing system; we define data consistency guarantees and show how they are enforced within this new model; we implement the model into the FlowDB prototype, and study its overhead with respect to a pure stream processing system using real world case studies and synthetic workloads. Finally, we further prove the benefits of the proposed model by showing that FlowDB can outperform a state-of-the-art, in-memory distributed database in data management tasks.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Proceedings of the 11th ACM International Conference on Distributed and Event-based Systems

自引率

0.00%

发文量