{"title":"FlowDB:集成流处理和一致状态管理","authors":"Lorenzo Affetti, Alessandro Margara, G. Cugola","doi":"10.1145/3093742.3093929","DOIUrl":null,"url":null,"abstract":"Recent advances in stream processing technologies led to their adoption in many large companies, where they are becoming a core element in the data processing stack. In these settings, stream processors are often used in combination with various kinds of data management frameworks to build software architectures that combine data storage, processing, retrieval, and mining. However, the adoption of separate and heterogeneous subsystems makes these architectures overmuch complex, and this hinders the design, development, maintenance, and evolution of the overall system. We address this issue by proposing a new model that integrates data management within a distributed stream processor. The model enables individual stream processing operators to persist data and make it visible and queryable from external components. It offers flexible mechanisms to control the consistency of data, including transactional updates plus ordering and integrity constraints. The paper contributes to the research on stream processing in various ways: we introduce a new model that has the potential to simplify complex data-intensive applications by integrating data management capabilities within a stream processing system; we define data consistency guarantees and show how they are enforced within this new model; we implement the model into the FlowDB prototype, and study its overhead with respect to a pure stream processing system using real world case studies and synthetic workloads. Finally, we further prove the benefits of the proposed model by showing that FlowDB can outperform a state-of-the-art, in-memory distributed database in data management tasks.","PeriodicalId":325666,"journal":{"name":"Proceedings of the 11th ACM International Conference on Distributed and Event-based Systems","volume":"30 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2017-06-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"21","resultStr":"{\"title\":\"FlowDB: Integrating Stream Processing and Consistent State Management\",\"authors\":\"Lorenzo Affetti, Alessandro Margara, G. Cugola\",\"doi\":\"10.1145/3093742.3093929\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Recent advances in stream processing technologies led to their adoption in many large companies, where they are becoming a core element in the data processing stack. In these settings, stream processors are often used in combination with various kinds of data management frameworks to build software architectures that combine data storage, processing, retrieval, and mining. However, the adoption of separate and heterogeneous subsystems makes these architectures overmuch complex, and this hinders the design, development, maintenance, and evolution of the overall system. We address this issue by proposing a new model that integrates data management within a distributed stream processor. The model enables individual stream processing operators to persist data and make it visible and queryable from external components. It offers flexible mechanisms to control the consistency of data, including transactional updates plus ordering and integrity constraints. The paper contributes to the research on stream processing in various ways: we introduce a new model that has the potential to simplify complex data-intensive applications by integrating data management capabilities within a stream processing system; we define data consistency guarantees and show how they are enforced within this new model; we implement the model into the FlowDB prototype, and study its overhead with respect to a pure stream processing system using real world case studies and synthetic workloads. Finally, we further prove the benefits of the proposed model by showing that FlowDB can outperform a state-of-the-art, in-memory distributed database in data management tasks.\",\"PeriodicalId\":325666,\"journal\":{\"name\":\"Proceedings of the 11th ACM International Conference on Distributed and Event-based Systems\",\"volume\":\"30 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2017-06-08\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"21\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Proceedings of the 11th ACM International Conference on Distributed and Event-based Systems\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1145/3093742.3093929\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 11th ACM International Conference on Distributed and Event-based Systems","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3093742.3093929","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
FlowDB: Integrating Stream Processing and Consistent State Management
Recent advances in stream processing technologies led to their adoption in many large companies, where they are becoming a core element in the data processing stack. In these settings, stream processors are often used in combination with various kinds of data management frameworks to build software architectures that combine data storage, processing, retrieval, and mining. However, the adoption of separate and heterogeneous subsystems makes these architectures overmuch complex, and this hinders the design, development, maintenance, and evolution of the overall system. We address this issue by proposing a new model that integrates data management within a distributed stream processor. The model enables individual stream processing operators to persist data and make it visible and queryable from external components. It offers flexible mechanisms to control the consistency of data, including transactional updates plus ordering and integrity constraints. The paper contributes to the research on stream processing in various ways: we introduce a new model that has the potential to simplify complex data-intensive applications by integrating data management capabilities within a stream processing system; we define data consistency guarantees and show how they are enforced within this new model; we implement the model into the FlowDB prototype, and study its overhead with respect to a pure stream processing system using real world case studies and synthetic workloads. Finally, we further prove the benefits of the proposed model by showing that FlowDB can outperform a state-of-the-art, in-memory distributed database in data management tasks.