Stream ancestor function: A mechanism for fine-grained provenance in stream processing systems

Watsawee Sansrimahachai, M. Weal, L. Moreau
{"title":"Stream ancestor function: A mechanism for fine-grained provenance in stream processing systems","authors":"Watsawee Sansrimahachai, M. Weal, L. Moreau","doi":"10.1109/RCIS.2012.6240427","DOIUrl":null,"url":null,"abstract":"Applications that require continuous processing of high-volume data streams have grown in prevalence and importance. These systems process streaming data in real-time and provide instantaneous response to support precise and ontime decisions. In such systems, it is difficult to know exactly how a particular result is generated or more particularly how to precisely trace stream events that caused a particular result. However, such information is extremely important for validating stream processing results. Therefore, it is crucial that stream processing systems have a mechanism for capturing and querying provenance information - the information pertaining to the process that produced result data - at the level of individual stream events, which we refer to as fine-grained provenance. In this paper, we propose a novel fine-grained provenance solution called Stream Ancestor Function - a reverse mapping function used to express precise dependencies between input and output stream elements. We demonstrate how to utilize stream ancestor functions by means of a stream provenance query and replay execution algorithm. Finally, we evaluate the stream ancestor function in terms of storage consumption for provenance collection and system throughput, demonstrating significant reductions in storage size and reasonable processing overheads.","PeriodicalId":130476,"journal":{"name":"2012 Sixth International Conference on Research Challenges in Information Science (RCIS)","volume":"4299 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2012-05-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"11","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2012 Sixth International Conference on Research Challenges in Information Science (RCIS)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/RCIS.2012.6240427","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 11

Abstract

Applications that require continuous processing of high-volume data streams have grown in prevalence and importance. These systems process streaming data in real-time and provide instantaneous response to support precise and ontime decisions. In such systems, it is difficult to know exactly how a particular result is generated or more particularly how to precisely trace stream events that caused a particular result. However, such information is extremely important for validating stream processing results. Therefore, it is crucial that stream processing systems have a mechanism for capturing and querying provenance information - the information pertaining to the process that produced result data - at the level of individual stream events, which we refer to as fine-grained provenance. In this paper, we propose a novel fine-grained provenance solution called Stream Ancestor Function - a reverse mapping function used to express precise dependencies between input and output stream elements. We demonstrate how to utilize stream ancestor functions by means of a stream provenance query and replay execution algorithm. Finally, we evaluate the stream ancestor function in terms of storage consumption for provenance collection and system throughput, demonstrating significant reductions in storage size and reasonable processing overheads.
流祖先函数:流处理系统中用于细粒度来源的一种机制
需要连续处理大容量数据流的应用程序越来越流行,也越来越重要。这些系统实时处理流数据,并提供即时响应,以支持精确和及时的决策。在这样的系统中,很难确切地知道一个特定的结果是如何产生的,或者更具体地说,如何精确地跟踪导致一个特定结果的流事件。然而,这些信息对于验证流处理结果是极其重要的。因此,至关重要的是,流处理系统具有在单个流事件级别捕获和查询来源信息(与产生结果数据的过程相关的信息)的机制,我们将其称为细粒度来源。在本文中,我们提出了一种新的细粒度来源解决方案,称为流祖先函数-一种用于表达输入和输出流元素之间精确依赖关系的反向映射函数。我们演示了如何通过流来源查询和重放执行算法来利用流祖先函数。最后,我们根据来源收集和系统吞吐量的存储消耗来评估流祖先函数,证明了存储大小和合理的处理开销的显着减少。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信