GeneaLog: Fine-Grained Data Streaming Provenance at the Edge

Dimitris Palyvos-Giannas, Vincenzo Gulisano, M. Papatriantafilou
{"title":"GeneaLog: Fine-Grained Data Streaming Provenance at the Edge","authors":"Dimitris Palyvos-Giannas, Vincenzo Gulisano, M. Papatriantafilou","doi":"10.1145/3274808.3274826","DOIUrl":null,"url":null,"abstract":"Fine-grained data provenance in data streaming allows linking each result tuple back to the source data that contributed to it, something beneficial for many applications (e.g., to find the conditions triggering a security- or safety-related alert). Further, when data transmission or storage has to be minimized, as in edge computing and cyber-physical systems, it can help in identifying the source data to be prioritized. The memory and processing costs of fine-grained data provenance, possibly afforded by high-end servers, can be prohibitive for the resource-constrained devices deployed in edge computing and cyber-physical systems. Motivated by this challenge, we present GeneaLog, a novel fine-grained data provenance technique for data streaming applications. Leveraging the logical dependencies of the data, GeneaLog takes advantage of cross-layer properties of the software stack and incurs a minimal, constant size per-tuple overhead. Furthermore, it allows for a modular and efficient algorithmic implementation using only standard data streaming operators. This is particularly useful for distributed streaming applications since the provenance processing can be executed at separate nodes, orthogonal to the data processing. We evaluate an implementation of GeneaLog using vehicular and smart grid applications, confirming it efficiently captures fine-grained provenance data with minimal overhead.","PeriodicalId":167957,"journal":{"name":"Proceedings of the 19th International Middleware Conference","volume":"52 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2018-11-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"19","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 19th International Middleware Conference","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3274808.3274826","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 19

Abstract

Fine-grained data provenance in data streaming allows linking each result tuple back to the source data that contributed to it, something beneficial for many applications (e.g., to find the conditions triggering a security- or safety-related alert). Further, when data transmission or storage has to be minimized, as in edge computing and cyber-physical systems, it can help in identifying the source data to be prioritized. The memory and processing costs of fine-grained data provenance, possibly afforded by high-end servers, can be prohibitive for the resource-constrained devices deployed in edge computing and cyber-physical systems. Motivated by this challenge, we present GeneaLog, a novel fine-grained data provenance technique for data streaming applications. Leveraging the logical dependencies of the data, GeneaLog takes advantage of cross-layer properties of the software stack and incurs a minimal, constant size per-tuple overhead. Furthermore, it allows for a modular and efficient algorithmic implementation using only standard data streaming operators. This is particularly useful for distributed streaming applications since the provenance processing can be executed at separate nodes, orthogonal to the data processing. We evaluate an implementation of GeneaLog using vehicular and smart grid applications, confirming it efficiently captures fine-grained provenance data with minimal overhead.
GeneaLog:边缘的细粒度数据流来源
数据流中的细粒度数据来源允许将每个结果元组链接回对其做出贡献的源数据,这对许多应用程序都是有益的(例如,查找触发安全或安全相关警报的条件)。此外,当数据传输或存储必须最小化时,如在边缘计算和网络物理系统中,它可以帮助确定要优先考虑的源数据。细粒度数据来源的内存和处理成本(可能由高端服务器提供)对于部署在边缘计算和网络物理系统中的资源受限设备来说可能是令人望而却步的。在这一挑战的激励下,我们提出了GeneaLog,这是一种用于数据流应用程序的新颖的细粒度数据来源技术。利用数据的逻辑依赖性,GeneaLog利用了软件堆栈的跨层属性,并产生了最小的、恒定大小的每个元组开销。此外,它允许仅使用标准数据流操作符的模块化和高效算法实现。这对于分布式流应用程序特别有用,因为来源处理可以在与数据处理正交的独立节点上执行。我们使用车载和智能电网应用程序评估了GeneaLog的实现,确认它能以最小的开销有效地捕获细粒度的来源数据。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信