GeneaLog: Fine-Grained Data Streaming Provenance at the Edge

Proceedings of the 19th International Middleware Conference Pub Date : 2018-11-26 DOI:10.1145/3274808.3274826

Dimitris Palyvos-Giannas, Vincenzo Gulisano, M. Papatriantafilou

{"title":"GeneaLog: Fine-Grained Data Streaming Provenance at the Edge","authors":"Dimitris Palyvos-Giannas, Vincenzo Gulisano, M. Papatriantafilou","doi":"10.1145/3274808.3274826","DOIUrl":null,"url":null,"abstract":"Fine-grained data provenance in data streaming allows linking each result tuple back to the source data that contributed to it, something beneficial for many applications (e.g., to find the conditions triggering a security- or safety-related alert). Further, when data transmission or storage has to be minimized, as in edge computing and cyber-physical systems, it can help in identifying the source data to be prioritized. The memory and processing costs of fine-grained data provenance, possibly afforded by high-end servers, can be prohibitive for the resource-constrained devices deployed in edge computing and cyber-physical systems. Motivated by this challenge, we present GeneaLog, a novel fine-grained data provenance technique for data streaming applications. Leveraging the logical dependencies of the data, GeneaLog takes advantage of cross-layer properties of the software stack and incurs a minimal, constant size per-tuple overhead. Furthermore, it allows for a modular and efficient algorithmic implementation using only standard data streaming operators. This is particularly useful for distributed streaming applications since the provenance processing can be executed at separate nodes, orthogonal to the data processing. We evaluate an implementation of GeneaLog using vehicular and smart grid applications, confirming it efficiently captures fine-grained provenance data with minimal overhead.","PeriodicalId":167957,"journal":{"name":"Proceedings of the 19th International Middleware Conference","volume":"52 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2018-11-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"19","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 19th International Middleware Conference","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3274808.3274826","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 19

Abstract

Fine-grained data provenance in data streaming allows linking each result tuple back to the source data that contributed to it, something beneficial for many applications (e.g., to find the conditions triggering a security- or safety-related alert). Further, when data transmission or storage has to be minimized, as in edge computing and cyber-physical systems, it can help in identifying the source data to be prioritized. The memory and processing costs of fine-grained data provenance, possibly afforded by high-end servers, can be prohibitive for the resource-constrained devices deployed in edge computing and cyber-physical systems. Motivated by this challenge, we present GeneaLog, a novel fine-grained data provenance technique for data streaming applications. Leveraging the logical dependencies of the data, GeneaLog takes advantage of cross-layer properties of the software stack and incurs a minimal, constant size per-tuple overhead. Furthermore, it allows for a modular and efficient algorithmic implementation using only standard data streaming operators. This is particularly useful for distributed streaming applications since the provenance processing can be executed at separate nodes, orthogonal to the data processing. We evaluate an implementation of GeneaLog using vehicular and smart grid applications, confirming it efficiently captures fine-grained provenance data with minimal overhead.

查看原文本刊更多论文

GeneaLog:边缘的细粒度数据流来源

数据流中的细粒度数据来源允许将每个结果元组链接回对其做出贡献的源数据，这对许多应用程序都是有益的(例如，查找触发安全或安全相关警报的条件)。此外，当数据传输或存储必须最小化时，如在边缘计算和网络物理系统中，它可以帮助确定要优先考虑的源数据。细粒度数据来源的内存和处理成本(可能由高端服务器提供)对于部署在边缘计算和网络物理系统中的资源受限设备来说可能是令人望而却步的。在这一挑战的激励下，我们提出了GeneaLog，这是一种用于数据流应用程序的新颖的细粒度数据来源技术。利用数据的逻辑依赖性，GeneaLog利用了软件堆栈的跨层属性，并产生了最小的、恒定大小的每个元组开销。此外，它允许仅使用标准数据流操作符的模块化和高效算法实现。这对于分布式流应用程序特别有用，因为来源处理可以在与数据处理正交的独立节点上执行。我们使用车载和智能电网应用程序评估了GeneaLog的实现，确认它能以最小的开销有效地捕获细粒度的来源数据。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Proceedings of the 19th International Middleware Conference

自引率

0.00%

发文量