Big Data Provenance: Challenges, State of the Art and Opportunities.

Jianwu Wang, Daniel Crawl, Shweta Purawat, Mai Nguyen, Ilkay Altintas
{"title":"Big Data Provenance: Challenges, State of the Art and Opportunities.","authors":"Jianwu Wang,&nbsp;Daniel Crawl,&nbsp;Shweta Purawat,&nbsp;Mai Nguyen,&nbsp;Ilkay Altintas","doi":"10.1109/BigData.2015.7364047","DOIUrl":null,"url":null,"abstract":"<p><p>Ability to track provenance is a key feature of scientific workflows to support data lineage and reproducibility. The challenges that are introduced by the volume, variety and velocity of Big Data, also pose related challenges for provenance and quality of Big Data, defined as veracity. The increasing size and variety of distributed Big Data provenance information bring new technical challenges and opportunities throughout the provenance lifecycle including recording, querying, sharing and utilization. This paper discusses the challenges and opportunities of Big Data provenance related to the veracity of the datasets themselves and the provenance of the analytical processes that analyze these datasets. It also explains our current efforts towards tracking and utilizing Big Data provenance using workflows as a programming model to analyze Big Data.</p>","PeriodicalId":74501,"journal":{"name":"Proceedings : ... IEEE International Conference on Big Data. IEEE International Conference on Big Data","volume":"2015 ","pages":"2509-2516"},"PeriodicalIF":0.0000,"publicationDate":"2015-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1109/BigData.2015.7364047","citationCount":"78","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings : ... IEEE International Conference on Big Data. IEEE International Conference on Big Data","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/BigData.2015.7364047","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2015/12/28 0:00:00","PubModel":"Epub","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 78

Abstract

Ability to track provenance is a key feature of scientific workflows to support data lineage and reproducibility. The challenges that are introduced by the volume, variety and velocity of Big Data, also pose related challenges for provenance and quality of Big Data, defined as veracity. The increasing size and variety of distributed Big Data provenance information bring new technical challenges and opportunities throughout the provenance lifecycle including recording, querying, sharing and utilization. This paper discusses the challenges and opportunities of Big Data provenance related to the veracity of the datasets themselves and the provenance of the analytical processes that analyze these datasets. It also explains our current efforts towards tracking and utilizing Big Data provenance using workflows as a programming model to analyze Big Data.

Abstract Image

Abstract Image

大数据来源:挑战、技术现状和机遇。
跟踪来源的能力是支持数据沿袭和再现性的科学工作流的关键特征。大数据的数量、种类和速度带来的挑战,也对大数据的来源和质量(定义为准确性)提出了相关挑战。分布式大数据溯源信息的规模和种类的不断增加,为溯源的记录、查询、共享和利用等整个生命周期带来了新的技术挑战和机遇。本文讨论了大数据来源的挑战和机遇,这些挑战和机遇与数据集本身的真实性以及分析这些数据集的分析过程的来源有关。它还解释了我们目前在跟踪和利用大数据来源方面所做的努力,使用工作流作为编程模型来分析大数据。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信