暗数据:我们正在解决正确的问题吗?

Michael J. Cafarella, I. Ilyas, Marcel Kornacker, Tim Kraska, C. Ré
{"title":"暗数据:我们正在解决正确的问题吗?","authors":"Michael J. Cafarella, I. Ilyas, Marcel Kornacker, Tim Kraska, C. Ré","doi":"10.1109/ICDE.2016.7498366","DOIUrl":null,"url":null,"abstract":"With the increasing urge of the enterprises to ingest as much data as they can in what's commonly referred to as “Data Lakes”, the new environment presents serious challenges to traditional ETL models and to building analytic layers on top of well-understood global schema. With the recent development of multiple technologies to support this “load-first” paradigm, even traditional enterprises have fairly large HDFS-based data lakes now. They have even had them long enough that their first generation IT projects delivered on some, but not all, of the promise of integrating their enterprise's data assets. In short, we moved from no data to Dark data. Dark data is what enterprises might have in their possession, without the ability to access it or with limited awareness of what this data represents. In particular, business-critical information might still remain out of reach. This panel is about Dark Data and whether we have been focusing on the right data management challenges in dealing with it.","PeriodicalId":6883,"journal":{"name":"2016 IEEE 32nd International Conference on Data Engineering (ICDE)","volume":"101 1","pages":"1444-1445"},"PeriodicalIF":0.0000,"publicationDate":"2016-05-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"11","resultStr":"{\"title\":\"Dark Data: Are we solving the right problems?\",\"authors\":\"Michael J. Cafarella, I. Ilyas, Marcel Kornacker, Tim Kraska, C. Ré\",\"doi\":\"10.1109/ICDE.2016.7498366\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"With the increasing urge of the enterprises to ingest as much data as they can in what's commonly referred to as “Data Lakes”, the new environment presents serious challenges to traditional ETL models and to building analytic layers on top of well-understood global schema. With the recent development of multiple technologies to support this “load-first” paradigm, even traditional enterprises have fairly large HDFS-based data lakes now. They have even had them long enough that their first generation IT projects delivered on some, but not all, of the promise of integrating their enterprise's data assets. In short, we moved from no data to Dark data. Dark data is what enterprises might have in their possession, without the ability to access it or with limited awareness of what this data represents. In particular, business-critical information might still remain out of reach. This panel is about Dark Data and whether we have been focusing on the right data management challenges in dealing with it.\",\"PeriodicalId\":6883,\"journal\":{\"name\":\"2016 IEEE 32nd International Conference on Data Engineering (ICDE)\",\"volume\":\"101 1\",\"pages\":\"1444-1445\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2016-05-16\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"11\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2016 IEEE 32nd International Conference on Data Engineering (ICDE)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/ICDE.2016.7498366\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2016 IEEE 32nd International Conference on Data Engineering (ICDE)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICDE.2016.7498366","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 11

摘要

随着企业越来越迫切地需要在通常被称为“数据湖”的环境中摄取尽可能多的数据,新环境对传统的ETL模型和在易于理解的全局模式之上构建分析层提出了严峻的挑战。随着最近支持这种“负载优先”范式的多种技术的发展,即使是传统企业现在也拥有相当大的基于hdfs的数据湖。他们甚至已经拥有了足够长的时间,以至于他们的第一代IT项目交付了一些(但不是全部)集成企业数据资产的承诺。简而言之,我们从没有数据变成了暗数据。暗数据是企业可能拥有的数据,但没有能力访问它,或者对这些数据的含义知之甚少。特别是,关键业务信息可能仍然遥不可及。这个小组讨论的是暗数据,以及我们在处理暗数据时是否关注了正确的数据管理挑战。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
Dark Data: Are we solving the right problems?
With the increasing urge of the enterprises to ingest as much data as they can in what's commonly referred to as “Data Lakes”, the new environment presents serious challenges to traditional ETL models and to building analytic layers on top of well-understood global schema. With the recent development of multiple technologies to support this “load-first” paradigm, even traditional enterprises have fairly large HDFS-based data lakes now. They have even had them long enough that their first generation IT projects delivered on some, but not all, of the promise of integrating their enterprise's data assets. In short, we moved from no data to Dark data. Dark data is what enterprises might have in their possession, without the ability to access it or with limited awareness of what this data represents. In particular, business-critical information might still remain out of reach. This panel is about Dark Data and whether we have been focusing on the right data management challenges in dealing with it.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信