Architecting an Enterprise Data Lake, A Covid19 Case Study

Bushra, M. Memon, Salahuddin Saddar
{"title":"Architecting an Enterprise Data Lake, A Covid19 Case Study","authors":"Bushra, M. Memon, Salahuddin Saddar","doi":"10.17706/jsw.16.4.174-181","DOIUrl":null,"url":null,"abstract":"Data is increasing at an enormous rate every day. Traditionally data has resided in silosacross any organization,so it’s difficult to have a complete picture for data driven business decision making. Data lake addresses the problem of rate of increase of data by providing “schema on read”, better integration and cheaper storage. It also solves the data silos problemby providing a central platform for a variety of data housing needs. However, implementing a data lake becomes challenging as the implementation needs to address the additional needs like metadata management, data discovery, data governance, data lifecycle management, security and centralized access controls mechanisms. This paper intends to provide a comprehensive architecture of data lake to address these challenges. We have also conducted and documented our experiments with publicly available datasets about COVID19 to validate the design and applicability of the proposed architecture for business analytics purposes.","PeriodicalId":11452,"journal":{"name":"e Informatica Softw. Eng. J.","volume":"23 1","pages":"174-181"},"PeriodicalIF":0.0000,"publicationDate":"2021-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"e Informatica Softw. Eng. J.","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.17706/jsw.16.4.174-181","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

Abstract

Data is increasing at an enormous rate every day. Traditionally data has resided in silosacross any organization,so it’s difficult to have a complete picture for data driven business decision making. Data lake addresses the problem of rate of increase of data by providing “schema on read”, better integration and cheaper storage. It also solves the data silos problemby providing a central platform for a variety of data housing needs. However, implementing a data lake becomes challenging as the implementation needs to address the additional needs like metadata management, data discovery, data governance, data lifecycle management, security and centralized access controls mechanisms. This paper intends to provide a comprehensive architecture of data lake to address these challenges. We have also conducted and documented our experiments with publicly available datasets about COVID19 to validate the design and applicability of the proposed architecture for business analytics purposes.
构建企业数据湖,2019冠状病毒病案例研究
数据每天都在以惊人的速度增长。传统上,数据存在于任何组织的竖井中,因此很难对数据驱动的业务决策有一个完整的了解。数据湖通过提供“读时模式”、更好的集成和更便宜的存储来解决数据增长速度的问题。它还通过为各种数据存储需求提供一个中心平台来解决数据孤岛问题。然而,实现数据湖变得具有挑战性,因为实现需要解决元数据管理、数据发现、数据治理、数据生命周期管理、安全性和集中访问控制机制等额外需求。本文旨在提供一个全面的数据湖架构来应对这些挑战。我们还使用有关covid - 19的公开可用数据集进行并记录了我们的实验,以验证用于业务分析目的的拟议架构的设计和适用性。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信