{"title":"Architecting an Enterprise Data Lake, A Covid19 Case Study","authors":"Bushra, M. Memon, Salahuddin Saddar","doi":"10.17706/jsw.16.4.174-181","DOIUrl":null,"url":null,"abstract":"Data is increasing at an enormous rate every day. Traditionally data has resided in silosacross any organization,so it’s difficult to have a complete picture for data driven business decision making. Data lake addresses the problem of rate of increase of data by providing “schema on read”, better integration and cheaper storage. It also solves the data silos problemby providing a central platform for a variety of data housing needs. However, implementing a data lake becomes challenging as the implementation needs to address the additional needs like metadata management, data discovery, data governance, data lifecycle management, security and centralized access controls mechanisms. This paper intends to provide a comprehensive architecture of data lake to address these challenges. We have also conducted and documented our experiments with publicly available datasets about COVID19 to validate the design and applicability of the proposed architecture for business analytics purposes.","PeriodicalId":11452,"journal":{"name":"e Informatica Softw. Eng. J.","volume":"23 1","pages":"174-181"},"PeriodicalIF":0.0000,"publicationDate":"2021-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"e Informatica Softw. Eng. J.","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.17706/jsw.16.4.174-181","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
Data is increasing at an enormous rate every day. Traditionally data has resided in silosacross any organization,so it’s difficult to have a complete picture for data driven business decision making. Data lake addresses the problem of rate of increase of data by providing “schema on read”, better integration and cheaper storage. It also solves the data silos problemby providing a central platform for a variety of data housing needs. However, implementing a data lake becomes challenging as the implementation needs to address the additional needs like metadata management, data discovery, data governance, data lifecycle management, security and centralized access controls mechanisms. This paper intends to provide a comprehensive architecture of data lake to address these challenges. We have also conducted and documented our experiments with publicly available datasets about COVID19 to validate the design and applicability of the proposed architecture for business analytics purposes.