{"title":"构建企业数据湖,2019冠状病毒病案例研究","authors":"Bushra, M. Memon, Salahuddin Saddar","doi":"10.17706/jsw.16.4.174-181","DOIUrl":null,"url":null,"abstract":"Data is increasing at an enormous rate every day. Traditionally data has resided in silosacross any organization,so it’s difficult to have a complete picture for data driven business decision making. Data lake addresses the problem of rate of increase of data by providing “schema on read”, better integration and cheaper storage. It also solves the data silos problemby providing a central platform for a variety of data housing needs. However, implementing a data lake becomes challenging as the implementation needs to address the additional needs like metadata management, data discovery, data governance, data lifecycle management, security and centralized access controls mechanisms. This paper intends to provide a comprehensive architecture of data lake to address these challenges. We have also conducted and documented our experiments with publicly available datasets about COVID19 to validate the design and applicability of the proposed architecture for business analytics purposes.","PeriodicalId":11452,"journal":{"name":"e Informatica Softw. Eng. J.","volume":"23 1","pages":"174-181"},"PeriodicalIF":0.0000,"publicationDate":"2021-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Architecting an Enterprise Data Lake, A Covid19 Case Study\",\"authors\":\"Bushra, M. Memon, Salahuddin Saddar\",\"doi\":\"10.17706/jsw.16.4.174-181\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Data is increasing at an enormous rate every day. Traditionally data has resided in silosacross any organization,so it’s difficult to have a complete picture for data driven business decision making. Data lake addresses the problem of rate of increase of data by providing “schema on read”, better integration and cheaper storage. It also solves the data silos problemby providing a central platform for a variety of data housing needs. However, implementing a data lake becomes challenging as the implementation needs to address the additional needs like metadata management, data discovery, data governance, data lifecycle management, security and centralized access controls mechanisms. This paper intends to provide a comprehensive architecture of data lake to address these challenges. We have also conducted and documented our experiments with publicly available datasets about COVID19 to validate the design and applicability of the proposed architecture for business analytics purposes.\",\"PeriodicalId\":11452,\"journal\":{\"name\":\"e Informatica Softw. Eng. J.\",\"volume\":\"23 1\",\"pages\":\"174-181\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2021-01-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"e Informatica Softw. Eng. J.\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.17706/jsw.16.4.174-181\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"e Informatica Softw. Eng. J.","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.17706/jsw.16.4.174-181","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Architecting an Enterprise Data Lake, A Covid19 Case Study
Data is increasing at an enormous rate every day. Traditionally data has resided in silosacross any organization,so it’s difficult to have a complete picture for data driven business decision making. Data lake addresses the problem of rate of increase of data by providing “schema on read”, better integration and cheaper storage. It also solves the data silos problemby providing a central platform for a variety of data housing needs. However, implementing a data lake becomes challenging as the implementation needs to address the additional needs like metadata management, data discovery, data governance, data lifecycle management, security and centralized access controls mechanisms. This paper intends to provide a comprehensive architecture of data lake to address these challenges. We have also conducted and documented our experiments with publicly available datasets about COVID19 to validate the design and applicability of the proposed architecture for business analytics purposes.