{"title":"数据科学工作流的数据湖策略","authors":"Grupo colaborativo","doi":"10.1109/CIMPS57786.2022.10035686","DOIUrl":null,"url":null,"abstract":"This paper details the research and technological strategy carried out to implement a Data Lake and Sandboxes of the Data Science Laboratory at the National Institute of Statistics and Geography (INEGI) Mexico, this project seeks to integrate digital information from different repositories, data sources internal and external, which exist by the various entities that generate statistical and geographic information, in various formats to combine them in a unified storage environment (temporary or permanent), which allows advanced processes to be carried out with techniques oriented towards analytics and data science.","PeriodicalId":205829,"journal":{"name":"2022 11th International Conference On Software Process Improvement (CIMPS)","volume":"46 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-10-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Data Lake Strategy for Data Science Workflows\",\"authors\":\"Grupo colaborativo\",\"doi\":\"10.1109/CIMPS57786.2022.10035686\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"This paper details the research and technological strategy carried out to implement a Data Lake and Sandboxes of the Data Science Laboratory at the National Institute of Statistics and Geography (INEGI) Mexico, this project seeks to integrate digital information from different repositories, data sources internal and external, which exist by the various entities that generate statistical and geographic information, in various formats to combine them in a unified storage environment (temporary or permanent), which allows advanced processes to be carried out with techniques oriented towards analytics and data science.\",\"PeriodicalId\":205829,\"journal\":{\"name\":\"2022 11th International Conference On Software Process Improvement (CIMPS)\",\"volume\":\"46 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2022-10-19\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2022 11th International Conference On Software Process Improvement (CIMPS)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/CIMPS57786.2022.10035686\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2022 11th International Conference On Software Process Improvement (CIMPS)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/CIMPS57786.2022.10035686","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
This paper details the research and technological strategy carried out to implement a Data Lake and Sandboxes of the Data Science Laboratory at the National Institute of Statistics and Geography (INEGI) Mexico, this project seeks to integrate digital information from different repositories, data sources internal and external, which exist by the various entities that generate statistical and geographic information, in various formats to combine them in a unified storage environment (temporary or permanent), which allows advanced processes to be carried out with techniques oriented towards analytics and data science.