{"title":"变化存在下大数据分析过程的高效再计算:计算框架、参考架构和应用","authors":"P. Missier, J. Cala","doi":"10.1109/BigDataCongress.2019.00017","DOIUrl":null,"url":null,"abstract":"Insights generated from Big Data through analytics processes are often unstable over time and thus lose their value, as the analysis typically depends on elements that change and evolve dynamically. However, the cost of having to periodically \"redo\" computationally expensive data analytics is not normally taken into account when assessing the benefits of the outcomes. The ReComp project addresses the problem of efficiently re-computing, all or in part, outcomes from complex analytical processes in response to some of the changes that occur to process dependencies. While such dependencies may include application and system libraries, as well as the deployment environment, ReComp is focused exclusively on changes to reference datasets as well as to the original inputs. Our hypothesis is that an efficient re-computation strategy requires the ability to (i) observe and quantify data changes, (ii) estimate the impact of those changes on a population of prior outcomes, (iii) identify the minimal process fragments that can restore the currency of the impacted outcomes, and (iv) selectively drive their refresh. In this paper we present a generic framework that addresses these requirements, and show how it can be customised to operate on two case studies of very diverse domains, namely genomics and geosciences. We discuss lessons learnt and outline the next steps towards the ReComp vision.","PeriodicalId":335850,"journal":{"name":"2019 IEEE International Congress on Big Data (BigDataCongress)","volume":"8 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2019-07-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":"{\"title\":\"Efficient Re-Computation of Big Data Analytics Processes in the Presence of Changes: Computational Framework, Reference Architecture, and Applications\",\"authors\":\"P. Missier, J. Cala\",\"doi\":\"10.1109/BigDataCongress.2019.00017\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Insights generated from Big Data through analytics processes are often unstable over time and thus lose their value, as the analysis typically depends on elements that change and evolve dynamically. However, the cost of having to periodically \\\"redo\\\" computationally expensive data analytics is not normally taken into account when assessing the benefits of the outcomes. The ReComp project addresses the problem of efficiently re-computing, all or in part, outcomes from complex analytical processes in response to some of the changes that occur to process dependencies. While such dependencies may include application and system libraries, as well as the deployment environment, ReComp is focused exclusively on changes to reference datasets as well as to the original inputs. Our hypothesis is that an efficient re-computation strategy requires the ability to (i) observe and quantify data changes, (ii) estimate the impact of those changes on a population of prior outcomes, (iii) identify the minimal process fragments that can restore the currency of the impacted outcomes, and (iv) selectively drive their refresh. In this paper we present a generic framework that addresses these requirements, and show how it can be customised to operate on two case studies of very diverse domains, namely genomics and geosciences. We discuss lessons learnt and outline the next steps towards the ReComp vision.\",\"PeriodicalId\":335850,\"journal\":{\"name\":\"2019 IEEE International Congress on Big Data (BigDataCongress)\",\"volume\":\"8 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2019-07-08\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"1\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2019 IEEE International Congress on Big Data (BigDataCongress)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/BigDataCongress.2019.00017\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2019 IEEE International Congress on Big Data (BigDataCongress)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/BigDataCongress.2019.00017","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Efficient Re-Computation of Big Data Analytics Processes in the Presence of Changes: Computational Framework, Reference Architecture, and Applications
Insights generated from Big Data through analytics processes are often unstable over time and thus lose their value, as the analysis typically depends on elements that change and evolve dynamically. However, the cost of having to periodically "redo" computationally expensive data analytics is not normally taken into account when assessing the benefits of the outcomes. The ReComp project addresses the problem of efficiently re-computing, all or in part, outcomes from complex analytical processes in response to some of the changes that occur to process dependencies. While such dependencies may include application and system libraries, as well as the deployment environment, ReComp is focused exclusively on changes to reference datasets as well as to the original inputs. Our hypothesis is that an efficient re-computation strategy requires the ability to (i) observe and quantify data changes, (ii) estimate the impact of those changes on a population of prior outcomes, (iii) identify the minimal process fragments that can restore the currency of the impacted outcomes, and (iv) selectively drive their refresh. In this paper we present a generic framework that addresses these requirements, and show how it can be customised to operate on two case studies of very diverse domains, namely genomics and geosciences. We discuss lessons learnt and outline the next steps towards the ReComp vision.