{"title":"使用机器学习的连续和数据密集型工作流的自适应执行","authors":"Sérgio Esteves, H. Galhardas, L. Veiga","doi":"10.1145/3274808.3274827","DOIUrl":null,"url":null,"abstract":"To extract value from evergrowing volumes of data and to drive decision making, organizations frequently resort to the composition of data processing workflows. The typical workflow model enforces strict temporal synchronization across processing steps without accounting the actual effect of intermediate computations on the final workflow output. However, this is not the most desirable in a multitude of scenarios. We identify a class of applications for continuous data processing where the workflow output changes slowly and without great significance in a short time window, thus squandering compute resources with current approaches. To overcome such inefficiency, we introduce a novel workflow model, for continuous and data-intensive processing, capable of relaxing triggering semantics according to the impact that input data is assessed to have on changing the workflow output. To estimate this impact, learn the correlation between input and output variation, and guarantee correctness within a given tolerated error constant, we rely on Machine Learning. The functionality of this model is implemented in SmartFlux, a middleware framework which can be integrated with existing workflow managers. Experimental results indicate substantial savings in resource usage, while not deviating the workflow output beyond a small error constant with a high confidence level.","PeriodicalId":167957,"journal":{"name":"Proceedings of the 19th International Middleware Conference","volume":"2020 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2018-11-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"4","resultStr":"{\"title\":\"Adaptive Execution of Continuous and Data-intensive Workflows with Machine Learning\",\"authors\":\"Sérgio Esteves, H. Galhardas, L. Veiga\",\"doi\":\"10.1145/3274808.3274827\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"To extract value from evergrowing volumes of data and to drive decision making, organizations frequently resort to the composition of data processing workflows. The typical workflow model enforces strict temporal synchronization across processing steps without accounting the actual effect of intermediate computations on the final workflow output. However, this is not the most desirable in a multitude of scenarios. We identify a class of applications for continuous data processing where the workflow output changes slowly and without great significance in a short time window, thus squandering compute resources with current approaches. To overcome such inefficiency, we introduce a novel workflow model, for continuous and data-intensive processing, capable of relaxing triggering semantics according to the impact that input data is assessed to have on changing the workflow output. To estimate this impact, learn the correlation between input and output variation, and guarantee correctness within a given tolerated error constant, we rely on Machine Learning. The functionality of this model is implemented in SmartFlux, a middleware framework which can be integrated with existing workflow managers. Experimental results indicate substantial savings in resource usage, while not deviating the workflow output beyond a small error constant with a high confidence level.\",\"PeriodicalId\":167957,\"journal\":{\"name\":\"Proceedings of the 19th International Middleware Conference\",\"volume\":\"2020 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2018-11-26\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"4\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Proceedings of the 19th International Middleware Conference\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1145/3274808.3274827\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 19th International Middleware Conference","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3274808.3274827","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Adaptive Execution of Continuous and Data-intensive Workflows with Machine Learning
To extract value from evergrowing volumes of data and to drive decision making, organizations frequently resort to the composition of data processing workflows. The typical workflow model enforces strict temporal synchronization across processing steps without accounting the actual effect of intermediate computations on the final workflow output. However, this is not the most desirable in a multitude of scenarios. We identify a class of applications for continuous data processing where the workflow output changes slowly and without great significance in a short time window, thus squandering compute resources with current approaches. To overcome such inefficiency, we introduce a novel workflow model, for continuous and data-intensive processing, capable of relaxing triggering semantics according to the impact that input data is assessed to have on changing the workflow output. To estimate this impact, learn the correlation between input and output variation, and guarantee correctness within a given tolerated error constant, we rely on Machine Learning. The functionality of this model is implemented in SmartFlux, a middleware framework which can be integrated with existing workflow managers. Experimental results indicate substantial savings in resource usage, while not deviating the workflow output beyond a small error constant with a high confidence level.