使用机器学习的连续和数据密集型工作流的自适应执行

Proceedings of the 19th International Middleware Conference Pub Date : 2018-11-26 DOI:10.1145/3274808.3274827

Sérgio Esteves, H. Galhardas, L. Veiga

{"title":"使用机器学习的连续和数据密集型工作流的自适应执行","authors":"Sérgio Esteves, H. Galhardas, L. Veiga","doi":"10.1145/3274808.3274827","DOIUrl":null,"url":null,"abstract":"To extract value from evergrowing volumes of data and to drive decision making, organizations frequently resort to the composition of data processing workflows. The typical workflow model enforces strict temporal synchronization across processing steps without accounting the actual effect of intermediate computations on the final workflow output. However, this is not the most desirable in a multitude of scenarios. We identify a class of applications for continuous data processing where the workflow output changes slowly and without great significance in a short time window, thus squandering compute resources with current approaches. To overcome such inefficiency, we introduce a novel workflow model, for continuous and data-intensive processing, capable of relaxing triggering semantics according to the impact that input data is assessed to have on changing the workflow output. To estimate this impact, learn the correlation between input and output variation, and guarantee correctness within a given tolerated error constant, we rely on Machine Learning. The functionality of this model is implemented in SmartFlux, a middleware framework which can be integrated with existing workflow managers. Experimental results indicate substantial savings in resource usage, while not deviating the workflow output beyond a small error constant with a high confidence level.","PeriodicalId":167957,"journal":{"name":"Proceedings of the 19th International Middleware Conference","volume":"2020 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2018-11-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"4","resultStr":"{\"title\":\"Adaptive Execution of Continuous and Data-intensive Workflows with Machine Learning\",\"authors\":\"Sérgio Esteves, H. Galhardas, L. Veiga\",\"doi\":\"10.1145/3274808.3274827\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"To extract value from evergrowing volumes of data and to drive decision making, organizations frequently resort to the composition of data processing workflows. The typical workflow model enforces strict temporal synchronization across processing steps without accounting the actual effect of intermediate computations on the final workflow output. However, this is not the most desirable in a multitude of scenarios. We identify a class of applications for continuous data processing where the workflow output changes slowly and without great significance in a short time window, thus squandering compute resources with current approaches. To overcome such inefficiency, we introduce a novel workflow model, for continuous and data-intensive processing, capable of relaxing triggering semantics according to the impact that input data is assessed to have on changing the workflow output. To estimate this impact, learn the correlation between input and output variation, and guarantee correctness within a given tolerated error constant, we rely on Machine Learning. The functionality of this model is implemented in SmartFlux, a middleware framework which can be integrated with existing workflow managers. Experimental results indicate substantial savings in resource usage, while not deviating the workflow output beyond a small error constant with a high confidence level.\",\"PeriodicalId\":167957,\"journal\":{\"name\":\"Proceedings of the 19th International Middleware Conference\",\"volume\":\"2020 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2018-11-26\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"4\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Proceedings of the 19th International Middleware Conference\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1145/3274808.3274827\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 19th International Middleware Conference","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3274808.3274827","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 4

摘要

为了从不断增长的数据量中提取价值并推动决策制定，组织经常求助于数据处理工作流的组合。典型的工作流模型强制执行严格的跨处理步骤的时间同步，而不考虑中间计算对最终工作流输出的实际影响。然而，在许多场景中，这并不是最理想的。我们确定了一类连续数据处理的应用程序，其中工作流输出在短时间窗口内变化缓慢且没有重大意义，从而浪费了当前方法的计算资源。为了克服这种低效率，我们引入了一种新的工作流模型，用于连续和数据密集型处理，能够根据评估输入数据对更改工作流输出的影响来放松触发语义。为了估计这种影响，了解输入和输出变化之间的相关性，并保证在给定的可容忍误差常数内的正确性，我们依赖于机器学习。该模型的功能是在SmartFlux中实现的，这是一个中间件框架，可以与现有的工作流管理器集成。实验结果表明，在节省大量资源使用的同时，不会偏离工作流程输出，超出高置信度的小误差常数。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Adaptive Execution of Continuous and Data-intensive Workflows with Machine Learning

To extract value from evergrowing volumes of data and to drive decision making, organizations frequently resort to the composition of data processing workflows. The typical workflow model enforces strict temporal synchronization across processing steps without accounting the actual effect of intermediate computations on the final workflow output. However, this is not the most desirable in a multitude of scenarios. We identify a class of applications for continuous data processing where the workflow output changes slowly and without great significance in a short time window, thus squandering compute resources with current approaches. To overcome such inefficiency, we introduce a novel workflow model, for continuous and data-intensive processing, capable of relaxing triggering semantics according to the impact that input data is assessed to have on changing the workflow output. To estimate this impact, learn the correlation between input and output variation, and guarantee correctness within a given tolerated error constant, we rely on Machine Learning. The functionality of this model is implemented in SmartFlux, a middleware framework which can be integrated with existing workflow managers. Experimental results indicate substantial savings in resource usage, while not deviating the workflow output beyond a small error constant with a high confidence level.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Proceedings of the 19th International Middleware Conference

自引率

0.00%

发文量