大规模分布式系统中的工作流监控应用

Dragos Sbirlea, Alina Simion, Florin Pop, V. Cristea
{"title":"大规模分布式系统中的工作流监控应用","authors":"Dragos Sbirlea, Alina Simion, Florin Pop, V. Cristea","doi":"10.1109/INCOS.2009.73","DOIUrl":null,"url":null,"abstract":"This paper presents the design, implementation and testing of the monitoring solution created for integration with a workflow execution platform. The monitoring solution constantly checks the system evolution in order to facilitate performance tuning and improvement. Monitoring is accomplished at application level, by monitoring each job from each workflow and at system level, by aggregating state information from each processing node. The solution also computes aggregated statistics that allow an improvement to the scheduling component of the system, with which it will interact. The improvement on the performance of distributed application is obtained using the realtime information to compute estimates of runtime which are used to improve scheduling. Another contribution is an automated error detection systems, which can improve the robustness of grid by enabling fault recovery mechanisms to be used. These aspects can benefit from the particularization of the monitoring system for a workflow-based application: the scheduling performance can be improved through better runtime estimation and the error detection can automatically detect several types of errors. The proposed monitoring solution could be used in the SEEGRID project as a part of the satellite image processing engine that is being built.","PeriodicalId":145328,"journal":{"name":"2009 International Conference on Intelligent Networking and Collaborative Systems","volume":"24 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2009-11-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Monitoring Workflow Applications in Large Scale Distributed Systems\",\"authors\":\"Dragos Sbirlea, Alina Simion, Florin Pop, V. Cristea\",\"doi\":\"10.1109/INCOS.2009.73\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"This paper presents the design, implementation and testing of the monitoring solution created for integration with a workflow execution platform. The monitoring solution constantly checks the system evolution in order to facilitate performance tuning and improvement. Monitoring is accomplished at application level, by monitoring each job from each workflow and at system level, by aggregating state information from each processing node. The solution also computes aggregated statistics that allow an improvement to the scheduling component of the system, with which it will interact. The improvement on the performance of distributed application is obtained using the realtime information to compute estimates of runtime which are used to improve scheduling. Another contribution is an automated error detection systems, which can improve the robustness of grid by enabling fault recovery mechanisms to be used. These aspects can benefit from the particularization of the monitoring system for a workflow-based application: the scheduling performance can be improved through better runtime estimation and the error detection can automatically detect several types of errors. The proposed monitoring solution could be used in the SEEGRID project as a part of the satellite image processing engine that is being built.\",\"PeriodicalId\":145328,\"journal\":{\"name\":\"2009 International Conference on Intelligent Networking and Collaborative Systems\",\"volume\":\"24 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2009-11-04\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2009 International Conference on Intelligent Networking and Collaborative Systems\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/INCOS.2009.73\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2009 International Conference on Intelligent Networking and Collaborative Systems","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/INCOS.2009.73","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

摘要

本文介绍了为与工作流执行平台集成而创建的监控解决方案的设计、实现和测试。监视解决方案不断地检查系统的发展,以促进性能调优和改进。监控在应用程序级别通过监控来自每个工作流的每个作业来完成,在系统级别通过聚合来自每个处理节点的状态信息来完成。该解决方案还计算聚合的统计数据,这些统计数据允许改进系统的调度组件,它将与之交互。利用实时信息来计算运行时估计,从而提高分布式应用程序的性能,从而改进调度。另一个贡献是自动错误检测系统,它可以通过启用故障恢复机制来提高网格的鲁棒性。这些方面可以从基于工作流的应用程序的监视系统的特殊化中受益:可以通过更好的运行时估计来改进调度性能,错误检测可以自动检测几种类型的错误。拟议的监测解决方案可用于SEEGRID项目,作为正在建造的卫星图像处理引擎的一部分。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
Monitoring Workflow Applications in Large Scale Distributed Systems
This paper presents the design, implementation and testing of the monitoring solution created for integration with a workflow execution platform. The monitoring solution constantly checks the system evolution in order to facilitate performance tuning and improvement. Monitoring is accomplished at application level, by monitoring each job from each workflow and at system level, by aggregating state information from each processing node. The solution also computes aggregated statistics that allow an improvement to the scheduling component of the system, with which it will interact. The improvement on the performance of distributed application is obtained using the realtime information to compute estimates of runtime which are used to improve scheduling. Another contribution is an automated error detection systems, which can improve the robustness of grid by enabling fault recovery mechanisms to be used. These aspects can benefit from the particularization of the monitoring system for a workflow-based application: the scheduling performance can be improved through better runtime estimation and the error detection can automatically detect several types of errors. The proposed monitoring solution could be used in the SEEGRID project as a part of the satellite image processing engine that is being built.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信