A Javaspace-Based Framework for Efficient Fault-Tolerant Master-Worker Distributed Applications

V. Galtier, C. Makassikis, S. Vialle
{"title":"A Javaspace-Based Framework for Efficient Fault-Tolerant Master-Worker Distributed Applications","authors":"V. Galtier, C. Makassikis, S. Vialle","doi":"10.1109/PDP.2011.82","DOIUrl":null,"url":null,"abstract":"We propose a framework built around a Java Space to ease the development of bag-of-tasks applications. The framework may optionally and automatically tolerate transient crash failures occurring on any of the distributed elements. It relies on check pointing and underlying middleware mechanisms to do so. To further improve check pointing efficiency, both in size and frequency, the programmer can introduce intermediate user-defined checkpoint data and code within the task processing program. The framework used without fault tolerance accelerates application development, does not introduce runtime overhead and yields to expected speedup. When enabling fault tolerance, our framework allows, despite failures, correct completion of applications with limited runtime and data storage overheads. Experiments run with up to 128 workers study the impact of some user-related and implementation-related on overall performance, and reveal good performances for classical Java Space-based master-worker application profiles.","PeriodicalId":341803,"journal":{"name":"2011 19th International Euromicro Conference on Parallel, Distributed and Network-Based Processing","volume":"14 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2011-02-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"3","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2011 19th International Euromicro Conference on Parallel, Distributed and Network-Based Processing","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/PDP.2011.82","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 3

Abstract

We propose a framework built around a Java Space to ease the development of bag-of-tasks applications. The framework may optionally and automatically tolerate transient crash failures occurring on any of the distributed elements. It relies on check pointing and underlying middleware mechanisms to do so. To further improve check pointing efficiency, both in size and frequency, the programmer can introduce intermediate user-defined checkpoint data and code within the task processing program. The framework used without fault tolerance accelerates application development, does not introduce runtime overhead and yields to expected speedup. When enabling fault tolerance, our framework allows, despite failures, correct completion of applications with limited runtime and data storage overheads. Experiments run with up to 128 workers study the impact of some user-related and implementation-related on overall performance, and reveal good performances for classical Java Space-based master-worker application profiles.
基于javascript的高效容错主worker分布式应用框架
我们提出了一个围绕Java空间构建的框架,以简化任务包应用程序的开发。框架可以选择并自动容忍在任何分布式元素上发生的短暂崩溃故障。它依赖于检查指向和底层中间件机制来做到这一点。为了在大小和频率上进一步提高检查点的效率,程序员可以在任务处理程序中引入中间用户定义的检查点数据和代码。没有容错的框架可以加速应用程序开发,不会带来运行时开销,也不会产生预期的加速效果。当启用容错功能时,我们的框架允许在出现故障的情况下,在有限的运行时和数据存储开销下正确完成应用程序。使用多达128个worker运行的实验研究了一些与用户相关和与实现相关的对总体性能的影响,并揭示了经典的基于Java空间的主worker应用程序配置文件的良好性能。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信