网格环境中的应用程序检查点,通过复制提高检查点可靠性

R. K. Bawa, R. Singh
{"title":"网格环境中的应用程序检查点,通过复制提高检查点可靠性","authors":"R. K. Bawa, R. Singh","doi":"10.1109/ICCCNT.2012.6395974","DOIUrl":null,"url":null,"abstract":"Grid technologies are emerging as the next generation of distributed computing, allowing the aggregation of heterogeneous resources that are geographically distributed. The heterogeneous nature of the grid makes it more vulnerable to faults which lead to either the failure of the job or delay in completing the execution of the job. Checkpointing is one of the many fault tolerance techniques which are used to make Grid more efficient and reliable. In this paper we have developed an application checkpointing based fault tolerance technique for Alchemi based Grid environment. In this technique application threads generate their checkpoints and store them in the checkpoint table at the manager node. In case a thread fails checkpoint of the corresponding thread is used to resume the execution from the point of failure. This technique introduces a slight overhead in fault free situations but very effective in case of a node failure. Increased checkpoint frequency improves job's resuming capability but also increases the overhead of generating and storing checkpoints which results in increased processing time of the job.","PeriodicalId":364589,"journal":{"name":"2012 Third International Conference on Computing, Communication and Networking Technologies (ICCCNT'12)","volume":null,"pages":null},"PeriodicalIF":0.0000,"publicationDate":"2012-07-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"2","resultStr":"{\"title\":\"Application checkpointing in grid environment with improved checkpoint reliability through replication\",\"authors\":\"R. K. Bawa, R. Singh\",\"doi\":\"10.1109/ICCCNT.2012.6395974\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Grid technologies are emerging as the next generation of distributed computing, allowing the aggregation of heterogeneous resources that are geographically distributed. The heterogeneous nature of the grid makes it more vulnerable to faults which lead to either the failure of the job or delay in completing the execution of the job. Checkpointing is one of the many fault tolerance techniques which are used to make Grid more efficient and reliable. In this paper we have developed an application checkpointing based fault tolerance technique for Alchemi based Grid environment. In this technique application threads generate their checkpoints and store them in the checkpoint table at the manager node. In case a thread fails checkpoint of the corresponding thread is used to resume the execution from the point of failure. This technique introduces a slight overhead in fault free situations but very effective in case of a node failure. Increased checkpoint frequency improves job's resuming capability but also increases the overhead of generating and storing checkpoints which results in increased processing time of the job.\",\"PeriodicalId\":364589,\"journal\":{\"name\":\"2012 Third International Conference on Computing, Communication and Networking Technologies (ICCCNT'12)\",\"volume\":null,\"pages\":null},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2012-07-26\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"2\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2012 Third International Conference on Computing, Communication and Networking Technologies (ICCCNT'12)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/ICCCNT.2012.6395974\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2012 Third International Conference on Computing, Communication and Networking Technologies (ICCCNT'12)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICCCNT.2012.6395974","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 2

摘要

网格技术作为下一代分布式计算技术正在兴起,它允许聚合地理上分布的异构资源。网格的异构特性使其更容易受到故障的影响,从而导致作业失败或延迟完成作业的执行。检查点是众多容错技术中的一种,用于提高网格的效率和可靠性。本文针对基于Alchemi的网格环境,开发了一种基于应用程序检查点的容错技术。在这种技术中,应用程序线程生成它们的检查点,并将它们存储在管理器节点的检查点表中。如果线程失败,则使用相应线程的检查点从失败点恢复执行。这种技术在无故障情况下会带来轻微的开销,但在节点发生故障时非常有效。检查点频率的增加提高了作业的恢复能力,但也增加了生成和存储检查点的开销,从而增加了作业的处理时间。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
Application checkpointing in grid environment with improved checkpoint reliability through replication
Grid technologies are emerging as the next generation of distributed computing, allowing the aggregation of heterogeneous resources that are geographically distributed. The heterogeneous nature of the grid makes it more vulnerable to faults which lead to either the failure of the job or delay in completing the execution of the job. Checkpointing is one of the many fault tolerance techniques which are used to make Grid more efficient and reliable. In this paper we have developed an application checkpointing based fault tolerance technique for Alchemi based Grid environment. In this technique application threads generate their checkpoints and store them in the checkpoint table at the manager node. In case a thread fails checkpoint of the corresponding thread is used to resume the execution from the point of failure. This technique introduces a slight overhead in fault free situations but very effective in case of a node failure. Increased checkpoint frequency improves job's resuming capability but also increases the overhead of generating and storing checkpoints which results in increased processing time of the job.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信