A user-transparent recoverable file system for distributed computing environment

H. Kim, H. Yeom
{"title":"A user-transparent recoverable file system for distributed computing environment","authors":"H. Kim, H. Yeom","doi":"10.1109/CLADE.2005.1520898","DOIUrl":null,"url":null,"abstract":"In a distributed computing environment, particularly grid, fault-tolerance is one of the core functionalities the system should provide. MPICH-GF is such a resilient system designed to resist external or internal failures, especially for message passing applications in the grid environment. But it does not stand the loss of a valuable resource: files. In a normal case, users open files and write data into them in an asynchronous manner, and checkpointing is initiated with no regard to the state of the context of the process. Therefore, the checkpointing system should automatically recognize the running process and protect the open files transparently. We have implemented a recoverable file system, named ReFS, which is incorporated into our fault-tolerant system MPICH-GF. ReFS is a versioning-like file system. ReFS provides middleware libraries with the system call interface to protect specific files at a given time. This prevents applications from processing their jobs with corrupted data and resulting in incorrect results in case of failures. We have focused not only on the reliability of the system but also on the reduction of inevitable overheads. This paper describes the design and implementation of ReFS and justifies the validity of the behavior of ReFS. We have developed ReFS on Linux, based on Ext2.","PeriodicalId":330715,"journal":{"name":"CLADE 2005. Proceedings Challenges of Large Applications in Distributed Environments, 2005.","volume":"1 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2005-07-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"9","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"CLADE 2005. Proceedings Challenges of Large Applications in Distributed Environments, 2005.","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/CLADE.2005.1520898","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 9

Abstract

In a distributed computing environment, particularly grid, fault-tolerance is one of the core functionalities the system should provide. MPICH-GF is such a resilient system designed to resist external or internal failures, especially for message passing applications in the grid environment. But it does not stand the loss of a valuable resource: files. In a normal case, users open files and write data into them in an asynchronous manner, and checkpointing is initiated with no regard to the state of the context of the process. Therefore, the checkpointing system should automatically recognize the running process and protect the open files transparently. We have implemented a recoverable file system, named ReFS, which is incorporated into our fault-tolerant system MPICH-GF. ReFS is a versioning-like file system. ReFS provides middleware libraries with the system call interface to protect specific files at a given time. This prevents applications from processing their jobs with corrupted data and resulting in incorrect results in case of failures. We have focused not only on the reliability of the system but also on the reduction of inevitable overheads. This paper describes the design and implementation of ReFS and justifies the validity of the behavior of ReFS. We have developed ReFS on Linux, based on Ext2.
面向分布式计算环境的用户透明可恢复文件系统
在分布式计算环境中,特别是网格环境中,容错是系统应该提供的核心功能之一。MPICH-GF就是这样一个弹性系统,设计用于抵抗外部或内部故障,特别是网格环境中的消息传递应用程序。但它不能忍受失去宝贵的资源:文件。在正常情况下,用户以异步方式打开文件并向其中写入数据,并且启动检查点,而不考虑进程上下文的状态。因此,检查点系统应该自动识别正在运行的进程,并透明地保护打开的文件。我们已经实现了一个可恢复的文件系统,名为ReFS,它被合并到我们的容错系统MPICH-GF中。ReFS是一个类似版本的文件系统。ReFS为中间件库提供了系统调用接口,以在给定时间保护特定的文件。这可以防止应用程序处理带有损坏数据的作业,并在出现故障时导致不正确的结果。我们不仅注重系统的可靠性,而且注重减少不可避免的间接费用。本文描述了ReFS的设计和实现,并证明了ReFS行为的有效性。我们在Linux上基于Ext2开发了ReFS。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信