Exploiting data-flow for fault-tolerance in a wide-area parallel system

A. Nguyen-Tuong, A. Grimshaw, M. Hyett
{"title":"Exploiting data-flow for fault-tolerance in a wide-area parallel system","authors":"A. Nguyen-Tuong, A. Grimshaw, M. Hyett","doi":"10.1109/RELDIS.1996.559687","DOIUrl":null,"url":null,"abstract":"Wide-area parallel processing systems will soon be available to researchers to solve a range of problems. In these systems, it is certain that host failures and other faults will be a common occurrence. Unfortunately, most parallel processing systems have not been designed with fault-tolerance in mind. Mentat is a high-performance object-oriented parallel processing system that is based on an extension of the data-flow model. The functional nature of data-flow enables both parallelism and fault-tolerance. In this paper, we exploit the data-flow underpinning of Mentat to provide easy-to-use and transparent fault-tolerance. We present results on both a small-scale network and a wide-area heterogeneous environment that consists of three sites: the National Center for Supercomputing Applications, the University of Virginia and the NASA Langley Research Center.","PeriodicalId":439132,"journal":{"name":"Proceedings 15th Symposium on Reliable Distributed Systems","volume":"59 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"1996-10-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"30","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings 15th Symposium on Reliable Distributed Systems","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/RELDIS.1996.559687","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 30

Abstract

Wide-area parallel processing systems will soon be available to researchers to solve a range of problems. In these systems, it is certain that host failures and other faults will be a common occurrence. Unfortunately, most parallel processing systems have not been designed with fault-tolerance in mind. Mentat is a high-performance object-oriented parallel processing system that is based on an extension of the data-flow model. The functional nature of data-flow enables both parallelism and fault-tolerance. In this paper, we exploit the data-flow underpinning of Mentat to provide easy-to-use and transparent fault-tolerance. We present results on both a small-scale network and a wide-area heterogeneous environment that consists of three sites: the National Center for Supercomputing Applications, the University of Virginia and the NASA Langley Research Center.
利用数据流实现广域并行系统的容错
广域并行处理系统将很快提供给研究人员解决一系列问题。在这些系统中,主机故障和其他故障肯定会经常发生。不幸的是,大多数并行处理系统在设计时都没有考虑到容错。Mentat是基于数据流模型扩展的高性能面向对象并行处理系统。数据流的功能特性支持并行性和容错性。在本文中,我们利用Mentat的数据流基础来提供易于使用和透明的容错。我们在小规模网络和广域异构环境中展示了结果,该环境由三个站点组成:国家超级计算应用中心、弗吉尼亚大学和NASA兰利研究中心。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信