在网格系统上使用非专用存储库存储检查点数据的策略

Middleware for Grid Computing Pub Date : 2005-11-28 DOI:10.1145/1101499.1101500

R. Camargo, Renato Cerqueira, Fabio Kon

{"title":"在网格系统上使用非专用存储库存储检查点数据的策略","authors":"R. Camargo, Renato Cerqueira, Fabio Kon","doi":"10.1145/1101499.1101500","DOIUrl":null,"url":null,"abstract":"Dealing with the large amounts of data generated by long-running parallel applications is one of the most challenging aspects of Grid Computing. Periodic checkpoints might be taken to guarantee application progression, producing even more data. The classical approach is to employ high-throughput checkpoint servers connected to the computational nodes by high speed networks. In the case of Opportunistic Grid Computing, we do not want to be forced to rely on such dedicated hardware. Instead, we want to use the shared Grid nodes to store application data in a distributed fashion.In this work, we evaluate several strategies to store checkpoints on distributed non-dedicated repositories. We consider the tradeoff among computational overhead, storage overhead, and degree of fault-tolerance of these strategies. We compare the use of replication, parity information, and information dispersal (IDA). We used InteGrade, an object-oriented Grid middleware, to implement the storage strategies and perform evaluation experiments.","PeriodicalId":313448,"journal":{"name":"Middleware for Grid Computing","volume":null,"pages":null},"PeriodicalIF":0.0000,"publicationDate":"2005-11-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"20","resultStr":"{\"title\":\"Strategies for storage of checkpointing data using non-dedicated repositories on Grid systems\",\"authors\":\"R. Camargo, Renato Cerqueira, Fabio Kon\",\"doi\":\"10.1145/1101499.1101500\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Dealing with the large amounts of data generated by long-running parallel applications is one of the most challenging aspects of Grid Computing. Periodic checkpoints might be taken to guarantee application progression, producing even more data. The classical approach is to employ high-throughput checkpoint servers connected to the computational nodes by high speed networks. In the case of Opportunistic Grid Computing, we do not want to be forced to rely on such dedicated hardware. Instead, we want to use the shared Grid nodes to store application data in a distributed fashion.In this work, we evaluate several strategies to store checkpoints on distributed non-dedicated repositories. We consider the tradeoff among computational overhead, storage overhead, and degree of fault-tolerance of these strategies. We compare the use of replication, parity information, and information dispersal (IDA). We used InteGrade, an object-oriented Grid middleware, to implement the storage strategies and perform evaluation experiments.\",\"PeriodicalId\":313448,\"journal\":{\"name\":\"Middleware for Grid Computing\",\"volume\":null,\"pages\":null},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2005-11-28\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"20\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Middleware for Grid Computing\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1145/1101499.1101500\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Middleware for Grid Computing","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/1101499.1101500","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 20

摘要

处理由长时间运行的并行应用程序生成的大量数据是网格计算最具挑战性的方面之一。可以采用定期检查点来保证应用程序的进展，从而产生更多的数据。经典的方法是采用高速网络连接到计算节点的高吞吐量检查点服务器。在机会网格计算的情况下，我们不希望被迫依赖这种专用硬件。相反，我们希望使用共享网格节点以分布式方式存储应用程序数据。在这项工作中，我们评估了几种在分布式非专用存储库上存储检查点的策略。我们考虑了这些策略的计算开销、存储开销和容错程度之间的权衡。我们比较了复制、奇偶校验信息和信息分散(IDA)的使用。我们使用面向对象的网格中间件InteGrade来实现存储策略并进行评估实验。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Strategies for storage of checkpointing data using non-dedicated repositories on Grid systems

Dealing with the large amounts of data generated by long-running parallel applications is one of the most challenging aspects of Grid Computing. Periodic checkpoints might be taken to guarantee application progression, producing even more data. The classical approach is to employ high-throughput checkpoint servers connected to the computational nodes by high speed networks. In the case of Opportunistic Grid Computing, we do not want to be forced to rely on such dedicated hardware. Instead, we want to use the shared Grid nodes to store application data in a distributed fashion.In this work, we evaluate several strategies to store checkpoints on distributed non-dedicated repositories. We consider the tradeoff among computational overhead, storage overhead, and degree of fault-tolerance of these strategies. We compare the use of replication, parity information, and information dispersal (IDA). We used InteGrade, an object-oriented Grid middleware, to implement the storage strategies and perform evaluation experiments.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Middleware for Grid Computing

自引率

0.00%

发文量