{"title":"在网格系统上使用非专用存储库存储检查点数据的策略","authors":"R. Camargo, Renato Cerqueira, Fabio Kon","doi":"10.1145/1101499.1101500","DOIUrl":null,"url":null,"abstract":"Dealing with the large amounts of data generated by long-running parallel applications is one of the most challenging aspects of Grid Computing. Periodic checkpoints might be taken to guarantee application progression, producing even more data. The classical approach is to employ high-throughput checkpoint servers connected to the computational nodes by high speed networks. In the case of Opportunistic Grid Computing, we do not want to be forced to rely on such dedicated hardware. Instead, we want to use the shared Grid nodes to store application data in a distributed fashion.In this work, we evaluate several strategies to store checkpoints on distributed non-dedicated repositories. We consider the tradeoff among computational overhead, storage overhead, and degree of fault-tolerance of these strategies. We compare the use of replication, parity information, and information dispersal (IDA). We used InteGrade, an object-oriented Grid middleware, to implement the storage strategies and perform evaluation experiments.","PeriodicalId":313448,"journal":{"name":"Middleware for Grid Computing","volume":null,"pages":null},"PeriodicalIF":0.0000,"publicationDate":"2005-11-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"20","resultStr":"{\"title\":\"Strategies for storage of checkpointing data using non-dedicated repositories on Grid systems\",\"authors\":\"R. Camargo, Renato Cerqueira, Fabio Kon\",\"doi\":\"10.1145/1101499.1101500\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Dealing with the large amounts of data generated by long-running parallel applications is one of the most challenging aspects of Grid Computing. Periodic checkpoints might be taken to guarantee application progression, producing even more data. The classical approach is to employ high-throughput checkpoint servers connected to the computational nodes by high speed networks. In the case of Opportunistic Grid Computing, we do not want to be forced to rely on such dedicated hardware. Instead, we want to use the shared Grid nodes to store application data in a distributed fashion.In this work, we evaluate several strategies to store checkpoints on distributed non-dedicated repositories. We consider the tradeoff among computational overhead, storage overhead, and degree of fault-tolerance of these strategies. We compare the use of replication, parity information, and information dispersal (IDA). We used InteGrade, an object-oriented Grid middleware, to implement the storage strategies and perform evaluation experiments.\",\"PeriodicalId\":313448,\"journal\":{\"name\":\"Middleware for Grid Computing\",\"volume\":null,\"pages\":null},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2005-11-28\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"20\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Middleware for Grid Computing\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1145/1101499.1101500\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Middleware for Grid Computing","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/1101499.1101500","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Strategies for storage of checkpointing data using non-dedicated repositories on Grid systems
Dealing with the large amounts of data generated by long-running parallel applications is one of the most challenging aspects of Grid Computing. Periodic checkpoints might be taken to guarantee application progression, producing even more data. The classical approach is to employ high-throughput checkpoint servers connected to the computational nodes by high speed networks. In the case of Opportunistic Grid Computing, we do not want to be forced to rely on such dedicated hardware. Instead, we want to use the shared Grid nodes to store application data in a distributed fashion.In this work, we evaluate several strategies to store checkpoints on distributed non-dedicated repositories. We consider the tradeoff among computational overhead, storage overhead, and degree of fault-tolerance of these strategies. We compare the use of replication, parity information, and information dispersal (IDA). We used InteGrade, an object-oriented Grid middleware, to implement the storage strategies and perform evaluation experiments.