可重构数据流应用程序的盗窃诱发检查点

2005 IEEE International Conference on Electro Information Technology Pub Date : 2005-05-22 DOI:10.1109/EIT.2005.1626998

S. Jafar, A. Krings, T. Gautier, Jean-Louis Roch

{"title":"可重构数据流应用程序的盗窃诱发检查点","authors":"S. Jafar, A. Krings, T. Gautier, Jean-Louis Roch","doi":"10.1109/EIT.2005.1626998","DOIUrl":null,"url":null,"abstract":"In this paper a new checkpoint/recovery protocol called theft-induced checkpointing is defined for dataflow computations in large heterogeneous environments. The protocol is especially useful in massively parallel multi-threaded computations as found in cluster or grid computing and utilizes the principle of work-stealing to distribute work. By basing the state of executions on a macro dataflow graph, the protocol shows extreme flexibility with respect to rollback. Specifically, it allows local rollback in dynamic heterogeneous systems, even under a different number of processors and processes. To maximize run-time efficiency, the overhead associated with checkpointing is shifted to the rollback operations whenever possible. Experimental results show the overhead induced is very small","PeriodicalId":358002,"journal":{"name":"2005 IEEE International Conference on Electro Information Technology","volume":"80 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2005-05-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"15","resultStr":"{\"title\":\"Theft-induced checkpointing for reconfigurable dataflow applications\",\"authors\":\"S. Jafar, A. Krings, T. Gautier, Jean-Louis Roch\",\"doi\":\"10.1109/EIT.2005.1626998\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"In this paper a new checkpoint/recovery protocol called theft-induced checkpointing is defined for dataflow computations in large heterogeneous environments. The protocol is especially useful in massively parallel multi-threaded computations as found in cluster or grid computing and utilizes the principle of work-stealing to distribute work. By basing the state of executions on a macro dataflow graph, the protocol shows extreme flexibility with respect to rollback. Specifically, it allows local rollback in dynamic heterogeneous systems, even under a different number of processors and processes. To maximize run-time efficiency, the overhead associated with checkpointing is shifted to the rollback operations whenever possible. Experimental results show the overhead induced is very small\",\"PeriodicalId\":358002,\"journal\":{\"name\":\"2005 IEEE International Conference on Electro Information Technology\",\"volume\":\"80 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2005-05-22\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"15\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2005 IEEE International Conference on Electro Information Technology\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/EIT.2005.1626998\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2005 IEEE International Conference on Electro Information Technology","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/EIT.2005.1626998","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 15

摘要

本文针对大型异构环境下的数据流计算，定义了一种新的检查点/恢复协议——盗窃诱发检查点。该协议在集群或网格计算中发现的大规模并行多线程计算中特别有用，并利用工作窃取原理来分配工作。通过基于宏数据流图的执行状态，该协议在回滚方面显示出极大的灵活性。具体来说，它允许在动态异构系统中进行本地回滚，即使在不同数量的处理器和进程下也是如此。为了最大化运行时效率，与检查点相关的开销尽可能地转移到回滚操作上。实验结果表明，产生的开销很小

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Theft-induced checkpointing for reconfigurable dataflow applications

In this paper a new checkpoint/recovery protocol called theft-induced checkpointing is defined for dataflow computations in large heterogeneous environments. The protocol is especially useful in massively parallel multi-threaded computations as found in cluster or grid computing and utilizes the principle of work-stealing to distribute work. By basing the state of executions on a macro dataflow graph, the protocol shows extreme flexibility with respect to rollback. Specifically, it allows local rollback in dynamic heterogeneous systems, even under a different number of processors and processes. To maximize run-time efficiency, the overhead associated with checkpointing is shifted to the rollback operations whenever possible. Experimental results show the overhead induced is very small

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

2005 IEEE International Conference on Electro Information Technology

自引率

0.00%

发文量