基于检查点布置和核心分配的可靠异构系统设计

E. Sha, Hailiang Dong, Weiwen Jiang, Qingfeng Zhuge, Xianzhang Chen, Lei Yang
{"title":"基于检查点布置和核心分配的可靠异构系统设计","authors":"E. Sha, Hailiang Dong, Weiwen Jiang, Qingfeng Zhuge, Xianzhang Chen, Lei Yang","doi":"10.1145/3194554.3194642","DOIUrl":null,"url":null,"abstract":"This paper studies two basic problems in the design of high-performance and high-reliability heterogeneous systems: (1) what type of core to execute each task, and (2) where to place checkpoints in the execution of tasks. The implementation of checkpointing techniques on the novel persistent memory (e.g., 3D Xpoint memory) based heterogeneous systems faces a bundle of new problems. First, the assignments of tasks may greatly influence the execution time of the whole application. Therefore, with the same time constraint, the reliability of the resultant system can be significantly affected. Second, creating checkpoints will incur heavy writes on persistent memories and reduce the lifetime of devices. In this paper, we optimally construct reliable systems by assigning tasks to the most suitable cores and placing minimum number of checkpoints in the application, such that the resultant system can satisfy the time constraint in the presence of faults. We devise an efficient dynamic programming algorithm to obtain the optimal assignment and checkpoint placement. Experimental results demonstrate that, compared with existing approaches, our technique can achieve 44% reductions on the number of checkpoints on average.","PeriodicalId":215940,"journal":{"name":"Proceedings of the 2018 on Great Lakes Symposium on VLSI","volume":"78 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2018-05-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"3","resultStr":"{\"title\":\"On the Design of Reliable Heterogeneous Systems via Checkpoint Placement and Core Assignment\",\"authors\":\"E. Sha, Hailiang Dong, Weiwen Jiang, Qingfeng Zhuge, Xianzhang Chen, Lei Yang\",\"doi\":\"10.1145/3194554.3194642\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"This paper studies two basic problems in the design of high-performance and high-reliability heterogeneous systems: (1) what type of core to execute each task, and (2) where to place checkpoints in the execution of tasks. The implementation of checkpointing techniques on the novel persistent memory (e.g., 3D Xpoint memory) based heterogeneous systems faces a bundle of new problems. First, the assignments of tasks may greatly influence the execution time of the whole application. Therefore, with the same time constraint, the reliability of the resultant system can be significantly affected. Second, creating checkpoints will incur heavy writes on persistent memories and reduce the lifetime of devices. In this paper, we optimally construct reliable systems by assigning tasks to the most suitable cores and placing minimum number of checkpoints in the application, such that the resultant system can satisfy the time constraint in the presence of faults. We devise an efficient dynamic programming algorithm to obtain the optimal assignment and checkpoint placement. Experimental results demonstrate that, compared with existing approaches, our technique can achieve 44% reductions on the number of checkpoints on average.\",\"PeriodicalId\":215940,\"journal\":{\"name\":\"Proceedings of the 2018 on Great Lakes Symposium on VLSI\",\"volume\":\"78 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2018-05-30\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"3\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Proceedings of the 2018 on Great Lakes Symposium on VLSI\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1145/3194554.3194642\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 2018 on Great Lakes Symposium on VLSI","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3194554.3194642","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 3

摘要

本文研究了高性能、高可靠性异构系统设计中的两个基本问题:(1)每个任务由哪种类型的核心执行;(2)任务执行中的检查点放置在哪里。在基于异构系统的新型持久内存(如3D Xpoint内存)上实现检查点技术面临着一系列新问题。首先,任务的分配可能会极大地影响整个应用程序的执行时间。因此,在相同的时间约束下,结果系统的可靠性会受到显著影响。其次,创建检查点将导致对持久内存的大量写操作,并缩短设备的生命周期。在本文中,我们通过将任务分配给最合适的核心并在应用程序中放置最少数量的检查点来优化构建可靠的系统,从而使系统在存在故障的情况下能够满足时间约束。我们设计了一种有效的动态规划算法来获得最优的分配和检查点位置。实验结果表明,与现有方法相比,我们的技术平均可以将检查点数量减少44%。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
On the Design of Reliable Heterogeneous Systems via Checkpoint Placement and Core Assignment
This paper studies two basic problems in the design of high-performance and high-reliability heterogeneous systems: (1) what type of core to execute each task, and (2) where to place checkpoints in the execution of tasks. The implementation of checkpointing techniques on the novel persistent memory (e.g., 3D Xpoint memory) based heterogeneous systems faces a bundle of new problems. First, the assignments of tasks may greatly influence the execution time of the whole application. Therefore, with the same time constraint, the reliability of the resultant system can be significantly affected. Second, creating checkpoints will incur heavy writes on persistent memories and reduce the lifetime of devices. In this paper, we optimally construct reliable systems by assigning tasks to the most suitable cores and placing minimum number of checkpoints in the application, such that the resultant system can satisfy the time constraint in the presence of faults. We devise an efficient dynamic programming algorithm to obtain the optimal assignment and checkpoint placement. Experimental results demonstrate that, compared with existing approaches, our technique can achieve 44% reductions on the number of checkpoints on average.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信