基于时间记忆的高效gpgpu时序错误恢复

Abbas Rahimi, L. Benini, Rajesh K. Gupta
{"title":"基于时间记忆的高效gpgpu时序错误恢复","authors":"Abbas Rahimi, L. Benini, Rajesh K. Gupta","doi":"10.7873/DATE.2014.113","DOIUrl":null,"url":null,"abstract":"Manufacturing and environmental variability lead to timing errors in computing systems that are typically corrected by error detection and correction mechanisms at the circuit level. The cost and speed of recovery can be improved by memoization-based optimization methods that exploit spatial or temporal parallelisms in suitable computing fabrics such as general-purpose graphics processing units (GPGPUs). We propose here a temporal memoization technique for use in floating-point units (FPUs) in GPGPUs that uses value locality inside data-parallel programs. The technique recalls (memorizes) the context of error-free execution of an instruction on a FPU. To enable scalable and independent recovery, a single-cycle lookup table (LUT) is tightly coupled to every FPU to maintain contexts of recent error-free executions. The LUT reuses these memorized contexts to exactly, or approximately, correct errant FP instructions based on application needs. In real-world applications, the temporal memoization technique achieves an average energy saving of 8%-28% for a wide range of timing error rates (0%-4%) and outperforms recent advances in resilient architectures. This technique also enhances robustness in the voltage overscaling regime and achieves relative average energy saving of 66 % with 11% voltage overscaling.","PeriodicalId":6550,"journal":{"name":"2014 Design, Automation & Test in Europe Conference & Exhibition (DATE)","volume":"75 1","pages":"1-6"},"PeriodicalIF":0.0000,"publicationDate":"2014-03-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"17","resultStr":"{\"title\":\"Temporal memoization for energy-efficient timing error recovery in GPGPUs\",\"authors\":\"Abbas Rahimi, L. Benini, Rajesh K. Gupta\",\"doi\":\"10.7873/DATE.2014.113\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Manufacturing and environmental variability lead to timing errors in computing systems that are typically corrected by error detection and correction mechanisms at the circuit level. The cost and speed of recovery can be improved by memoization-based optimization methods that exploit spatial or temporal parallelisms in suitable computing fabrics such as general-purpose graphics processing units (GPGPUs). We propose here a temporal memoization technique for use in floating-point units (FPUs) in GPGPUs that uses value locality inside data-parallel programs. The technique recalls (memorizes) the context of error-free execution of an instruction on a FPU. To enable scalable and independent recovery, a single-cycle lookup table (LUT) is tightly coupled to every FPU to maintain contexts of recent error-free executions. The LUT reuses these memorized contexts to exactly, or approximately, correct errant FP instructions based on application needs. In real-world applications, the temporal memoization technique achieves an average energy saving of 8%-28% for a wide range of timing error rates (0%-4%) and outperforms recent advances in resilient architectures. This technique also enhances robustness in the voltage overscaling regime and achieves relative average energy saving of 66 % with 11% voltage overscaling.\",\"PeriodicalId\":6550,\"journal\":{\"name\":\"2014 Design, Automation & Test in Europe Conference & Exhibition (DATE)\",\"volume\":\"75 1\",\"pages\":\"1-6\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2014-03-24\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"17\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2014 Design, Automation & Test in Europe Conference & Exhibition (DATE)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.7873/DATE.2014.113\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2014 Design, Automation & Test in Europe Conference & Exhibition (DATE)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.7873/DATE.2014.113","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 17

摘要

制造和环境的可变性导致计算系统中的定时错误,这些错误通常通过电路级的错误检测和校正机制来纠正。基于记忆的优化方法可以在适当的计算结构(如通用图形处理单元(gpgpu))中利用空间或时间的并行性,从而提高恢复的成本和速度。本文提出了一种用于gpgpu中的浮点单元(fpu)的时间记忆技术,该技术在数据并行程序中使用值局部性。该技术可以在FPU上回忆(记忆)无错误执行指令的上下文。为了支持可扩展和独立的恢复,单周期查找表(LUT)与每个FPU紧密耦合,以维护最近无错误执行的上下文。LUT重用这些记忆的上下文,根据应用程序的需要精确地或近似地纠正错误的FP指令。在实际应用中,时间记忆技术在大范围的时间错误率(0%-4%)下实现了8%-28%的平均节能,并且优于弹性架构中的最新进展。该技术还增强了电压过标度的鲁棒性,在电压过标度为11%的情况下实现了66%的相对平均节能。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
Temporal memoization for energy-efficient timing error recovery in GPGPUs
Manufacturing and environmental variability lead to timing errors in computing systems that are typically corrected by error detection and correction mechanisms at the circuit level. The cost and speed of recovery can be improved by memoization-based optimization methods that exploit spatial or temporal parallelisms in suitable computing fabrics such as general-purpose graphics processing units (GPGPUs). We propose here a temporal memoization technique for use in floating-point units (FPUs) in GPGPUs that uses value locality inside data-parallel programs. The technique recalls (memorizes) the context of error-free execution of an instruction on a FPU. To enable scalable and independent recovery, a single-cycle lookup table (LUT) is tightly coupled to every FPU to maintain contexts of recent error-free executions. The LUT reuses these memorized contexts to exactly, or approximately, correct errant FP instructions based on application needs. In real-world applications, the temporal memoization technique achieves an average energy saving of 8%-28% for a wide range of timing error rates (0%-4%) and outperforms recent advances in resilient architectures. This technique also enhances robustness in the voltage overscaling regime and achieves relative average energy saving of 66 % with 11% voltage overscaling.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信