{"title":"基于电阻存储器的gpgpu软错误鲁棒节能寄存器文件研究","authors":"Jingweijia Tan, Zhi Li, Mingsong Chen, Xin Fu","doi":"10.1145/2827697","DOIUrl":null,"url":null,"abstract":"The increasing adoption of graphics processing units (GPUs) for high-performance computing raises the reliability challenge, which is generally ignored in traditional GPUs. GPUs usually support thousands of parallel threads and require a sizable register file. Such large register file is highly susceptible to soft errors and power-hungry. Although ECC has been adopted to register file in modern GPUs, it causes considerable power overhead, which further increases the power stress. Thus, an energy-efficient soft-error protection mechanism is more desirable. Besides its extremely low leakage power consumption, resistive memory (e.g., spin-transfer torque RAM) is also immune to the radiation induced soft errors due to its magnetic field based storage. In this article, we propose to LEverage reSistive memory to enhance the Soft-error robustness and reduce the power consumption (LESS) of registers in the General-Purpose computing on GPUs (GPGPUs). Since resistive memory experiences longer write latency compared to SRAM, we explore the unique characteristics of GPGPU applications to obtain the win-win gains: achieving the near-full soft-error protection for the register file, and meanwhile substantially reducing the energy consumption with negligible performance degradation. Our experimental results show that LESS is able to mitigate the registers soft-error vulnerability by 86% and achieve 61% energy savings with negligible (e.g., 1%) performance degradation.","PeriodicalId":7063,"journal":{"name":"ACM Trans. Design Autom. Electr. Syst.","volume":"155 1","pages":"34:1-34:25"},"PeriodicalIF":0.0000,"publicationDate":"2016-01-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"3","resultStr":"{\"title\":\"Exploring Soft-Error Robust and Energy-Efficient Register File in GPGPUs using Resistive Memory\",\"authors\":\"Jingweijia Tan, Zhi Li, Mingsong Chen, Xin Fu\",\"doi\":\"10.1145/2827697\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"The increasing adoption of graphics processing units (GPUs) for high-performance computing raises the reliability challenge, which is generally ignored in traditional GPUs. GPUs usually support thousands of parallel threads and require a sizable register file. Such large register file is highly susceptible to soft errors and power-hungry. Although ECC has been adopted to register file in modern GPUs, it causes considerable power overhead, which further increases the power stress. Thus, an energy-efficient soft-error protection mechanism is more desirable. Besides its extremely low leakage power consumption, resistive memory (e.g., spin-transfer torque RAM) is also immune to the radiation induced soft errors due to its magnetic field based storage. In this article, we propose to LEverage reSistive memory to enhance the Soft-error robustness and reduce the power consumption (LESS) of registers in the General-Purpose computing on GPUs (GPGPUs). Since resistive memory experiences longer write latency compared to SRAM, we explore the unique characteristics of GPGPU applications to obtain the win-win gains: achieving the near-full soft-error protection for the register file, and meanwhile substantially reducing the energy consumption with negligible performance degradation. Our experimental results show that LESS is able to mitigate the registers soft-error vulnerability by 86% and achieve 61% energy savings with negligible (e.g., 1%) performance degradation.\",\"PeriodicalId\":7063,\"journal\":{\"name\":\"ACM Trans. Design Autom. Electr. Syst.\",\"volume\":\"155 1\",\"pages\":\"34:1-34:25\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2016-01-28\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"3\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"ACM Trans. Design Autom. Electr. Syst.\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1145/2827697\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"ACM Trans. Design Autom. Electr. Syst.","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/2827697","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 3
摘要
随着图形处理单元(graphics processing unit, gpu)在高性能计算领域的日益普及,对可靠性提出了挑战,而传统图形处理单元通常忽略了这一点。gpu通常支持数千个并行线程,并且需要一个相当大的寄存器文件。如此大的寄存器文件极易产生软错误和耗电。虽然在现代gpu中已经采用了ECC来注册文件,但它会造成相当大的功耗开销,从而进一步增加了功耗压力。因此,更需要一种节能的软错误保护机制。除了极低的泄漏功耗外,电阻式存储器(例如自旋转移扭矩RAM)由于其基于磁场的存储,也不受辐射引起的软误差的影响。在本文中,我们提出利用电阻式内存来增强gpu (gpgpu)通用计算寄存器的软错误鲁棒性和降低功耗(LESS)。由于电阻存储器比SRAM具有更长的写入延迟,因此我们探索GPGPU应用的独特特性,以获得双赢:实现对寄存器文件的近乎完全的软错误保护,同时大幅降低能耗,性能下降可以忽略不计。我们的实验结果表明,LESS能够将寄存器的软错误漏洞减少86%,并在性能下降可以忽略不计(例如1%)的情况下实现61%的节能。
Exploring Soft-Error Robust and Energy-Efficient Register File in GPGPUs using Resistive Memory
The increasing adoption of graphics processing units (GPUs) for high-performance computing raises the reliability challenge, which is generally ignored in traditional GPUs. GPUs usually support thousands of parallel threads and require a sizable register file. Such large register file is highly susceptible to soft errors and power-hungry. Although ECC has been adopted to register file in modern GPUs, it causes considerable power overhead, which further increases the power stress. Thus, an energy-efficient soft-error protection mechanism is more desirable. Besides its extremely low leakage power consumption, resistive memory (e.g., spin-transfer torque RAM) is also immune to the radiation induced soft errors due to its magnetic field based storage. In this article, we propose to LEverage reSistive memory to enhance the Soft-error robustness and reduce the power consumption (LESS) of registers in the General-Purpose computing on GPUs (GPGPUs). Since resistive memory experiences longer write latency compared to SRAM, we explore the unique characteristics of GPGPU applications to obtain the win-win gains: achieving the near-full soft-error protection for the register file, and meanwhile substantially reducing the energy consumption with negligible performance degradation. Our experimental results show that LESS is able to mitigate the registers soft-error vulnerability by 86% and achieve 61% energy savings with negligible (e.g., 1%) performance degradation.