{"title":"gpu辐射灵敏度与算法容错效率的实验评估","authors":"P. Rech, L. Carro","doi":"10.1109/IOLTS.2013.6604091","DOIUrl":null,"url":null,"abstract":"Experimental results demonstrate that Graphic Processing Units are very prone to be corrupted by neutrons. We have performed several experimental campaigns at ISIS, UK and at LANSCE, Los Alamos, NM, USA accessing the sensitivity of the GPU internal resources as well as the error rate of common parallel algorithms. Experiments highlight output error patterns and radiation responses that can be fruitfully used to design optimized Algorithm-Based Fault Tolerance strategies and provide pragmatic programming guidelines to increase the code reliability with low computational overhead.","PeriodicalId":423175,"journal":{"name":"2013 IEEE 19th International On-Line Testing Symposium (IOLTS)","volume":"115 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2013-07-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"4","resultStr":"{\"title\":\"Experimental evaluation of GPUs radiation sensitivity and algorithm-based fault tolerance efficiency\",\"authors\":\"P. Rech, L. Carro\",\"doi\":\"10.1109/IOLTS.2013.6604091\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Experimental results demonstrate that Graphic Processing Units are very prone to be corrupted by neutrons. We have performed several experimental campaigns at ISIS, UK and at LANSCE, Los Alamos, NM, USA accessing the sensitivity of the GPU internal resources as well as the error rate of common parallel algorithms. Experiments highlight output error patterns and radiation responses that can be fruitfully used to design optimized Algorithm-Based Fault Tolerance strategies and provide pragmatic programming guidelines to increase the code reliability with low computational overhead.\",\"PeriodicalId\":423175,\"journal\":{\"name\":\"2013 IEEE 19th International On-Line Testing Symposium (IOLTS)\",\"volume\":\"115 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2013-07-08\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"4\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2013 IEEE 19th International On-Line Testing Symposium (IOLTS)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/IOLTS.2013.6604091\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2013 IEEE 19th International On-Line Testing Symposium (IOLTS)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/IOLTS.2013.6604091","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 4
摘要
实验结果表明,图形处理单元很容易被中子破坏。我们在ISIS,英国和LANSCE, Los Alamos, NM,美国进行了几次实验活动,访问GPU内部资源的敏感性以及常见并行算法的错误率。实验突出了输出错误模式和辐射响应,可以有效地用于设计优化的基于算法的容错策略,并提供实用的编程指南,以低计算开销提高代码可靠性。
Experimental evaluation of GPUs radiation sensitivity and algorithm-based fault tolerance efficiency
Experimental results demonstrate that Graphic Processing Units are very prone to be corrupted by neutrons. We have performed several experimental campaigns at ISIS, UK and at LANSCE, Los Alamos, NM, USA accessing the sensitivity of the GPU internal resources as well as the error rate of common parallel algorithms. Experiments highlight output error patterns and radiation responses that can be fruitfully used to design optimized Algorithm-Based Fault Tolerance strategies and provide pragmatic programming guidelines to increase the code reliability with low computational overhead.