{"title":"错误弹性变压器:一种新的软错误漏洞导向的错误检测与抑制方法","authors":"Kwondo Ma, C. Amarnath, A. Chatterjee","doi":"10.1109/ETS56758.2023.10174239","DOIUrl":null,"url":null,"abstract":"Transformer networks have achieved remarkable success in Natural Language Processing (NLP) and Computer Vision applications. However, the underlying large volumes of Transformer computations demand high reliability and resilience to soft errors in processor hardware. The objective of this research is to develop efficient techniques for design of error resilient Transformer architectures. To enable this, we first perform a soft error vulnerability analysis of every fully connected layers in Transformer computations. Based on this study, error detection and suppression modules are selectively introduced into datapaths to restore Transformer performance under anticipated error rate conditions. Memory access errors and neuron output errors are detected using checksums of linear Transformer computations. Correction consists of determining output neurons with out-of-range values and suppressing the same to zero. For a Transformer with nominal BLEU score of 52.7, such vulnerability guided selective error suppression can recover language translation performance from a BLEU score of 0 to 50.774 with as much as 0.001 probability of activation error, incurring negligible memory and computation overheads.","PeriodicalId":211522,"journal":{"name":"2023 IEEE European Test Symposium (ETS)","volume":"23 Vol. 23 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2023-05-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Error Resilient Transformers: A Novel Soft Error Vulnerability Guided Approach to Error Checking and Suppression\",\"authors\":\"Kwondo Ma, C. Amarnath, A. Chatterjee\",\"doi\":\"10.1109/ETS56758.2023.10174239\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Transformer networks have achieved remarkable success in Natural Language Processing (NLP) and Computer Vision applications. However, the underlying large volumes of Transformer computations demand high reliability and resilience to soft errors in processor hardware. The objective of this research is to develop efficient techniques for design of error resilient Transformer architectures. To enable this, we first perform a soft error vulnerability analysis of every fully connected layers in Transformer computations. Based on this study, error detection and suppression modules are selectively introduced into datapaths to restore Transformer performance under anticipated error rate conditions. Memory access errors and neuron output errors are detected using checksums of linear Transformer computations. Correction consists of determining output neurons with out-of-range values and suppressing the same to zero. For a Transformer with nominal BLEU score of 52.7, such vulnerability guided selective error suppression can recover language translation performance from a BLEU score of 0 to 50.774 with as much as 0.001 probability of activation error, incurring negligible memory and computation overheads.\",\"PeriodicalId\":211522,\"journal\":{\"name\":\"2023 IEEE European Test Symposium (ETS)\",\"volume\":\"23 Vol. 23 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2023-05-22\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2023 IEEE European Test Symposium (ETS)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/ETS56758.2023.10174239\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2023 IEEE European Test Symposium (ETS)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ETS56758.2023.10174239","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Error Resilient Transformers: A Novel Soft Error Vulnerability Guided Approach to Error Checking and Suppression
Transformer networks have achieved remarkable success in Natural Language Processing (NLP) and Computer Vision applications. However, the underlying large volumes of Transformer computations demand high reliability and resilience to soft errors in processor hardware. The objective of this research is to develop efficient techniques for design of error resilient Transformer architectures. To enable this, we first perform a soft error vulnerability analysis of every fully connected layers in Transformer computations. Based on this study, error detection and suppression modules are selectively introduced into datapaths to restore Transformer performance under anticipated error rate conditions. Memory access errors and neuron output errors are detected using checksums of linear Transformer computations. Correction consists of determining output neurons with out-of-range values and suppressing the same to zero. For a Transformer with nominal BLEU score of 52.7, such vulnerability guided selective error suppression can recover language translation performance from a BLEU score of 0 to 50.774 with as much as 0.001 probability of activation error, incurring negligible memory and computation overheads.