{"title":"Reducing DUE-FIT of caches by exploiting acoustic wave detectors for error recovery","authors":"Gaurang Upasani, X. Vera, Antonio González","doi":"10.1109/IOLTS.2013.6604056","DOIUrl":null,"url":null,"abstract":"Cosmic radiation induced soft errors have emerged as a key challenge in computer system design. The exponential increase in the transistor count will drive the per chip fault rate sky high. New techniques for detecting errors in the logic and memories that allow meeting the desired failures in-time (FIT) budget in future chip multiprocessors (CMPs) are essential. Among the two major contributors towards soft error rate, silent data corruption (SDC) and detected unrecoverable error (DUE), DUE is the largest. Moreover, processors can experience a super-linear increase in DUE when the size of the write-back cache is doubled. This paper targets the DUE problem in write-back data caches. We analyze the cost of protection against single bit and multi-bit upsets into caches. Our results show that the proposed mechanism can reduce the DUE to “0” with minimum area, power and performance overheads.","PeriodicalId":423175,"journal":{"name":"2013 IEEE 19th International On-Line Testing Symposium (IOLTS)","volume":"6 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2013-07-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"12","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2013 IEEE 19th International On-Line Testing Symposium (IOLTS)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/IOLTS.2013.6604056","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 12
Abstract
Cosmic radiation induced soft errors have emerged as a key challenge in computer system design. The exponential increase in the transistor count will drive the per chip fault rate sky high. New techniques for detecting errors in the logic and memories that allow meeting the desired failures in-time (FIT) budget in future chip multiprocessors (CMPs) are essential. Among the two major contributors towards soft error rate, silent data corruption (SDC) and detected unrecoverable error (DUE), DUE is the largest. Moreover, processors can experience a super-linear increase in DUE when the size of the write-back cache is doubled. This paper targets the DUE problem in write-back data caches. We analyze the cost of protection against single bit and multi-bit upsets into caches. Our results show that the proposed mechanism can reduce the DUE to “0” with minimum area, power and performance overheads.