Neel Gala, Swagath Venkataramani, A. Raghunathan, V. Kamakoti
{"title":"STOCK: Stochastic Checkers for Low-overhead Approximate Error Detection","authors":"Neel Gala, Swagath Venkataramani, A. Raghunathan, V. Kamakoti","doi":"10.1145/2934583.2934634","DOIUrl":null,"url":null,"abstract":"Designing reliable systems, while eschewing the high overheads of conventional fault tolerance techniques, is a critical challenge in the deeply scaled CMOS and post-CMOS era. To address this challenge, we leverage the intrinsic resilience of application domains such as multimedia, recognition, mining, search, and analytics where acceptable outputs are produced despite occasional approximate computations. We propose stochastic checkers, wherein a stochastic logic based realization of the circuit is used as an error checker, and the original circuit's output is declared to be correct if it lies within a certain range of the checker's output. The key benefit of stochastic checkers is that the intrinsic compactness of stochastic logic leads to greatly reduced overheads. However, due to the approximate nature of stochastic circuits, errors that cause the output to be within a certain range of the correct value may not be detected (missed coverage). In addition, some correct outputs may be incorrectly flagged as erroneous (false positives). To limit the number of missed errors and false positives, we propose a technique that uses input permuted partial replicas of the stochastic logic to improve accuracy without greatly increasing the overheads. We also address the challenge of error detection latency (due to the bit-serial nature of stochastic logic) through progressive checking policies that produce an early decision based on a prefix of the checker's output bitstream. We evaluate stochastic checkers on hardware implementations of a suite of error-resilient applications, and demonstrate that they can lead to greatly reduced overheads (29.5% area and 21.5% power, on average) compared to traditional fault tolerance techniques, while achieving very high coverage (average of 99.5%) and very low false positives (average of 0.1%).","PeriodicalId":142716,"journal":{"name":"Proceedings of the 2016 International Symposium on Low Power Electronics and Design","volume":"1 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2016-08-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 2016 International Symposium on Low Power Electronics and Design","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/2934583.2934634","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
Designing reliable systems, while eschewing the high overheads of conventional fault tolerance techniques, is a critical challenge in the deeply scaled CMOS and post-CMOS era. To address this challenge, we leverage the intrinsic resilience of application domains such as multimedia, recognition, mining, search, and analytics where acceptable outputs are produced despite occasional approximate computations. We propose stochastic checkers, wherein a stochastic logic based realization of the circuit is used as an error checker, and the original circuit's output is declared to be correct if it lies within a certain range of the checker's output. The key benefit of stochastic checkers is that the intrinsic compactness of stochastic logic leads to greatly reduced overheads. However, due to the approximate nature of stochastic circuits, errors that cause the output to be within a certain range of the correct value may not be detected (missed coverage). In addition, some correct outputs may be incorrectly flagged as erroneous (false positives). To limit the number of missed errors and false positives, we propose a technique that uses input permuted partial replicas of the stochastic logic to improve accuracy without greatly increasing the overheads. We also address the challenge of error detection latency (due to the bit-serial nature of stochastic logic) through progressive checking policies that produce an early decision based on a prefix of the checker's output bitstream. We evaluate stochastic checkers on hardware implementations of a suite of error-resilient applications, and demonstrate that they can lead to greatly reduced overheads (29.5% area and 21.5% power, on average) compared to traditional fault tolerance techniques, while achieving very high coverage (average of 99.5%) and very low false positives (average of 0.1%).