{"title":"Efficient on-line fault-tolerance for the preconditioned conjugate gradient method","authors":"A. Schöll, Claus Braun, M. Kochte, H. Wunderlich","doi":"10.1109/IOLTS.2015.7229839","DOIUrl":null,"url":null,"abstract":"Linear system solvers are key components of many scientific applications and they can benefit significantly from modern heterogeneous computer architectures. However, such nano-scaled CMOS devices face an increasing number of reliability threats, which make the integration of fault tolerance mandatory. The preconditioned conjugate gradient method (PCG) is a very popular solver since it typically finds solutions faster than direct methods, and it is less vulnerable to transient effects. However, as latest research shows, the vulnerability is still considerable. Even single errors caused, for instance, by marginal hardware, harsh operating conditions or particle radiation can increase execution times considerably or corrupt solutions without indication. In this work, a novel and highly efficient fault-tolerant PCG method is presented. The method applies only two inner products to reliably detect errors. In case of errors, the method automatically selects between roll-back and efficient on-line correction. This significantly reduces the error detection overhead and expensive re-computations.","PeriodicalId":413023,"journal":{"name":"2015 IEEE 21st International On-Line Testing Symposium (IOLTS)","volume":"14 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2015-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"8","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2015 IEEE 21st International On-Line Testing Symposium (IOLTS)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/IOLTS.2015.7229839","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 8
Abstract
Linear system solvers are key components of many scientific applications and they can benefit significantly from modern heterogeneous computer architectures. However, such nano-scaled CMOS devices face an increasing number of reliability threats, which make the integration of fault tolerance mandatory. The preconditioned conjugate gradient method (PCG) is a very popular solver since it typically finds solutions faster than direct methods, and it is less vulnerable to transient effects. However, as latest research shows, the vulnerability is still considerable. Even single errors caused, for instance, by marginal hardware, harsh operating conditions or particle radiation can increase execution times considerably or corrupt solutions without indication. In this work, a novel and highly efficient fault-tolerant PCG method is presented. The method applies only two inner products to reliably detect errors. In case of errors, the method automatically selects between roll-back and efficient on-line correction. This significantly reduces the error detection overhead and expensive re-computations.