Hao Qiu, Semiu A. Olowogemo, B. Lin, W. H. Robinson, D. Limbrick
{"title":"Understanding time-varying vulnerability accross GPU Program Lifetime","authors":"Hao Qiu, Semiu A. Olowogemo, B. Lin, W. H. Robinson, D. Limbrick","doi":"10.1109/DFT56152.2022.9962365","DOIUrl":null,"url":null,"abstract":"Time-varying behaviors of GPU program vulnerability could be exploited to reduce overheads for fault-tolerant designs. However, the inherent parallelism and performance overheads for massive fault injection (FI) hindered such assessments using FI. NVBitFI, a GPU FI tool featuring high-performance and good compatibility, allows time-varying vulnerability evaluations using FI within a reasonable time. We extended NVBitFI to control FI tests on the temporal dimension. A scalable workflow characterizing the time-varying vulnerability of GPU programs at two granularities is presented. A convenient approach to profile vulnerability with actual GPU time is also proposed. Results obtained from 60K fault injections demonstrated the feasibility of the proposed methodologies. A case study exploring the improved instruction-level grouping is presented. More than 340K faults are injected into the vectorAdd kernel to show the possibility to generalize the time-varying behavior of smaller inputs to realistic workloads with large inputs.","PeriodicalId":411011,"journal":{"name":"2022 IEEE International Symposium on Defect and Fault Tolerance in VLSI and Nanotechnology Systems (DFT)","volume":"21 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-10-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2022 IEEE International Symposium on Defect and Fault Tolerance in VLSI and Nanotechnology Systems (DFT)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/DFT56152.2022.9962365","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
Time-varying behaviors of GPU program vulnerability could be exploited to reduce overheads for fault-tolerant designs. However, the inherent parallelism and performance overheads for massive fault injection (FI) hindered such assessments using FI. NVBitFI, a GPU FI tool featuring high-performance and good compatibility, allows time-varying vulnerability evaluations using FI within a reasonable time. We extended NVBitFI to control FI tests on the temporal dimension. A scalable workflow characterizing the time-varying vulnerability of GPU programs at two granularities is presented. A convenient approach to profile vulnerability with actual GPU time is also proposed. Results obtained from 60K fault injections demonstrated the feasibility of the proposed methodologies. A case study exploring the improved instruction-level grouping is presented. More than 340K faults are injected into the vectorAdd kernel to show the possibility to generalize the time-varying behavior of smaller inputs to realistic workloads with large inputs.