{"title":"现代 GPU 张量核中的瞬态故障检测","authors":"M. Hafezan, E. Atoofian","doi":"10.1145/3687483","DOIUrl":null,"url":null,"abstract":"Deep Neural networks (DNNs) have emerged as an effective solution for many machine learning applications. However, the great success comes with the cost of excessive computation. The Volta graphics processing unit (GPU) from NVIDIA introduced a specialized hardware unit called tensor core (TC) aiming at meeting the growing computation demand needed by DNNs. Most previous studies on TCs have focused on performance improvement through the utilization of TC's high degree of parallelism. However, as DNNs are deployed into security-sensitive applications such as autonomous driving, the reliability of TCs is as important as performance.\n In this work, we exploit the unique architectural characteristics of TCs and propose a simple and implementation-efficient hardware technique called fault detection in tensor core (FDTC) to detect transient faults in TCs. In particular, FDTC exploits the zero-valued weights that stem from network pruning as well as sparse activations arising from the common ReLU operator to verify tensor operations. High level of sparsity in tensors allows FDTC to run original and verifying products simultaneously, leading to zero performance penalty. For applications with low sparsity rate, FDTC relies on temporal redundancy to re-execute effectual products. FDTC schedules the execution of verifying products only when multipliers are idle. Our experimental results reveal that FDTC offers 100% fault coverage with no performance penalty and small energy overhead in TCs.","PeriodicalId":50914,"journal":{"name":"ACM Transactions on Embedded Computing Systems","volume":null,"pages":null},"PeriodicalIF":2.8000,"publicationDate":"2024-08-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Transient Fault Detection in Tensor Cores for Modern GPUs\",\"authors\":\"M. Hafezan, E. Atoofian\",\"doi\":\"10.1145/3687483\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Deep Neural networks (DNNs) have emerged as an effective solution for many machine learning applications. However, the great success comes with the cost of excessive computation. The Volta graphics processing unit (GPU) from NVIDIA introduced a specialized hardware unit called tensor core (TC) aiming at meeting the growing computation demand needed by DNNs. Most previous studies on TCs have focused on performance improvement through the utilization of TC's high degree of parallelism. However, as DNNs are deployed into security-sensitive applications such as autonomous driving, the reliability of TCs is as important as performance.\\n In this work, we exploit the unique architectural characteristics of TCs and propose a simple and implementation-efficient hardware technique called fault detection in tensor core (FDTC) to detect transient faults in TCs. In particular, FDTC exploits the zero-valued weights that stem from network pruning as well as sparse activations arising from the common ReLU operator to verify tensor operations. High level of sparsity in tensors allows FDTC to run original and verifying products simultaneously, leading to zero performance penalty. For applications with low sparsity rate, FDTC relies on temporal redundancy to re-execute effectual products. FDTC schedules the execution of verifying products only when multipliers are idle. Our experimental results reveal that FDTC offers 100% fault coverage with no performance penalty and small energy overhead in TCs.\",\"PeriodicalId\":50914,\"journal\":{\"name\":\"ACM Transactions on Embedded Computing Systems\",\"volume\":null,\"pages\":null},\"PeriodicalIF\":2.8000,\"publicationDate\":\"2024-08-10\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"ACM Transactions on Embedded Computing Systems\",\"FirstCategoryId\":\"94\",\"ListUrlMain\":\"https://doi.org/10.1145/3687483\",\"RegionNum\":3,\"RegionCategory\":\"计算机科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q2\",\"JCRName\":\"COMPUTER SCIENCE, HARDWARE & ARCHITECTURE\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"ACM Transactions on Embedded Computing Systems","FirstCategoryId":"94","ListUrlMain":"https://doi.org/10.1145/3687483","RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"COMPUTER SCIENCE, HARDWARE & ARCHITECTURE","Score":null,"Total":0}
Transient Fault Detection in Tensor Cores for Modern GPUs
Deep Neural networks (DNNs) have emerged as an effective solution for many machine learning applications. However, the great success comes with the cost of excessive computation. The Volta graphics processing unit (GPU) from NVIDIA introduced a specialized hardware unit called tensor core (TC) aiming at meeting the growing computation demand needed by DNNs. Most previous studies on TCs have focused on performance improvement through the utilization of TC's high degree of parallelism. However, as DNNs are deployed into security-sensitive applications such as autonomous driving, the reliability of TCs is as important as performance.
In this work, we exploit the unique architectural characteristics of TCs and propose a simple and implementation-efficient hardware technique called fault detection in tensor core (FDTC) to detect transient faults in TCs. In particular, FDTC exploits the zero-valued weights that stem from network pruning as well as sparse activations arising from the common ReLU operator to verify tensor operations. High level of sparsity in tensors allows FDTC to run original and verifying products simultaneously, leading to zero performance penalty. For applications with low sparsity rate, FDTC relies on temporal redundancy to re-execute effectual products. FDTC schedules the execution of verifying products only when multipliers are idle. Our experimental results reveal that FDTC offers 100% fault coverage with no performance penalty and small energy overhead in TCs.
期刊介绍:
The design of embedded computing systems, both the software and hardware, increasingly relies on sophisticated algorithms, analytical models, and methodologies. ACM Transactions on Embedded Computing Systems (TECS) aims to present the leading work relating to the analysis, design, behavior, and experience with embedded computing systems.