Chandra Sekhar Mummidi, Victor C. Ferreira, Sudarshan Srinivasan, Sandip Kundu
{"title":"在平铺AMX加速器上的高效自检矩阵乘法","authors":"Chandra Sekhar Mummidi, Victor C. Ferreira, Sudarshan Srinivasan, Sandip Kundu","doi":"10.1145/3633332","DOIUrl":null,"url":null,"abstract":"<p>General Matrix Multiplication (GEMM) is a computationally expensive operation that is used in many applications such as machine-learning. Hardware accelerators are increasingly popular for speeding up GEMM computation, with Tiled Matrix Multiplication (TMUL) in recent Intel processors being an example. Unfortunately, the TMUL hardware is susceptible to errors necessitating online error detection. Algorithm-based Error Detection techniques (ABED) is a powerful technique to detect errors in matrix multiplications. In this paper, we consider implementation of ABED that integrates seamlessly with the TMUL hardware to minimize performance overhead. Unfortunately, rounding errors introduced by floating-point operations do not allow a straightforward implementation of ABED in TMUL. Previously an error bound was considered for addressing rounding errors in ABED. If the error detection threshold is set too low, it will trigger false alarm while a loose bound will allow errors to escape detection. In this paper, we propose an adaptive error threshold that takes into account the TMUL input values to address the problem of false triggers and error escapes, and provide a taxonomy of various error classes. This threshold is obtained from theoretical error analysis but is not easy to implement in hardware. Consequently, we relax the threshold such that it can be easily computed in hardware. While ABED ensures error free computation it does not guarantee full coverage of all hardware faults. To address this problem, we propose an algorithmic pattern-generation technique to ensure full coverage for all hardware faults. To evaluate the benefits of our proposed solution, we conducted fault injection experiments and show that our approach does not produce any false alarms or detection escapes for observable errors. We conducted additional fault injection experiments on a Deep Neural Network (DNN) model and find that if a fault is not detected, it does not cause any misclassification.</p>","PeriodicalId":50920,"journal":{"name":"ACM Transactions on Architecture and Code Optimization","volume":"14 2 1","pages":""},"PeriodicalIF":1.5000,"publicationDate":"2023-11-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Highly Efficient Self-Checking Matrix Multiplication on Tiled AMX Accelerators\",\"authors\":\"Chandra Sekhar Mummidi, Victor C. Ferreira, Sudarshan Srinivasan, Sandip Kundu\",\"doi\":\"10.1145/3633332\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<p>General Matrix Multiplication (GEMM) is a computationally expensive operation that is used in many applications such as machine-learning. Hardware accelerators are increasingly popular for speeding up GEMM computation, with Tiled Matrix Multiplication (TMUL) in recent Intel processors being an example. Unfortunately, the TMUL hardware is susceptible to errors necessitating online error detection. Algorithm-based Error Detection techniques (ABED) is a powerful technique to detect errors in matrix multiplications. In this paper, we consider implementation of ABED that integrates seamlessly with the TMUL hardware to minimize performance overhead. Unfortunately, rounding errors introduced by floating-point operations do not allow a straightforward implementation of ABED in TMUL. Previously an error bound was considered for addressing rounding errors in ABED. If the error detection threshold is set too low, it will trigger false alarm while a loose bound will allow errors to escape detection. In this paper, we propose an adaptive error threshold that takes into account the TMUL input values to address the problem of false triggers and error escapes, and provide a taxonomy of various error classes. This threshold is obtained from theoretical error analysis but is not easy to implement in hardware. Consequently, we relax the threshold such that it can be easily computed in hardware. While ABED ensures error free computation it does not guarantee full coverage of all hardware faults. To address this problem, we propose an algorithmic pattern-generation technique to ensure full coverage for all hardware faults. To evaluate the benefits of our proposed solution, we conducted fault injection experiments and show that our approach does not produce any false alarms or detection escapes for observable errors. We conducted additional fault injection experiments on a Deep Neural Network (DNN) model and find that if a fault is not detected, it does not cause any misclassification.</p>\",\"PeriodicalId\":50920,\"journal\":{\"name\":\"ACM Transactions on Architecture and Code Optimization\",\"volume\":\"14 2 1\",\"pages\":\"\"},\"PeriodicalIF\":1.5000,\"publicationDate\":\"2023-11-22\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"ACM Transactions on Architecture and Code Optimization\",\"FirstCategoryId\":\"94\",\"ListUrlMain\":\"https://doi.org/10.1145/3633332\",\"RegionNum\":3,\"RegionCategory\":\"计算机科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q4\",\"JCRName\":\"COMPUTER SCIENCE, HARDWARE & ARCHITECTURE\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"ACM Transactions on Architecture and Code Optimization","FirstCategoryId":"94","ListUrlMain":"https://doi.org/10.1145/3633332","RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q4","JCRName":"COMPUTER SCIENCE, HARDWARE & ARCHITECTURE","Score":null,"Total":0}
Highly Efficient Self-Checking Matrix Multiplication on Tiled AMX Accelerators
General Matrix Multiplication (GEMM) is a computationally expensive operation that is used in many applications such as machine-learning. Hardware accelerators are increasingly popular for speeding up GEMM computation, with Tiled Matrix Multiplication (TMUL) in recent Intel processors being an example. Unfortunately, the TMUL hardware is susceptible to errors necessitating online error detection. Algorithm-based Error Detection techniques (ABED) is a powerful technique to detect errors in matrix multiplications. In this paper, we consider implementation of ABED that integrates seamlessly with the TMUL hardware to minimize performance overhead. Unfortunately, rounding errors introduced by floating-point operations do not allow a straightforward implementation of ABED in TMUL. Previously an error bound was considered for addressing rounding errors in ABED. If the error detection threshold is set too low, it will trigger false alarm while a loose bound will allow errors to escape detection. In this paper, we propose an adaptive error threshold that takes into account the TMUL input values to address the problem of false triggers and error escapes, and provide a taxonomy of various error classes. This threshold is obtained from theoretical error analysis but is not easy to implement in hardware. Consequently, we relax the threshold such that it can be easily computed in hardware. While ABED ensures error free computation it does not guarantee full coverage of all hardware faults. To address this problem, we propose an algorithmic pattern-generation technique to ensure full coverage for all hardware faults. To evaluate the benefits of our proposed solution, we conducted fault injection experiments and show that our approach does not produce any false alarms or detection escapes for observable errors. We conducted additional fault injection experiments on a Deep Neural Network (DNN) model and find that if a fault is not detected, it does not cause any misclassification.
期刊介绍:
ACM Transactions on Architecture and Code Optimization (TACO) focuses on hardware, software, and system research spanning the fields of computer architecture and code optimization. Articles that appear in TACO will either present new techniques and concepts or report on experiences and experiments with actual systems. Insights useful to architects, hardware or software developers, designers, builders, and users will be emphasized.