{"title":"基于收缩阵列的AI加速器的gpu加速时序仿真","authors":"S. Holst, Lim Bumun, X. Wen","doi":"10.1109/ATS52891.2021.00034","DOIUrl":null,"url":null,"abstract":"Systolic arrays are currently used in autonomous systems such as self-driving cars to accelerate the enormous amount of matrix operations necessary for DNN inference. The reliability of such accelerators are of utmost importance since any loss in DNN accuracy due to erroneous calculations can have dire consequences. We propose a novel method to measure accuracy losses caused by arbitrary timing faults in systolic arrays. Our GPU-based simulation system enables for the first time a complete and accurate timing simulation of all inference-related matrix operations on large systolic arrays. A single consumer-grade GPU can simulate a LeNet-5 at a throughput of about 13s per inference. Furthermore, our simulation approach readily scales to larger DNNs and multiple GPUs.","PeriodicalId":432330,"journal":{"name":"2021 IEEE 30th Asian Test Symposium (ATS)","volume":"143 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2021-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":"{\"title\":\"GPU-Accelerated Timing Simulation of Systolic-Array-Based AI Accelerators\",\"authors\":\"S. Holst, Lim Bumun, X. Wen\",\"doi\":\"10.1109/ATS52891.2021.00034\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Systolic arrays are currently used in autonomous systems such as self-driving cars to accelerate the enormous amount of matrix operations necessary for DNN inference. The reliability of such accelerators are of utmost importance since any loss in DNN accuracy due to erroneous calculations can have dire consequences. We propose a novel method to measure accuracy losses caused by arbitrary timing faults in systolic arrays. Our GPU-based simulation system enables for the first time a complete and accurate timing simulation of all inference-related matrix operations on large systolic arrays. A single consumer-grade GPU can simulate a LeNet-5 at a throughput of about 13s per inference. Furthermore, our simulation approach readily scales to larger DNNs and multiple GPUs.\",\"PeriodicalId\":432330,\"journal\":{\"name\":\"2021 IEEE 30th Asian Test Symposium (ATS)\",\"volume\":\"143 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2021-11-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"1\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2021 IEEE 30th Asian Test Symposium (ATS)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/ATS52891.2021.00034\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2021 IEEE 30th Asian Test Symposium (ATS)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ATS52891.2021.00034","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
GPU-Accelerated Timing Simulation of Systolic-Array-Based AI Accelerators
Systolic arrays are currently used in autonomous systems such as self-driving cars to accelerate the enormous amount of matrix operations necessary for DNN inference. The reliability of such accelerators are of utmost importance since any loss in DNN accuracy due to erroneous calculations can have dire consequences. We propose a novel method to measure accuracy losses caused by arbitrary timing faults in systolic arrays. Our GPU-based simulation system enables for the first time a complete and accurate timing simulation of all inference-related matrix operations on large systolic arrays. A single consumer-grade GPU can simulate a LeNet-5 at a throughput of about 13s per inference. Furthermore, our simulation approach readily scales to larger DNNs and multiple GPUs.