Xueying Wang, Guangli Li, Zhen Jia, Xiaobing Feng, Yida Wang
{"title":"快速卷积满足低精度:在现代cpu上探索有效的量化Winograd卷积","authors":"Xueying Wang, Guangli Li, Zhen Jia, Xiaobing Feng, Yida Wang","doi":"10.1145/3632956","DOIUrl":null,"url":null,"abstract":"<p>Low-precision computation has emerged as one of the most effective techniques for accelerating convolutional neural networks and has garnered widespread support on modern hardware. Despite its effectiveness in accelerating convolutional neural networks, low-precision computation has not been commonly applied to fast convolutions, such as the Winograd algorithm, due to numerical issues. In this paper, we propose an effective quantized Winograd convolution, named LoWino, which employs an in-side quantization method in the Winograd domain to reduce the precision loss caused by transformations. Meanwhile, we present an efficient implementation that integrates well-designed optimization techniques, allowing us to fully exploit the capabilities of low-precision computation on modern CPUs. We evaluate LoWino on two Intel Xeon Scalable Processor platforms with representative convolutional layers and neural network models. The experimental results demonstrate that our approach can achieve an average of 1.84 × and 1.91 × operator speedups over state-of-the-art implementations in the vendor library while preserving accuracy loss at a reasonable level.</p>","PeriodicalId":50920,"journal":{"name":"ACM Transactions on Architecture and Code Optimization","volume":"9 1","pages":""},"PeriodicalIF":1.5000,"publicationDate":"2023-11-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Fast Convolution Meets Low Precision: Exploring Efficient Quantized Winograd Convolution on Modern CPUs\",\"authors\":\"Xueying Wang, Guangli Li, Zhen Jia, Xiaobing Feng, Yida Wang\",\"doi\":\"10.1145/3632956\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<p>Low-precision computation has emerged as one of the most effective techniques for accelerating convolutional neural networks and has garnered widespread support on modern hardware. Despite its effectiveness in accelerating convolutional neural networks, low-precision computation has not been commonly applied to fast convolutions, such as the Winograd algorithm, due to numerical issues. In this paper, we propose an effective quantized Winograd convolution, named LoWino, which employs an in-side quantization method in the Winograd domain to reduce the precision loss caused by transformations. Meanwhile, we present an efficient implementation that integrates well-designed optimization techniques, allowing us to fully exploit the capabilities of low-precision computation on modern CPUs. We evaluate LoWino on two Intel Xeon Scalable Processor platforms with representative convolutional layers and neural network models. The experimental results demonstrate that our approach can achieve an average of 1.84 × and 1.91 × operator speedups over state-of-the-art implementations in the vendor library while preserving accuracy loss at a reasonable level.</p>\",\"PeriodicalId\":50920,\"journal\":{\"name\":\"ACM Transactions on Architecture and Code Optimization\",\"volume\":\"9 1\",\"pages\":\"\"},\"PeriodicalIF\":1.5000,\"publicationDate\":\"2023-11-17\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"ACM Transactions on Architecture and Code Optimization\",\"FirstCategoryId\":\"94\",\"ListUrlMain\":\"https://doi.org/10.1145/3632956\",\"RegionNum\":3,\"RegionCategory\":\"计算机科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q4\",\"JCRName\":\"COMPUTER SCIENCE, HARDWARE & ARCHITECTURE\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"ACM Transactions on Architecture and Code Optimization","FirstCategoryId":"94","ListUrlMain":"https://doi.org/10.1145/3632956","RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q4","JCRName":"COMPUTER SCIENCE, HARDWARE & ARCHITECTURE","Score":null,"Total":0}
Fast Convolution Meets Low Precision: Exploring Efficient Quantized Winograd Convolution on Modern CPUs
Low-precision computation has emerged as one of the most effective techniques for accelerating convolutional neural networks and has garnered widespread support on modern hardware. Despite its effectiveness in accelerating convolutional neural networks, low-precision computation has not been commonly applied to fast convolutions, such as the Winograd algorithm, due to numerical issues. In this paper, we propose an effective quantized Winograd convolution, named LoWino, which employs an in-side quantization method in the Winograd domain to reduce the precision loss caused by transformations. Meanwhile, we present an efficient implementation that integrates well-designed optimization techniques, allowing us to fully exploit the capabilities of low-precision computation on modern CPUs. We evaluate LoWino on two Intel Xeon Scalable Processor platforms with representative convolutional layers and neural network models. The experimental results demonstrate that our approach can achieve an average of 1.84 × and 1.91 × operator speedups over state-of-the-art implementations in the vendor library while preserving accuracy loss at a reasonable level.
期刊介绍:
ACM Transactions on Architecture and Code Optimization (TACO) focuses on hardware, software, and system research spanning the fields of computer architecture and code optimization. Articles that appear in TACO will either present new techniques and concepts or report on experiences and experiments with actual systems. Insights useful to architects, hardware or software developers, designers, builders, and users will be emphasized.