基于2次幂量化的cpu快速CNN推理缓存处理

2023 International Technical Conference on Circuits/Systems, Computers, and Communications (ITC-CSCC) Pub Date : 2023-06-25 DOI:10.1109/ITC-CSCC58803.2023.10212854

Joseph Woo, Seungtae Lee, Seongwoo Kim, Gwang-Jun Byeon, Seokin Hong

{"title":"基于2次幂量化的cpu快速CNN推理缓存处理","authors":"Joseph Woo, Seungtae Lee, Seongwoo Kim, Gwang-Jun Byeon, Seokin Hong","doi":"10.1109/ITC-CSCC58803.2023.10212854","DOIUrl":null,"url":null,"abstract":"Convolutional Neural Networks (CNN) demand high computational capabilities, motivating researchers to leverage Processing-In-Memory (PIM) technology to achieve significant performance improvements. However, implementing complex arithmetic operations such as multiplication within memory is a significant challenge in the PIM architecture. To address this challenge, this paper proposes a PIM-enabled cache (PEC) architecture that utilizes shifters for performing multiplication operations at a low cost. We also introduce a filter-wise hardware-friendly Power-of-Two (POT) quantization scheme that quantizes weights into power-of-two values for specific filters to accelerate convolution operations with the PEC. Our experimental results demonstrate that the proposed PEC, together with the POT quantization, achieves 2.28x performance improvement on average with an accuracy degradation of 0.784%.","PeriodicalId":220939,"journal":{"name":"2023 International Technical Conference on Circuits/Systems, Computers, and Communications (ITC-CSCC)","volume":"36 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2023-06-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"In-Cache Processing with Power-of-Two Quantization for Fast CNN Inference on CPUs\",\"authors\":\"Joseph Woo, Seungtae Lee, Seongwoo Kim, Gwang-Jun Byeon, Seokin Hong\",\"doi\":\"10.1109/ITC-CSCC58803.2023.10212854\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Convolutional Neural Networks (CNN) demand high computational capabilities, motivating researchers to leverage Processing-In-Memory (PIM) technology to achieve significant performance improvements. However, implementing complex arithmetic operations such as multiplication within memory is a significant challenge in the PIM architecture. To address this challenge, this paper proposes a PIM-enabled cache (PEC) architecture that utilizes shifters for performing multiplication operations at a low cost. We also introduce a filter-wise hardware-friendly Power-of-Two (POT) quantization scheme that quantizes weights into power-of-two values for specific filters to accelerate convolution operations with the PEC. Our experimental results demonstrate that the proposed PEC, together with the POT quantization, achieves 2.28x performance improvement on average with an accuracy degradation of 0.784%.\",\"PeriodicalId\":220939,\"journal\":{\"name\":\"2023 International Technical Conference on Circuits/Systems, Computers, and Communications (ITC-CSCC)\",\"volume\":\"36 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2023-06-25\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2023 International Technical Conference on Circuits/Systems, Computers, and Communications (ITC-CSCC)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/ITC-CSCC58803.2023.10212854\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2023 International Technical Conference on Circuits/Systems, Computers, and Communications (ITC-CSCC)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ITC-CSCC58803.2023.10212854","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

摘要

卷积神经网络(CNN)需要高计算能力，这促使研究人员利用内存中处理(PIM)技术来实现显著的性能改进。然而，在内存中实现复杂的算术运算(如乘法)是PIM体系结构中的一个重大挑战。为了解决这一挑战，本文提出了一种支持pim的缓存(PEC)架构，该架构利用移位器以低成本执行乘法操作。我们还介绍了一种滤波器硬件友好的2次幂(POT)量化方案，该方案将权重量化为特定滤波器的2次幂值，以加速与PEC的卷积运算。实验结果表明，该方法与POT量化相结合，平均提高了2.28倍的性能，精度下降了0.784%。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

In-Cache Processing with Power-of-Two Quantization for Fast CNN Inference on CPUs

Convolutional Neural Networks (CNN) demand high computational capabilities, motivating researchers to leverage Processing-In-Memory (PIM) technology to achieve significant performance improvements. However, implementing complex arithmetic operations such as multiplication within memory is a significant challenge in the PIM architecture. To address this challenge, this paper proposes a PIM-enabled cache (PEC) architecture that utilizes shifters for performing multiplication operations at a low cost. We also introduce a filter-wise hardware-friendly Power-of-Two (POT) quantization scheme that quantizes weights into power-of-two values for specific filters to accelerate convolution operations with the PEC. Our experimental results demonstrate that the proposed PEC, together with the POT quantization, achieves 2.28x performance improvement on average with an accuracy degradation of 0.784%.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

2023 International Technical Conference on Circuits/Systems, Computers, and Communications (ITC-CSCC)

自引率

0.00%

发文量