BFP-CIM: Data-Free Quantization with Dynamic Block-Floating-Point Arithmetic for Energy-Efficient Computing-In-Memory-based Accelerator

Cheng-Yang Chang, Chi-Tse Huang, Yu-Chuan Chuang, Kuang-Chao Chou, A. Wu
{"title":"BFP-CIM: Data-Free Quantization with Dynamic Block-Floating-Point Arithmetic for Energy-Efficient Computing-In-Memory-based Accelerator","authors":"Cheng-Yang Chang, Chi-Tse Huang, Yu-Chuan Chuang, Kuang-Chao Chou, A. Wu","doi":"10.1109/ASP-DAC58780.2024.10473797","DOIUrl":null,"url":null,"abstract":"Convolutional neural networks (CNNs) are known for their exceptional performance in various applications; however, their energy consumption during inference can be substantial. Analog Computing-In-Memory (CIM) has shown promise in enhancing the energy efficiency of CNNs, but the use of analog-to-digital converters (ADCs) remains a challenge. ADCs convert analog partial sums from CIM crossbar arrays to digital values, with high-precision ADCs accounting for over 60% of the system’s energy. Researchers have explored quantizing CNNs to use low-precision ADCs to tackle this issue, trading off accuracy for efficiency. However, these methods necessitate data-dependent adjustments to minimize accuracy loss. Instead, we observe that the first most significant toggled bit indicates the optimal quantization range for each input value. Accordingly, we propose a range-aware rounding (RAR) for runtime bit-width adjustment, eliminating the need for pre-deployment efforts. RAR can be easily integrated into a CIM accelerator using dynamic block-floating-point arithmetic. Experimental results show that our methods maintain accuracy while achieving up to 1.81 × and 2.08 × energy efficiency improvements on CIFAR-10 and ImageNet datasets, respectively, compared with state-of-the-art techniques.","PeriodicalId":518586,"journal":{"name":"2024 29th Asia and South Pacific Design Automation Conference (ASP-DAC)","volume":"39 5-6","pages":"545-550"},"PeriodicalIF":0.0000,"publicationDate":"2024-01-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2024 29th Asia and South Pacific Design Automation Conference (ASP-DAC)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ASP-DAC58780.2024.10473797","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

Abstract

Convolutional neural networks (CNNs) are known for their exceptional performance in various applications; however, their energy consumption during inference can be substantial. Analog Computing-In-Memory (CIM) has shown promise in enhancing the energy efficiency of CNNs, but the use of analog-to-digital converters (ADCs) remains a challenge. ADCs convert analog partial sums from CIM crossbar arrays to digital values, with high-precision ADCs accounting for over 60% of the system’s energy. Researchers have explored quantizing CNNs to use low-precision ADCs to tackle this issue, trading off accuracy for efficiency. However, these methods necessitate data-dependent adjustments to minimize accuracy loss. Instead, we observe that the first most significant toggled bit indicates the optimal quantization range for each input value. Accordingly, we propose a range-aware rounding (RAR) for runtime bit-width adjustment, eliminating the need for pre-deployment efforts. RAR can be easily integrated into a CIM accelerator using dynamic block-floating-point arithmetic. Experimental results show that our methods maintain accuracy while achieving up to 1.81 × and 2.08 × energy efficiency improvements on CIFAR-10 and ImageNet datasets, respectively, compared with state-of-the-art techniques.
BFP-CIM:采用动态块浮点运算的无数据量化技术,实现基于内存的高能效计算加速器
卷积神经网络(CNN)以其在各种应用中的卓越性能而著称,然而,其在推理过程中的能耗可能非常大。模拟计算内存(CIM)有望提高 CNN 的能效,但模数转换器(ADC)的使用仍是一个挑战。模数转换器将 CIM 横条阵列中的模拟部分和转换为数字值,高精度模数转换器占系统能耗的 60% 以上。研究人员已经探索过量化 CNN,使用低精度 ADC 来解决这一问题,以精度换效率。然而,这些方法需要根据数据进行调整,以尽量减少精度损失。相反,我们观察到,第一个最显著的切换位指示了每个输入值的最佳量化范围。因此,我们提出了一种用于运行时调整位宽的范围感知舍入(RAR)方法,从而消除了预先部署的需要。RAR 可以使用动态块浮点运算轻松集成到 CIM 加速器中。实验结果表明,与最先进的技术相比,我们的方法在保持准确性的同时,在 CIFAR-10 和 ImageNet 数据集上分别实现了高达 1.81 倍和 2.08 倍的能效提升。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信