BFP-CIM: Data-Free Quantization with Dynamic Block-Floating-Point Arithmetic for Energy-Efficient Computing-In-Memory-based Accelerator

2024 29th Asia and South Pacific Design Automation Conference (ASP-DAC) Pub Date : 2024-01-22 DOI:10.1109/ASP-DAC58780.2024.10473797

Cheng-Yang Chang, Chi-Tse Huang, Yu-Chuan Chuang, Kuang-Chao Chou, A. Wu

{"title":"BFP-CIM: Data-Free Quantization with Dynamic Block-Floating-Point Arithmetic for Energy-Efficient Computing-In-Memory-based Accelerator","authors":"Cheng-Yang Chang, Chi-Tse Huang, Yu-Chuan Chuang, Kuang-Chao Chou, A. Wu","doi":"10.1109/ASP-DAC58780.2024.10473797","DOIUrl":null,"url":null,"abstract":"Convolutional neural networks (CNNs) are known for their exceptional performance in various applications; however, their energy consumption during inference can be substantial. Analog Computing-In-Memory (CIM) has shown promise in enhancing the energy efficiency of CNNs, but the use of analog-to-digital converters (ADCs) remains a challenge. ADCs convert analog partial sums from CIM crossbar arrays to digital values, with high-precision ADCs accounting for over 60% of the system’s energy. Researchers have explored quantizing CNNs to use low-precision ADCs to tackle this issue, trading off accuracy for efficiency. However, these methods necessitate data-dependent adjustments to minimize accuracy loss. Instead, we observe that the first most significant toggled bit indicates the optimal quantization range for each input value. Accordingly, we propose a range-aware rounding (RAR) for runtime bit-width adjustment, eliminating the need for pre-deployment efforts. RAR can be easily integrated into a CIM accelerator using dynamic block-floating-point arithmetic. Experimental results show that our methods maintain accuracy while achieving up to 1.81 × and 2.08 × energy efficiency improvements on CIFAR-10 and ImageNet datasets, respectively, compared with state-of-the-art techniques.","PeriodicalId":518586,"journal":{"name":"2024 29th Asia and South Pacific Design Automation Conference (ASP-DAC)","volume":"39 5-6","pages":"545-550"},"PeriodicalIF":0.0000,"publicationDate":"2024-01-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2024 29th Asia and South Pacific Design Automation Conference (ASP-DAC)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ASP-DAC58780.2024.10473797","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

Abstract

Convolutional neural networks (CNNs) are known for their exceptional performance in various applications; however, their energy consumption during inference can be substantial. Analog Computing-In-Memory (CIM) has shown promise in enhancing the energy efficiency of CNNs, but the use of analog-to-digital converters (ADCs) remains a challenge. ADCs convert analog partial sums from CIM crossbar arrays to digital values, with high-precision ADCs accounting for over 60% of the system’s energy. Researchers have explored quantizing CNNs to use low-precision ADCs to tackle this issue, trading off accuracy for efficiency. However, these methods necessitate data-dependent adjustments to minimize accuracy loss. Instead, we observe that the first most significant toggled bit indicates the optimal quantization range for each input value. Accordingly, we propose a range-aware rounding (RAR) for runtime bit-width adjustment, eliminating the need for pre-deployment efforts. RAR can be easily integrated into a CIM accelerator using dynamic block-floating-point arithmetic. Experimental results show that our methods maintain accuracy while achieving up to 1.81 × and 2.08 × energy efficiency improvements on CIFAR-10 and ImageNet datasets, respectively, compared with state-of-the-art techniques.

查看原文本刊更多论文

BFP-CIM：采用动态块浮点运算的无数据量化技术，实现基于内存的高能效计算加速器

卷积神经网络（CNN）以其在各种应用中的卓越性能而著称，然而，其在推理过程中的能耗可能非常大。模拟计算内存（CIM）有望提高 CNN 的能效，但模数转换器（ADC）的使用仍是一个挑战。模数转换器将 CIM 横条阵列中的模拟部分和转换为数字值，高精度模数转换器占系统能耗的 60% 以上。研究人员已经探索过量化 CNN，使用低精度 ADC 来解决这一问题，以精度换效率。然而，这些方法需要根据数据进行调整，以尽量减少精度损失。相反，我们观察到，第一个最显著的切换位指示了每个输入值的最佳量化范围。因此，我们提出了一种用于运行时调整位宽的范围感知舍入（RAR）方法，从而消除了预先部署的需要。RAR 可以使用动态块浮点运算轻松集成到 CIM 加速器中。实验结果表明，与最先进的技术相比，我们的方法在保持准确性的同时，在 CIFAR-10 和 ImageNet 数据集上分别实现了高达 1.81 倍和 2.08 倍的能效提升。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

2024 29th Asia and South Pacific Design Automation Conference (ASP-DAC)

自引率

0.00%

发文量