Yang Zhou;Yang Wang;Wenxin Yin;Yazheng Jiang;Jingchuan Wei;Yubin Qin;Yang Hu;Shaojun Wei;Shouyi Yin
{"title":"BeaCIM: A Digital Compute-in-Memory DNN Processor With Bi-Directional Exponent Alignment for FP8 Training","authors":"Yang Zhou;Yang Wang;Wenxin Yin;Yazheng Jiang;Jingchuan Wei;Yubin Qin;Yang Hu;Shaojun Wei;Shouyi Yin","doi":"10.1109/TCSII.2025.3541101","DOIUrl":null,"url":null,"abstract":"Previous digital Compute-In-Memory (DCIM) exhibits limitations in Deep Neural Network (DNN) training with FP8, which is playing an increasingly important role in model training. Most of the previous DCIM purely align exponents to the maximum one when computing floating-point data. This method struggles to balance accuracy and energy efficiency when using FP8. In this brief, we propose BeaCIM, a DCIM processor with bi-directional exponent alignment. There are several contributions. First, we propose a new exponent alignment mechanism, which can dynamically adjust the shared exponent <inline-formula> <tex-math>$(E^{*})$ </tex-math></inline-formula> towards the numerical distribution center of exponents. Second, we use the Shift-on-Product (SOP) method to address the limitation of data bitwidth, and present <inline-formula> <tex-math>$E^{*}$ </tex-math></inline-formula> calculator based on the Ordinary Least Squares (OLS). Finally, we performed RTL-level circuit implementation and evaluated BeaCIM using various datasets and models. Our experiments show the following results: <xref>(1)</xref> Compared with the standard hybrid FP8 training, our bi-directional exponent alignment FP8 training exhibits an average top-1 accuracy drop less than 0.75% across different models and datasets. <xref>(2)</xref> BeaCIM can achieve an energy efficiency of 31.5 TFLOPS/W at FP8, which is 2.6-<inline-formula> <tex-math>$19.3\\times $ </tex-math></inline-formula> better than state-of-the-art works.","PeriodicalId":13101,"journal":{"name":"IEEE Transactions on Circuits and Systems II: Express Briefs","volume":"72 4","pages":"608-612"},"PeriodicalIF":4.9000,"publicationDate":"2025-02-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Transactions on Circuits and Systems II: Express Briefs","FirstCategoryId":"5","ListUrlMain":"https://ieeexplore.ieee.org/document/10879560/","RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"ENGINEERING, ELECTRICAL & ELECTRONIC","Score":null,"Total":0}
引用次数: 0
Abstract
Previous digital Compute-In-Memory (DCIM) exhibits limitations in Deep Neural Network (DNN) training with FP8, which is playing an increasingly important role in model training. Most of the previous DCIM purely align exponents to the maximum one when computing floating-point data. This method struggles to balance accuracy and energy efficiency when using FP8. In this brief, we propose BeaCIM, a DCIM processor with bi-directional exponent alignment. There are several contributions. First, we propose a new exponent alignment mechanism, which can dynamically adjust the shared exponent $(E^{*})$ towards the numerical distribution center of exponents. Second, we use the Shift-on-Product (SOP) method to address the limitation of data bitwidth, and present $E^{*}$ calculator based on the Ordinary Least Squares (OLS). Finally, we performed RTL-level circuit implementation and evaluated BeaCIM using various datasets and models. Our experiments show the following results: (1) Compared with the standard hybrid FP8 training, our bi-directional exponent alignment FP8 training exhibits an average top-1 accuracy drop less than 0.75% across different models and datasets. (2) BeaCIM can achieve an energy efficiency of 31.5 TFLOPS/W at FP8, which is 2.6-$19.3\times $ better than state-of-the-art works.
期刊介绍:
TCAS II publishes brief papers in the field specified by the theory, analysis, design, and practical implementations of circuits, and the application of circuit techniques to systems and to signal processing. Included is the whole spectrum from basic scientific theory to industrial applications. The field of interest covered includes:
Circuits: Analog, Digital and Mixed Signal Circuits and Systems
Nonlinear Circuits and Systems, Integrated Sensors, MEMS and Systems on Chip, Nanoscale Circuits and Systems, Optoelectronic
Circuits and Systems, Power Electronics and Systems
Software for Analog-and-Logic Circuits and Systems
Control aspects of Circuits and Systems.