{"title":"MBS:一种高精度的Softmax逼近方法和高效的硬件实现","authors":"Yuanchen Wu;Zhiheng Xie;Hongbing Pan;Yuxuan Wang","doi":"10.1109/TCSI.2025.3559069","DOIUrl":null,"url":null,"abstract":"The softmax function needs to be frequently used in the multi-head attention layer of Transformer networks. Compared to DNNs and other networks, Transformers have higher computational complexity, requiring higher accuracy and hardware performance for softmax function calculations. Therefore, we propose mixed-base softmax (MBS) for the first time for the approximation of the softmax function. This method combines exponential functions with bases of 2 and 4, which is advantageous for hardware implementation. MBS has a high similarity to the softmax function and demonstrates advanced performance during inference in Transformer network. Through algorithm transformation and hardware optimization, we have designed a low-complexity and highly parallel hardware architecture, which only occupies few additional hardware resources compared to base-2 softmax but achieves higher accuracy. Experimental results show that, under TSMC 90nm CMOS technology at the frequency of 0.5 GHz, our design can achieve the efficiency of 236.18 Gps/(mm<inline-formula> <tex-math>${^{{2}}} \\cdot $ </tex-math></inline-formula>mW) with the area of <inline-formula> <tex-math>$4234~\\mu $ </tex-math></inline-formula>m2. Furthermore, MBS exhibits higher computational accuracy and inference precision compared with base-2 softmax.","PeriodicalId":13039,"journal":{"name":"IEEE Transactions on Circuits and Systems I: Regular Papers","volume":"72 7","pages":"3366-3375"},"PeriodicalIF":5.2000,"publicationDate":"2025-04-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"MBS: A High-Precision Approximation Method for Softmax and Efficient Hardware Implementation\",\"authors\":\"Yuanchen Wu;Zhiheng Xie;Hongbing Pan;Yuxuan Wang\",\"doi\":\"10.1109/TCSI.2025.3559069\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"The softmax function needs to be frequently used in the multi-head attention layer of Transformer networks. Compared to DNNs and other networks, Transformers have higher computational complexity, requiring higher accuracy and hardware performance for softmax function calculations. Therefore, we propose mixed-base softmax (MBS) for the first time for the approximation of the softmax function. This method combines exponential functions with bases of 2 and 4, which is advantageous for hardware implementation. MBS has a high similarity to the softmax function and demonstrates advanced performance during inference in Transformer network. Through algorithm transformation and hardware optimization, we have designed a low-complexity and highly parallel hardware architecture, which only occupies few additional hardware resources compared to base-2 softmax but achieves higher accuracy. Experimental results show that, under TSMC 90nm CMOS technology at the frequency of 0.5 GHz, our design can achieve the efficiency of 236.18 Gps/(mm<inline-formula> <tex-math>${^{{2}}} \\\\cdot $ </tex-math></inline-formula>mW) with the area of <inline-formula> <tex-math>$4234~\\\\mu $ </tex-math></inline-formula>m2. Furthermore, MBS exhibits higher computational accuracy and inference precision compared with base-2 softmax.\",\"PeriodicalId\":13039,\"journal\":{\"name\":\"IEEE Transactions on Circuits and Systems I: Regular Papers\",\"volume\":\"72 7\",\"pages\":\"3366-3375\"},\"PeriodicalIF\":5.2000,\"publicationDate\":\"2025-04-16\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"IEEE Transactions on Circuits and Systems I: Regular Papers\",\"FirstCategoryId\":\"5\",\"ListUrlMain\":\"https://ieeexplore.ieee.org/document/10966265/\",\"RegionNum\":1,\"RegionCategory\":\"工程技术\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"ENGINEERING, ELECTRICAL & ELECTRONIC\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Transactions on Circuits and Systems I: Regular Papers","FirstCategoryId":"5","ListUrlMain":"https://ieeexplore.ieee.org/document/10966265/","RegionNum":1,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"ENGINEERING, ELECTRICAL & ELECTRONIC","Score":null,"Total":0}
MBS: A High-Precision Approximation Method for Softmax and Efficient Hardware Implementation
The softmax function needs to be frequently used in the multi-head attention layer of Transformer networks. Compared to DNNs and other networks, Transformers have higher computational complexity, requiring higher accuracy and hardware performance for softmax function calculations. Therefore, we propose mixed-base softmax (MBS) for the first time for the approximation of the softmax function. This method combines exponential functions with bases of 2 and 4, which is advantageous for hardware implementation. MBS has a high similarity to the softmax function and demonstrates advanced performance during inference in Transformer network. Through algorithm transformation and hardware optimization, we have designed a low-complexity and highly parallel hardware architecture, which only occupies few additional hardware resources compared to base-2 softmax but achieves higher accuracy. Experimental results show that, under TSMC 90nm CMOS technology at the frequency of 0.5 GHz, our design can achieve the efficiency of 236.18 Gps/(mm${^{{2}}} \cdot $ mW) with the area of $4234~\mu $ m2. Furthermore, MBS exhibits higher computational accuracy and inference precision compared with base-2 softmax.
期刊介绍:
TCAS I publishes regular papers in the field specified by the theory, analysis, design, and practical implementations of circuits, and the application of circuit techniques to systems and to signal processing. Included is the whole spectrum from basic scientific theory to industrial applications. The field of interest covered includes: - Circuits: Analog, Digital and Mixed Signal Circuits and Systems - Nonlinear Circuits and Systems, Integrated Sensors, MEMS and Systems on Chip, Nanoscale Circuits and Systems, Optoelectronic - Circuits and Systems, Power Electronics and Systems - Software for Analog-and-Logic Circuits and Systems - Control aspects of Circuits and Systems.