{"title":"HIMAM: Hardware Implementation of Multiply-and-Max/Min Layers for Energy-Efficient DNN Inference","authors":"Fanny Spagnolo;Pasquale Corsonello;Stefania Perri","doi":"10.1109/TCSII.2025.3581784","DOIUrl":null,"url":null,"abstract":"This Brief presents HIMAM: the first hardware implementation of the Multiply-and-Max/Min (MAM) layers, recently proposed as an effective alternative to the traditional Multiply-and-Accumulate (MAC) paradigm used in Deep Neural Networks (DNNs). The proposed design relies on a specialized hardware architecture that uses floating-point arithmetic and was devised to implement the unconventional multiply then compare-and-add pipeline involved in MAM layers. Based on the observation that such a paradigm actually requires just few product operations to be accurately computed, we propose to replace the most computational intensive components with approximate ones. The FPGA-based implementation carried out on a Zynq Ultrascale+ device and operating in 32-bit floating-point mode exhibits 10.91 GFLOPS/W. When implemented on a 28-nm FDSOI technology process, such an architecture dissipates only 5.3 mW running at 250 MHz, which is at least 41.7% lower than the MAC-based state-of-the-art hardware architectures.","PeriodicalId":13101,"journal":{"name":"IEEE Transactions on Circuits and Systems II: Express Briefs","volume":"72 8","pages":"1083-1087"},"PeriodicalIF":4.9000,"publicationDate":"2025-06-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=11045655","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Transactions on Circuits and Systems II: Express Briefs","FirstCategoryId":"5","ListUrlMain":"https://ieeexplore.ieee.org/document/11045655/","RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"ENGINEERING, ELECTRICAL & ELECTRONIC","Score":null,"Total":0}
引用次数: 0
Abstract
This Brief presents HIMAM: the first hardware implementation of the Multiply-and-Max/Min (MAM) layers, recently proposed as an effective alternative to the traditional Multiply-and-Accumulate (MAC) paradigm used in Deep Neural Networks (DNNs). The proposed design relies on a specialized hardware architecture that uses floating-point arithmetic and was devised to implement the unconventional multiply then compare-and-add pipeline involved in MAM layers. Based on the observation that such a paradigm actually requires just few product operations to be accurately computed, we propose to replace the most computational intensive components with approximate ones. The FPGA-based implementation carried out on a Zynq Ultrascale+ device and operating in 32-bit floating-point mode exhibits 10.91 GFLOPS/W. When implemented on a 28-nm FDSOI technology process, such an architecture dissipates only 5.3 mW running at 250 MHz, which is at least 41.7% lower than the MAC-based state-of-the-art hardware architectures.
期刊介绍:
TCAS II publishes brief papers in the field specified by the theory, analysis, design, and practical implementations of circuits, and the application of circuit techniques to systems and to signal processing. Included is the whole spectrum from basic scientific theory to industrial applications. The field of interest covered includes:
Circuits: Analog, Digital and Mixed Signal Circuits and Systems
Nonlinear Circuits and Systems, Integrated Sensors, MEMS and Systems on Chip, Nanoscale Circuits and Systems, Optoelectronic
Circuits and Systems, Power Electronics and Systems
Software for Analog-and-Logic Circuits and Systems
Control aspects of Circuits and Systems.