高效DNN推理的乘和最大/最小层的硬件实现

IF 4.9 2区工程技术 Q2 ENGINEERING, ELECTRICAL & ELECTRONIC

IEEE Transactions on Circuits and Systems II: Express Briefs Pub Date : 2025-06-20 DOI:10.1109/TCSII.2025.3581784

Fanny Spagnolo;Pasquale Corsonello;Stefania Perri

{"title":"高效DNN推理的乘和最大/最小层的硬件实现","authors":"Fanny Spagnolo;Pasquale Corsonello;Stefania Perri","doi":"10.1109/TCSII.2025.3581784","DOIUrl":null,"url":null,"abstract":"This Brief presents HIMAM: the first hardware implementation of the Multiply-and-Max/Min (MAM) layers, recently proposed as an effective alternative to the traditional Multiply-and-Accumulate (MAC) paradigm used in Deep Neural Networks (DNNs). The proposed design relies on a specialized hardware architecture that uses floating-point arithmetic and was devised to implement the unconventional multiply then compare-and-add pipeline involved in MAM layers. Based on the observation that such a paradigm actually requires just few product operations to be accurately computed, we propose to replace the most computational intensive components with approximate ones. The FPGA-based implementation carried out on a Zynq Ultrascale+ device and operating in 32-bit floating-point mode exhibits 10.91 GFLOPS/W. When implemented on a 28-nm FDSOI technology process, such an architecture dissipates only 5.3 mW running at 250 MHz, which is at least 41.7% lower than the MAC-based state-of-the-art hardware architectures.","PeriodicalId":13101,"journal":{"name":"IEEE Transactions on Circuits and Systems II: Express Briefs","volume":"72 8","pages":"1083-1087"},"PeriodicalIF":4.9000,"publicationDate":"2025-06-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=11045655","citationCount":"0","resultStr":"{\"title\":\"HIMAM: Hardware Implementation of Multiply-and-Max/Min Layers for Energy-Efficient DNN Inference\",\"authors\":\"Fanny Spagnolo;Pasquale Corsonello;Stefania Perri\",\"doi\":\"10.1109/TCSII.2025.3581784\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"This Brief presents HIMAM: the first hardware implementation of the Multiply-and-Max/Min (MAM) layers, recently proposed as an effective alternative to the traditional Multiply-and-Accumulate (MAC) paradigm used in Deep Neural Networks (DNNs). The proposed design relies on a specialized hardware architecture that uses floating-point arithmetic and was devised to implement the unconventional multiply then compare-and-add pipeline involved in MAM layers. Based on the observation that such a paradigm actually requires just few product operations to be accurately computed, we propose to replace the most computational intensive components with approximate ones. The FPGA-based implementation carried out on a Zynq Ultrascale+ device and operating in 32-bit floating-point mode exhibits 10.91 GFLOPS/W. When implemented on a 28-nm FDSOI technology process, such an architecture dissipates only 5.3 mW running at 250 MHz, which is at least 41.7% lower than the MAC-based state-of-the-art hardware architectures.\",\"PeriodicalId\":13101,\"journal\":{\"name\":\"IEEE Transactions on Circuits and Systems II: Express Briefs\",\"volume\":\"72 8\",\"pages\":\"1083-1087\"},\"PeriodicalIF\":4.9000,\"publicationDate\":\"2025-06-20\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=11045655\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"IEEE Transactions on Circuits and Systems II: Express Briefs\",\"FirstCategoryId\":\"5\",\"ListUrlMain\":\"https://ieeexplore.ieee.org/document/11045655/\",\"RegionNum\":2,\"RegionCategory\":\"工程技术\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q2\",\"JCRName\":\"ENGINEERING, ELECTRICAL & ELECTRONIC\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Transactions on Circuits and Systems II: Express Briefs","FirstCategoryId":"5","ListUrlMain":"https://ieeexplore.ieee.org/document/11045655/","RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"ENGINEERING, ELECTRICAL & ELECTRONIC","Score":null,"Total":0}

引用次数: 0

摘要

本摘要介绍了HIMAM：乘法和最大/最小（MAM）层的第一个硬件实现，最近被提出作为深度神经网络（dnn）中使用的传统乘法和累积（MAC）范式的有效替代方案。所提出的设计依赖于使用浮点运算的专用硬件架构，并被设计用于实现MAM层中涉及的非常规乘法然后比较和加法管道。根据观察，这样的范例实际上只需要很少的产品操作来精确计算，我们建议用近似的组件代替最计算密集的组件。基于fpga的实现在Zynq Ultrascale+器件上进行，在32位浮点模式下运行，其性能为10.91 GFLOPS/W。当在28纳米FDSOI技术工艺上实现时，这种架构在250 MHz下的功耗仅为5.3 mW，比基于mac的最先进硬件架构低至少41.7%。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

HIMAM: Hardware Implementation of Multiply-and-Max/Min Layers for Energy-Efficient DNN Inference

This Brief presents HIMAM: the first hardware implementation of the Multiply-and-Max/Min (MAM) layers, recently proposed as an effective alternative to the traditional Multiply-and-Accumulate (MAC) paradigm used in Deep Neural Networks (DNNs). The proposed design relies on a specialized hardware architecture that uses floating-point arithmetic and was devised to implement the unconventional multiply then compare-and-add pipeline involved in MAM layers. Based on the observation that such a paradigm actually requires just few product operations to be accurately computed, we propose to replace the most computational intensive components with approximate ones. The FPGA-based implementation carried out on a Zynq Ultrascale+ device and operating in 32-bit floating-point mode exhibits 10.91 GFLOPS/W. When implemented on a 28-nm FDSOI technology process, such an architecture dissipates only 5.3 mW running at 250 MHz, which is at least 41.7% lower than the MAC-based state-of-the-art hardware architectures.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

IEEE Transactions on Circuits and Systems II: Express Briefs 工程技术-工程：电子与电气

CiteScore

7.90

自引率

20.50%

发文量

883

审稿时长

3.0 months

期刊介绍： TCAS II publishes brief papers in the field specified by the theory, analysis, design, and practical implementations of circuits, and the application of circuit techniques to systems and to signal processing. Included is the whole spectrum from basic scientific theory to industrial applications. The field of interest covered includes: Circuits: Analog, Digital and Mixed Signal Circuits and Systems Nonlinear Circuits and Systems, Integrated Sensors, MEMS and Systems on Chip, Nanoscale Circuits and Systems, Optoelectronic Circuits and Systems, Power Electronics and Systems Software for Analog-and-Logic Circuits and Systems Control aspects of Circuits and Systems.