高效DNN推理的乘和最大/最小层的硬件实现

IF 4.9 2区 工程技术 Q2 ENGINEERING, ELECTRICAL & ELECTRONIC
Fanny Spagnolo;Pasquale Corsonello;Stefania Perri
{"title":"高效DNN推理的乘和最大/最小层的硬件实现","authors":"Fanny Spagnolo;Pasquale Corsonello;Stefania Perri","doi":"10.1109/TCSII.2025.3581784","DOIUrl":null,"url":null,"abstract":"This Brief presents HIMAM: the first hardware implementation of the Multiply-and-Max/Min (MAM) layers, recently proposed as an effective alternative to the traditional Multiply-and-Accumulate (MAC) paradigm used in Deep Neural Networks (DNNs). The proposed design relies on a specialized hardware architecture that uses floating-point arithmetic and was devised to implement the unconventional multiply then compare-and-add pipeline involved in MAM layers. Based on the observation that such a paradigm actually requires just few product operations to be accurately computed, we propose to replace the most computational intensive components with approximate ones. The FPGA-based implementation carried out on a Zynq Ultrascale+ device and operating in 32-bit floating-point mode exhibits 10.91 GFLOPS/W. When implemented on a 28-nm FDSOI technology process, such an architecture dissipates only 5.3 mW running at 250 MHz, which is at least 41.7% lower than the MAC-based state-of-the-art hardware architectures.","PeriodicalId":13101,"journal":{"name":"IEEE Transactions on Circuits and Systems II: Express Briefs","volume":"72 8","pages":"1083-1087"},"PeriodicalIF":4.9000,"publicationDate":"2025-06-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=11045655","citationCount":"0","resultStr":"{\"title\":\"HIMAM: Hardware Implementation of Multiply-and-Max/Min Layers for Energy-Efficient DNN Inference\",\"authors\":\"Fanny Spagnolo;Pasquale Corsonello;Stefania Perri\",\"doi\":\"10.1109/TCSII.2025.3581784\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"This Brief presents HIMAM: the first hardware implementation of the Multiply-and-Max/Min (MAM) layers, recently proposed as an effective alternative to the traditional Multiply-and-Accumulate (MAC) paradigm used in Deep Neural Networks (DNNs). The proposed design relies on a specialized hardware architecture that uses floating-point arithmetic and was devised to implement the unconventional multiply then compare-and-add pipeline involved in MAM layers. Based on the observation that such a paradigm actually requires just few product operations to be accurately computed, we propose to replace the most computational intensive components with approximate ones. The FPGA-based implementation carried out on a Zynq Ultrascale+ device and operating in 32-bit floating-point mode exhibits 10.91 GFLOPS/W. When implemented on a 28-nm FDSOI technology process, such an architecture dissipates only 5.3 mW running at 250 MHz, which is at least 41.7% lower than the MAC-based state-of-the-art hardware architectures.\",\"PeriodicalId\":13101,\"journal\":{\"name\":\"IEEE Transactions on Circuits and Systems II: Express Briefs\",\"volume\":\"72 8\",\"pages\":\"1083-1087\"},\"PeriodicalIF\":4.9000,\"publicationDate\":\"2025-06-20\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=11045655\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"IEEE Transactions on Circuits and Systems II: Express Briefs\",\"FirstCategoryId\":\"5\",\"ListUrlMain\":\"https://ieeexplore.ieee.org/document/11045655/\",\"RegionNum\":2,\"RegionCategory\":\"工程技术\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q2\",\"JCRName\":\"ENGINEERING, ELECTRICAL & ELECTRONIC\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Transactions on Circuits and Systems II: Express Briefs","FirstCategoryId":"5","ListUrlMain":"https://ieeexplore.ieee.org/document/11045655/","RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"ENGINEERING, ELECTRICAL & ELECTRONIC","Score":null,"Total":0}
引用次数: 0

摘要

本摘要介绍了HIMAM:乘法和最大/最小(MAM)层的第一个硬件实现,最近被提出作为深度神经网络(dnn)中使用的传统乘法和累积(MAC)范式的有效替代方案。所提出的设计依赖于使用浮点运算的专用硬件架构,并被设计用于实现MAM层中涉及的非常规乘法然后比较和加法管道。根据观察,这样的范例实际上只需要很少的产品操作来精确计算,我们建议用近似的组件代替最计算密集的组件。基于fpga的实现在Zynq Ultrascale+器件上进行,在32位浮点模式下运行,其性能为10.91 GFLOPS/W。当在28纳米FDSOI技术工艺上实现时,这种架构在250 MHz下的功耗仅为5.3 mW,比基于mac的最先进硬件架构低至少41.7%。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
HIMAM: Hardware Implementation of Multiply-and-Max/Min Layers for Energy-Efficient DNN Inference
This Brief presents HIMAM: the first hardware implementation of the Multiply-and-Max/Min (MAM) layers, recently proposed as an effective alternative to the traditional Multiply-and-Accumulate (MAC) paradigm used in Deep Neural Networks (DNNs). The proposed design relies on a specialized hardware architecture that uses floating-point arithmetic and was devised to implement the unconventional multiply then compare-and-add pipeline involved in MAM layers. Based on the observation that such a paradigm actually requires just few product operations to be accurately computed, we propose to replace the most computational intensive components with approximate ones. The FPGA-based implementation carried out on a Zynq Ultrascale+ device and operating in 32-bit floating-point mode exhibits 10.91 GFLOPS/W. When implemented on a 28-nm FDSOI technology process, such an architecture dissipates only 5.3 mW running at 250 MHz, which is at least 41.7% lower than the MAC-based state-of-the-art hardware architectures.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
IEEE Transactions on Circuits and Systems II: Express Briefs
IEEE Transactions on Circuits and Systems II: Express Briefs 工程技术-工程:电子与电气
CiteScore
7.90
自引率
20.50%
发文量
883
审稿时长
3.0 months
期刊介绍: TCAS II publishes brief papers in the field specified by the theory, analysis, design, and practical implementations of circuits, and the application of circuit techniques to systems and to signal processing. Included is the whole spectrum from basic scientific theory to industrial applications. The field of interest covered includes: Circuits: Analog, Digital and Mixed Signal Circuits and Systems Nonlinear Circuits and Systems, Integrated Sensors, MEMS and Systems on Chip, Nanoscale Circuits and Systems, Optoelectronic Circuits and Systems, Power Electronics and Systems Software for Analog-and-Logic Circuits and Systems Control aspects of Circuits and Systems.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信