一种适用于人工智能应用的截断展位乘法器的高效架构

IF 2.5 3区 工程技术 Q3 COMPUTER SCIENCE, HARDWARE & ARCHITECTURE
Shareefa Fairoose P. , Ashutosh Mishra
{"title":"一种适用于人工智能应用的截断展位乘法器的高效架构","authors":"Shareefa Fairoose P. ,&nbsp;Ashutosh Mishra","doi":"10.1016/j.vlsi.2025.102544","DOIUrl":null,"url":null,"abstract":"<div><div>This paper presents truncated approximate booth multipliers (TR-ABMs) that are both energy- and area-efficient, leveraging novel Leading One/Zero Position Detectors (LOZPDs) and optimized approximate booth multiplier (ABM) architectures. The proposed <span>LOZPD</span>s enable substantial reductions in Area-Delay Product (<span>ADP</span>) and Power-Delay Product (<span>PDP</span>) relative to existing techniques. Two architectures are introduced: <span>TR-ABM1</span> integrates <span>LOZPD</span>-based operand truncation with a conventional Booth multiplier, while <span>TR-ABM2</span> employs an approximate Booth multiplier variant for further efficiency gains. The level of approximation is tunable through key design parameters, including multiplier width (<span><math><mi>w</mi></math></span>) and the number of partial product columns utilizing Approximate Partial Product Generators (<span>APG</span>s) and Approximate Compressors (<span>AC</span>s). Comprehensive error analysis is conducted via Monte Carlo simulations with 10 million random inputs, and the designs are synthesized using Cadence® Genus in 90 nm CMOS technology. The <span>TR-ABM</span>s are evaluated in neural network (<span>NN</span>) inference and 64-point Fast Fourier Transform (<span>FFT64</span>) applications. For MNIST handwritten digit classification, the <span>TR-ABM</span>s achieve accuracy on par with exact fixed-point Booth multipliers. In <span>FFT64</span>, the proposed designs deliver significant area and power savings over state-of-the-art approximate multipliers. Specifically, the <span>TR-ABM</span>s achieve 66.58%–75.59% reductions in <span>ADP</span> and 47.91%–60.94% reductions in <span>PDP</span>, while maintaining reliable computational accuracy. Overall, the <span>TR-ABM</span>s offer a superior accuracy-performance trade-off compared to prior approximate multipliers, making them highly suitable for energy-efficient artificial intelligence and signal processing applications.</div></div>","PeriodicalId":54973,"journal":{"name":"Integration-The Vlsi Journal","volume":"106 ","pages":"Article 102544"},"PeriodicalIF":2.5000,"publicationDate":"2025-09-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"An efficient architecture of truncated booth multiplier for AI application\",\"authors\":\"Shareefa Fairoose P. ,&nbsp;Ashutosh Mishra\",\"doi\":\"10.1016/j.vlsi.2025.102544\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><div>This paper presents truncated approximate booth multipliers (TR-ABMs) that are both energy- and area-efficient, leveraging novel Leading One/Zero Position Detectors (LOZPDs) and optimized approximate booth multiplier (ABM) architectures. The proposed <span>LOZPD</span>s enable substantial reductions in Area-Delay Product (<span>ADP</span>) and Power-Delay Product (<span>PDP</span>) relative to existing techniques. Two architectures are introduced: <span>TR-ABM1</span> integrates <span>LOZPD</span>-based operand truncation with a conventional Booth multiplier, while <span>TR-ABM2</span> employs an approximate Booth multiplier variant for further efficiency gains. The level of approximation is tunable through key design parameters, including multiplier width (<span><math><mi>w</mi></math></span>) and the number of partial product columns utilizing Approximate Partial Product Generators (<span>APG</span>s) and Approximate Compressors (<span>AC</span>s). Comprehensive error analysis is conducted via Monte Carlo simulations with 10 million random inputs, and the designs are synthesized using Cadence® Genus in 90 nm CMOS technology. The <span>TR-ABM</span>s are evaluated in neural network (<span>NN</span>) inference and 64-point Fast Fourier Transform (<span>FFT64</span>) applications. For MNIST handwritten digit classification, the <span>TR-ABM</span>s achieve accuracy on par with exact fixed-point Booth multipliers. In <span>FFT64</span>, the proposed designs deliver significant area and power savings over state-of-the-art approximate multipliers. Specifically, the <span>TR-ABM</span>s achieve 66.58%–75.59% reductions in <span>ADP</span> and 47.91%–60.94% reductions in <span>PDP</span>, while maintaining reliable computational accuracy. Overall, the <span>TR-ABM</span>s offer a superior accuracy-performance trade-off compared to prior approximate multipliers, making them highly suitable for energy-efficient artificial intelligence and signal processing applications.</div></div>\",\"PeriodicalId\":54973,\"journal\":{\"name\":\"Integration-The Vlsi Journal\",\"volume\":\"106 \",\"pages\":\"Article 102544\"},\"PeriodicalIF\":2.5000,\"publicationDate\":\"2025-09-16\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Integration-The Vlsi Journal\",\"FirstCategoryId\":\"5\",\"ListUrlMain\":\"https://www.sciencedirect.com/science/article/pii/S0167926025002019\",\"RegionNum\":3,\"RegionCategory\":\"工程技术\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q3\",\"JCRName\":\"COMPUTER SCIENCE, HARDWARE & ARCHITECTURE\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Integration-The Vlsi Journal","FirstCategoryId":"5","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0167926025002019","RegionNum":3,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"COMPUTER SCIENCE, HARDWARE & ARCHITECTURE","Score":null,"Total":0}
引用次数: 0

摘要

本文介绍了截断近似展位乘法器(TR-ABMs),它既节能又节省面积,利用新颖的领先一/零位置检测器(lozpd)和优化的近似展位乘法器(ABM)架构。与现有技术相比,拟议的lozpd可以大幅降低面积延迟积(ADP)和功率延迟积(PDP)。引入了两种架构:TR-ABM1将基于lozpd的操作数截断与传统的Booth乘法器集成在一起,而TR-ABM2采用近似的Booth乘法器变体以进一步提高效率。近似水平可通过关键设计参数进行调整,包括乘法器宽度(w)和利用近似部分乘积生成器(apg)和近似压缩器(ac)的部分乘积列的数量。通过蒙特卡罗模拟对1000万个随机输入进行了全面的误差分析,并使用Cadence®Genus在90纳米CMOS技术中合成了这些设计。在神经网络(NN)推理和64点快速傅里叶变换(FFT64)应用中对TR-ABMs进行了评估。对于MNIST手写数字分类,TR-ABMs达到与精确定点布斯乘法器相当的精度。在FFT64中,与最先进的近似乘法器相比,拟议的设计提供了显着的面积和功耗节省。具体来说,TR-ABMs在保持可靠的计算精度的同时,ADP降低66.58%-75.59%,PDP降低47.91%-60.94%。总的来说,与之前的近似乘法器相比,TR-ABMs提供了卓越的精度和性能权衡,使其非常适合节能的人工智能和信号处理应用。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
An efficient architecture of truncated booth multiplier for AI application
This paper presents truncated approximate booth multipliers (TR-ABMs) that are both energy- and area-efficient, leveraging novel Leading One/Zero Position Detectors (LOZPDs) and optimized approximate booth multiplier (ABM) architectures. The proposed LOZPDs enable substantial reductions in Area-Delay Product (ADP) and Power-Delay Product (PDP) relative to existing techniques. Two architectures are introduced: TR-ABM1 integrates LOZPD-based operand truncation with a conventional Booth multiplier, while TR-ABM2 employs an approximate Booth multiplier variant for further efficiency gains. The level of approximation is tunable through key design parameters, including multiplier width (w) and the number of partial product columns utilizing Approximate Partial Product Generators (APGs) and Approximate Compressors (ACs). Comprehensive error analysis is conducted via Monte Carlo simulations with 10 million random inputs, and the designs are synthesized using Cadence® Genus in 90 nm CMOS technology. The TR-ABMs are evaluated in neural network (NN) inference and 64-point Fast Fourier Transform (FFT64) applications. For MNIST handwritten digit classification, the TR-ABMs achieve accuracy on par with exact fixed-point Booth multipliers. In FFT64, the proposed designs deliver significant area and power savings over state-of-the-art approximate multipliers. Specifically, the TR-ABMs achieve 66.58%–75.59% reductions in ADP and 47.91%–60.94% reductions in PDP, while maintaining reliable computational accuracy. Overall, the TR-ABMs offer a superior accuracy-performance trade-off compared to prior approximate multipliers, making them highly suitable for energy-efficient artificial intelligence and signal processing applications.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
Integration-The Vlsi Journal
Integration-The Vlsi Journal 工程技术-工程:电子与电气
CiteScore
3.80
自引率
5.30%
发文量
107
审稿时长
6 months
期刊介绍: Integration''s aim is to cover every aspect of the VLSI area, with an emphasis on cross-fertilization between various fields of science, and the design, verification, test and applications of integrated circuits and systems, as well as closely related topics in process and device technologies. Individual issues will feature peer-reviewed tutorials and articles as well as reviews of recent publications. The intended coverage of the journal can be assessed by examining the following (non-exclusive) list of topics: Specification methods and languages; Analog/Digital Integrated Circuits and Systems; VLSI architectures; Algorithms, methods and tools for modeling, simulation, synthesis and verification of integrated circuits and systems of any complexity; Embedded systems; High-level synthesis for VLSI systems; Logic synthesis and finite automata; Testing, design-for-test and test generation algorithms; Physical design; Formal verification; Algorithms implemented in VLSI systems; Systems engineering; Heterogeneous systems.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信