An efficient architecture of truncated booth multiplier for AI application

IF 2.5 3区工程技术 Q3 COMPUTER SCIENCE, HARDWARE & ARCHITECTURE

Integration-The Vlsi Journal Pub Date : 2025-09-16 DOI:10.1016/j.vlsi.2025.102544

Shareefa Fairoose P. , Ashutosh Mishra

{"title":"An efficient architecture of truncated booth multiplier for AI application","authors":"Shareefa Fairoose P. , Ashutosh Mishra","doi":"10.1016/j.vlsi.2025.102544","DOIUrl":null,"url":null,"abstract":"<div><div>This paper presents truncated approximate booth multipliers (TR-ABMs) that are both energy- and area-efficient, leveraging novel Leading One/Zero Position Detectors (LOZPDs) and optimized approximate booth multiplier (ABM) architectures. The proposed LOZPDs enable substantial reductions in Area-Delay Product (ADP) and Power-Delay Product (PDP) relative to existing techniques. Two architectures are introduced: TR-ABM1 integrates LOZPD-based operand truncation with a conventional Booth multiplier, while TR-ABM2 employs an approximate Booth multiplier variant for further efficiency gains. The level of approximation is tunable through key design parameters, including multiplier width (<math><mi>w</mi></math>) and the number of partial product columns utilizing Approximate Partial Product Generators (APGs) and Approximate Compressors (ACs). Comprehensive error analysis is conducted via Monte Carlo simulations with 10 million random inputs, and the designs are synthesized using Cadence® Genus in 90 nm CMOS technology. The TR-ABMs are evaluated in neural network (NN) inference and 64-point Fast Fourier Transform (FFT64) applications. For MNIST handwritten digit classification, the TR-ABMs achieve accuracy on par with exact fixed-point Booth multipliers. In FFT64, the proposed designs deliver significant area and power savings over state-of-the-art approximate multipliers. Specifically, the TR-ABMs achieve 66.58%–75.59% reductions in ADP and 47.91%–60.94% reductions in PDP, while maintaining reliable computational accuracy. Overall, the TR-ABMs offer a superior accuracy-performance trade-off compared to prior approximate multipliers, making them highly suitable for energy-efficient artificial intelligence and signal processing applications.</div></div>","PeriodicalId":54973,"journal":{"name":"Integration-The Vlsi Journal","volume":"106 ","pages":"Article 102544"},"PeriodicalIF":2.5000,"publicationDate":"2025-09-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Integration-The Vlsi Journal","FirstCategoryId":"5","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0167926025002019","RegionNum":3,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"COMPUTER SCIENCE, HARDWARE & ARCHITECTURE","Score":null,"Total":0}

引用次数: 0

Abstract

This paper presents truncated approximate booth multipliers (TR-ABMs) that are both energy- and area-efficient, leveraging novel Leading One/Zero Position Detectors (LOZPDs) and optimized approximate booth multiplier (ABM) architectures. The proposed LOZPDs enable substantial reductions in Area-Delay Product (ADP) and Power-Delay Product (PDP) relative to existing techniques. Two architectures are introduced: TR-ABM1 integrates LOZPD-based operand truncation with a conventional Booth multiplier, while TR-ABM2 employs an approximate Booth multiplier variant for further efficiency gains. The level of approximation is tunable through key design parameters, including multiplier width (

w

) and the number of partial product columns utilizing Approximate Partial Product Generators (APGs) and Approximate Compressors (ACs). Comprehensive error analysis is conducted via Monte Carlo simulations with 10 million random inputs, and the designs are synthesized using Cadence® Genus in 90 nm CMOS technology. The TR-ABMs are evaluated in neural network (NN) inference and 64-point Fast Fourier Transform (FFT64) applications. For MNIST handwritten digit classification, the TR-ABMs achieve accuracy on par with exact fixed-point Booth multipliers. In FFT64, the proposed designs deliver significant area and power savings over state-of-the-art approximate multipliers. Specifically, the TR-ABMs achieve 66.58%–75.59% reductions in ADP and 47.91%–60.94% reductions in PDP, while maintaining reliable computational accuracy. Overall, the TR-ABMs offer a superior accuracy-performance trade-off compared to prior approximate multipliers, making them highly suitable for energy-efficient artificial intelligence and signal processing applications.

查看原文本刊更多论文

一种适用于人工智能应用的截断展位乘法器的高效架构

本文介绍了截断近似展位乘法器（TR-ABMs），它既节能又节省面积，利用新颖的领先一/零位置检测器（lozpd）和优化的近似展位乘法器（ABM）架构。与现有技术相比，拟议的lozpd可以大幅降低面积延迟积（ADP）和功率延迟积（PDP）。引入了两种架构：TR-ABM1将基于lozpd的操作数截断与传统的Booth乘法器集成在一起，而TR-ABM2采用近似的Booth乘法器变体以进一步提高效率。近似水平可通过关键设计参数进行调整，包括乘法器宽度(w)和利用近似部分乘积生成器（apg）和近似压缩器（ac）的部分乘积列的数量。通过蒙特卡罗模拟对1000万个随机输入进行了全面的误差分析，并使用Cadence®Genus在90纳米CMOS技术中合成了这些设计。在神经网络（NN）推理和64点快速傅里叶变换（FFT64）应用中对TR-ABMs进行了评估。对于MNIST手写数字分类，TR-ABMs达到与精确定点布斯乘法器相当的精度。在FFT64中，与最先进的近似乘法器相比，拟议的设计提供了显着的面积和功耗节省。具体来说，TR-ABMs在保持可靠的计算精度的同时，ADP降低66.58%-75.59%，PDP降低47.91%-60.94%。总的来说，与之前的近似乘法器相比，TR-ABMs提供了卓越的精度和性能权衡，使其非常适合节能的人工智能和信号处理应用。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Integration-The Vlsi Journal 工程技术-工程：电子与电气

CiteScore

3.80

自引率

5.30%

发文量

107

审稿时长

6 months

期刊介绍： Integration''s aim is to cover every aspect of the VLSI area, with an emphasis on cross-fertilization between various fields of science, and the design, verification, test and applications of integrated circuits and systems, as well as closely related topics in process and device technologies. Individual issues will feature peer-reviewed tutorials and articles as well as reviews of recent publications. The intended coverage of the journal can be assessed by examining the following (non-exclusive) list of topics: Specification methods and languages; Analog/Digital Integrated Circuits and Systems; VLSI architectures; Algorithms, methods and tools for modeling, simulation, synthesis and verification of integrated circuits and systems of any complexity; Embedded systems; High-level synthesis for VLSI systems; Logic synthesis and finite automata; Testing, design-for-test and test generation algorithms; Physical design; Formal verification; Algorithms implemented in VLSI systems; Systems engineering; Heterogeneous systems.