{"title":"支持可变位宽的二进制输入的代价和速度协同优化并行随机乘法器","authors":"Qiang He;Yudi Zhao;Zhihuai Zhang;Gang Du;Xiaofei Nie;Ye Lu;Shisheng Xiong;Kai Zhao","doi":"10.1109/TCSII.2025.3562199","DOIUrl":null,"url":null,"abstract":"Stochastic circuits offer the benefits of small area and lower power consumption. However, as the bit width of the operands increases, the area and latency of stochastic circuits also need to increase exponentially to meet the precision requirements, resulting in a decrease in performance. This brief introduces a low-cost and high-speed parallel approximate stochastic computing multiplier (PASCM), which takes binary streams as both inputs and outputs. The PASCM is suitable for multiplication operations with multi-bit width. In order to further enhance the accuracy of the PASCM, an error compensation mechanism has been proposed. To verify the performance of PASCM, validation was conducted on FPGA. The experimental results indicate that the proposed design exhibits significant area and latency advantages among existing multipliers. Take 8-bit as an example, the PASCM shows a 48.33%, 18.61%, 45.74%, and 57.95% reduction in Look-Up Table (LUT), latency, power delay product (PDP), and area delay product (ADP), respectively, compared to the 8-bit precise binary multiplier implemented using an IP core. To further validate the design, the PASCM was constructed into Multiply-Accumulate units (MAC) and applied to several image processing algorithms on FPGA. The proposed multiplier showed excellent results in terms of peak signal-to-noise ratio (PSNR) and mean structural similarity index (MSSIM), with some algorithms achieving complete consistency with binary computation results, and the hardware performance also surpasses the most advanced designs.","PeriodicalId":13101,"journal":{"name":"IEEE Transactions on Circuits and Systems II: Express Briefs","volume":"72 8","pages":"1068-1072"},"PeriodicalIF":4.9000,"publicationDate":"2025-04-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"A Cost and Speed Co-Optimized Parallel Stochastic Multiplier for Binary Inputs Supporting Variable Bit-Widths\",\"authors\":\"Qiang He;Yudi Zhao;Zhihuai Zhang;Gang Du;Xiaofei Nie;Ye Lu;Shisheng Xiong;Kai Zhao\",\"doi\":\"10.1109/TCSII.2025.3562199\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Stochastic circuits offer the benefits of small area and lower power consumption. However, as the bit width of the operands increases, the area and latency of stochastic circuits also need to increase exponentially to meet the precision requirements, resulting in a decrease in performance. This brief introduces a low-cost and high-speed parallel approximate stochastic computing multiplier (PASCM), which takes binary streams as both inputs and outputs. The PASCM is suitable for multiplication operations with multi-bit width. In order to further enhance the accuracy of the PASCM, an error compensation mechanism has been proposed. To verify the performance of PASCM, validation was conducted on FPGA. The experimental results indicate that the proposed design exhibits significant area and latency advantages among existing multipliers. Take 8-bit as an example, the PASCM shows a 48.33%, 18.61%, 45.74%, and 57.95% reduction in Look-Up Table (LUT), latency, power delay product (PDP), and area delay product (ADP), respectively, compared to the 8-bit precise binary multiplier implemented using an IP core. To further validate the design, the PASCM was constructed into Multiply-Accumulate units (MAC) and applied to several image processing algorithms on FPGA. The proposed multiplier showed excellent results in terms of peak signal-to-noise ratio (PSNR) and mean structural similarity index (MSSIM), with some algorithms achieving complete consistency with binary computation results, and the hardware performance also surpasses the most advanced designs.\",\"PeriodicalId\":13101,\"journal\":{\"name\":\"IEEE Transactions on Circuits and Systems II: Express Briefs\",\"volume\":\"72 8\",\"pages\":\"1068-1072\"},\"PeriodicalIF\":4.9000,\"publicationDate\":\"2025-04-18\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"IEEE Transactions on Circuits and Systems II: Express Briefs\",\"FirstCategoryId\":\"5\",\"ListUrlMain\":\"https://ieeexplore.ieee.org/document/10970099/\",\"RegionNum\":2,\"RegionCategory\":\"工程技术\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q2\",\"JCRName\":\"ENGINEERING, ELECTRICAL & ELECTRONIC\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Transactions on Circuits and Systems II: Express Briefs","FirstCategoryId":"5","ListUrlMain":"https://ieeexplore.ieee.org/document/10970099/","RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"ENGINEERING, ELECTRICAL & ELECTRONIC","Score":null,"Total":0}
A Cost and Speed Co-Optimized Parallel Stochastic Multiplier for Binary Inputs Supporting Variable Bit-Widths
Stochastic circuits offer the benefits of small area and lower power consumption. However, as the bit width of the operands increases, the area and latency of stochastic circuits also need to increase exponentially to meet the precision requirements, resulting in a decrease in performance. This brief introduces a low-cost and high-speed parallel approximate stochastic computing multiplier (PASCM), which takes binary streams as both inputs and outputs. The PASCM is suitable for multiplication operations with multi-bit width. In order to further enhance the accuracy of the PASCM, an error compensation mechanism has been proposed. To verify the performance of PASCM, validation was conducted on FPGA. The experimental results indicate that the proposed design exhibits significant area and latency advantages among existing multipliers. Take 8-bit as an example, the PASCM shows a 48.33%, 18.61%, 45.74%, and 57.95% reduction in Look-Up Table (LUT), latency, power delay product (PDP), and area delay product (ADP), respectively, compared to the 8-bit precise binary multiplier implemented using an IP core. To further validate the design, the PASCM was constructed into Multiply-Accumulate units (MAC) and applied to several image processing algorithms on FPGA. The proposed multiplier showed excellent results in terms of peak signal-to-noise ratio (PSNR) and mean structural similarity index (MSSIM), with some algorithms achieving complete consistency with binary computation results, and the hardware performance also surpasses the most advanced designs.
期刊介绍:
TCAS II publishes brief papers in the field specified by the theory, analysis, design, and practical implementations of circuits, and the application of circuit techniques to systems and to signal processing. Included is the whole spectrum from basic scientific theory to industrial applications. The field of interest covered includes:
Circuits: Analog, Digital and Mixed Signal Circuits and Systems
Nonlinear Circuits and Systems, Integrated Sensors, MEMS and Systems on Chip, Nanoscale Circuits and Systems, Optoelectronic
Circuits and Systems, Power Electronics and Systems
Software for Analog-and-Logic Circuits and Systems
Control aspects of Circuits and Systems.