{"title":"Dynamic precision configurable multiply and accumulate architecture for hardware accelerators","authors":"Saraswathy B. , Anita Angeline A.","doi":"10.1016/j.vlsi.2025.102419","DOIUrl":"10.1016/j.vlsi.2025.102419","url":null,"abstract":"<div><div>The advantages of mixed precision over fixed precision deep neural network (DNN) accelerators in terms of efficiency with negligible accuracy loss have gained research interest in precision scalable DNN accelerators. Since most computations in DNN accelerators are multiply and accumulate (MAC), designing precision scalable MAC optimized in terms of area and power is crucial. The bottom-up approach of designing precision scalable MAC, achieving higher precision by shifting and adding the results of 2b <span><math><mo>×</mo></math></span> 2b multipliers, has become a wide-spread research interest due to its maximum utilization of available resources. It demands the efficient design of 2b <span><math><mo>×</mo></math></span> 2b multipliers and fusion units. This paper proposes four reduced computation bit brick units, customized to support a specific combination of signed/unsigned 2b <span><math><mo>×</mo></math></span> 2b multiplication, optimized for area and power. Then, an addition-optimized hybrid fusion unit with reduced computation bit bricks is proposed. To prove the superiority of the proposed design, the existing and proposed designs are modelled in Verilog, synthesized using 45 nm CMOS technology and compared. The results demonstrate a 32% reduction in area, 36% reduction in power and a 41% reduction in power delay product of the proposed design compared to the state-of-the-art designs.</div></div>","PeriodicalId":54973,"journal":{"name":"Integration-The Vlsi Journal","volume":"103 ","pages":"Article 102419"},"PeriodicalIF":2.2,"publicationDate":"2025-04-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143882265","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Approximate subtractors designed for image processing applications","authors":"P. Divya Parameswari, A.V. Ananthalakshmi","doi":"10.1016/j.vlsi.2025.102425","DOIUrl":"10.1016/j.vlsi.2025.102425","url":null,"abstract":"<div><div>Approximate circuits play a vital role in enhancing efficiency and optimizing resource use in modern computing systems. Their benefits are particularly notable in fields that tolerate minor inaccuracies, such as image processing, signal processing, and data mining, where a slight reduction in precision can lead to substantial savings in power and space requirements. This study explores an innovative design for an approximate full subtractor based on the principle of pruning, meticulously implemented using universal two-input NOR gates, valued for their cost efficiency, low power consumption, and compact design. <strong>Existing approximate subtractors have been designed using non-universal basic gates such as XOR, XNOR, NOT, and AND gates. In contrast, the proposed approach utilizes only the universal NOR gate, leading to improved circuit efficiency in terms of area, delay, and power consumption.</strong> Additionally, this work evaluates performance metrics of approximate circuits, demonstrating their effectiveness in various image processing applications involving full subtractors.</div></div>","PeriodicalId":54973,"journal":{"name":"Integration-The Vlsi Journal","volume":"103 ","pages":"Article 102425"},"PeriodicalIF":2.2,"publicationDate":"2025-04-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143876731","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"A low-voltage and low-power PLL for Sub-GHz IoT applications","authors":"Jen-Chieh Liu, Rui-Cheng Ai","doi":"10.1016/j.vlsi.2025.102424","DOIUrl":"10.1016/j.vlsi.2025.102424","url":null,"abstract":"<div><div>This paper presents a phase-locked loop (PLL) with a standby mode, which is suitable for sub-GHz IoT systems. The multi-band self-calibration circuit (MSCC) enables a small voltage controlled oscillator (VCO) gain to improve jitter performance. In standby mode, power MOS transistors are utilized to achieve low leakage current and limit the power current. The test chip of PLL was implemented using a 90 nm CMOS process. The chip area and core area were 641 × 978 and 204 × 531 μm<sup>2</sup>, respectively. The power consumption at 915 MHz was 246.12 μW with an operating voltage of 0.6 V. In the standby mode, the power consumption of the PLL was 12.42 nW.</div></div>","PeriodicalId":54973,"journal":{"name":"Integration-The Vlsi Journal","volume":"103 ","pages":"Article 102424"},"PeriodicalIF":2.2,"publicationDate":"2025-04-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143876732","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Orbital circuit elements table: A novel approach to elementary circuit elements","authors":"Omer Faruk Tozlu , Yunus Babacan , Firat Kacar","doi":"10.1016/j.vlsi.2025.102423","DOIUrl":"10.1016/j.vlsi.2025.102423","url":null,"abstract":"<div><div>Fundamental passive circuit elements are indispensable to any circuit design. Two models have been proposed in the literature to define conventional circuit elements. This paper introduces a novel model, orbital circuit elements table, that defines these circuit elements by using trigonometric functions. Especially, relationships between the elements and their behavior under sinusoidal input voltage are presented. The model categorizes circuit elements into linear and non-linear groups, with the first orbit encompassing four fundamental elements and the subsequent orbits representing higher-order, non-linear elements. Thus, a technique was presented on how to move from one orbit to another. Thanks to the presented model the fundamental characteristics of unknown circuit elements can be found easily. Finally, the model provides a fundamental framework for designing emulator circuits.</div></div>","PeriodicalId":54973,"journal":{"name":"Integration-The Vlsi Journal","volume":"103 ","pages":"Article 102423"},"PeriodicalIF":2.2,"publicationDate":"2025-04-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143834539","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Zhiqiang Xu, Zhenmin Li, Feng Han, Xiaolei Wang, Gaoming Du
{"title":"Image encryption/decryption accelerator based on Fast Cosine Number Transform","authors":"Zhiqiang Xu, Zhenmin Li, Feng Han, Xiaolei Wang, Gaoming Du","doi":"10.1016/j.vlsi.2025.102416","DOIUrl":"10.1016/j.vlsi.2025.102416","url":null,"abstract":"<div><div>The Cosine Number Transform (CNT) and its Fast Cosine Number Transform (FCNT) are widely used in image encryption due to their modular arithmetic, which enhances computational accuracy. However, existing hardware architectures for FCNT suffer from long computation cycles and high resource consumption, making it challenging to meet the demands for fast image encryption. This paper proposes an eight-point FCNT hardware architecture with multiplier-less multiplication (MM), employing pipeline and time-division multiplexing methods. Based on this architecture, an image encryption hardware accelerator was implemented. Experimental results show that compared to existing methods, the proposed FCNT architecture reduces computational delay by 18.2%and decreases LUTs usage by 18.5% and FFs usage by 21.7%. Furthermore, compared to the current state-of-the-art, the image encryption hardware accelerator based on our FCNT architecture achieves the fastest processing speed, requiring only 0.505 ms to encrypt a single 256 × 256 grayscale image, with a throughput of 1038 Mbps.</div></div>","PeriodicalId":54973,"journal":{"name":"Integration-The Vlsi Journal","volume":"103 ","pages":"Article 102416"},"PeriodicalIF":2.2,"publicationDate":"2025-04-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143834538","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Design of new single-bit multilayer ALU in QCA technology","authors":"Reza Faraji, Abdalhossein Rezai","doi":"10.1016/j.vlsi.2025.102422","DOIUrl":"10.1016/j.vlsi.2025.102422","url":null,"abstract":"<div><div>Nowadays, the CMOS technology is faced with limitations such as short channel effects. The new nanotechnology, QCA technology, is a promising candidate to replace the CMOS technology. This technology can be used for efficient digital circuits design. The ALU circuit is a significant digital circuit that can be developed in this technology. This paper proposes a new single-bit QCA Multilayer ALU (MALU), which has 4 inputs and 3 outputs. This MALU can execute 5 operations including subtraction, addition, OR, AND, and XOR. The functionality of the proposed single-bit MALU has been verified using QCADesigner tool. The results confirm that the proposed QCA MALU has 0.19 μm<sup>2</sup> area, 235 cells, 8.17 meV energy, 3 clock cycles delay, and 0.109 nW average power dissipation. Moreover, the comparison demonstrates that the suggested single-bit QCA MALU circuit provides advantages when considering of energy, area, latency, and cost as compared to previously QCA ALU circuits.</div></div>","PeriodicalId":54973,"journal":{"name":"Integration-The Vlsi Journal","volume":"103 ","pages":"Article 102422"},"PeriodicalIF":2.2,"publicationDate":"2025-04-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143860150","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Haodong Hu , Jie Peng , Guiqing Liu , Shihao Yu , Zhongjin Zhao , Yufei Zhang , Chenxi Zhang , Zhiwei Li , Haijun Liu , Hui Xu , Yinan Wang
{"title":"An efficient numerical simulation method based on practical 1T1R devices measurement for compute in memory chip design","authors":"Haodong Hu , Jie Peng , Guiqing Liu , Shihao Yu , Zhongjin Zhao , Yufei Zhang , Chenxi Zhang , Zhiwei Li , Haijun Liu , Hui Xu , Yinan Wang","doi":"10.1016/j.vlsi.2025.102420","DOIUrl":"10.1016/j.vlsi.2025.102420","url":null,"abstract":"<div><div>As compute-in-memory (CIM) architecture emerge to overcome the Von Neumann bottleneck, the efficient simulation of its core one-transistor-one-resistor (1T1R) crossbar array becomes critical.</div><div>Due to the compact nonlinear voltage division, the simulation efficiency of existing methods is hard to meet the simulation speed requirement of CIM chip design. Therefore, a novel numerical algorithm called dichotomy voltage division method (DVDM) was proposed. DVDM leverages interval bisection to bypass derivative calculations, achieving over 30 % speedup than the netlist-based Hspice method and 10<sup>3</sup>-fold acceleration than the Matlab symbolic calculation method for DC scanning simulation. Crucially, DVDM's efficiency does not compromise fidelity to established compact models, which maintains equivalent accuracy to these established methods. Furthermore, DVDM successfully simulates the multiply accumulate operations—a cornerstone of neural network inference—demonstrating its potential to bridge device-level modeling and system-level CIM chip design. By balancing computational efficiency with model fidelity, DVDM provides a novel tool for rapid exploration of next-generation CIM systems.</div></div>","PeriodicalId":54973,"journal":{"name":"Integration-The Vlsi Journal","volume":"103 ","pages":"Article 102420"},"PeriodicalIF":2.2,"publicationDate":"2025-04-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143860149","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Hardware efficient approximate sigmoid activation function for classifying features around zero","authors":"Shreya Venkatesh, R. Sindhu, V. Arunachalam","doi":"10.1016/j.vlsi.2025.102421","DOIUrl":"10.1016/j.vlsi.2025.102421","url":null,"abstract":"<div><div>The binary classification of features around zero in an RNN-LSTM network requires accurate sigmoid activation. The approximate sigmoid activation function is preferred to reduce the computational complexity and hardware resources. Therefore, an IMDB dataset is considered for the Python-based data analysis, the features are passed through the LSTM layer, the dense layer, and finally the sigmoid activation function for binary classification. From the analysis, an approximate 3-term, 8-segment Taylor series sigmoid (<span><math><mrow><mrow><msub><mi>σ</mi><mrow><mi>T</mi><mo>_</mo><mn>3</mn><mo>_</mo><mn>8</mn></mrow></msub><mrow><mo>(</mo><mi>x</mi><mo>)</mo></mrow></mrow><mo>)</mo></mrow></math></span> is proposed with an 11-bit customized floating-point (CFP) and provides sufficient accuracy. The <span><math><mrow><msub><mi>σ</mi><mrow><mi>T</mi><mo>_</mo><mn>3</mn><mo>_</mo><mn>8</mn></mrow></msub><mrow><mo>(</mo><mi>x</mi><mo>)</mo></mrow></mrow></math></span> is implemented with an efficient range select controller, data scheduler and area-efficient arithmetic processing unit (APU). The APU is implemented with a CFP multiplier (CFP-Mul) and Exponent-aware CFP adder (EACFP-Add). Therefore, the FPGA implementation uses fewer hardware resources (LUT, FF and DSP) and obtained 1658 <strong><em>μ</em></strong>m<sup>2</sup> and 0.3305 mW power at 500 MHz in TSMC 65 nm ASIC implementation. This proposed function <span><math><mrow><msub><mi>σ</mi><mrow><mi>T</mi><mo>_</mo><mn>3</mn><mo>_</mo><mn>8</mn></mrow></msub><mrow><mo>(</mo><mi>x</mi><mo>)</mo></mrow></mrow></math></span> is used in the LSTM cell and classification layer. With the IMDB and SMS spam detection datasets, it provides near-classification metrics compared to the exact <em>σ</em>(<em>x</em>).</div></div>","PeriodicalId":54973,"journal":{"name":"Integration-The Vlsi Journal","volume":"103 ","pages":"Article 102421"},"PeriodicalIF":2.2,"publicationDate":"2025-04-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143823230","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Design of 1T2R ReRAM array for in memory element-wise multiplication with distributed and majority logics","authors":"Ancy Joy , Jinsa Kuruvilla","doi":"10.1016/j.vlsi.2025.102418","DOIUrl":"10.1016/j.vlsi.2025.102418","url":null,"abstract":"<div><div>Multiplication is regarded as an essential component for inner product computation in Digital Signal Processing (DSP) applications. The frequent data transfers between the CPU and memory cause delays in traditional computer systems. In-memory computation (IMC) lowers latency by processing data at the location of storage. Because of its unique properties, Resistive Random-Access Memory (ReRAM) is a state-of-the-art non-volatile memory technology with many potential applications in IMC. On the other hand, the Conventional One Transistor One Resistor (1T1R) ReRAM bit-cell has a high bit-error rate and slower sensing during Non-Volatile Memory (NVM) storage. This study considers an in-memory element-wise multiplication using a One Transistor two Resistor (1T2R) ReRAM array. The adder-shifter module is the fundamental building block of element-wise multiplication. The majority logic is implemented in the proposed design as a memory READ operation. Additionally, distributed logics utilize look-up tables and adder-shifters to calculate inner products rather than multipliers and adders. It also addresses the high latency and intricate design associated with the traditional multiply and accumulate (MAC) method of implementing element-wise multiplication. According to the simulated results, the 1T2R cell uses less power and has a shorter write delay than the traditional 2T2R cell. Moreover, the sensing margin of the 1T2R bit cell remains 1.2 times larger than that of the 1T1R cell. In addition, compared to 1T1R and 2T2R cell-based designs, the proposed 1T2R design achieves 7.21 % and 48.24 % energy savings in element wise multiplication operation.</div></div>","PeriodicalId":54973,"journal":{"name":"Integration-The Vlsi Journal","volume":"103 ","pages":"Article 102418"},"PeriodicalIF":2.2,"publicationDate":"2025-04-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143815011","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"3D-multiscroll chaotic attractors design, circuit implementation and application to medical image encryption","authors":"Jie Zhang, Jiangang Zuo","doi":"10.1016/j.vlsi.2025.102417","DOIUrl":"10.1016/j.vlsi.2025.102417","url":null,"abstract":"<div><div>Compared to traditional chaotic attractors, multiscroll chaotic attractors (MSCAs) have broad application potential in fields such as dynamical system research and information processing. This paper introduces 1D-MSCAs, 2D-MSCAs, and 3D-MSCAs based on the Sprott-A system by incorporating piecewise linear functions. Through a dynamical analysis using equilibrium points, Lyapunov exponents, and bifurcation diagrams, it is found that the MSCAs have no equilibrium points and possess hidden attractors. The system exhibits a rich variety of dynamical behaviors, including reverse multiplicative cycle bifurcation, transient chaos, and bursting chaos, as the parameters vary. Additionally, the system demonstrates initial offset behavior in response to changes in initial conditions. The MSCAs are validated through an analog circuit implementation. Furthermore, a novel cryptosystem is designed by integrating 3D-MSCAs with RNA operations, and its security performance is evaluated in terms of key sensitivity, histogram analysis, correlation, and information entropy. The analysis results indicate that the proposed cryptosystem offers high-security performance, providing a promising solution for medical image encryption.</div></div>","PeriodicalId":54973,"journal":{"name":"Integration-The Vlsi Journal","volume":"103 ","pages":"Article 102417"},"PeriodicalIF":2.2,"publicationDate":"2025-04-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143786121","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}