Xin Si;Fangyuan Dong;Shengnan He;Yuhui Shi;Anran Yin;Hui Gao;Xiang Li
{"title":"A 28 nm 16-kb Sign-Extension-Less Digital-Compute-in-Memory Macro With Extension-Friendly Compute Units and Accuracy-Adjustable Adder-Tree","authors":"Xin Si;Fangyuan Dong;Shengnan He;Yuhui Shi;Anran Yin;Hui Gao;Xiang Li","doi":"10.1109/TVLSI.2024.3418888","DOIUrl":"10.1109/TVLSI.2024.3418888","url":null,"abstract":"Conventional digital-domain SRAM compute-in-memory (CIM) faces challenges in handling multiply-and-accumulate (MAC) operations with signed values, either in serial data feeding mode or extra sign-bit processing. The proposed CIM macro has the following features: 1) a sign-extension-less array multiplication circuit structure that eliminates the need for converting partial sums into 2’s complement, which removes the constraints related to handling specific symbol bits; 2) developing a circuit that avoids signed bit extension shift and accumulate, resulting in reduced area cost; and 3) integrating an adder structure that provides adjustable accuracy, thereby enhancing network adaptability as compared to traditional approximation techniques. A fabricated 28 nm 16-kb sign-extension-less DCIM was tested with the highest MAC speed with 5.6 ns (Signed 8 b IN&W 23 b Out) and achieved the best energy efficiency with 40.15 TOPS/W over a wide range of network adaptability.","PeriodicalId":13425,"journal":{"name":"IEEE Transactions on Very Large Scale Integration (VLSI) Systems","volume":null,"pages":null},"PeriodicalIF":2.8,"publicationDate":"2024-07-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141550593","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"A μ-GA Oriented ANN-Driven: Parameter Extraction of 5G CMOS Power Amplifier","authors":"Tahesin Samira Delwar;Abrar Siddique;Unal Aras;Yangwon Lee;Jee Youl Ryu","doi":"10.1109/TVLSI.2024.3414584","DOIUrl":"10.1109/TVLSI.2024.3414584","url":null,"abstract":"This article introduces a novel method for extracting crucial parameters from a fifth-generation (5G) CMOS power amplifier (PA) operating at 24 GHz. The proposed method, micro-genetic algorithm artificial neural network (\u0000<inline-formula> <tex-math>$mu $ </tex-math></inline-formula>\u0000-GAANN), presents an innovative synergy between \u0000<inline-formula> <tex-math>$mu $ </tex-math></inline-formula>\u0000-GA and ANN, enabling the accurate determination of crucial PA (circuit components) parameters. The \u0000<inline-formula> <tex-math>$mu $ </tex-math></inline-formula>\u0000-GAANN model has a fixed and robust stimulation function (\u0000<inline-formula> <tex-math>${F} {_{text {SF}}}$ </tex-math></inline-formula>\u0000 and \u0000<inline-formula> <tex-math>${R} {_{text {SF}}}$ </tex-math></inline-formula>\u0000). ANNs are trained to approximate the parameter extraction process based on input-output data generated from the \u0000<inline-formula> <tex-math>$mu $ </tex-math></inline-formula>\u0000-GA. The proposed \u0000<inline-formula> <tex-math>$mu $ </tex-math></inline-formula>\u0000-GA incorporates the arithmetic crossover and nonuniform mutation; thus, several parameters of the ANN network are tweaked. Moreover, ANN parameters are enhanced by using \u0000<inline-formula> <tex-math>$mu $ </tex-math></inline-formula>\u0000-GA to achieve an optimal PA design in a shorter period of time. To verify the proposed \u0000<inline-formula> <tex-math>$mu $ </tex-math></inline-formula>\u0000-GAANN, we have also compared the training time with particle swarm optimization (PSO) employed in ANN, i.e., PSOANN. Besides, a derivative superposition (DS) linearization technique is used in the PA circuit, along with input load splits (I-LSs) to solve the low input impedance problem of conventional DS. To design a PA, the proposed \u0000<inline-formula> <tex-math>$mu $ </tex-math></inline-formula>\u0000-GAANN outperforms the traditional feedforward artificial neural networks (TFFANN). Using \u0000<inline-formula> <tex-math>$mu $ </tex-math></inline-formula>\u0000-GAANN, the PA’s simulated S21 is 25 dB, while the measured S21 is 21.2 dB. With traditional TFFANN, we observe a simulated gain of 24.1 dB for the PA. We achieved a simulated gain of 23.2 dB of the PA without using ANNs. The measured results of the \u0000<inline-formula> <tex-math>$P {_{text {sat}}}$ </tex-math></inline-formula>\u0000 and PAE of the PA with \u0000<inline-formula> <tex-math>$mu $ </tex-math></inline-formula>\u0000-GAANN are 9.8 dBm and 32.1%, respectively. Also, a measured PA achieves a high third-order-input-intercept point (IIP3) of 14.1 dBm. The core chip area of the PA is 0.35 mm2.","PeriodicalId":13425,"journal":{"name":"IEEE Transactions on Very Large Scale Integration (VLSI) Systems","volume":null,"pages":null},"PeriodicalIF":2.8,"publicationDate":"2024-07-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141519144","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Toward Efficient Asynchronous Circuits Design Flow Using Backward Delay Propagation Constraint","authors":"Lingfeng Zhou;Shanlin Xiao;Huiyao Wang;Jinghai Wang;Zeyang Xu;Bohan Wang;Zhiyi Yu","doi":"10.1109/TVLSI.2024.3418769","DOIUrl":"10.1109/TVLSI.2024.3418769","url":null,"abstract":"In recent years, asynchronous circuits have gained attention in neural network chips and Internet of Things (IoT) due to their potential advantages of low power and high performance. However, design efficiency of asynchronous circuits remains low and faces challenges in large-scale applications because of the lack of electronic design automation (EDA) support. This article presents a new bundled-data (BD) asynchronous circuits’ design flow using traditional EDA tools, including a new backward delay propagation constraint (BDPC) method. In this method, control paths and data paths are analyzed together in a tightly coupled approach to improve the accuracy of static timing analysis (STA). Compared with other design flows, the proposed design flow and constraint method show significant advantages in aspects of STA accuracy, design efficiency, and design applicability, and solving the congestion issues of field-programmable gate array (FPGA) in a previous work. An asynchronous RISC-V processor was implemented to verify the method, with selective handshake technology to further reduce power. Compared with the synchronous processor, the asynchronous processor achieves a 17.4% power optimization on the TSMC 65-nm process and a 48.3% dynamic power savings on the FPGA while maintaining the same frequency and resource utilization.","PeriodicalId":13425,"journal":{"name":"IEEE Transactions on Very Large Scale Integration (VLSI) Systems","volume":null,"pages":null},"PeriodicalIF":2.8,"publicationDate":"2024-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141507922","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Yuyang Li;Vijay Shankaran Vivekanand;Rajkumar Kubendran;Inhee Lee
{"title":"Dynamic Neural Fields Accelerator Design for a Millimeter-Scale Tracking System","authors":"Yuyang Li;Vijay Shankaran Vivekanand;Rajkumar Kubendran;Inhee Lee","doi":"10.1109/TVLSI.2024.3416725","DOIUrl":"10.1109/TVLSI.2024.3416725","url":null,"abstract":"This brief introduces a compact-size hardware accelerator for dynamic neural fields (DNF) used in object tracking. To address the substantial computational workload and memory occupancy associated with conventional DNFs, three key approaches are implemented: kernel size reduction and abstraction, the replacement of sigmoidal functions with comparison operations, and the approximation of rectangular-shaped objects. The design is realized in a 28-nm CMOS process, resulting in a layout with an area of 0.53 mm2. Simulation results demonstrate that the accelerator processes \u0000<inline-formula> <tex-math>$256 times 256$ </tex-math></inline-formula>\u0000 dynamic vision sensor (DVS) frames at 211 frames per second (fps), with a power consumption of 1.68 mW under such conditions.","PeriodicalId":13425,"journal":{"name":"IEEE Transactions on Very Large Scale Integration (VLSI) Systems","volume":null,"pages":null},"PeriodicalIF":2.8,"publicationDate":"2024-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141519145","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Yichen Zhang;Chaowei Tang;Yanqi Zheng;Xian Tang;Ka Nang Leung
{"title":"An Adaptive Zero-Current Detector for Single-Inductor Multiple-Output DC-DC Converter With Full-Wave Current Sensor","authors":"Yichen Zhang;Chaowei Tang;Yanqi Zheng;Xian Tang;Ka Nang Leung","doi":"10.1109/TVLSI.2024.3415475","DOIUrl":"10.1109/TVLSI.2024.3415475","url":null,"abstract":"This brief presents an adaptive zero-current detector (ZCD) for the single-inductor multiple-output (SIMO) DC-DC converter with a full-wave current sensor. The innovative adaptive ZCD, which can be applied to the order power distribution control (OPDC) SIMO DC-DC converter, is designed, and it can accurately turn off the low-side power switch when the SIMO DC-DC converter operates in the discontinuous conduction mode. Besides, a new full-wave current sensor which contains only one sensing transistor is presented, and it can precisely sense the inductor current with a small delay. The SIMO DC-DC converter is designed and fabricated in a standard 65 nm CMOS process with output power ranges from 3.7 to 925 mW. The measured reverse current is reduced by up to 78.2%, and the measured light-load power efficiency is improved by up to 10%.","PeriodicalId":13425,"journal":{"name":"IEEE Transactions on Very Large Scale Integration (VLSI) Systems","volume":null,"pages":null},"PeriodicalIF":2.8,"publicationDate":"2024-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141519153","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Protecting Parallel Data Encryption in Multi-Tenant FPGAs by Exploring Simple but Effective Clocking Methodologies","authors":"Yankun Zhu;Pingqiang Zhou","doi":"10.1109/TVLSI.2024.3418961","DOIUrl":"10.1109/TVLSI.2024.3418961","url":null,"abstract":"Capitalizing on their versatility and high-performance attributes within heterogeneous designs, increasingly number of field-programmable gate arrays (FPGAs) are integrated into cloud data centers by cloud service providers (CSPs). While CSPs intend to reduce the cost by sharing one board among multiple users (called multi-tenant FPGA), hardware security problems such as side-channel attacks restrict it from spreading commercially. Existing research works have underscored the feasibility of remote side-channel attacks targeting a singular advanced encryption standard (AES) module on multi-tenant FPGAs, but they have not looked into the scenario of parallel data encryption on multiple AES modules for a single tenant, which is possible due to the small resource consumption of one AES module. In this work, we scrutinize correlation power analysis (CPA)-based side-channel attacks on parallel data encryption modules and develop two simple yet effective protective methods based on clocking methodologies—clocking phase shift and small frequency shift. The former technique adopts an identical clock frequency but with distinctive clocking phase to parallel encryption modules while the latter implements slightly different clock frequencies for parallel encryption modules. Experimental results show that both the methods can effectively increase the minimum required power traces for successful CPA, thus instituting a natural protective barrier for parallel data encryption.","PeriodicalId":13425,"journal":{"name":"IEEE Transactions on Very Large Scale Integration (VLSI) Systems","volume":null,"pages":null},"PeriodicalIF":2.8,"publicationDate":"2024-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141519146","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Yuan-Chun Luo;Anni Lu;Janak Sharda;Moritz Scherer;Jorge Tomas Gomez;Syed Shakib Sarwar;Ziyun Li;Reid Frederick Pinkham;Barbara De Salvo;Shimeng Yu
{"title":"Thermally Constrained Codesign of Heterogeneous 3-D Integration of Compute-in-Memory, Digital ML Accelerator, and RISC-V Cores for Mixed ML and Non-ML Workloads","authors":"Yuan-Chun Luo;Anni Lu;Janak Sharda;Moritz Scherer;Jorge Tomas Gomez;Syed Shakib Sarwar;Ziyun Li;Reid Frederick Pinkham;Barbara De Salvo;Shimeng Yu","doi":"10.1109/TVLSI.2024.3415481","DOIUrl":"10.1109/TVLSI.2024.3415481","url":null,"abstract":"Heterogeneous 3-D (H3D) integration not only reduces the chip form factor and fabrication cost but also allows the merging of diverse compute paradigms that suit different applications. This is especially attractive when modern algorithms, such as the augmented reality/virtual reality (AR/VR) workloads, consist of mixed machine learning (ML) and non-ML workloads. To date, codesign that considers the thermal, latency, and power constraints of H3D hardware is largely unexplored. In this work, a thermally aware framework for H3D hardware design is developed to evaluate the thermal, latency, and power trade-offs for a heterogeneous system with compute-in-memory (CIM), digital ML cores, and RISC-V cores. The framework solves for runtime tunable operating points described as the optimal speedup factor, the number of activated RISC-V cores, the cooling coefficient, and the activity rate based on user-defined criteria, achieving up to 135 TOPS and 215 TOPS/W under \u0000<inline-formula> <tex-math>$74~^{circ }$ </tex-math></inline-formula>\u0000C for the AR/VR workloads.","PeriodicalId":13425,"journal":{"name":"IEEE Transactions on Very Large Scale Integration (VLSI) Systems","volume":null,"pages":null},"PeriodicalIF":2.8,"publicationDate":"2024-06-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141507921","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Area Efficient 0.009-mm2 28.1-ppm/°C 11.3-MHz ALL-MOS Relaxation Oscillator","authors":"Joshua Adiel Wijaya;Poki Chen;Lucky Kumar Pradhan;Ahmad Shahid Bhatti;Seiji Kajihara","doi":"10.1109/TVLSI.2024.3416992","DOIUrl":"10.1109/TVLSI.2024.3416992","url":null,"abstract":"This article presents an ultrasmall area on-chip relaxation oscillator with low-temperature sensitivity. In this design, a virtual resistor mainly composed of a complementary to absolute temperature (CTAT) voltage reference circuit is implemented to replace the real resistor for efficient temperature compensation, which counterbalances the inherent proportional to absolute temperature (PTAT) property of the original relaxation circuit of the oscillator. The conventional capacitor is also replaced with a MOS capacitor to complete the ALL-MOS oscillator circuit with two prime advantages, one of which is larger capacitance to area density, and the other is better matching with critical MOSFETs. Implemented in a 0.18-\u0000<inline-formula> <tex-math>$mu $ </tex-math></inline-formula>\u0000m TSMC standard CMOS process, the proposed relaxation oscillator has achieved a temperature coefficient of 28.17 ppm/°C over the temperature range from \u0000<inline-formula> <tex-math>$- 25~^{circ }$ </tex-math></inline-formula>\u0000C to \u0000<inline-formula> <tex-math>$+ 125~^{circ }$ </tex-math></inline-formula>\u0000C at 11.39-MHz oscillation frequency. This circuit consumes \u0000<inline-formula> <tex-math>$243.1~mu $ </tex-math></inline-formula>\u0000W under 1.3-V power supply. Along with the abovementioned excellent performance, the oscillator achieves an ultrasmall core chip area of 0.009 mm2, which is almost one order less than most of the prior arts’ in the same process.","PeriodicalId":13425,"journal":{"name":"IEEE Transactions on Very Large Scale Integration (VLSI) Systems","volume":null,"pages":null},"PeriodicalIF":2.8,"publicationDate":"2024-06-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141519147","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"IEEE Transactions on Very Large Scale Integration (VLSI) Systems Publication Information","authors":"","doi":"10.1109/TVLSI.2024.3410460","DOIUrl":"https://doi.org/10.1109/TVLSI.2024.3410460","url":null,"abstract":"","PeriodicalId":13425,"journal":{"name":"IEEE Transactions on Very Large Scale Integration (VLSI) Systems","volume":null,"pages":null},"PeriodicalIF":2.8,"publicationDate":"2024-06-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=10576046","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141474803","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"IEEE Transactions on Very Large Scale Integration (VLSI) Systems Society Information","authors":"","doi":"10.1109/TVLSI.2024.3410462","DOIUrl":"https://doi.org/10.1109/TVLSI.2024.3410462","url":null,"abstract":"","PeriodicalId":13425,"journal":{"name":"IEEE Transactions on Very Large Scale Integration (VLSI) Systems","volume":null,"pages":null},"PeriodicalIF":2.8,"publicationDate":"2024-06-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=10576058","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141474820","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}