Ahmed M. Mohey;Jelin Leslin;Gaurav Singh;Marko Kosunen;Jussi Ryynänen;Martin Andraud
{"title":"A 22-nm All-Digital Time-Domain Neural Network Accelerator for Precision In-Sensor Processing","authors":"Ahmed M. Mohey;Jelin Leslin;Gaurav Singh;Marko Kosunen;Jussi Ryynänen;Martin Andraud","doi":"10.1109/TVLSI.2024.3496090","DOIUrl":"https://doi.org/10.1109/TVLSI.2024.3496090","url":null,"abstract":"Deep neural network (DNN) accelerators are increasingly integrated into sensing applications, such as wearables and sensor networks, to provide advanced in-sensor processing capabilities. Given wearables’ strict size and power requirements, minimizing the area and energy consumption of DNN accelerators is a critical concern. In that regard, computing DNN models in the time domain is a promising architecture, taking advantage of both technology scaling friendliness and efficiency. Yet, time-domain accelerators are typically not fully digital, limiting the full benefits of time-domain computation. In this work, we propose an all-digital time-domain accelerator with a small size and low energy consumption to target precision in-sensor processing like human activity recognition (HAR). The proposed accelerator features a simple and efficient architecture without dependencies on analog nonidealities such as leakage and charge errors. An eight-neuron layer (core computation layer) is implemented in 22-nm FD-SOI technology. The layer occupies \u0000<inline-formula> <tex-math>$70 times ,70,mu $ </tex-math></inline-formula>\u0000m while supporting multibit inputs (8-bit) and weights (8-bit) with signed accumulation up to 18 bits. The power dissipation of the computation layer is 576\u0000<inline-formula> <tex-math>$mu $ </tex-math></inline-formula>\u0000W at 0.72-V supply and 500-MHz clock frequency achieving an average area efficiency of 24.74 GOPS/mm2 (up to 544.22 GOPS/mm2), an average energy efficiency of 0.21 TOPS/W (up to 4.63 TOPS/W), and a normalized energy efficiency of 13.46 1b-TOPS/W (up to 296.30 1b-TOPS/W).","PeriodicalId":13425,"journal":{"name":"IEEE Transactions on Very Large Scale Integration (VLSI) Systems","volume":"32 12","pages":"2220-2231"},"PeriodicalIF":2.8,"publicationDate":"2024-11-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142821135","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Yuguo Xiang;Dayan Zhou;Minjia Song;Danfeng Zhai;Jingchao Lan;Junyan Ren;Fan Ye
{"title":"A Comprehensive Digital Calibration for Pipelined ADCs Using Cascaded Nonlinearity Correction","authors":"Yuguo Xiang;Dayan Zhou;Minjia Song;Danfeng Zhai;Jingchao Lan;Junyan Ren;Fan Ye","doi":"10.1109/TVLSI.2024.3496669","DOIUrl":"https://doi.org/10.1109/TVLSI.2024.3496669","url":null,"abstract":"This brief presents a digital calibration for pipelined analog-to-digital converters (ADCs) utilizing the cascaded nonlinearity correction (CNC) method. By cascading three correction layers for compensating nonlinearities in different parts of pipelined ADC, it comprehensively calibrates distortion in both ADC front end and back end with a low hardware cost. In addition, this work employs a discriminative fine-tuning least-mean-square (DFT-LMS) algorithm with varying step sizes for different layers, thereby improving both the convergence speed and the accuracy. An 800-MS/s, 12-bit ring amplifier-based pipelined ADC is presented to verify the proposed calibration technique. With calibration, the SFDR has a 26.7-dB improvement at low frequency and 23.6-dB improvement at Nyquist frequency, resulting in over 6-dB improvement compared with prior-art calibration techniques. The calibration algorithm has been verified on a TSMC 28-nm CMOS process. The experimental results show that the proposed ADC calibrator has an area of <inline-formula> <tex-math>$6592~mu $ </tex-math></inline-formula>m2 and consumes 5.31 mW at 800-MHz clock rate.","PeriodicalId":13425,"journal":{"name":"IEEE Transactions on Very Large Scale Integration (VLSI) Systems","volume":"33 4","pages":"1192-1196"},"PeriodicalIF":2.8,"publicationDate":"2024-11-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143667350","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Francesco Gagliardi;Danilo Scintu;Massimo Piotto;Paolo Bruschi;Michele Dei
{"title":"Static-Linearity Enhancement Techniques for Digital-to-Analog Converters Exploiting Optimal Arrangements of Unit Elements","authors":"Francesco Gagliardi;Danilo Scintu;Massimo Piotto;Paolo Bruschi;Michele Dei","doi":"10.1109/TVLSI.2024.3495558","DOIUrl":"https://doi.org/10.1109/TVLSI.2024.3495558","url":null,"abstract":"Driven by the ongoing challenge of designing high-accuracy digital-to-analog converters (DACs) at the cost of a relatively small area occupation, optimal combination algorithms (OCAs) recently gained attention within the myriad of possible calibration techniques for DACs. OCAs show appealing properties with respect to traditional approaches such as dynamic element matching (DEM). At start-up or upon request, mismatches affecting DAC elements are measured on-chip, allowing rearrangement in the selection logic of the DAC unit elements. The newly found arrangement is, hence, used during normal operation, achieving superior linearity. As of today, several alternative OCAs have been proposed; however, designers willing to implement OCA-calibrated DACs are faced with unclear tradeoffs and insufficient design guidelines. In this work, we provide a detailed comparison of existing OCAs based on statistical behavioral simulations. Starting from this, we investigate the relationships between OCAs’ performances and circuit-level design aspects. Specifically, OCAs’ effectiveness in improving the static linearity is linked to the number of DAC bits and the accuracy of the auxiliary comparator required by every OCA. Unforeseen trends emerge, and new design considerations are suggested, fostering novel awareness on the subject of high-accuracy DAC designs enabled by OCA-based calibration techniques.","PeriodicalId":13425,"journal":{"name":"IEEE Transactions on Very Large Scale Integration (VLSI) Systems","volume":"32 12","pages":"2243-2256"},"PeriodicalIF":2.8,"publicationDate":"2024-11-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=10756519","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142821155","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"SMBHA: A System-Level Multicore BGV Hardware Accelerator Based on FPGA","authors":"Jia-Li Duan;Chi Zhang;Li-Hui Wang;Lei Shen","doi":"10.1109/TVLSI.2024.3480997","DOIUrl":"https://doi.org/10.1109/TVLSI.2024.3480997","url":null,"abstract":"Fully homomorphic encryption (FHE) enables calculations on encrypted data and is a crucial foundation for achieving privacy computing. However, the high computation overhead restricts its widespread application. Even after algorithm and software optimization, its processing speed remains low. This article proposes the first practical system-level multicore Brakerski-Gentry-Vaikuntanathan (BGV) hardware acceleration scheme based on field-programmable gate array (FPGA). By analyzing the bottleneck of system acceleration, a hierarchical storage structure is introduced to reduce data movement. A novel 4-2 mixed-radix number theoretic transform (NTT) algorithm is proposed, allowing flexible switching between radix-4 and radix-2, with the ability to reuse twiddle factors. In addition, a reconfigurable processing element (PE) is proposed that supports all homomorphic operations of BGV. The design of this article is evaluated on Xilinx Virtex7 series FPGA, achieving a throughput of NTT/inverse NTT (INTT) up to <inline-formula> <tex-math>$14times $ </tex-math></inline-formula> higher than previous designs. Compared with simple encrypted arithmetic library (SEAL), the full system performances of homomorphic encryption (ENC), decryption (DEC), and homomorphic multiplication achieve improvements of <inline-formula> <tex-math>$13.9times $ </tex-math></inline-formula>, <inline-formula> <tex-math>$7.07times $ </tex-math></inline-formula>, and <inline-formula> <tex-math>$16.6times $ </tex-math></inline-formula>, respectively.","PeriodicalId":13425,"journal":{"name":"IEEE Transactions on Very Large Scale Integration (VLSI) Systems","volume":"33 2","pages":"546-557"},"PeriodicalIF":2.8,"publicationDate":"2024-11-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142993412","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Yechen Tian;Yutong Zhang;Junjie Gu;Hao Xu;Weitian Liu;Rui Yin;Zongming Duan;Hao Gao;Na Yan
{"title":"Design and Analysis of a 26–32-GHz 6-bit Passive Vector Modulation Phase Shifter for CMOS Bidirectional Transceiver","authors":"Yechen Tian;Yutong Zhang;Junjie Gu;Hao Xu;Weitian Liu;Rui Yin;Zongming Duan;Hao Gao;Na Yan","doi":"10.1109/TVLSI.2024.3490618","DOIUrl":"https://doi.org/10.1109/TVLSI.2024.3490618","url":null,"abstract":"This article presents a 26–32-GHz 6-bit bidirectional passive vector modulation phase shifter (PVM-PS) in 40-nm CMOS for phased array systems. The passive phase shifter comprises a center-tap transformer-based quadrature generator/combiner, two 6-bit X-type attenuators, and a differential Wilkinson power combiner/divider. The symmetric design enables bidirectional signal propagation and offers flexible system configuration. Passive switches are sized to optimize the tradeoff among gain variation, insertion loss, and linearity. The phase shifter implemented in 40 nm covers a range of 360° with 5.625° resolution and the rms phase error is between 0.4° and 1.3°. It exhibits <1-dB magnitude imbalance and <1.2° phase imbalance between forward and reverse propagation modes. Its OP1dB is above −1 dBm across the operation frequency.","PeriodicalId":13425,"journal":{"name":"IEEE Transactions on Very Large Scale Integration (VLSI) Systems","volume":"33 3","pages":"673-684"},"PeriodicalIF":2.8,"publicationDate":"2024-11-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143489100","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Jie Ding;Fuming Liu;Kuan Deng;Zihan Zheng;Jingnan Zheng;Yongzhen Chen;Jiangfeng Wu
{"title":"A 16-bit 1-MS/s SAR ADC With Capacitor Mismatch Self-Calibration","authors":"Jie Ding;Fuming Liu;Kuan Deng;Zihan Zheng;Jingnan Zheng;Yongzhen Chen;Jiangfeng Wu","doi":"10.1109/TVLSI.2024.3489231","DOIUrl":"https://doi.org/10.1109/TVLSI.2024.3489231","url":null,"abstract":"This article introduces a successive approximation register (SAR) analog-to-digital converter (ADC) that utilizes a foreground capacitor mismatch self-calibration method. The proposed floating operation puts the uncalibrated high-bit capacitor into the floating state, preventing the sub-ADC from saturating caused by comparator static offset during the calibration process. To address the random mismatch of the LSB capacitors and improve the calibration accuracy, this article employs round-robin grouping of eight sets of LSB capacitors. In addition, a precharged bootstrapped switch is proposed to achieve high sampling linearity with low power consumption and area overhead. An anti-interference custom-designed 0.5-fF capacitor structure is suggested for binary-weighted capacitor mismatch of capacitive DAC (CDAC). Furthermore, the circuit implementation of the comparator utilized by ADC is also discussed. The prototype was fabricated in a 180-nm CMOS process with a 1.8-V supply and achieved spurious-free dynamic ranges of 108.9 and 92.38 dB at an input frequency of 1 kHz while operating at sampling rates of 100 kS/s and 1 MS/s, respectively. The prototype consumes 6.745 mW and occupies 0.91 \u0000<inline-formula> <tex-math>$text {mm}^{2}$ </tex-math></inline-formula>\u0000.","PeriodicalId":13425,"journal":{"name":"IEEE Transactions on Very Large Scale Integration (VLSI) Systems","volume":"33 1","pages":"10-20"},"PeriodicalIF":2.8,"publicationDate":"2024-11-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142918422","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"A 360° Tunable Phase Shifter With Low Phase Error Based on Bandpass Networks in 0.25- μm GaN Technology","authors":"Hanjun Zhao;Xu Yan;Hui Chu;Xiaohua Zhu;Yongxin Guo","doi":"10.1109/TVLSI.2024.3489355","DOIUrl":"https://doi.org/10.1109/TVLSI.2024.3489355","url":null,"abstract":"This brief presents a 360° tunable phase shifter (PS) with low phase error in a 0.25-<inline-formula> <tex-math>$mu $ </tex-math></inline-formula>m GaN-on-SiC HEMT process. To achieve these features, the design incorporates two key innovations: a novel switched-bandpass phase-shifting cell (PSC) topology and a Q-learning-based optimization algorithm, both applied for the first time in monolithic microwave integrated circuit (MMIC) PS designs. The adverse effects of the charge trapping effect in GaN HEMT switches are mitigated by using a nonlinear equivalent circuit model. A PS prototype consisting of a fifth-order bandpass PSC and two third-order bandpass PSCs with a core area of <inline-formula> <tex-math>$1.25times 2.5$ </tex-math></inline-formula> mm2 is designed, fabricated, and measured. Experimental results demonstrate a low rms phase error of less than 7.0°, along with high power linearity characterized by an IP<inline-formula> <tex-math>$_{mathrm {1,dB}}$ </tex-math></inline-formula> of 37 dBm and an IIP3 of 48 dBm, over a frequency range from 4.1 to 5.3 GHz.","PeriodicalId":13425,"journal":{"name":"IEEE Transactions on Very Large Scale Integration (VLSI) Systems","volume":"33 4","pages":"1172-1176"},"PeriodicalIF":2.8,"publicationDate":"2024-11-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143667707","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Online Alignment and Addition in Multiterm Floating-Point Adders","authors":"Kosmas Alexandridis;Giorgos Dimitrakopoulos","doi":"10.1109/TVLSI.2024.3488966","DOIUrl":"https://doi.org/10.1109/TVLSI.2024.3488966","url":null,"abstract":"Multiterm floating-point (FP) addition appears in vector dot-product computations, matrix multiplications, and other forms of FP data aggregation. A critical step in multiterm floating-point addition is the alignment of fractions of the FP terms before adding them. Alignment is executed serially by identifying first the maximum of all exponents and then shifting the fraction of each term according to the difference of its exponent from the maximum one. Contrary to common practice, this work proposes a new online algorithm that splits the identification of the maximum exponent, the alignment shift for each fraction, and their addition to multiple fused incremental steps that can be computed in parallel. Each fused step is implemented by a new associative operator that allows the incremental alignment and addition for arbitrary number of operands. Experimental results show that employing the proposed align-and-add operators for the implementation of multiterm floating-point adders can improve delay or save significant area and power. The achieved area and power savings range between 3% and 23% and between 4% and 26%, respectively.","PeriodicalId":13425,"journal":{"name":"IEEE Transactions on Very Large Scale Integration (VLSI) Systems","volume":"33 4","pages":"1182-1186"},"PeriodicalIF":2.8,"publicationDate":"2024-11-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143667370","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Cost-Effective Analytical Models of Resistive Opens Defects in FinFET Technology","authors":"Gustavo Aguirre;Freddy Forero;Victor Champac;Michel Renovell;Florence Azais;Mariane Comte;Jean-Marc Galliere","doi":"10.1109/TVLSI.2024.3479068","DOIUrl":"https://doi.org/10.1109/TVLSI.2024.3479068","url":null,"abstract":"FinFET technology has become an attractive candidate for high-performance and power-efficient applications. However, its susceptibility to defects increases due to the complexity of the process fabrications and smaller feature sizes. This article proposes compact and low-cost analytical models to evaluate the delay increase in FinFET-based circuits due to resistive open defects. The models rely on electrical simulations to precharacterize the circuit library. Analytical expressions are developed for the three types of resistive opens that may occur in FinFET-based logic cells using multifin and multifinger structures. These types of resistive opens include: a resistive open at the drain or source of the transistors (RODS), a resistive open affecting the gate of a single transistor, and a resistive open affecting the gates of both nMOS and pMOS transistors. Compact analytical models are also developed to evaluate the delay increase due to the resistive open defects under process variations. Independent and correlated process variations are taken into account. The analytical models have been validated against SPICE electrical simulations. The proposed analytical models can be used to evaluate the detectability of resistive open defects, significantly reducing the cost of dealing with different defect sizes. Potential applications of the developed analytical models are delineated. This work allows us to have higher quality and reliable electronic products.","PeriodicalId":13425,"journal":{"name":"IEEE Transactions on Very Large Scale Integration (VLSI) Systems","volume":"33 3","pages":"841-852"},"PeriodicalIF":2.8,"publicationDate":"2024-11-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143489184","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Eduardo Antonio Ceśar da Costa;Morgana Macedo Azevedo da Rosa
{"title":"RCU- 2m: A VLSI Radix- 2m Cubic Unit","authors":"Eduardo Antonio Ceśar da Costa;Morgana Macedo Azevedo da Rosa","doi":"10.1109/TVLSI.2024.3486237","DOIUrl":"https://doi.org/10.1109/TVLSI.2024.3486237","url":null,"abstract":"Cubic operations are among the most used arithmetic operations in many applications that demand higher order simultaneous operand computation, such as cryptography and bicubic polynomial interpolation. This article proposes a novel VLSI radix-<inline-formula> <tex-math>$2^{m}$ </tex-math></inline-formula> cubic unit (RCU-<inline-formula> <tex-math>$2^{m}$ </tex-math></inline-formula>) capable of processing cubic operations at m bits simultaneously, with m values of 2 (RCU-4), 3 (RCU-8), and 4 (RCU-16). RCU-16 emerges as the most area-efficient configuration, surpassing RCU-8 and notably outperforming RCU-4. In the 8-bit scenario, RCU-16 achieves remarkable area savings, surpassing the literature’s proposed cubic unit by <inline-formula> <tex-math>$11.58times $ </tex-math></inline-formula>. Across all configurations, RCU-<inline-formula> <tex-math>$2^{m}$ </tex-math></inline-formula> consistently outperforms the automatically selected cube unit, with energy savings ranging from <inline-formula> <tex-math>$1.04times $ </tex-math></inline-formula> to <inline-formula> <tex-math>$2times $ </tex-math></inline-formula>. In application specific integrated circuit (ASIC) and field-programmable gate array (FPGA)-based analyses, RCU-16 consistently exhibits superior performance in both area and energy savings compared with RCU-4, RCU-8, and solutions from the literature. These findings emphasize the importance of adopting radix-<inline-formula> <tex-math>$2^{m}$ </tex-math></inline-formula> configurations, particularly RCU-16, for optimal energy-constrained VLSI applications.","PeriodicalId":13425,"journal":{"name":"IEEE Transactions on Very Large Scale Integration (VLSI) Systems","volume":"33 3","pages":"733-745"},"PeriodicalIF":2.8,"publicationDate":"2024-11-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143489251","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}