{"title":"An Implementation of Reconfigurable Match Table for FPGA-Based Programmable Switches","authors":"Xiaoyong Song;Zhichuan Guo","doi":"10.1109/TVLSI.2024.3436047","DOIUrl":"10.1109/TVLSI.2024.3436047","url":null,"abstract":"Match table is the key part to perform packet processing and forwarding for programmable switches in a software-defined network (SDN). However, the match table in current field-programmable gate array (FPGA)-based switches is inflexible or undisclosed. When the network function changes, the match table on FPGA needs to be redesigned or reset size parameters, and after recompilation and reimplementation, it could work again; this time-consuming and labor-intensive operation seriously reduces the flexibility and configurability of the switch. To address this issue, this article presents a design of reconfigurable match table (RMT) for FPGA-based programmable switches. A three-layer table structure is introduced to realize the reconfiguration and hardware-plane mapping of user-defined tables, and the logical tables in packet processing pipeline are interconnected with the physical tables in memory pool by the designed resource-efficient segment crossbar. To the best of our knowledge, this article is the first to publicly present the entire FPGA-based RMT design scheme and implementation details. The proposed design implements reconfigurable ternary content addressable memory (TCAM) based and static random access memory (SRAM) based match tables on Xilinx FPGA and verifies them with a packet filter system. In the proposed RMT system, a user could reconfigure the number, depth, and width of user-defined match tables (UMTs) in pipeline via control plane without modifying hardware, which enhances the flexibility of the data plane of FPGA-based switch greatly.","PeriodicalId":13425,"journal":{"name":"IEEE Transactions on Very Large Scale Integration (VLSI) Systems","volume":"32 11","pages":"2121-2134"},"PeriodicalIF":2.8,"publicationDate":"2024-08-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141939169","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Mehdi Saberi;Hossein Yaghoobzadeh Shadmehri;Mohammad Tavakkoli Ghouchani;Alexandre Schmid
{"title":"A High-Precision and High-Dynamic-Range Current-Mode WTA Circuit for Low-Supply-Voltage Applications","authors":"Mehdi Saberi;Hossein Yaghoobzadeh Shadmehri;Mohammad Tavakkoli Ghouchani;Alexandre Schmid","doi":"10.1109/TVLSI.2024.3436575","DOIUrl":"10.1109/TVLSI.2024.3436575","url":null,"abstract":"This brief proposes a low-voltage, high-precision, and high-dynamic-range current-mode analog winner-take-all (WTA) circuit. The proposed structure employs a new high-gain stage as a feedback network between the input node of each cell and the common node of the circuit to reduce the sensitivity of the output current to the loser signals, especially when they are close to the winner. In addition, another network is employed that senses the amount of the output/winner current and adjusts the bias current of the gain stages. This ensures that the drain-source voltage of the input transistor in the winner cell matches the behavior of the output transistor’s drain-source voltage, enhancing the accuracy as well as the input dynamic range (DR) of the structure. Moreover, since the circuit works properly with a minimum supply voltage of only \u0000<inline-formula> <tex-math>$V_{text {GS}} + V_{text {eff}}$ </tex-math></inline-formula>\u0000, it is a promising candidate for applications in emerging technologies with low supply voltage requirements. Based on the proposed structure, a three-input WTA circuit is designed and fabricated in a 0.18-\u0000<inline-formula> <tex-math>$mu $ </tex-math></inline-formula>\u0000 m CMOS technology. According to the measurement results, the proposed circuit exhibits a maximum error of 1.5% for the input signal range of \u0000<inline-formula> <tex-math>$60~mu $ </tex-math></inline-formula>\u0000 A when the input frequency is 100 kHz. The silicon area occupied by the circuit is \u0000<inline-formula> <tex-math>$33~mu $ </tex-math></inline-formula>\u0000 m \u0000<inline-formula> <tex-math>$times 65~mu $ </tex-math></inline-formula>\u0000 m.","PeriodicalId":13425,"journal":{"name":"IEEE Transactions on Very Large Scale Integration (VLSI) Systems","volume":"32 10","pages":"1955-1958"},"PeriodicalIF":2.8,"publicationDate":"2024-08-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141939171","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Thresholding Decision-Directed Descent (T3D): A Tuning Solution for DDR5 DRAM DFEs","authors":"Mitchell Cooke;Nicola Nicolici","doi":"10.1109/TVLSI.2024.3435419","DOIUrl":"10.1109/TVLSI.2024.3435419","url":null,"abstract":"Emerging memory technologies, such as DDR5, offer increased data rates and storage capacities, at the expense of signal integrity challenges. To address these challenges, the DDR5 standard incorporates a four-tap decision feedback equalizer (DFE). As elaborated in this article, known methods for DFE tuning are limited due to interface complexity and distinct equalization requirements for DDR5. We propose a decision-directed DFE tuning method called thresholding decision-directed descent (T3D). By leveraging DDR5 architectural features, our novel method tracks the eye envelope as it opens, which facilitates rapid convergence compared to the state of the art. To validate the performance of T3D, silicon measurements are presented alongside a virtual testbench methodology. By demonstrating the high correlation between silicon and simulation results, the virtual testbench can be beneficial for the design, validation, and prototyping of future DFE tuning methods.","PeriodicalId":13425,"journal":{"name":"IEEE Transactions on Very Large Scale Integration (VLSI) Systems","volume":"32 11","pages":"2060-2073"},"PeriodicalIF":2.8,"publicationDate":"2024-08-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141938979","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Ko-Hong Lin, Ont-Derh Lin, Shi-Yu Huang, Duo Sheng
{"title":"Low-Jitter Frequency Doubling Circuit Supporting Higher-Speed BISG and Aging Sensing in a Chiplet-Based Design Environment","authors":"Ko-Hong Lin, Ont-Derh Lin, Shi-Yu Huang, Duo Sheng","doi":"10.1109/tvlsi.2024.3435059","DOIUrl":"https://doi.org/10.1109/tvlsi.2024.3435059","url":null,"abstract":"","PeriodicalId":13425,"journal":{"name":"IEEE Transactions on Very Large Scale Integration (VLSI) Systems","volume":"57 1","pages":""},"PeriodicalIF":2.8,"publicationDate":"2024-08-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141939172","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"A High-Speed Dynamic Element Matching Decoder With Integrated Background Calibration Control","authors":"Tobias Schirmer;Simon Buhr;Felix Burkhardt;Florian Protze;Frank Ellinger","doi":"10.1109/TVLSI.2024.3432640","DOIUrl":"10.1109/TVLSI.2024.3432640","url":null,"abstract":"A dynamic element matching (DEM) decoder with integrated mismatch calibration control for high-speed current-steering digital-to-analog converters (CS-DACs) and CSDAC- based direct digital frequency synthesizers (DDFSs) is studied and presented. The DEM algorithm achieves very good averaging of mismatch-induced errors in the succeeding CS-DAC. It features a minimum element transition rate, therefore opimizing the power dissipation and ensuring minimal glitch energy at the output. Due to the chosen network-based architecture, with only a few modifications of the hardware, the decoder allows the integration of a comprehensive current source mismatch calibration that can be fully operated in the background and even in parallel to the regular DEM operation. A proof-ofconcept hardware implementation of the presented decoder was fabricated in a 22-nm FD-SOI CMOS process and characterized in a high-speed DDFS system with a sampling rate of 5 GHz. Measurements reveal a significant improvement in the spurious free dynamic range (SFDR) and signal-to-noise-and-distortion ratio (SNDR) when the calibration and DEM are enabled. Compared to the state-of-the-art (SoA), the presented DDFS achieves one of the best figures of merit.","PeriodicalId":13425,"journal":{"name":"IEEE Transactions on Very Large Scale Integration (VLSI) Systems","volume":"32 11","pages":"2074-2084"},"PeriodicalIF":2.8,"publicationDate":"2024-07-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141867464","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"An Efficient Two-Stage Pipelined Compute-in-Memory Macro for Accelerating Transformer Feed-Forward Networks","authors":"Heng Zhang;Wenhe Yin;Sunan He;Yuan Du;Li Du","doi":"10.1109/TVLSI.2024.3432403","DOIUrl":"10.1109/TVLSI.2024.3432403","url":null,"abstract":"Transformer architectures have achieved state-of-the-art performance in various applications. However, deploying transformer models on resource-constrained platforms is still challenging due to its dynamic workloads, intensive computations, and substantial memory access. In this article, we propose a two-stage pipelined compute-in-memory (CIM) macro for effectively deploying and accelerating the feed-forward network (FFN) layers of transformer models. Two independent CIM arrays are designed to execute the two distinct linear projections in FFN layers, which are interconnected by co-designed analog rectified linear unit (ReLU) circuits to realize the nonlinear activation function. The analog multiply-and-add (MAC) results from the first CIM array are streamed directly to the analog ReLU circuits, and subsequently to the next CIM array for performing another linear projection. This architecture eliminates the need for analog-to-digital converters (ADCs) and digital-to-analog converters (DACs) for internal results’ staging, thereby enhancing overall macro efficiency and reducing computing latency. A proof-of-concept macro is fabricated using TSMC 65-nm process and achieves 4.096 TOPS peak throughput, 4.39 TOPS/mm2 area efficiency, and 49.83 TOPS/W energy efficiency. To map transformer models onto the proposed macro, we quantize the FFN layers of BERTMINI model under per-token granularity for activations and per-tensor granularity for weights using quantization-aware training (QAT), which exhibits excellent accuracy across multiple benchmarks.","PeriodicalId":13425,"journal":{"name":"IEEE Transactions on Very Large Scale Integration (VLSI) Systems","volume":"32 10","pages":"1889-1899"},"PeriodicalIF":2.8,"publicationDate":"2024-07-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141867412","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Luchang He;Chenchen Xie;Qingyu Wu;Siqiu Xu;Houpeng Chen;Xing Ding;Xi Li;Zhitang Song
{"title":"A Low-Cost Quadruple-Node-Upsets Resilient Latch Design","authors":"Luchang He;Chenchen Xie;Qingyu Wu;Siqiu Xu;Houpeng Chen;Xing Ding;Xi Li;Zhitang Song","doi":"10.1109/TVLSI.2024.3430224","DOIUrl":"10.1109/TVLSI.2024.3430224","url":null,"abstract":"In this article, a low-cost quadruple-node-upsets resilient latch (LCQRL) design is proposed. To meet the high-reliability demands of safety-critical applications, the latch integrates nine soft-error-interceptive modules (SIMs) to form robust feedback loops, ensuring complete resilience to quadruple-node upsets (QNUs). Each Sim comprises ten CMOS transistors and a clocked inverter. Notably, C-element (CE) and dual interlocked storage cell (DICE) modules are not employed in this circuit, resulting in a small area and low power consumption. The simulation results verify the complete QNU self-recoverability and cost-effectiveness of this design. Compared with the existing radiation-hardened QNU resilient latches, the LCQRL latch demonstrates significant improvements in area, power consumption, and area-power–delay product (APDP) by 47.8%, 63%, and 75.5%, respectively. Furthermore, it exhibits low sensitivity to process, voltage, and temperature (PVT) variations.","PeriodicalId":13425,"journal":{"name":"IEEE Transactions on Very Large Scale Integration (VLSI) Systems","volume":"32 10","pages":"1930-1939"},"PeriodicalIF":2.8,"publicationDate":"2024-07-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141867465","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Xiaoning Ma;Qinzhi Xu;Chenghan Wang;He Cao;Jianyun Liu;Daoqing Zhang;Zhiqiang Li
{"title":"An Electrical-Thermal Co-Simulation Model of Chiplet Heterogeneous Integration Systems","authors":"Xiaoning Ma;Qinzhi Xu;Chenghan Wang;He Cao;Jianyun Liu;Daoqing Zhang;Zhiqiang Li","doi":"10.1109/TVLSI.2024.3430498","DOIUrl":"10.1109/TVLSI.2024.3430498","url":null,"abstract":"Chiplet heterogeneous integration (CHI) is one of the important technology choices to continue Moore’s law. However, due to the characteristics of high power and low supply voltage in CHI systems, heavy currents need to flow through the power delivery network (PDN), and the Joule heating effect will result in the overall temperature increase of the CHI system. Meanwhile, the high temperature will cause the current as well as the performance of the system to degrade and a series of reliability problems will occur. In this article, an effective electrical-thermal coupling model is proposed to predict the steady-state temperature distribution of a 2.5-D CHI system considering the Joule heating effect and the temperature effect on the IR drop. The equivalent electrical conductivity model is also built up to describe the design features of the redistribution layer (RDL), bump, and through silicon via (TSV) structures based on the electrical-thermal duality. Furthermore, the governing equations for voltage distribution and temperature distribution are solved simultaneously by utilizing the finite volume method (FVM) with nonuniform mesh to realize the electrical-thermal co-simulation of the multiscale CHI system. The model application is further performed to investigate the influence of the model parameters on the voltage drop and temperature distribution of the CHI system. The verified systems and simulated results of the present investigation demonstrate the viability and accuracy of voltage and temperature field co-simulation and indicate that the new proposed electrical-thermal model is helpful in thermal and voltage drop analysis of packaging structures with the Joule heating effect and can be adopted to assist in the physical design optimization of 2.5-D CHI or 3-D heterogeneous stacked chips.","PeriodicalId":13425,"journal":{"name":"IEEE Transactions on Very Large Scale Integration (VLSI) Systems","volume":"32 10","pages":"1769-1781"},"PeriodicalIF":2.8,"publicationDate":"2024-07-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141867469","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"A 370-nW Bio-AFE With 2.9-$mu$Vrms Input Noise in an Octa-Channel System-in-Package for Multimode Bio-Signal Acquisition","authors":"Patrick Fath, Harald Pretl","doi":"10.1109/tvlsi.2024.3430059","DOIUrl":"https://doi.org/10.1109/tvlsi.2024.3430059","url":null,"abstract":"","PeriodicalId":13425,"journal":{"name":"IEEE Transactions on Very Large Scale Integration (VLSI) Systems","volume":"51 1","pages":""},"PeriodicalIF":2.8,"publicationDate":"2024-07-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141867467","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}