{"title":"A Half-Schmitt Trigger-Based 9T1R Nonvolatile Robust SRAM Cell for Instant On-Off Application","authors":"Mohammad Mudakir Fazili;Sayeed Ahmad;Belal Iqbal","doi":"10.1109/TCSI.2024.3507932","DOIUrl":"https://doi.org/10.1109/TCSI.2024.3507932","url":null,"abstract":"This article proposes a novel design of 9T1R nonvolatile static random access memory (nvSRAM). The proposed nvSRAM cell significantly improves critical performance metrics such as static noise margin, read-delay, break-even time (BET), and store/restore delay/energy. Schmitt trigger action is enabled during read to mitigate read disturb problem and improve read static noise margin. A resistive random access memory (RRAM) device is used intelligently to store/restore data in a single cycle. The technique reduces the store and restore delay by 50% and 25% compared to its nearest competitor nvSRAM. The Monte-Carlo simulations demonstrate a very reliable read/write operation with lower variability (\u0000<inline-formula> <tex-math>$sigma /mu $ </tex-math></inline-formula>\u0000) and improved effective mean (\u0000<inline-formula> <tex-math>$mu - 3sigma $ </tex-math></inline-formula>\u0000) compared to existing nvSRAMs. The Worst-case corner analysis under extreme temperatures confirms the robustness of the design in face of PVT variations. The proposed design incurs only 40% area overhead as compared to 6T cell, which is much smaller than several other existing nvSRAMs. A new figure of merit that comprehensively captures cell-stability, write-ability, read/write/store/restore delay & power dissipation of an nvSRAM cell is also proposed. Based on this metric, it is observed that the proposed cell outperforms all of the nvSRAM cells considered in this work. Array simulations for a 1Mb nvSRAM at both typical and slow process corners demonstrate minimal store/restore delay overheads and favorable access times, supporting its suitability for applications requiring instant on-off functionality.","PeriodicalId":13039,"journal":{"name":"IEEE Transactions on Circuits and Systems I: Regular Papers","volume":"72 1","pages":"169-179"},"PeriodicalIF":5.2,"publicationDate":"2024-12-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142938550","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Exploring Hybrid FitzHugh-Rinzel (FHR) Neuron Model Behavior: Cost-Effective FPGA Implementation for High-Frequency and High-Precision Matching by Electromagnetic Flux Effects","authors":"Sohrab Majidifar;Mohsen Hayati;Saeed Haghiri","doi":"10.1109/TCSI.2024.3503421","DOIUrl":"https://doi.org/10.1109/TCSI.2024.3503421","url":null,"abstract":"Effective implementation of spiking neuron models in hardware is crucial for real systems. Utilizing the main capabilities of FPGAs, this paper introduces a highly precise method for evaluating nonlinear functions. The approach relies on effectively matching trigonometric-based functions to approximate the nonlinear terms of a Fitzhugh-Rinzel neuron model uses the electromagnetic flux coupling with a focus on cost-effectiveness and high-speed digital implementation using the CORDIC algorithm and multiplierless design. The close correspondence between the approximate functions and the nonlinear functions of the original model results in minimal errors in the outputs of the proposed model compared to the original model which reduces the lead and lag of signals between the original model and the proposed models. For the digital FPGA implementation of the FHR neuron model, we employed the Virtex-5 board to validate and synthesize the suggested method. In this scenario, the proposed FHR model demonstrates superior performance in terms of speed and cost compared to the original model. The speed-up of our proposed model is about 6 times faster than the original model (414.86 MHz compared to 69.232 MHz) and also, the number of fitted neurons for our proposed approach is about 6.66 times (20 compared to 3).","PeriodicalId":13039,"journal":{"name":"IEEE Transactions on Circuits and Systems I: Regular Papers","volume":"72 2","pages":"844-853"},"PeriodicalIF":5.2,"publicationDate":"2024-12-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143106486","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Dengfeng Wang;Weifeng He;Qin Wang;Hailong Jiao;Yanan Sun
{"title":"Robust Monolithic 3D Carbon-Based Computing-in-SRAM With Variation-Aware Bit-Wise Data-Mapping for High-Performance and Integration Density","authors":"Dengfeng Wang;Weifeng He;Qin Wang;Hailong Jiao;Yanan Sun","doi":"10.1109/TCSI.2024.3506775","DOIUrl":"https://doi.org/10.1109/TCSI.2024.3506775","url":null,"abstract":"Bit-serial computing-in memory with SRAM cells (SRAM-CIM) enables a full set of integer and floating-point arithmetic operations and various data-intensive computations. Carbon nanotube field-effect transistors (CN-MOSFETs) with high scalability, energy-efficiency, and low process thermal budget are attractive to realize high-dense monolithic three-dimensional (M3D) SRAM-CIM. However, CN-MOSFETs possess unique process variations with asymmetric spatial correlations which can significantly influence the performance and reliability of carbon-based SRAM-CIM. In this paper, new M3D-4N4P SRAM-CIM cells with CN-MOSFETs are proposed with optimized profiles for achieving ultra-high integration density while preserving robustness of data-access and computation. Furthermore, the variation-aware bit-wise data-mapping method is proposed for enhancing the performance of carbon-based SRAM-CIM by leveraging the spatial correlations of CN-MOSFETs. By minimizing the area skew of vertically-stacked layers, the areas of proposed M3D-4N4P SRAM-CIM cells are reduced by up to 50.32% compared to the previous 6N2P SRAM-CIM cells assuming carbon nanotube transistor technology. The proposed M3D-4N4P SRAM-CIM array also achieves by up to <inline-formula> <tex-math>$2.17times $ </tex-math></inline-formula> higher throughput on arithmetic operations and 18.34% lower computing latency with 25.36% reduced energy consumptions on MAC-based benchmarks, respectively, compared to the previous 2D-6N2P SRAM-CIM array.","PeriodicalId":13039,"journal":{"name":"IEEE Transactions on Circuits and Systems I: Regular Papers","volume":"72 3","pages":"1229-1242"},"PeriodicalIF":5.2,"publicationDate":"2024-12-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143496572","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Philex Ming-Yan Fan;Ming-Xun Wang;Wei-Ting Lin;Yao-Chia Liu
{"title":"A 36-Gb/s 1.6-pJ/b PAM-3 Transmitter Leveraging Digital Logic Cells and 4-Tap FFE in 22-nm CMOS","authors":"Philex Ming-Yan Fan;Ming-Xun Wang;Wei-Ting Lin;Yao-Chia Liu","doi":"10.1109/TCSI.2024.3509802","DOIUrl":"https://doi.org/10.1109/TCSI.2024.3509802","url":null,"abstract":"The first 36-Gb/s transmitter with differential outputs leveraging the three-level pulse amplitude modulation (PAM-3) and digital logic cells is investigated in this study. The employment of digital logic cells simplifies design complexity, enabling the transmitter to achieve an energy efficiency of 1.6pJ/bit under a 1-V supply, and 0.88 pJ/bit when solely considering the data path. The measurement of data rates and energy efficiencies is conducted using an external power supply, omitting an on-chip voltage regulator. The proposed transmitter adopts a 3-bit to 2 unit-intervals (UIs) encoding scheme, considering factors of power consumption, design complexity, area, and bit efficiency. The circuit macro is fabricated in 22nm standard CMOS technology and occupies an area of 0.025mm2 for the transmitter only, and 0.055mm2 for both the transmitter and T-coils. The utilization of 4-tap feedforward equalizer (FFE) yields enhancement in eye opening area, achieving a substantial 96.5% increase at 33Gb/s of data rate and 300% at 34.5Gb/s. The eye measurements are conducted using a pair of 0.914-meter cables.","PeriodicalId":13039,"journal":{"name":"IEEE Transactions on Circuits and Systems I: Regular Papers","volume":"72 1","pages":"365-373"},"PeriodicalIF":5.2,"publicationDate":"2024-12-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142938604","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Jonah Van Assche;Charlotte Frenkel;Ali Safa;Georges Gielen
{"title":"FREYA: A 0.023-mm²/Channel, 20.8- μW/Channel, Event-Driven 8-Channel SoC for Spiking End-to-End Sensing of Time-Sparse Biosignals","authors":"Jonah Van Assche;Charlotte Frenkel;Ali Safa;Georges Gielen","doi":"10.1109/TCSI.2024.3504264","DOIUrl":"https://doi.org/10.1109/TCSI.2024.3504264","url":null,"abstract":"Biomedical systems-on-chip (SoCs) for real-time monitoring of vital signs need to read out multiple recording channels in parallel and process them locally with low latency, at a low per-channel area and power consumption. To achieve this, event-driven SoCs that exploit the time-sparse nature of biosignals such as the electrocardiogram (ECG) have been proposed; they only process the signal when it shows activity. Such SoCs convert time-sparse biosignals into spike trains, on which spiking neural networks (SNNs) can perform event-driven signal classification. State-of-the-art event-driven SoCs, however, still suffer from poor area and power efficiency and use inflexible, hard-coded spike-encoding schemes. To improve on these challenges, this paper presents FREYA, an 8-channel event-driven SoC for end-to-end sensing of time-sparse biosignals. The proposed SoC consists of the following key contributions: 1) an 8-channel time-division-multiplexed level-crossing sampling (LCS) analog-to-spike converter (ASC) that encodes analog input signals into input spikes for an on-chip SNN; 2) an ASC spike-encoding algorithm that is fully programmable in resolution (4 to 8 bits) and conversion algorithm (offset and decay parameters); 3) an on-chip integrated, flexible SNN processor based on a programmable crossbar architecture, that allows for efficient event-driven processing, and that can be reconfigured towards multiple sensing applications; 4) a custom offline end-to-end training framework for the fast retraining of the spike-encoding algorithm and SNN architecture towards new applications or patient-dependent signal variations. A prototype IC has been fabricated in a 40nm CMOS technology. It has a per-channel active area of 0.023 mm2 (0.184 mm2 in total), a <inline-formula> <tex-math>$7times $ </tex-math></inline-formula> improvement over the state of the art. For the use case of ECG-based QRS-labeling, a detection accuracy of 98.67% is achieved, while the system consumes <inline-formula> <tex-math>$20.8~mu $ </tex-math></inline-formula>W per channel and achieves a latency of only 80 ms, thus paving the way for multi-channel, high-fidelity, event-driven SoCs in biomedical applications.","PeriodicalId":13039,"journal":{"name":"IEEE Transactions on Circuits and Systems I: Regular Papers","volume":"72 3","pages":"1093-1104"},"PeriodicalIF":5.2,"publicationDate":"2024-11-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143496618","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"A 2.793 μW Near-Threshold Neuronal Population Dynamics Trajectory Filter for Reliable Simultaneous Localization and Mapping","authors":"Zhengzhe Wei;Boyi Dong;Yuqi Su;Yi Wang;Chuanshi Yang;Yuncheng Lu;Chao Wang;Tony Tae-Hyoung Kim;Yuanjin Zheng","doi":"10.1109/TCSI.2024.3493246","DOIUrl":"https://doi.org/10.1109/TCSI.2024.3493246","url":null,"abstract":"This work presents an algorithm hardware co-design implementing a digital neuronal population dynamics simulator intended for the trajectory error correction task within a simultaneous localization and mapping workflow. A custom discretized procedural algorithm approximating a neuronal population dynamics-based inference operation is developed for mapping onto an ultra-lightweight digital macro featuring massively parallel in-situ processing techniques. Fabricated using a 40nm technology, the test chip features a <inline-formula> <tex-math>$22times 22$ </tex-math></inline-formula> neuron array with 0.1358mm2 core area and provides a 12-bit computing precision. A time-multiplexed processing element design prevents the use of excessive silicon area. Accomplished via extensive data reuse through massively parallel processing-in-memory architecture attached to a custom I/O interface, a single inference operation is completed within 3277 clock cycles, providing 200 inferences per second operating at a low frequency of 0.667Mhz with a 0.5V core supply and consuming sub-10-<inline-formula> <tex-math>$mu $ </tex-math></inline-formula>W power.","PeriodicalId":13039,"journal":{"name":"IEEE Transactions on Circuits and Systems I: Regular Papers","volume":"72 3","pages":"1269-1281"},"PeriodicalIF":5.2,"publicationDate":"2024-11-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143496430","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Loganathan Ponnarasi;P. B. Pankajavalli;Yongdo Lim;Rathinasamy Sakthivel
{"title":"Secure State Estimation-Based Stabilization of IoT-Enabled Wireless Power Transfer Systems","authors":"Loganathan Ponnarasi;P. B. Pankajavalli;Yongdo Lim;Rathinasamy Sakthivel","doi":"10.1109/TCSI.2024.3503719","DOIUrl":"https://doi.org/10.1109/TCSI.2024.3503719","url":null,"abstract":"This paper concerns the distributed state estimation-based security control issue for Internet-of-Things (IoT)-enabled wireless power transfer (WPT) system subject to interchange attacks. The WPT system is first modeled using a state-space framework, and then the IoT elements including Web-enabled smart sensors are considered to obtain measurements for state estimation and control purpose. Additionally, a Bernoulli distribution-based random variable with known probability is introduced to characterize randomly occurring interchange attacks in output measurements. Using Lyapunov theory and stochastic analysis technique, a set of sufficient conditions is established in terms of linear matrix inequalities for ensuring convergence of state estimation with reliable and consistent power transfer operations within the described WPT system applications. Furthermore, particle swarm optimization algorithm is used to determine the optimal gain of the state estimation scheme for minimizing mean-squared estimation errors. Finally, numerical simulation is given to illustrate the efficiency and usefulness of the presented control approach.","PeriodicalId":13039,"journal":{"name":"IEEE Transactions on Circuits and Systems I: Regular Papers","volume":"72 1","pages":"443-452"},"PeriodicalIF":5.2,"publicationDate":"2024-11-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142938452","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Monolithic GaN-Based Multiple-Phase Bidirectional Energy Transfer With Seamless Control Applied on High-Voltage and Low-Voltage Batteries","authors":"Tz-Wun Wang;Sheng-Hsi Hung;Si-Yi Li;Chi-Yu Chen;Po-Jui Chiu;Tzu-Ying Wu;Ke-Horng Chen;Kuo-Lin Zheng;Ying-Hsi Lin;Shian-Ru Lin;Tsung-Yen Tsai","doi":"10.1109/TCSI.2024.3494853","DOIUrl":"https://doi.org/10.1109/TCSI.2024.3494853","url":null,"abstract":"In this paper, the proposed multi-phase (MP) bidirectional dual Gallium-Nitride (GaN) controlled rectifier (GCR) uses dual GCR with the pre-charge technique to reduce third quadrant operation by minimizing dead time to 0.12ns and 0.13ns, and lowering the negative VDS to −0.6V and −0.8V in buck and boost operation, respectively. This work is the first research for monolithic bidirectional energy transfer with a two-switch-only topology. With the help of MP-accelerated current control and the GCR dynamic ramp generator, the voltage variation on the high-voltage (HV) side and low-voltage (LV) side can be reduced to less than 50mV and to 40mV, respectively, during buck and boost operation transitions. Moreover, the recovery time is effectively reduced and current balance between the four phases can be achieved within 7 cycles (=350ns). The peak efficiency is as high as 95.5% and 94.2% in buck and boost operation, respectively.","PeriodicalId":13039,"journal":{"name":"IEEE Transactions on Circuits and Systems I: Regular Papers","volume":"72 1","pages":"420-432"},"PeriodicalIF":5.2,"publicationDate":"2024-11-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142938607","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"A Generalize Hardware Debugging Approach for Large Language Models Semi-Synthetic, Datasets","authors":"Weimin Fu;Shijie Li;Yifang Zhao;Kaichen Yang;Xuan Zhang;Yier Jin;Xiaolong Guo","doi":"10.1109/TCSI.2024.3487486","DOIUrl":"https://doi.org/10.1109/TCSI.2024.3487486","url":null,"abstract":"Large Language Models (LLMs) have precipitated emerging trends towards intelligent automation. However, integrating LLMs into the hardware debug domain encounters challenges: the datasets for LLMs for hardware are often plagued by a dual dilemma – scarcity and subpar quality. Traditional hardware debug approaches that rely on experienced labor to generate detailed prompts are not cheaply scalable. Similarly, strategies that depend on existing LLMs and randomly generated prompts fail to achieve sufficient reliability. We propose a directed, semi-synthetic data synthetic method that leverages version control information and journalistic event descriptions. To produce high-quality data, this approach utilizes version control data from hardware projects combined with the 5W1H (Who, What, When, Where, Why, How) journalistic principles. It facilitates the linear scaling of dataset volumes without depending on skilled labor. We have implemented this method on a collected dataset of open-source hardware designs and fine-tuned fifteen general-purpose LLMs to enable their capability in hardware debugging tasks, thereby validating the efficacy of our approach.","PeriodicalId":13039,"journal":{"name":"IEEE Transactions on Circuits and Systems I: Regular Papers","volume":"72 2","pages":"623-636"},"PeriodicalIF":5.2,"publicationDate":"2024-11-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143183833","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"IEEE Open Access Publishing","authors":"","doi":"10.1109/TCSI.2024.3498757","DOIUrl":"https://doi.org/10.1109/TCSI.2024.3498757","url":null,"abstract":"","PeriodicalId":13039,"journal":{"name":"IEEE Transactions on Circuits and Systems I: Regular Papers","volume":"71 12","pages":"6583-6583"},"PeriodicalIF":5.2,"publicationDate":"2024-11-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=10768865","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142713802","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}