{"title":"BAG3++: An Extensible Generator Framework for Automated Layout-Aware AMS Design","authors":"Felicia Guo;Bob Zhou;Ayan Biswas;Paul Kwon;Zhaokai Liu;Ken Ho;Vladimir Stojanović;Borivoje Nikolić","doi":"10.1109/OJCAS.2024.3502641","DOIUrl":"https://doi.org/10.1109/OJCAS.2024.3502641","url":null,"abstract":"We present BAG<inline-formula> <tex-math>$3{++}$ </tex-math></inline-formula>, an extensible analog/mixed-signal (AMS) design framework for layout-aware design. BAG<inline-formula> <tex-math>$3{++}$ </tex-math></inline-formula> realizes a unified design environment that merges schematic, layout, and verification views into a single development interface. We further introduce new automated design features that enable rapid automation and optimization across a range of performance specifications, processes, and applications. We demonstrate the practical use of these features through (a) a bit-reconfigurable successive-approximation-register (SAR) analog-to-digital converter (ADC) implemented in the open-source Skywater 130nm process and (b) an ultra-high speed output driver optimized in two modern processes. BAG<inline-formula> <tex-math>$3{++}$ </tex-math></inline-formula> interfaces with both commercial and open-source design frameworks, and the extensibility of BAG<inline-formula> <tex-math>$3{++}$ </tex-math></inline-formula> is further illustrated through the integration of an open-source simulator.","PeriodicalId":93442,"journal":{"name":"IEEE open journal of circuits and systems","volume":"6 ","pages":"181-191"},"PeriodicalIF":2.4,"publicationDate":"2025-06-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=11052889","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144492372","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Yifei Zhu;Zhenxuan Luan;Dawei Feng;Weiwei Chen;Lei Ren;Zhangxi Tan
{"title":"Revolutionize 3D-Chip Design With Open3DFlow, an Open-Source AI-Enhanced Solution","authors":"Yifei Zhu;Zhenxuan Luan;Dawei Feng;Weiwei Chen;Lei Ren;Zhangxi Tan","doi":"10.1109/OJCAS.2024.3518754","DOIUrl":"https://doi.org/10.1109/OJCAS.2024.3518754","url":null,"abstract":"The escalating demand for high-performance and energy-efficient electronics has propelled 3D integrated circuits (3D ICs) as a promising solution. However, major obstacles have been the lack of specialized electronic design automation (EDA) software and standardized design flows for 3D chiplets. To bridge the gap, we introduce Open3DFlow,<xref>1</xref> an open-source design platform for 3D ICs. It is a seven-step workflow that incorporates essential ASIC back-end processes while supporting multi-physics analysis, such as through silicon via (TSV) modeling, thermal analysis, and signal integrity (SI) evaluations. To illustrate all functionalities of <italic>Open3DFlow</i>, we use it to implement a 3D RISC-V CPU design with a vertically stacked L2 cache on a separated die. We harden both CPU logic and 3D-cache die in a GlobalFoundries <inline-formula> <tex-math>$0.18mu $ </tex-math></inline-formula>m (GF180) process with open-source PDK support. We enable face-to-face (F2F) coupling of the top and bottom die by constructing a bonding layer based on the original technology file. <italic>Open3DFlow</i>’s open-source nature allows seamless integration of custom AI optimization algorithms. As a showcase, we leverage large language models (LLMs) to help the bonding pad placement. In addition, we apply LLM on back-end Tcl script generations to improve design productivity. We expect <italic>Open3DFlow</i> to open up a brand-new paradigm for future 3D IC innovations.","PeriodicalId":93442,"journal":{"name":"IEEE open journal of circuits and systems","volume":"6 ","pages":"169-180"},"PeriodicalIF":2.4,"publicationDate":"2025-06-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=11052893","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144492383","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Jiovana S. Gomes;Mateus Grellert;Fábio L. L. Ramos;Sergio Bampi
{"title":"End-to-End Neural Video Compression: A Review","authors":"Jiovana S. Gomes;Mateus Grellert;Fábio L. L. Ramos;Sergio Bampi","doi":"10.1109/OJCAS.2025.3559774","DOIUrl":"https://doi.org/10.1109/OJCAS.2025.3559774","url":null,"abstract":"The pervasive presence of video content has spurred the development of advanced technologies to manage, process, and deliver high-quality content efficiently. Video compression is crucial in providing high-quality video services under limited network and storage capacities, traditionally achieved through hybrid codecs. However, as these frameworks reach a performance bottleneck with compression gains becoming harder to achieve with conventional methods, Deep Neural Networks (DNNs) offer a promising alternative. By leveraging DNNs’ nonlinear representation capacity, these networks can enhance compression efficiency and visual quality. Neural Video Coding (NVC) has recently received significant attention, with Neural Image Coding models surpassing traditional codecs in compression ratios. Therefore, this survey explores the state-of-the-art in NVC, examining recent works, frameworks, and the potential of this innovative approach to revolutionize video compression. We identify that NVC models have come a long way since the first proposals and currently are on par in compression efficiency with the latest hybrid codec, VVC. Still, many improvements are required to enable the practical usage of NVC, such as hardware-friendly development to enable faster inference and execution on mobile and energy-constrained devices.","PeriodicalId":93442,"journal":{"name":"IEEE open journal of circuits and systems","volume":"6 ","pages":"120-134"},"PeriodicalIF":2.4,"publicationDate":"2025-04-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=10962175","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143848781","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Sung-June Byun;Byeong-Gi Jang;Jong-Wan Jo;Dae-Young Choi;Young-Gun Pu;Sang-Sun Yoo;Seok-Kee Kim;Yeon-Jae Jung;Kang-Yoon Lee
{"title":"Design of a High Efficiency Bi-Directional Four-Switch Buck-Boost Converter With HV Gate Driver for Multi-Cell Battery Power Bank Applications","authors":"Sung-June Byun;Byeong-Gi Jang;Jong-Wan Jo;Dae-Young Choi;Young-Gun Pu;Sang-Sun Yoo;Seok-Kee Kim;Yeon-Jae Jung;Kang-Yoon Lee","doi":"10.1109/OJCAS.2025.3557835","DOIUrl":"https://doi.org/10.1109/OJCAS.2025.3557835","url":null,"abstract":"This paper presents a bidirectional Four-Switch Buck-Boost (FSBB) converter with a high-voltage (HV) gate driver for use in power bank applications. The proposed FSBB is also integrated into this converter for increased efficiency. Thus, the proposed buck-boost converter can reduce conduction loss over a wide input voltage range by reducing the on-resistance of external MOSFETs using a gate source voltage (VGS) of 5V or 10V. The chip to be examined in this study is fabricated using a 130 nm 1P5M bipolar-CMOS-DMOS HV process with laterally diffused MOSFET (LDMOS) options to have a die size of 2.7 x 2.7 mm2. The proposed architecture is found to achieve a maximum output power level of 40W. The measurement results show that the maximum efficiencies at gate-source voltages (VGS) of 5V and 10V are 96.67% and 98.15%, respectively.","PeriodicalId":93442,"journal":{"name":"IEEE open journal of circuits and systems","volume":"6 ","pages":"110-119"},"PeriodicalIF":2.4,"publicationDate":"2025-04-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=10949157","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143845493","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Circuit Simulation of Any Time-Domain Source on Fractional-Order Impedances by Use of the Haar Wavelet Transform, Case Study of the Skin Effect","authors":"Georgios G. Roumeliotis;Jan Desmet;Jos Knockaert","doi":"10.1109/OJCAS.2025.3573989","DOIUrl":"https://doi.org/10.1109/OJCAS.2025.3573989","url":null,"abstract":"An application of the ability of the Haar wavelet operational matrix to perform the numerical inverse Laplace transform as combined with the intrinsically convenient Haar wavelet transform of any time-domain signal is presented in this paper. A case study of the transient- and steady-state behavior of the input impedance of a short-circuited transmission line showcases a method to perform the numerical inverse Laplace transform of fractional-order approximative expressions of the skin effect. Furthermore, an improved skin effect approximation is presented.","PeriodicalId":93442,"journal":{"name":"IEEE open journal of circuits and systems","volume":"6 ","pages":"155-168"},"PeriodicalIF":2.4,"publicationDate":"2025-03-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=11016785","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144331773","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Homomorphic Evaluation Cluster Architecture for Fully Homomorphic Encryption","authors":"Hanyoung Lee;Ardianto Satriawan;Hanho Lee","doi":"10.1109/OJCAS.2025.3568058","DOIUrl":"https://doi.org/10.1109/OJCAS.2025.3568058","url":null,"abstract":"Fully Homomorphic Encryption (FHE) allows computational processing of encrypted data on cloud servers, providing high security and enabling safe data utilization. As homomorphic multiplication progresses with encrypted data, noise accumulates, requiring a process called bootstrapping to restore the noise level of the new ciphertext <inline-formula> <tex-math>$ct^{prime }$ </tex-math></inline-formula>. Bootstrapping involves linear transformation processes, such as Coefficient to Slots and Slots to Coefficient, where most operations used are rotation. Rotation shifts elements in slots to new positions based on rotation index k. However, the computational cost and memory bandwidth required for a rotation adds significant overhead and limits the ability to perform FHE operations. Therefore, an efficient implementation of rotation is crucial for high-performance FHE applications. To address this problem, we optimized the datapath of rotation in the CKKS scheme to be hardware-friendly and proposed a homomorphic evaluation cluster hardware accelerator tailored for FHE workloads. Our architecture is aware of the computational and memory constraints of field programmable gate arrays (FPGAs) and performs number theoretic transform (NTT), its inverse (INTT), key multiplication, base conversion, and automorphism in a single cluster. We implemented our design in the AMD Alveo U280 FPGA platform. With a polynomial length of 216 and operating at 250 MHz as a rotation accelerator, the design implementation on the FPGA shows a speed-up of about <inline-formula> <tex-math>$700times $ </tex-math></inline-formula> compared to the CPU implementation in OpenFHE. Compared to the GPU implementation, it shows a <inline-formula> <tex-math>$1.77times $ </tex-math></inline-formula> speed-up, and compared to previous FPGA implementations, it shows a <inline-formula> <tex-math>$1.13times $ </tex-math></inline-formula> better.","PeriodicalId":93442,"journal":{"name":"IEEE open journal of circuits and systems","volume":"6 ","pages":"135-146"},"PeriodicalIF":2.4,"publicationDate":"2025-03-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=10993408","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144205925","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"An Integrated Fully Differential Current Amplifier With Frequency Compensation for Inductive Sensor Excitation","authors":"Maximilian Scherzer;Mario Auer","doi":"10.1109/OJCAS.2025.3546464","DOIUrl":"https://doi.org/10.1109/OJCAS.2025.3546464","url":null,"abstract":"In this article an integrated fully differential current amplifier is presented. It was designed for inductive sensor excitation, in this case for a fluxgate sensor, however the concept is applicable wherever a low noise and precise current is required. A brief review of some of the basic elements of the circuit is given, followed by the development of a model that takes into account output impedance limitations due to mismatch and stability criteria, an essential consideration in the design of a stable current amplifier for inductive loads. Based on the proposed model, the design and implementation of the current amplifier is outlined, identifying potential difficulties for on-chip integration. The final design was then fabricated using a standard 180nm CMOS technology. Measurement results show that the circuit draws only 2.8 mA from a 3.3V supply voltage and occupies a total area of 0.64 mm2. Special efforts were made to accurately evaluate the output impedance, whereby a value of 436k<inline-formula> <tex-math>$Omega $ </tex-math></inline-formula> was recorded. In addition, the current amplifier achieves an output-referred noise current of 2.5<inline-formula> <tex-math>$text {nA}/sqrt {text {Hz}}$ </tex-math></inline-formula>, resulting in a measured signal-to-noise ratio of more than 105.2 dB for a bandwidth of 512 Hz at an output current of 9<inline-formula> <tex-math>$text {mA}_{text {p-p}}$ </tex-math></inline-formula>.","PeriodicalId":93442,"journal":{"name":"IEEE open journal of circuits and systems","volume":"6 ","pages":"147-154"},"PeriodicalIF":2.4,"publicationDate":"2025-02-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=10906603","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144205882","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Amirhossein Rostami;Seyed Mohammad Ali Zeinolabedin;Liyuan Guo;Florian Kelber;Heiner Bauer;Andreas Dixius;Stefan Scholze;Marc Berthel;Dennis Walter;Johannes Uhlig;Bernhard Vogginger;Christian Mayr
{"title":"NLU: An Adaptive, Small-Footprint, Low-Power Neural Learning Unit for Edge and IoT Applications","authors":"Amirhossein Rostami;Seyed Mohammad Ali Zeinolabedin;Liyuan Guo;Florian Kelber;Heiner Bauer;Andreas Dixius;Stefan Scholze;Marc Berthel;Dennis Walter;Johannes Uhlig;Bernhard Vogginger;Christian Mayr","doi":"10.1109/OJCAS.2025.3546067","DOIUrl":"https://doi.org/10.1109/OJCAS.2025.3546067","url":null,"abstract":"Over the last few years, online training of deep neural networks (DNNs) on edge and mobile devices has attracted increasing interest in practical use cases due to their adaptability to new environments, personalization, and privacy preservation. Despite these advantages, online learning on resource-restricted devices is challenging. This work demonstrates a 16-bit floating-point, flexible, power- and memory-efficient neural learning unit (NLU) that can be integrated into processors to accelerate the learning process. To achieve this, we implemented three key strategies: a dynamic control unit, a tile allocation engine, and a neural compute pipeline, which together enhance data reuse and improve the flexibility of the NLU. The NLU was integrated into a system-on-chip (SoC) featuring a 32-bit RISC-V core and memory subsystems, fabricated using GlobalFoundries 22nm FDSOI technology. The design occupies just <inline-formula> <tex-math>$0.015mm^{2}$ </tex-math></inline-formula> of silicon area and consumes only 0.379 mW of power. The results show that the NLU can accelerate the training process by up to <inline-formula> <tex-math>$24.38times $ </tex-math></inline-formula> and reduce energy consumption by up to <inline-formula> <tex-math>$37.37times $ </tex-math></inline-formula> compared to a RISC-V implementation with a floating-point unit (FPU). Additionally, compared to the state-of-the-art RISC-V with vector coprocessor, the NLU achieves <inline-formula> <tex-math>$4.2times $ </tex-math></inline-formula> higher energy efficiency (measured in GFLOPS/W). These results demonstrate the feasibility of our design for edge and IoT devices, positioning it favorably among state-of-the-art on-chip learning solutions. Furthermore, we performed mixed-precision on-chip training from scratch for keyword spotting tasks using the Google Speech Commands (GSC) dataset. Training on just 40% of the dataset, the NLU achieved a training accuracy of 89.34% with stochastic rounding.","PeriodicalId":93442,"journal":{"name":"IEEE open journal of circuits and systems","volume":"6 ","pages":"85-99"},"PeriodicalIF":2.4,"publicationDate":"2025-02-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=10904478","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143637834","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Konstantinos Metaxas;Vassilis Alimisis;Costas Oustoglou;Yannis Kominis;Paul P. Sotiriadis
{"title":"Nonlinear Analysis of Differential LC Oscillators and Injection Locked Frequency Dividers","authors":"Konstantinos Metaxas;Vassilis Alimisis;Costas Oustoglou;Yannis Kominis;Paul P. Sotiriadis","doi":"10.1109/OJCAS.2025.3545904","DOIUrl":"https://doi.org/10.1109/OJCAS.2025.3545904","url":null,"abstract":"A comprehensive nonlinear analysis of autonomous and periodically forced fully-differential, negative-resistor LC oscillators is presented. Through nonlinear transformations in the state space, it is shown that oscillators within this class exhibit qualitatively similar dynamical behavior in terms of their limit cycles and bifurcation curves, at least within an open region containing the origin. The case of autonomous, complementary BJT oscillators is used to validate the qualitative analysis and demonstrate a general approach of how to numerically extend the bifurcation curves away from the equilibrium point and determine the oscillatory conditions. When external periodic force is present, we focus on the special case of periodically multiplicatively-forced fully-differential, negative-resistor, LC oscillators and use Harmonic Balance techniques to derive analytical expressions estimating the locking range in the weak injection regime. The results are used to calculate the locking range of a harmonically forced complementary BJT oscillator yielding explicit expressions closely aligned with experimental measurements, thus verifying the validity of the analysis.","PeriodicalId":93442,"journal":{"name":"IEEE open journal of circuits and systems","volume":"6 ","pages":"100-109"},"PeriodicalIF":2.4,"publicationDate":"2025-02-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=10904493","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143637833","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Energy Consumption Modeling of 2-D and 3-D Decoder Circuits","authors":"Yufei Xiao;Kai Cai;Xiaohu Ge;Yong Xiao","doi":"10.1109/OJCAS.2025.3538707","DOIUrl":"https://doi.org/10.1109/OJCAS.2025.3538707","url":null,"abstract":"Energy consumption evaluation for data processing tasks, such as encoding and decoding, is a critical consideration in designing very large scale integration (VLSI) circuits. Incorporating both information theory and circuit perspectives, a new general energy consumption model is proposed to capture the energy consumption of channel decoder circuits. For the binary erasure channel, lower bounds of energy consumption are derived for two-dimensional (2D) and three-dimensional (3D) decoder circuits under specified error probabilities, along with scaling rules for energy consumption in each case. Based on the proposed model, the lower bounds of energy consumption for staged serial and parallel implementations are derived, and a specific threshold value is identified to determine the parallel or serial decoding in decoder circuits. Staged serial implementations in 3D decoder circuits achieve a higher energy efficiency than fully parallel implementations when the processed data exceed 48 bits. Simulation results further demonstrate that the energy efficiency of 3D decoders improves with increasing data volume. When the number of input bits is 648, 1296 and 1944, the energy consumption of 3D decoders is reduced by 11.58%, 13.07%, and 13.86% compared to 2D decoders, respectively. The energy consumption of 3D decoders surpasses that of 2D decoders when the decoding error probability falls below a specific threshold of 0.035492. These results provide a foundational framework and benchmarks for analyzing and optimizing the energy consumption of 2D and 3D channel decoder circuits, enabling more efficient VLSI circuit designs.","PeriodicalId":93442,"journal":{"name":"IEEE open journal of circuits and systems","volume":"6 ","pages":"74-84"},"PeriodicalIF":2.4,"publicationDate":"2025-02-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=10870295","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143553307","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}