{"title":"Comments on “Analytical Design Equations for Class-E Power Amplifier”","authors":"Pallab Kr Gogoi;Ştefan Ştefănescu;Ayan Sharma","doi":"10.1109/TCSI.2025.3532202","DOIUrl":"https://doi.org/10.1109/TCSI.2025.3532202","url":null,"abstract":"In their seminal work, Acar et al. (2007) proposed analytical design equations for Class E power amplifiers, which have significantly influenced subsequent research in this field. However, their analysis contains calculation errors in the evaluation of certain expressions, leading to inaccuracies in the derived design equations. This error results in significantly incorrect values for the design parameters, which, in turn, affect the accuracy of the overall design set. This work addresses these errors, providing a corrected set of design equations for Class E PAs, further supported by supplementary Python code, enabling researchers to readily explore and verify the corrected Class E design framework.","PeriodicalId":13039,"journal":{"name":"IEEE Transactions on Circuits and Systems I: Regular Papers","volume":"72 9","pages":"5297-5298"},"PeriodicalIF":5.2,"publicationDate":"2025-03-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144914197","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"An Improved Dual-Output Five-Level Buck PFC Rectifier With Reduced Switch Count","authors":"Ankush Koli;N. Sandeep;Harpal Tiwari","doi":"10.1109/TCSI.2025.3547750","DOIUrl":"https://doi.org/10.1109/TCSI.2025.3547750","url":null,"abstract":"Research on power converters featuring multipurpose capabilities holds a lot of potential in power electronics. This paper presents an AC/DC converter with a reduced component count and dual DC output terminals. The semiconductor devices in the proposed topology undergo reduced switching (ON/OFF) transitions in a switching cycle, considerably lowering the switching losses. Furthermore, the multicarrier pulse width modulation control architecture allows for the simultaneous feeding of single and double equal/unequal loads without sacrificing converter stability. Additionally, because of its continuous current mode functioning, fewer capacitive and inductive filters are needed at the input and output sides. Experimental verification has been done on the proposed transformerless multilevel rectifier topology in steady-state and dynamic operating conditions. Finally, a detailed comparative assessment is included to demonstrate the superior performance of the proposed topology.","PeriodicalId":13039,"journal":{"name":"IEEE Transactions on Circuits and Systems I: Regular Papers","volume":"72 8","pages":"4380-4388"},"PeriodicalIF":5.2,"publicationDate":"2025-03-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144725215","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
M. Ronchi;S. Di Giacomo;M. Amadori;G. Borghi;M. Carminati;C. Fiorini
{"title":"Design, Implementation, and Analysis of an Integrated Switched Capacitor Analog Neuron for Edge Computing AI Accelerators","authors":"M. Ronchi;S. Di Giacomo;M. Amadori;G. Borghi;M. Carminati;C. Fiorini","doi":"10.1109/TCSI.2025.3546521","DOIUrl":"https://doi.org/10.1109/TCSI.2025.3546521","url":null,"abstract":"Parallel computing is the key to accelerate artificial neural networks, both in digital and analog implementations. Our research focuses on analog artificial neural networks (NN), where parallel computations are executed with voltages, charges and currents, using as the computing elements the same devices that act as memories for the raw processed data. These analog in-memory computing structures can be exploited for edge computing applications, thanks to their ability to directly interface with analog signals with low latency, reducing data throughput and front-end complexity. This work presents the specific implementation of a single neuron used in a larger feedforward, fully connected analog neural network ASIC (ANNA), showing its performance and criticalities. The ASIC is designed as a re-programmable analog accelerator for the reconstruction of the position of interaction of gamma rays in Anger cameras, for medical imaging applications as PET and SPECT. This first prototype has been fabricated on a 0.35 um CMOS process with an area of 24 mm2, and it is able to process 200,000 events per second, with an experimentally measured energy efficiency of 50 GOPS/W. The network has been trained on a Matlab model, that was adjusted to embed many nonidealities to match the physical chip, as demonstrated in this work.","PeriodicalId":13039,"journal":{"name":"IEEE Transactions on Circuits and Systems I: Regular Papers","volume":"72 8","pages":"3947-3960"},"PeriodicalIF":5.2,"publicationDate":"2025-03-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=10918870","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144725198","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Optimization of DTC-Based and Harmonic-Mixer-Based Fractional-N PLLs: Comparative Analysis of Jitter and Power Trade-Offs","authors":"Yuyang Zhu;Masaru Osada;Haoming Zhang;Tetsuya Iizuka","doi":"10.1109/TCSI.2025.3546983","DOIUrl":"https://doi.org/10.1109/TCSI.2025.3546983","url":null,"abstract":"As phase-locked loop (PLL) architectures become increasingly complex, optimizing the jitter and power performance through calculation alone is becoming more challenging for fractional-N PLLs. To find the most suitable PLL architecture that meets the jitter-power requirements of various applications, a simple and widely-applicable method is in demand to find the optimal jitter-power relation of different PLL architectures. In this paper, we propose the use of a multi-objective evolutionary algorithm (MOEA) to optimize the jitter and power of PLLs, specifically focusing on two popular fractional-N PLL architectures: digital-to-time converter (DTC)-based and harmonic-mixer (HM)-based PLLs. By applying the MOEA, we can achieve optimal jitter and power relationships for both architectures, and the observed trends in jitter and power are explained and supported with calculations.","PeriodicalId":13039,"journal":{"name":"IEEE Transactions on Circuits and Systems I: Regular Papers","volume":"72 8","pages":"3872-3885"},"PeriodicalIF":5.2,"publicationDate":"2025-03-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=10918863","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144725200","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"SHMT: An SRAM and HBM Hybrid Computing-in-Memory Architecture With Optimized KV Cache for Multimodal Transformer","authors":"Xiangqu Fu;Jinshan Yue;Muhammad Faizan;Zhi Li;Qiang Huo;Feng Zhang","doi":"10.1109/TCSI.2025.3561245","DOIUrl":"https://doi.org/10.1109/TCSI.2025.3561245","url":null,"abstract":"Multimodal Transformer (MMT) algorithms have become the state-of-the-art for multimodal tasks such as image captioning. The Encoder-Decoder (E-D) structure, consisting of Encoder, Decoder-causal, and Decoder-cross components, provides a flexible and effective framework for multimodal tasks. However, previous accelerators mainly focus on the dataflow and hardware optimization of the Encoder, which fails to accelerate the entire E-D structure efficiently. There remain three challenges: 1) the lack of pipeline and multicore optimization at the module, layer, and E-D level; 2) the Decoder-causal and Decoder-cross computations have lower arithmetic intensity compared to the Encoder, requiring a better solution for the varying arithmetic intensities; and 3) the autoregressive algorithm in Decoder-causal leads to redundant KV Cache accesses and considerable idle power. In this paper, <italic>SHMT</i>, an SRAM and HBM hybrid computing-in-memory (CIM) architecture, is designed to efficiently support multimodal Transformers with three key contributions: 1) a multi-level pipelined multicore scheme, including pipeline optimization across E-D layer-head-module levels and a multicore network-on-chip (NoC) architecture, to reduce inference latency and off-chip accesses; 2) a heterogeneous SRAM-HBM architecture, utilizing high-density HBM-CIM for low-arithmetic-intensity (LAI) parts and high-performance SRAM-CIM for high-arithmetic-intensity (HAI) parts; and 3) by integrating KV Cache with zero-padding in SRAM-CIM, SHMT eliminates redundant read-write operations in KV Cache, reducing idle power consumption. Experiment results show that SHMT achieves <inline-formula> <tex-math>$212times $ </tex-math></inline-formula> speedup, reduces energy consumption by <inline-formula> <tex-math>$208times sim 2000times $ </tex-math></inline-formula> per token, and achieves <inline-formula> <tex-math>$13.3times $ </tex-math></inline-formula> higher energy efficiency compared to NVIDIA A100 GPU.","PeriodicalId":13039,"journal":{"name":"IEEE Transactions on Circuits and Systems I: Regular Papers","volume":"72 6","pages":"2712-2725"},"PeriodicalIF":5.2,"publicationDate":"2025-03-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144171058","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Memristor-Based Selective Convolutional Circuit for High-Density Salt-and-Pepper Noise Removal","authors":"Binghui Ding;Ling Chen;Chuandong Li;Tingwen Huang;Sushmita Mitra","doi":"10.1109/TCSI.2025.3566364","DOIUrl":"https://doi.org/10.1109/TCSI.2025.3566364","url":null,"abstract":"In this article, the memristor-based selective convolutional (MSC) circuit for salt-and-pepper (SAP) noise removal was proposed. In experiments, the MSC model was built and benchmarked against a ternary selective convolutional (TSC) model. Results show that the MSC model effectively restores images corrupted by SAP noise, achieving similar performance to the TSC model in both quantitative measures and visual quality at noise densities of up to 50%. In addition, this study proposes an enhanced MSC (MSCE) model based on MSC, which reduces power consumption by 57.6% compared with the MSC model while improving performance. The MSCE model maintains reliability when memristors experience conductance drift rates of less than 30% and yields greater than 89%.","PeriodicalId":13039,"journal":{"name":"IEEE Transactions on Circuits and Systems I: Regular Papers","volume":"72 7","pages":"3115-3125"},"PeriodicalIF":5.2,"publicationDate":"2025-03-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144536322","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Design Techniques for a Multi-Phase Injection- Based Eight-Phase 17-GHz Clock Generator for Multi-Phase Wireline Receivers","authors":"Bob Zhou;Borivoje Nikolić","doi":"10.1109/TCSI.2025.3563517","DOIUrl":"https://doi.org/10.1109/TCSI.2025.3563517","url":null,"abstract":"Clock generation for high-speed wireline receivers must provide multiple clock phases with high-resolution rotation. To address this, an 8-phase 17 GHz clock generation circuit with built-in 6b rotation is presented. Multi-phase injection is used to perform reference-side phase rotation to efficiently generate and rotate eight clock phases. The injection method is analyzed with a model to study the introduced nonlinearity, and the effect of the injection strength is discussed. Designed by using BAG<inline-formula> <tex-math>$3++$ </tex-math></inline-formula> for layout-aware design optimization, the proposed circuit achieves 98 fs RMS jitter and a measured DNLpp and INLpp of 1.26 and 4.05 LSB respectively, while consuming 33 mW.","PeriodicalId":13039,"journal":{"name":"IEEE Transactions on Circuits and Systems I: Regular Papers","volume":"72 9","pages":"4442-4454"},"PeriodicalIF":5.2,"publicationDate":"2025-03-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144914279","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Yijun Cui;Jiansheng Chen;Ziying Ni;Zhuoyao Zhang;Chenghua Wang;Weiqiang Liu
{"title":"Instruction-Based High-Performance Hardware Controller of CRYSTALS-Kyber With Balanced Resource Utilization","authors":"Yijun Cui;Jiansheng Chen;Ziying Ni;Zhuoyao Zhang;Chenghua Wang;Weiqiang Liu","doi":"10.1109/TCSI.2025.3547799","DOIUrl":"https://doi.org/10.1109/TCSI.2025.3547799","url":null,"abstract":"Post-quantum cryptography (PQC) aims to ensure information security in the era following the emergence of quantum computers. Lattice-based cryptography (LBC) algorithms have shown significant promise in the standardization process of post-quantum cryptography. This paper proposes an instruction-based high-performance hardware controller of CRYSTALS-Kyber. By designing a highly flexible instruction-based architecture, the control unit evenly distributes instructions and enables independent control of internal modules, significantly enhancing the scalability and adaptability of the hardware. Additionally, the integration of a reconfigurable polynomial operation array (RPOA) unit and optimization of data storage formats further improve computational efficiency and resource utilization. Implementation results on Artix-7 FPGA show that the architecture operates at a frequency exceeding 300 MHz, achieving a performance improvement of 41.3% to 170% compared to the latest designs, while significantly reducing resource overhead. The resource costs for the three security levels are 8112 LUTs, 6077 FFs, and 2523 SLICEs, respectively, with overall computation times of <inline-formula> <tex-math>$34.7~mu s$ </tex-math></inline-formula>, <inline-formula> <tex-math>$53.4~mu s$ </tex-math></inline-formula>, and <inline-formula> <tex-math>$78.5~mu s$ </tex-math></inline-formula>. The proposed design demonstrates outstanding performance, resource efficiency, and energy consumption, providing an efficient and cost-effective hardware solution for the practical deployment of post-quantum cryptography.","PeriodicalId":13039,"journal":{"name":"IEEE Transactions on Circuits and Systems I: Regular Papers","volume":"72 5","pages":"2394-2407"},"PeriodicalIF":5.2,"publicationDate":"2025-03-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143888305","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Design and Analysis of Passive-Integrated Absorptive Flat-Group-Delay RF Bandpass Filters in GaAs Technology for Digital Communications","authors":"Nasrin Iranpour;Li Yang;Roberto Gómez-García;Xi Zhu","doi":"10.1109/TCSI.2025.3546509","DOIUrl":"https://doi.org/10.1109/TCSI.2025.3546509","url":null,"abstract":"A family of on-chip passive-integrated absorptive flat-group-delay RF bandpass filters (BPFs) in gallium arsenide (GaAs) technology is presented. These wideband BPFs feature broadband quasi-reflectionless behavior along with quasi-constant group-delay responses beyond their associated 3-dB-bandwidth (BW) ranges. Firstly. by means of a <inline-formula> <tex-math>$pi $ </tex-math></inline-formula>-shape network composed of a reflective first-order BPF and two shunt identical lossy bandstop filters (BSFs), a two-port-absorptive RF BPF is engineered. To further increase the stopband attenuation levels, the extension of this filter concept to higher-rejection BPFs using <italic>n</i> cascaded reflective single-pole BPF units and (<inline-formula> <tex-math>${n} +1$ </tex-math></inline-formula>) replicas of a shunt lossy BSF is then approached. Subsequently, in order to equip such BPFs with higher-selectivity filtering responses, the development of input- and two-port-reflectionless BPFs with multiple transmission zeros (TZs) is addressed. Moreover, a multi-TZ flat-group-delay BPF with input-absorptive behavior is devised. It exploits a reflective BPF channel, which is shaped by a high-selectivity BPF unit with two close-to-passband TZs and a shunt series-<italic>LC</i> resonator that produces an additional TZ, along with a shunt absorptive BSF in a complementary-diplexer-based topology. Finally, by cascading two duplicated high-selectivity BPFs and the associated absorptive BSFs with a modified shunt series-<italic>LC</i> resonator in a back-to-back connection, a type of two-port-reflectionless BPF with multiple TZs is further engineered. Following this approach, a modified shunt lossy BSF instead of the previous shunt series-<italic>LC</i> resonator is employed in the overall reflectionless BPF to obtain a sharper-rejection passband and flatter group delay versus the corresponding beyond-3-dB BW. The RF operational foundations of these absorptive BPFs are detailed with analyses of their relevant lumped-element-based equivalent circuits. Furthermore, proof-of-concept prototypes for the five suggested RF BPFs are simulated, built, and measured to experimentally validate their design concepts for application in power-efficient high-data-rate digital-communication systems.","PeriodicalId":13039,"journal":{"name":"IEEE Transactions on Circuits and Systems I: Regular Papers","volume":"72 6","pages":"2639-2652"},"PeriodicalIF":5.2,"publicationDate":"2025-03-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144170983","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"EdgeLLM: A Highly Efficient CPU-FPGA Heterogeneous Edge Accelerator for Large Language Models","authors":"Mingqiang Huang;Ao Shen;Kai Li;Haoxiang Peng;Boyu Li;Yupeng Su;Hao Yu","doi":"10.1109/TCSI.2025.3546256","DOIUrl":"https://doi.org/10.1109/TCSI.2025.3546256","url":null,"abstract":"The rapid advancements in artificial intelligence (AI), particularly the Large Language Models (LLMs), have profoundly affected our daily work and communication forms. However, it is still a challenge to deploy LLMs on resource-constrained edge devices (such as robots), due to the intensive computation requirements, heavy memory access, diverse operator types and difficulties in compilation. In this work, we proposed EdgeLLM to address the above issues. Firstly, focusing on the computation, we designed mix-precision processing element array together with group systolic architecture, that can efficiently support both FP<inline-formula> <tex-math>$16ast $ </tex-math></inline-formula>FP16 for the MHA block (Multi-Head Attention) and FP<inline-formula> <tex-math>$16ast $ </tex-math></inline-formula>INT4 for the FFN layer (Feed-Forward Network). Meanwhile specific optimization on log-scale structured weight sparsity, has been used to further increase the efficiency. Secondly, to address the compilation and deployment issue, we analyzed the whole operators within LLM models and developed a universal data parallelism scheme, by which all of the input and output features maintain the same data shape, enabling to process different operators without any data rearrangement. Then we proposed an end-to-end compiler to map the whole LLM model on CPU-FPGA heterogeneous system (AMD Xilinx VCU128 FPGA). The accelerator achieves <inline-formula> <tex-math>$1.91times $ </tex-math></inline-formula> higher throughput and <inline-formula> <tex-math>$7.55times $ </tex-math></inline-formula> higher energy efficiency than the commercial GPU (NVIDIA A100-SXM4-80G). When compared with state-of-the-art FPGA accelerator of FlightLLM, it shows 10-24% better performance in terms of HBM bandwidth utilization, energy efficiency and LLM throughput.","PeriodicalId":13039,"journal":{"name":"IEEE Transactions on Circuits and Systems I: Regular Papers","volume":"72 7","pages":"3352-3365"},"PeriodicalIF":5.2,"publicationDate":"2025-03-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144550651","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}