Yanqiu Li;Zhiqiang Wang;Kuan Wang;Jun Yuan;Guoqing Xin;Xiaojie Shi
{"title":"A Modeling Method of Reverse Biased Electric Field for JBS Diodes Based on Schwarz-Christoffel Transformation","authors":"Yanqiu Li;Zhiqiang Wang;Kuan Wang;Jun Yuan;Guoqing Xin;Xiaojie Shi","doi":"10.1109/TCAD.2025.3531252","DOIUrl":"https://doi.org/10.1109/TCAD.2025.3531252","url":null,"abstract":"This article introduces a modeling method for the reverse biased electric field of junction barrier Schottky (JBS) diodes, utilizing the Schwarz-Christoffel transformation. Building upon prior research on JBS diodes electric field modeling, this approach is rooted in a purely theoretical derivation, avoiding the dependence on conclusions from simulation software—a limitation of earlier modeling methods. In this study, complex boundary conditions are transformed mathematically into simpler ones to make it possible to solve for the electric field. Then, the analytic solution of the electric field distribution is obtained by using the superposition theorem. To validate this modeling method, the electric field distribution from this model is compared with results from simulation software, and a way of applying the analytic solution of the electric field distribution in commercial simulation software is given.","PeriodicalId":13251,"journal":{"name":"IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems","volume":"44 7","pages":"2832-2835"},"PeriodicalIF":2.7,"publicationDate":"2025-01-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144322994","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Bayesian Learning-Enhanced Embedded Memory Design With Automated Circuit Variant Generation","authors":"Dongho Kim;Junseo Lee;Seokhun Kim;Jihwan Park;Sangheon Lee;Hanwool Jeong","doi":"10.1109/TCAD.2025.3531337","DOIUrl":"https://doi.org/10.1109/TCAD.2025.3531337","url":null,"abstract":"This article proposes a Bayesian learning driven automated embedded memory design methodology that aims to minimize leakage current, minimize power, and maximize performance while meeting predefined constraints. To achieve this objective effectively, we present an automatic tool that leverages a reference initial circuit design to generate a diverse set of schematic and layout options for logic-equivalent circuit variants and various transistor threshold voltage (Vth) modifications, while ensuring compliance with design rules. Subsequently, leveraging the range of circuit options generated, Bayesian optimization is employed not only to identify optimal circuit parameters but also to select the most appropriate circuit topology and individual transistor <inline-formula> <tex-math>$V_{th}$ </tex-math></inline-formula> to attain the desired design objectives. TSMC 28 nm process simulation results demonstrate the proposed methodology reducing power by 26.28%–46.44%, <inline-formula> <tex-math>$T_{mathrm { access}}$ </tex-math></inline-formula> by 25.60%–42.29%, and leakage current by 22.73%–50.11% compared to the compiler-generated design, with a runtime of 10–40 h.","PeriodicalId":13251,"journal":{"name":"IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems","volume":"44 8","pages":"3099-3111"},"PeriodicalIF":2.7,"publicationDate":"2025-01-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144657449","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Efficient Modeling Attack on Multiplexer PUFs via Kronecker Matrix Multiplication","authors":"Hongfei Wang;Caixue Wan;Hai Jin","doi":"10.1109/TCAD.2025.3531336","DOIUrl":"https://doi.org/10.1109/TCAD.2025.3531336","url":null,"abstract":"The physical unclonable function (PUF) is valued for its lightweight nature and unique functionality, making it a common choice for securing hardware products requiring authentication and key generation mechanisms. In response to the susceptibility of individual PUFs to modeling attacks, advanced PUF variants have been developed to improve security measures. One notable type in this regard is the multiplexer-based composition of arbiter PUFs, known as MPUF, which aims to meet high reliability and security standards simultaneously. Current research on attacking MPUF encounters challenges, such as substantial demands for training CRPs and low success rates. In this work, we propose a novel numerical modeling attack strategy for MPUFs. Using Kronecker products from mathematical perspective, this method precisely describes the MPUF model without using complex network architectures, boosting attack accuracy, and overall efficiency. Experiment comparison with state-of-the-art works demonstrates that our method achieves better performance in terms of attack accuracy, robustness, and efficiency. Our method is able to successfully attack a (512, 8)-MPUF in 32.71 min with 97.14% accuracy, outperforming all existing attack methods on MPUFs. More, we validate our method through experiments with hardware implementations on FPGAs. The advantages of our method also include the adaptability to be employed to attack other MPUF variations like cMPUF and rMPUF, and the capability to be integrated with an existing attack method for launching efficient attack on MPUFs leveraging reliability information.","PeriodicalId":13251,"journal":{"name":"IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems","volume":"44 8","pages":"2883-2896"},"PeriodicalIF":2.7,"publicationDate":"2025-01-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=10844882","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144657326","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Atleus: Accelerating Transformers on the Edge Enabled by 3D Heterogeneous Manycore Architectures","authors":"Pratyush Dhingra;Janardhan Rao Doppa;Partha Pratim Pande","doi":"10.1109/TCAD.2025.3531255","DOIUrl":"https://doi.org/10.1109/TCAD.2025.3531255","url":null,"abstract":"Transformer architectures have become the standard neural network model for various machine learning (ML) applications, including natural language processing and computer vision. However, the compute and memory requirements introduced by transformer models make them challenging to adopt for edge applications. Furthermore, fine-tuning pretrained transformers (e.g., foundation models) is a common task to enhance the model’s predictive performance on specific tasks/applications. Existing transformer accelerators are oblivious to complexities introduced by fine-tuning. In this article, we propose the design of a three-dimensional (3D) heterogeneous architecture referred to as Atleus that incorporates heterogeneous computing resources specifically optimized to accelerate transformer models for the dual purposes of fine-tuning and inference. Specifically, Atleus utilizes nonvolatile memory and systolic array for accelerating transformer computational kernels using an integrated 3D platform. Moreover, we design a suitable NoC to achieve high performance and energy efficiency. Finally, Atleus adopts an effective quantization scheme to support model compression. Experimental results demonstrate that Atleus outperforms existing state-of-the-art by up to <inline-formula> <tex-math>$56times $ </tex-math></inline-formula> and <inline-formula> <tex-math>$64.5times $ </tex-math></inline-formula> in terms of performance and energy efficiency, respectively.","PeriodicalId":13251,"journal":{"name":"IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems","volume":"44 8","pages":"2842-2855"},"PeriodicalIF":2.7,"publicationDate":"2025-01-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144657327","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"TeMACLE: A Technology Mapping-Aware Area-Efficient Standard Cell Library Extension Framework","authors":"Rongliang Fu;Chao Wang;Bei Yu;Tsung-Yi Ho","doi":"10.1109/TCAD.2025.3529802","DOIUrl":"https://doi.org/10.1109/TCAD.2025.3529802","url":null,"abstract":"Standard cell libraries play a crucial role in modern very large-scale integration design by providing predesigned, precharacterized, and preverified building blocks to simplify the design process. However, the increasing complexity of circuits demands more specialized and optimized cells, thereby necessitating the extension of standard cell libraries. This article proposes TeMACLE, a technology mapping-aware area-efficient framework to extend the standard cell library. Aiming at the area optimization of digital circuits, TeMACLE extends the given original standard cell library through two feasible: 1) the area compaction of standard cells and 2) the area-efficient facilitation for technology mapping. TeMACLE employs K-feasible cones to extract subcircuits and designs a subcircuit encoding method to divide them. Then, an SAT-based subcircuit matching algorithm is proposed to identify all equivalent subcircuits further. Finally, new standard cells are determined by a technology mapping-aware area-efficient strategy. The experimental results on the EPFL benchmark using the FreePDK45 process design kit show the effectiveness and efficiency of TeMACLE. Notably, TeMACLE is available at <uri>https://github.com/Flians/TeMACLE</uri>.","PeriodicalId":13251,"journal":{"name":"IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems","volume":"44 8","pages":"3034-3045"},"PeriodicalIF":2.7,"publicationDate":"2025-01-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144657450","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Error Resilient Online Reinforcement Learning Using Adaptive Statistical Checks","authors":"Chandramouli Amarnath;Mohamed Mejri;Jackson Isenberg;Abhijit Chatterjee","doi":"10.1109/TCAD.2025.3529820","DOIUrl":"https://doi.org/10.1109/TCAD.2025.3529820","url":null,"abstract":"Online deep reinforcement learning (deep RL)-based systems are being increasingly deployed in a variety of safety-critical applications. Due to the dynamic nature of the environments they work in, onboard reinforcement learning (RL) hardware is vulnerable to soft errors from radiation, thermal effects and electrical noise that corrupts the results of computations. Existing approaches to on-line error resilience in machine learning systems have relied on the availability of large training datasets to configure resilience parameters. This is not always feasible for online RL systems. Similarly, other approaches involving specialized hardware or modifications to training algorithms are difficult to implement for onboard RL applications. In contrast, we present a novel error resilience approach for online RL that leverages running statistics of neuron output values collected across the (real-time) RL training process to configure error detection thresholds (called checks) for the deep RL forward pass. Similarly, we formulate checks on the deep RL backward pass using running statistical thresholds on reduced-dimension checksums of online learning weight updates to rapidly detect and correct errors in online deep RL training. In this methodology, statistical concentration bounds leveraging running statistics are used to diagnose neuron outputs or weights as erroneous. The use of running statistics allows the checks to adapt to changes caused by continual on-line RL training. Erroneous neurons are set to zero (suppressed) in the forward pass. Erroneous weight updates are frozen, allowing nonerroneous weight updates to proceed and allowing online learning without rerunning training episodes. Our approach is compared against the state of the art and validated on several RL algorithms as well as a hardware validation platform.","PeriodicalId":13251,"journal":{"name":"IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems","volume":"44 8","pages":"3112-3125"},"PeriodicalIF":2.7,"publicationDate":"2025-01-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144657446","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"LayoutCopilot: An LLM-Powered Multiagent Collaborative Framework for Interactive Analog Layout Design","authors":"Bingyang Liu;Haoyi Zhang;Xiaohan Gao;Zichen Kong;Xiyuan Tang;Yibo Lin;Runsheng Wang;Ru Huang","doi":"10.1109/TCAD.2025.3529805","DOIUrl":"https://doi.org/10.1109/TCAD.2025.3529805","url":null,"abstract":"Analog layout design heavily involves interactive processes between humans and design tools. electronic design automation (EDA) tools for this task are usually designed to use scripting commands or visualized buttons for manipulation, especially for interactive automation functionalities, which have a steep learning curve and cumbersome user experience, making a notable barrier to designers’ adoption. Aiming to address such a usability issue, this article introduces LayoutCopilot, a pioneering multiagent collaborative framework powered by large language models (LLMs) for interactive analog layout design. LayoutCopilot simplifies human-tool interaction by converting natural language instructions into executable script commands, and it interprets high-level design intents into actionable suggestions, significantly streamlining the design process. Experimental results demonstrate the flexibility, efficiency, and accessibility of LayoutCopilot in handling real-world analog designs.","PeriodicalId":13251,"journal":{"name":"IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems","volume":"44 8","pages":"3126-3139"},"PeriodicalIF":2.7,"publicationDate":"2025-01-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144657445","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"PBS: Program Behavior-Aware Scheduling for High-Level Synthesis","authors":"Aoxiang Qin;Rongjie Yang;Minghua Shen;Nong Xiao","doi":"10.1109/TCAD.2025.3529817","DOIUrl":"https://doi.org/10.1109/TCAD.2025.3529817","url":null,"abstract":"Program behavior comprises operation dependency and resource requirement. They impact the performance of scheduling in high-level synthesis (HLS). Most existing scheduling methods focus on one aspect, resulting in poor performance. In this article, we propose PBS, a program behavior-aware scheduling method for HLS. We leverage a hybrid state encoding scheme to facilitate the comprehensive learning of program behaviors. Moreover, we propose bi-directional GNN and multiresolution aggregation schemes for learning complex operation dependency behavior. These schemes are integrated in an RL framework to iteratively improve scheduling solutions toward low latency and resource usage. Experiments show that PBS provides an average 32.7%, 26.3%, and 25.9% latency reductions, compared with the SDC, GNN-based, and RL-based methods, respectively.","PeriodicalId":13251,"journal":{"name":"IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems","volume":"44 8","pages":"3006-3019"},"PeriodicalIF":2.7,"publicationDate":"2025-01-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144657329","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"ChainPIM: A ReRAM-Based Processing-in-Memory Accelerator for HGNNs via Chain Structure","authors":"Wenjing Xiao;Jianyu Wang;Dan Chen;Chenglong Shi;Xin Ling;Min Chen;Thomas Wu","doi":"10.1109/TCAD.2025.3528906","DOIUrl":"https://doi.org/10.1109/TCAD.2025.3528906","url":null,"abstract":"Heterogeneous graph neural networks (HGNNs) have recently demonstrated significant advantages of capturing powerful structural and semantic information in heterogeneous graphs. Different from homogeneous graph neural networks directly aggregating information based on neighbors, HGNNs aggregate information based on complex metapaths. ReRAM-based processing-in-memory (PIM) architecture can reduce data movement and compute matrix-vector multiplication (MVM) in analog. It can be well used to accelerate HGNNs. However, the complex metapath-based aggregation of HGNNs makes it challenging to efficiently utilize the parallelism of ReRAM and vertices data reuse. To this end, we propose ChainPIM, the first ReRAM-based processing-in-memory accelerator for HGNNs featuring high-computing parallelism and vertices data reuse. Specifically, we introduce R-chain, which is based on a chain structure to build related metapath instances together. We can efficiently reuse vertices through R-chain and process different R-chains in parallel. Then, we further design an efficient storage format for storing R-chains, which reduces a lot of repeated vertices storage. Finally, a specialized ReRAM-based architecture is developed to pipeline different types of aggregations in HGNNs, fully exploiting the huge potential of multilevel parallelism in HGNNs. Our experiments show that ChainPIM achieves an average memory space reduction of 47.86% and performance improvement by <inline-formula> <tex-math>$128.29times $ </tex-math></inline-formula> compared to NVIDIA Tesla V100 GPU.","PeriodicalId":13251,"journal":{"name":"IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems","volume":"44 7","pages":"2516-2529"},"PeriodicalIF":2.7,"publicationDate":"2025-01-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144322970","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Spatz: Clustering Compact RISC-V-Based Vector Units to Maximize Computing Efficiency","authors":"Matteo Perotti;Samuel Riedel;Matheus Cavalcante;Luca Benini","doi":"10.1109/TCAD.2025.3528349","DOIUrl":"https://doi.org/10.1109/TCAD.2025.3528349","url":null,"abstract":"The ever-increasing computational and storage requirements of modern applications and the slowdown of technology scaling pose major challenges to designing and implementing efficient computer architectures. To mitigate the bottlenecks of typical processor-based architectures on both the instruction and data sides of the memory, we present Spatz, a compact 64 bit floating-point-capable vector processor based on RISC-V’s vector extension Zve64d. Using Spatz as the main Processing Element (PE), we design an open-source dualcore vector processor architecture based on a modular and scalable cluster sharing a Scratchpad Memory (SCM). Unlike typical vector processors, whose Vector Register Files (VRFs) are hundreds of KiB large, we prove that Spatz can achieve peak energy efficiency with a latch-based VRF of only 2 KiB. An implementation of the Spatz-based cluster in GlobalFoundries’ 12LPP process with eight double-precision Floating Point Units (FPUs) achieves an FPU utilization just 3.4% lower than the ideal upper bound on a double-precision, floating-point matrix multiplication. The cluster reaches 7.7 FMA/cycle, corresponding to 15.7 DP-GFLOPS and 95.7 GFLOPSDP/W at 1 GHz and nominal operating conditions (TT, 0.80 V, and 25 °C), with more than 55% of the power spent on the FPUs. Furthermore, the optimally balanced Spatz-based cluster reaches a 95.0% FPU utilization (7.6 FMA/cycle), 15.2 GFLOPSDP, and 99.3 GFLOPSDP/W (61% of the power spent in the FPU) on a 2D workload with <inline-formula> <tex-math>$7times 7$ </tex-math></inline-formula> kernel, resulting in an outstanding area/energy efficiency of 171 GFLOPSDP/W/mm2. At equi-area, the computing cluster built upon compact vector processors reaches a 30% higher energy efficiency than a cluster with the same FPU count built upon scalar cores specialized for stream-based floating-point computation.","PeriodicalId":13251,"journal":{"name":"IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems","volume":"44 7","pages":"2488-2502"},"PeriodicalIF":2.7,"publicationDate":"2025-01-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144322967","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}