Lancheng Zou;Su Zheng;Peng Xu;Siting Liu;Bei Yu;Martin D. F. Wong
{"title":"Lay-Net: Grafting Netlist Knowledge on Layout-Based Congestion Prediction","authors":"Lancheng Zou;Su Zheng;Peng Xu;Siting Liu;Bei Yu;Martin D. F. Wong","doi":"10.1109/TCAD.2025.3527379","DOIUrl":"https://doi.org/10.1109/TCAD.2025.3527379","url":null,"abstract":"Congestion modeling is crucial for enhancing the routability of VLSI placement solutions. The underutilization of netlist information constrains the efficacy of existing layout-based congestion modeling techniques. We devise a novel approach that grafts netlist-based message passing (MP) into a layout-based model, thereby achieving a better knowledge fusion between layout and netlist to improve congestion prediction performance. The innovative heterogeneous MP paradigm more effectively incorporates routing demand into the model by considering connections between cells, overlaps of nets, and interactions between cells and nets. Leveraging multiscale features, the proposed model effectively captures connection information across various ranges, addressing the issue of inadequate global information present in existing models. Using contrastive learning and mini-Gnet techniques allows the model to learn and represent features more effectively, boosting its capabilities and achieving superior performance. Extensive experiments demonstrate a notable performance enhancement of the proposed model compared to existing methods. Our code is available at: <uri>https://github.com/lanchengzou/congPred</uri>.","PeriodicalId":13251,"journal":{"name":"IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems","volume":"44 7","pages":"2627-2640"},"PeriodicalIF":2.7,"publicationDate":"2025-01-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144322990","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Toward Precise and Explainable Hardware Trojan Localization at LUT Level","authors":"Hao Su;Wei Hu;Xuelin Zhang;Dan Zhu;Lingjuan Wu","doi":"10.1109/TCAD.2025.3527377","DOIUrl":"https://doi.org/10.1109/TCAD.2025.3527377","url":null,"abstract":"Trojans represent a severe threat to hardware security and trust. This work investigates the Trojan detection problem from a unique viewpoint and proposes a novel hardware Trojan localization method targeting FPGA netlists. The proposed method automatically extracts the rich structural and behavioral features at look-up-table (LUT) level to train an explainable graph neural network (GNN) model for classifying design nodes in FPGA netlists and identifying the Trojan-infected ones. Experimental results using 183 hardware Trojan benchmarks show that our method successfully pinpoints Trojan-infected nodes with true positive rate, accuracy and area under the ROC curve (AUC) of 95.14%, 95.71%, and 95.46%, respectively. To the best of our knowledge, this is the first LUT level Trojan localization solution using explainable GNNs.","PeriodicalId":13251,"journal":{"name":"IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems","volume":"44 7","pages":"2817-2821"},"PeriodicalIF":2.7,"publicationDate":"2025-01-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144323095","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Reverse-Engineering Optimization Techniques of High-Level Synthesis: Practical Insights Into Accelerating Applications With AMD-Xilinx Vitis","authors":"Jorge Koronis;Oscar Garnica;J. Ignacio Hidalgo;Juan Lanchares Dávila","doi":"10.1109/TCAD.2025.3526053","DOIUrl":"https://doi.org/10.1109/TCAD.2025.3526053","url":null,"abstract":"Modern AI applications contain computationally expensive sections. Accelerator cards and tools like AMD Vitis HLS leverage high-level synthesis (HLS) and hardware (HW) optimizations to create custom HW designs to accelerate them. Nevertheless, the learning curve is steep, even for those with previous knowledge of HW design, due to the complexity of the optimization techniques and limited information on their interactions and HW effects. This article quantitatively analyses the interactions of optimization techniques after reverse engineering Vitis’ optimization directives, both in isolation and in pairs. Over 150 experiments were conducted to investigate three distinct goals: 1) assessing pragma behavior and the rules governing pragma application and optimizations; 2) modeling Vitis HLS latency estimates; and 3) evaluating the impact of optimizations on design space exploration (DSE), specifically area and latency. These experiments involve different combinations and placements of optimizations in the loop and function hierarchy of the test bench. Our findings offer guidance on using Vitis pragmas and identify promising configurations for optimizing latency and area.","PeriodicalId":13251,"journal":{"name":"IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems","volume":"44 7","pages":"2558-2570"},"PeriodicalIF":2.7,"publicationDate":"2025-01-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=10830788","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144323091","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Nested Speculative Execution Attacks via Runahead","authors":"Chaoqun Shen;Gang Qu;Jiliang Zhang","doi":"10.1109/TCAD.2025.3526544","DOIUrl":"https://doi.org/10.1109/TCAD.2025.3526544","url":null,"abstract":"Runahead execution is an effective microarchitectural level performance boosting technique. It removes the blocking load instruction with long latency and speculatively executes the subsequent instructions with little pipeline modifications. However, the nature of prefetching data and instructions creates potential security risks similar to Spectre and Meltdown. In this work, we present the first comprehensive analysis of the security implications of runahead execution and report a novel attack, named SPECRUN. SPECRUN exploits the unresolved branch predictions within nested speculative execution during runahead execution. It can manipulate the speculative execution window and hence eliminates the major limitation of Spectre-type attacks: the number of executable transient instructions is limited by the small reorder buffer size. Therefore, SPECRUN can improve the exploitability of transient attacks significantly. To demonstrate this, we implement a proof-of-concept attack that can successfully extract secrets from a victim process. We analyze existing defense techniques and propose new ones against SPECRUN. The effectiveness and overhead of these mitigation mechanisms are carefully discussed to shed light on the security vulnerabilities and defense before the adoption of runahead execution on current and future processors.","PeriodicalId":13251,"journal":{"name":"IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems","volume":"44 7","pages":"2475-2487"},"PeriodicalIF":2.7,"publicationDate":"2025-01-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144322964","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Li Lu;Junchao Chen;Aneesh Balakrishnan;Markus Ulbricht;Milos Krstic
{"title":"Accelerate SEU Simulation-Based Fault Injection With Spatio-Temporal Graph Convolutional Networks","authors":"Li Lu;Junchao Chen;Aneesh Balakrishnan;Markus Ulbricht;Milos Krstic","doi":"10.1109/TCAD.2025.3526748","DOIUrl":"https://doi.org/10.1109/TCAD.2025.3526748","url":null,"abstract":"Evaluating the sensitivity of circuits to single event upset (SEU) faults has become increasingly important and challenging due to the growing complexity of circuits. Simulation-based fault injection is time-intensive, particularly for highly complex circuits. This article proposes a novel approach using Spatio-temporal graph convolutional networks (STGCNs) to predict SEU fault propagation results in circuits. By representing circuits’ structure as graphs and integrating temporal features from the simulation workload, STGCNs can learn from these spatio-temporal graphs to identify SEU fault propagation patterns. To validate this method, we test it on six evaluation circuits, achieving a prediction accuracy of 93%–99%. Given this performance, to accelerate SEU simulation-based fault injection, we divide SEU faults into three subsets and use an STGCN fine-tuned on the training and validation dataset to predict SEU fault propagation in the test dataset, eliminating the need for simulation and reducing the required time. To identify an efficient dataset separation method, we compare three sampling methods: 1) spatial sampling (sampling flip-flops for injected faults); 2) temporal sampling (sampling time points for fault injection); and 3) hybrid sampling (incorporating both spatial and temporal sampling). The hybrid sampling approach is the most promising, optimizing the tradeoff between efficiency and accuracy. This approach reduces simulation time by 50% while maintaining accuracy above 95% on the six evaluation circuits.","PeriodicalId":13251,"journal":{"name":"IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems","volume":"44 7","pages":"2599-2612"},"PeriodicalIF":2.7,"publicationDate":"2025-01-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144323103","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Robust Quantum Feature Selection With Sparse Optimization Circuit","authors":"Jiaye Li;Jiagang Song;Jinjing Shi;Hang Xu;Hao Yu;Gang Chen;Shichao Zhang","doi":"10.1109/TCAD.2025.3526060","DOIUrl":"https://doi.org/10.1109/TCAD.2025.3526060","url":null,"abstract":"High-dimensional data has long been a notoriously challenging issue. Existing quantum dimension reduction technology primarily focuses on quantum principal component analysis. However, there are only a few studies on quantum feature selection (QFS) algorithms, and these algorithms are often not robust. Additionally, there are limited quantum circuits specifically designed for feature selection, and they still cannot address the objective function based on sparse learning. To address these issues, this article proposes a robust QFS algorithm by designing a novel sparse optimization circuit. Specifically, we first apply sparse regularization and least squares loss to construct the proposed objective function. Then, six types of quantum registers and their initial states are prepared. Furthermore, quantum techniques such as quantum phase estimation and controlled rotation are employed to construct a sparse optimization circuit, which is used to obtain the final quantum state of the feature selection variable. Finally, a series of experiments are conducted to verify the accuracy of the feature selection and the robustness of the proposed algorithm.","PeriodicalId":13251,"journal":{"name":"IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems","volume":"44 7","pages":"2613-2626"},"PeriodicalIF":2.7,"publicationDate":"2025-01-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144322966","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"FTCD: Fault-Tolerant Co-Design of Flow and Control Layers for Fully Programmable Valve Array Biochips","authors":"Yuhan Zhu;Genggeng Liu;Wenzhong Guo;Xing Huang","doi":"10.1109/TCAD.2025.3525615","DOIUrl":"https://doi.org/10.1109/TCAD.2025.3525615","url":null,"abstract":"As a new generation of flow-based microfluidics, fully programmable valve array (FPVA) biochips have gained widespread adoption as a biochemical experimental platform, thanks to their enhanced programmability and flexibility. Environmental and human factors, however, often introduce physical faults during the manufacturing process, such as channel blockage and leakage, which, undoubtedly, can affect the results of bioassays and even cause execution failure. In this article, we focus on the fault-tolerant co-design of flow and control layers in FPVA biochips for the first time. For the flow layer, three dynamic fault-tolerant techniques, i.e., a cell function conversion method, a bidirectional redundancy scheme, and a fault mapping method, are presented and integrated into the device placement and flow routing stages. As a consequence, we further realize an efficient and effective fault-tolerance-oriented physical design method, thus ensuring the robustness of chip architecture and correctness of assay outcomes. For the control layer, we design another three fault-tolerant techniques, including a series duplication scheme of leakage valves, allocation and merging rules of backup valves, and a logic conflict-aware adjustment strategy of redundant architecture. Based on these techniques, we construct a fault-tolerant control system to realize dynamic recovery of control signals. Experimental results on multiple test cases demonstrate that the proposed method can produce optimized fault-tolerant FPVA architectures with low-fabrication cost, high-execution efficiency, and high-fault-tolerance success rate.","PeriodicalId":13251,"journal":{"name":"IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems","volume":"44 7","pages":"2669-2682"},"PeriodicalIF":2.7,"publicationDate":"2025-01-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144323090","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Andrea Costamagna;Alessandro Tempia Calvino;Alan Mishchenko;Giovanni De Micheli
{"title":"Area-Oriented Resubstitution For Networks of Look-Up Tables","authors":"Andrea Costamagna;Alessandro Tempia Calvino;Alan Mishchenko;Giovanni De Micheli","doi":"10.1109/TCAD.2025.3525617","DOIUrl":"https://doi.org/10.1109/TCAD.2025.3525617","url":null,"abstract":"This article addresses the challenge of reducing the number of nodes in look-up table (LUT) networks with two significant applications. First, field-programmable gate arrays (FPGAs) can be modeled as networks of LUTs, and minimizing the node count is imperative to meet resource constraints. Second, in area-oriented design space exploration for standard-cell designs, collapsing a circuit into a LUT network, restructuring it, and later remapping to the original representation helps escape local minima. Thus, the development of algorithms for optimizing and restructuring LUT networks holds considerable promise for area-oriented optimization. Substitution (also called resubstitution) is a powerful logic minimization method that can identify nonlocal logic dependencies and exploit them for logic minimization. State-of-the-art substitution algorithms for LUT networks rely heavily on SAT solving, limiting the number of optimization attempts and the size of the substitution subnetworks to one node <xref>[1]</xref>. Conversely, our method relies on circuit simulation to increase the number of substitution candidates and enables substitutions with more than one node. The experimental results show that the proposed method identifies optimization opportunities overlooked by other methods, improving <bold>11</b> out of <bold>23</b> best-known results in the EPFL synthesis competition and yielding a 3.46% area reduction compared to the state-of-the-art.","PeriodicalId":13251,"journal":{"name":"IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems","volume":"44 7","pages":"2571-2584"},"PeriodicalIF":2.7,"publicationDate":"2025-01-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=10820546","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144323101","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Retention Accelerated Testing for 3-D QLC nand Flash Memory: Characterization, Analysis, and Modeling","authors":"Shaoqi Yang;Meng Zhang;Xuepeng Zhan;Peng Guo;Xiaohuan Zhao;Guangkuo Yang;Xinyi Guo;Jixuan Wu;Fei Wu;Jiezhi Chen","doi":"10.1109/TCAD.2025.3526055","DOIUrl":"https://doi.org/10.1109/TCAD.2025.3526055","url":null,"abstract":"3-D <sc>nand</small> flash memory has become quite popular and is now widely used in data centers and mobile devices due to its outstanding storage density and cost-effectiveness. Larger storage capacity is made possible by 3-D quad-level cell (QLC) <sc>nand</small> flash memory with the charge-trap (CT) structure, which stores four bits in each cell. However, data reliability is sacrificed in exchange for greater capacity. The lifespan of data retention is crucial for nonvolatile storage. Thus, an important role is played by the Arrhenius model, which is widely used for lifespan prediction and high-temperature acceleration testing. Interestingly, we discover that the conventional Arrhenius model is inaccurate after analyzing the data retention properties of 3-D QLC <sc>nand</small> flash memory. An empirical model is proposed for changing the apparent activation energy (Ea) based on the influence of different parameters, in order to accurately predict data lifespan and perform accelerated experiments. This developed model provides a temperature- and cycle-related parameter table for Ea, which is useful for high-temperature acceleration testing examinations. Simultaneously, we observe a linear connection between the 40 °C data retention time mapping and the other temperatures. We evaluate the effects of the modified Ea model and the classic Arrhenius model with the epitaxial data and conclude that the former can reduce the error by approximately 70% to a maximum.","PeriodicalId":13251,"journal":{"name":"IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems","volume":"44 7","pages":"2779-2788"},"PeriodicalIF":2.7,"publicationDate":"2025-01-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144323107","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Yuzhou Chen;Zhenyu Li;Dongxu Lyu;Yansong Xu;Guanghui He
{"title":"Neural Rendering Acceleration With Deferred Neural Decoding and Voxel-Centric Data Flow","authors":"Yuzhou Chen;Zhenyu Li;Dongxu Lyu;Yansong Xu;Guanghui He","doi":"10.1109/TCAD.2024.3524918","DOIUrl":"https://doi.org/10.1109/TCAD.2024.3524918","url":null,"abstract":"Neural radiance field has become a fundamental rendering technique across diverse applications such as augmented/virtual reality and autonomous driving. It achieves exceptional rendering quality and reduces model construction cost mainly by introducing a novel neural representation, instant neural graphics primitives (Instant-NGPs). Despite its superiority, Instant-NGP poses severe problems of intensive computation, memory inefficiency and pipeline inefficiency, owing to numerous neural network queries, irregular memory access and intricate sampling procedure. To address these issues, this article proposes NeRA, an algorithm-architecture co-optimization framework that facilitates the efficient neural rendering of Instant-NGP. For intensive computation, we reconstruct the rendering flow and propose a deferred neural decoding algorithm to aggregate the network queries, which reduces the computational workload by 85.6% and only incurs <0.5%> <tex-math>$Delta $ </tex-math></inline-formula> interpolation algorithm is proposed to condense the scattered memory access and improves the equivalent bandwidth of on-chip memory by <inline-formula> <tex-math>$2.38times $ </tex-math></inline-formula>. Furthermore, a voxel-centric data flow is proposed to fully reuse the cached data and save 88.7% of the external memory access. For pipeline inefficiency, a highly-pipelined hardware architecture with decoupled spatial skipping and interleaved sampling is constructed to eliminate the bubbles and invalid samples in the pipeline, which boosts the overall throughput by <inline-formula> <tex-math>$2.41times $ </tex-math></inline-formula>. Extensively evaluated on representative benchmarks, NeRA attains <inline-formula> <tex-math>$1.2sim 2.9times $ </tex-math></inline-formula> in rendering throughput, <inline-formula> <tex-math>$1.7sim 36.5times $ </tex-math></inline-formula> in energy-efficiency and <inline-formula> <tex-math>$3.6sim 8.3times $ </tex-math></inline-formula> in area-efficiency, compared to the state-of-the-art related architectures.","PeriodicalId":13251,"journal":{"name":"IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems","volume":"44 7","pages":"2725-2737"},"PeriodicalIF":2.7,"publicationDate":"2025-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144323102","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}