{"title":"IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems publication information","authors":"","doi":"10.1109/TCAD.2024.3513476","DOIUrl":"https://doi.org/10.1109/TCAD.2024.3513476","url":null,"abstract":"","PeriodicalId":13251,"journal":{"name":"IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems","volume":"44 1","pages":"C3-C3"},"PeriodicalIF":2.7,"publicationDate":"2024-12-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=10814919","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142880309","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"2024 Index IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems Vol. 43","authors":"","doi":"10.1109/TCAD.2024.3518672","DOIUrl":"https://doi.org/10.1109/TCAD.2024.3518672","url":null,"abstract":"","PeriodicalId":13251,"journal":{"name":"IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems","volume":"43 12","pages":"4865-4939"},"PeriodicalIF":2.7,"publicationDate":"2024-12-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=10804686","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142844453","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"CAMEL: Physically Inspired Crosstalk-Aware Mapping and Gate Scheduling for Frequency-Tunable Quantum Chips","authors":"Bin-Han Lu;Peng Wang;Zhao-Yun Chen;Huan-Yu Liu;Tai-Ping Sun;Peng Duan;Yu-Chun Wu;Guo-Ping Guo","doi":"10.1109/TCAD.2024.3507580","DOIUrl":"https://doi.org/10.1109/TCAD.2024.3507580","url":null,"abstract":"Crosstalk poses a significant challenge in quantum computing, particularly when quantum gates are executed in parallel, as qubit frequency resonance can lead to residual coupling and reduced gate fidelity. Current solutions struggle to mitigate both crosstalk and decoherence during parallel two-qubit gate operations on frequency-tunable quantum chips. To address this, we propose a crosstalk-aware mapping and gate scheduling (CAMEL) approach, designed to mitigate crosstalk and suppress decoherence by leveraging the tunable coupler’s physical properties and incorporating a pulse compensation technique. CAMEL operates within a two-step compilation framework: first, a qubit mapping strategy that considers both crosstalk and decoherence; and second, a gate timing scheduling method that prioritizes the execution of the largest possible set of crosstalk-free parallel gates, reducing overall circuit execution time. Evaluation results demonstrate CAMEL’s superior ability to mitigate crosstalk compared to crosstalk-agnostic methods, while successfully suppressing decoherence where other approaches fail. Additionally, CAMEL performs better than dynamic-frequency-aware techniques, particularly in low-complexity hardware environments.","PeriodicalId":13251,"journal":{"name":"IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems","volume":"44 5","pages":"1968-1980"},"PeriodicalIF":2.7,"publicationDate":"2024-11-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143861003","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Roberto Bosio;Filippo Minnella;Teodoro Urso;Mario R. Casu;Luciano Lavagno;Mihai T. Lazarescu;Paolo Pasini
{"title":"NN2FPGA: Optimizing CNN Inference on FPGAs With Binary Integer Programming","authors":"Roberto Bosio;Filippo Minnella;Teodoro Urso;Mario R. Casu;Luciano Lavagno;Mihai T. Lazarescu;Paolo Pasini","doi":"10.1109/TCAD.2024.3507570","DOIUrl":"https://doi.org/10.1109/TCAD.2024.3507570","url":null,"abstract":"Skip connections have emerged as a key component of modern convolutional neural networks (CNNs) for computer vision tasks, allowing for the creation of more accurate and deeper models by addressing the vanishing gradient problem. However, the existing implementations of field-programmable gate array (FPGA)-based accelerators for ResNets and MobileNetV2 often experience decreased performance and increased computational latency due to the implementation of skip blocks. This article presents a novel framework for developing deep learning models on FPGAs that focuses on skip connections, with a unique approach to reduce buffering overhead. This results in a more efficient utilization of resources in the implementation of the skip layer. The nn2fpga compiler follows a thorough set of high-level synthesis (HLS) design principles and optimization strategies, exploiting in novel ways standard techniques to effectively map skip connection-based networks into static dataflow accelerators. To maximize throughput and efficiently use the available resources, our compiler employs a fast and effective design space exploration method based on a binary integer programming model which accurately assigns FPGA resources to the network layers, to maximize global throughput under resource constraints and then minimize resources for the achieved maximum throughput. Experimental results on the CIFAR-10 and ImageNet datasets demonstrate substantial gains in throughput (<inline-formula> <tex-math>$mathbf {3times }$ </tex-math></inline-formula>–<inline-formula> <tex-math>$mathbf {7times }$ </tex-math></inline-formula> on the past HLS-based work) for ResNet8, ResNet20, and MobileNetV2 models deployed on various Xilinx FPGA boards. Notably, MobileNetV2 deployed on the ZCU102 achieves a throughput of 2115 frame per second, representing even a 10% speedup over a state-of-the-art highly optimized manual register-transfer level implementation, showing that HLS can actually improve over manual design, thanks to the faster exploration of the design space.","PeriodicalId":13251,"journal":{"name":"IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems","volume":"44 5","pages":"1807-1818"},"PeriodicalIF":2.7,"publicationDate":"2024-11-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=10769518","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143871020","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Multipath Bound for DAG Tasks","authors":"Qingqiang He;Nan Guan;Shuai Zhao;Mingsong Lv","doi":"10.1109/TCAD.2024.3507563","DOIUrl":"https://doi.org/10.1109/TCAD.2024.3507563","url":null,"abstract":"This article studies the response time bound of a directed acyclic graph (DAG) task. Recently, the idea of using multiple paths to bound the response time of a DAG task, instead of using a single longest path in previous results, was proposed and led to the so-called multipath bound. Multipath bounds can greatly reduce the response time bound and significantly improve the schedulability of DAG tasks. This article derives a new multipath bound and proposes an optimal algorithm to compute this bound. We further present a systematic analysis on the dominance and the sustainability of three existing multipath bounds and the proposed multipath bound. Our bound theoretically dominates and empirically outperforms all existing multipath bounds. What is more, the proposed bound is the only multipath bound that is proved to be self-sustainable.","PeriodicalId":13251,"journal":{"name":"IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems","volume":"44 5","pages":"1676-1689"},"PeriodicalIF":2.7,"publicationDate":"2024-11-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143870933","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"CSA-CiM: Enhancing Multifunctional Computing-in-Memory With Configurable Sense Amplifiers","authors":"Yuxiao Jiang;Kai Ni;Thomas Kämpfe;Cheng Zhuo;Zheyu Yan;Xunzhao Yin","doi":"10.1109/TCAD.2024.3506864","DOIUrl":"https://doi.org/10.1109/TCAD.2024.3506864","url":null,"abstract":"Computing-in-memory (CiM) effectively alleviates the memory wall problem faced by traditional von Neumann architectures when handling data-intensive applications. Most CiM arrays employ dedicated sense amplifiers (SAs) to perform specific functions, and prior configurable CiM arrays achieve multifunctionality by stacking multiple SAs with corresponding functions. However, the independent nature of these SAs, particularly the analog-to-digital converter (ADC), results in excessive energy and area consumption. In this article, we propose a configurable multifunctional ferroelectric field effect transistor (FeFET)-based CiM array design, including configurable peripheral circuit with corresponding multifunctionalities and reusable SA components, to reduce energy consumption and latency. The array cells perform logical AND and XNOR operations, and the proposed SA can be configured to operate in either ADC or winner-take-all (WTA) modes, thereby enabling the array to implement both multiplication-accumulation (MAC) and associative search operations. Instead of operating independently, the WTA component within the SA participates as a flash stage in successive approximation register (SAR) conversions in ADC mode, thus enhancing the WTA utilization, energy efficiency and compactness. By integrating the multifunctional CiM array and the configurable SA, our design supports MAC, Hamming-distance computation (HDC), and nearest neighbor search (NNS) operations within the same structure. Compared to existing works, our design achieves energy efficiency improvements of <inline-formula> <tex-math>$7.2times $ </tex-math></inline-formula> for MAC, <inline-formula> <tex-math>$2.9times $ </tex-math></inline-formula> for HDC, and EDP improvement of <inline-formula> <tex-math>$6.4times $ </tex-math></inline-formula> for NNS, respectively.","PeriodicalId":13251,"journal":{"name":"IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems","volume":"44 5","pages":"1869-1873"},"PeriodicalIF":2.7,"publicationDate":"2024-11-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143871096","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"PISOV: Physics-Informed Separation of Variables Solvers for Full-Chip Thermal Analysis","authors":"Liang Chen;Wenxing Zhu;Min Tang;Sheldon X.-D. Tan;Jun-Fa Mao;Jianhua Zhang","doi":"10.1109/TCAD.2024.3506867","DOIUrl":"https://doi.org/10.1109/TCAD.2024.3506867","url":null,"abstract":"Thermal issues are becoming increasingly critical due to rising power densities in high-performance chip design. The need for fast and precise full-chip thermal analysis is evident. Although machine learning (ML)-based methods have been widely used in thermal simulation, their training time remains a challenge. In this article, we proposed a novel physics-informed separation of variables solver (PISOV) to significantly reduce training time for fast full-chip thermal analysis. Inspired by the recently proposed ThermPINN, we employ a least-square regression method to calculate the unknown coefficients of the cosine series. The proposed PISOV method combines physics-informed neural network (PINN) and separation of variables (SOVs) methods. Due to the matrix-solving method of PISOV, its speed is much faster than that of ThermPINN. On top of PISOV, we parameterize effective convection coefficients and power values for surrogate model-based uncertainty quantification (UQ) analysis by using neural networks, a task that cannot be accomplished by the SOV method. In the parameterized PISOV, we only need to calculate once to obtain all parameterized results of the hyperdimensional partial differential equations. Additionally, we study the impact of sampling methods (such as grid, uniform, Sobol, Latin hypercube sampling (LHS), Halton, and Hammersly) and hybrid sampling methods on the accuracy of PISOV and parameterized PISOV. Numerical results show that PISOV can achieve a speedup of <inline-formula> <tex-math>$245times $ </tex-math></inline-formula>, and <inline-formula> <tex-math>$10^{4}times $ </tex-math></inline-formula> over ThermPINN, and PINN, respectively. Among different sampling methods, the Hammersley sampling method yields the best accuracy.","PeriodicalId":13251,"journal":{"name":"IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems","volume":"44 5","pages":"1874-1886"},"PeriodicalIF":2.7,"publicationDate":"2024-11-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143870920","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Xuan Wang;Xiaomi Zhou;Shanshan Han;Ruicheng Dai;Xiaolong Shen;Menghui Xu;Leibin Ni;Wei Wu;Weikang Qian
{"title":"AccALS 2.0: Accelerating Approximate Logic Synthesis by Simultaneous Selection of Multiple Local Approximate Changes","authors":"Xuan Wang;Xiaomi Zhou;Shanshan Han;Ruicheng Dai;Xiaolong Shen;Menghui Xu;Leibin Ni;Wei Wu;Weikang Qian","doi":"10.1109/TCAD.2024.3506860","DOIUrl":"https://doi.org/10.1109/TCAD.2024.3506860","url":null,"abstract":"Approximate computing emerges as an energy-efficient computing paradigm designed for applications that can tolerate errors. Many iterative methods for approximate logic synthesis (ALS) have been developed to automatically synthesize approximate circuits. Nonetheless, most of them overlook the potential of applying multiple local approximate changes (LACs) simultaneously in one iteration, which can significantly reduce the overall computation time. In this article, we propose AccALS 2.0, a novel framework for further accelerating iterative ALS flows, which is based on simultaneous selection of multiple LACs in a single round. However, there are two challenges for selecting multiple LACs. The first is that the mutual influence of multiple LACs can affect the estimation of the circuit error. The second is that there may exist conflicts among multiple LACs. To address these issues, first, we propose an efficient measure for the mutual influence between two LACs. With its help, we transform the problems of solving the LAC conflicts and selecting multiple LACs into a unified maximum independent set problem for solving. The experimental results showed that AccALS 2.0 outperforms state-of-the-art ALS methods in runtime, while achieving similar or better-circuit quality.","PeriodicalId":13251,"journal":{"name":"IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems","volume":"44 5","pages":"1620-1633"},"PeriodicalIF":2.7,"publicationDate":"2024-11-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143871023","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Siting Liu;Ziyi Wang;Fangzhou Liu;Yibo Lin;Bei Yu;Martin D. F. Wong
{"title":"Sign-Off Timing Considerations via Concurrent Routing Topology Optimization","authors":"Siting Liu;Ziyi Wang;Fangzhou Liu;Yibo Lin;Bei Yu;Martin D. F. Wong","doi":"10.1109/TCAD.2024.3506216","DOIUrl":"https://doi.org/10.1109/TCAD.2024.3506216","url":null,"abstract":"Timing closure is considered across the circuit design flow. Generally, the early stage timing optimization can only focus on improving early timing metrics, e.g., rough timing estimation using linear RC model or prerouting path length, since obtaining sign-off performance needs a time-consuming routing flow. However, there is no consistency guarantee between early stage metrics and sign-off timing performance. Therefore, we utilize the power of deep learning techniques to bridge the gap between the early stage analysis and the sign-off analysis. A well-designed deep learning framework guides the adjustment of Steiner points to enable explicit early stage timing optimization. Cooperating with deep Steiner point adjustment, we propose the routing topology reconstruction to accelerate the convergence and hold a reasonable routing topology. Further, we also introduce Steiner point simplification as a post-processing technique to avoid unnecessary routing constraints. This article demonstrates the ability of the learning-assist framework to perform robust and efficient timing optimization in the early stage with comprehensive and convincing experimental results on real-world designs. With Steiner point adjustment alone, TSteinerPt, can help the state-of-the-art open-source router to obtain 11.2% and 7.1% improvement for the sign-off worst-negative slack and total negative slack, respectively. Under the additional joint optimization with routing topology reconstruction and simplification, TSteinerRec can further save 25.9% optimization duration with a better-sign-off performance.","PeriodicalId":13251,"journal":{"name":"IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems","volume":"44 5","pages":"1942-1953"},"PeriodicalIF":2.7,"publicationDate":"2024-11-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143860953","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"SRAM Periphery Testing Using the Cell-Aware Test Methodology","authors":"Xhesila Xhafa;Eric Faehn;Patrick Girard;Arnaud Virazel","doi":"10.1109/TCAD.2024.3506854","DOIUrl":"https://doi.org/10.1109/TCAD.2024.3506854","url":null,"abstract":"Testing memory circuits is crucial for ensuring the quality and reliability of system-on-chip (SoC) designs, especially as shrinking technology nodes increase susceptibility to nanometer-scale defects. This article introduces an enhanced methodology for memory testing, leveraging the cell-aware (CA) test concept. Building on prior work for SRAM array testing (Xhafa et al., 2023), we extend the CA methodology to include periphery testing by generating, for the first time, CA models for each memory input–output (I/O) element, covering key components, such as address decoders, write drivers, and sense amplifiers. We present results from testing these periphery components using the CA methodology. Additionally, we compare existing SRAM testing techniques with our CA methodology for the decoder and I/O circuitry. To ensure a fair comparison, we selected minimal March tests designed to detect functional faults in peripheral circuits, aligning with the fault models targeted by our approach. A quantitative analysis of fault coverage demonstrates the effectiveness of our methodology compared to March algorithms, particularly in terms of test complexity.","PeriodicalId":13251,"journal":{"name":"IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems","volume":"44 5","pages":"2000-2013"},"PeriodicalIF":2.7,"publicationDate":"2024-11-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143860950","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}