{"title":"Prerouting Timing Prediction Across Different Technology Nodes","authors":"Xinyun Zhang;Binwu Zhu;Fangzhou Liu;Jiaxi Jiang;Ziyi Wang;Peng Xu;Hong Xu;Bei Yu","doi":"10.1109/TCAD.2024.3523426","DOIUrl":"https://doi.org/10.1109/TCAD.2024.3523426","url":null,"abstract":"In the domain of very-large-scale integration (VLSI) design, the accuracy of prerouting timing prediction is of paramount importance for ensuring the performance and reliability of integrated circuits. Traditional methods based on machine learning necessitate the availability of extensive and high-quality datasets. However, this requirement poses significant challenges for advanced technology nodes due to the laborious and time-intensive nature of data preparation. To address this critical issue, we introduce a novel transfer learning framework that leverages data from preceding technology nodes to facilitate learning and prediction on the target node. Our methodology commences with the disentanglement and alignment of timing path features across different nodes, ensuring the preservation and effective translation of intrinsic timing path properties. Subsequently, we employ a Bayesian-based model to predict the arrival times of individual timing paths. This model is particularly adept at managing the high-variability inherent in arrival times and exhibits strong generalization capabilities to novel design scenarios. Moreover, we propose a new algorithm to reweight the preceding node data during training by estimating their transferability through the cell type distribution. We validate the efficacy of our proposed framework through comprehensive experimental evaluations, demonstrating successful transfer learning from 130 or 45 to 7-nm technology nodes. The results underscore the potential of our approach to significantly mitigate the dependency on extensive data preparation while maintaining high accuracy in timing prediction for cutting-edge VLSI designs.","PeriodicalId":13251,"journal":{"name":"IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems","volume":"44 7","pages":"2697-2710"},"PeriodicalIF":2.7,"publicationDate":"2024-12-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144322992","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"APPLE-DSE: Asynchronous Parallel Pareto Set Learning for Microarchitecture Design Space Exploration","authors":"Xuyang Zhao;Tianning Gao;Zheng Wu;Zhaori Bi;Changhao Yan;Fan Yang;Sheng-Guo Wang;Dian Zhou;Xuan Zeng","doi":"10.1109/TCAD.2024.3522880","DOIUrl":"https://doi.org/10.1109/TCAD.2024.3522880","url":null,"abstract":"The synthesizable and parameterizable RISC-V microarchitecture, combined with multiobjective optimization-based design space exploration (DSE), facilitates agile adaptation to various microprocessor designs for customized applications. However, to enhance design quality, DSE must consider both architecture parameters and EDA tool parameters, resulting in exponentially increased optimization complexity with the dimensionality of parameters. Exhaustively exploring the whole design space is impossible. Additionally, due to the time-consuming nature of microprocessor simulation, minimizing the number of simulations is imperative. Addressing these challenges, we propose asynchronous parallel Pareto set learning for microarchitecture DSE (APPLE-DSE). APPLE-DSE utilizes the Pareto set learning (PSL) technique to obtain an approximate Pareto front with a “light-weight” evaluation. PSL captures the structural characteristics of the Pareto set (PS) guided by the surrogate models, enabling it to explore any tradeoff area in the approximate PS. Employing the probabilistic reparameterization (PR) technique, APPLE-DSE adapts PSL to handle discrete variables. Furthermore, APPLE-DSE incorporates a simulation time-aware asynchronous parallel scheduling strategy to further enhance optimization efficiency. Experimental results show that APPLE-DSE achieves a maximum improvement of 16.81% in hypervolume within the same time budget and a <inline-formula> <tex-math>$127.73times $ </tex-math></inline-formula> speedup in algorithm run time per iteration compared to state-of-the-art methods.","PeriodicalId":13251,"journal":{"name":"IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems","volume":"44 7","pages":"2765-2778"},"PeriodicalIF":2.7,"publicationDate":"2024-12-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144322993","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Seunghyun Hwang;Michael Joseph Smith;Vinicius C. Do Nascimento;Qiang Qiu;Cheng-Kok Koh;Ganesh Subbarayan;Dan Jiao
{"title":"Real-Time 3-D Thermal Simulation of Advanced Packages via Generative Adversarial Networks","authors":"Seunghyun Hwang;Michael Joseph Smith;Vinicius C. Do Nascimento;Qiang Qiu;Cheng-Kok Koh;Ganesh Subbarayan;Dan Jiao","doi":"10.1109/TCAD.2024.3522878","DOIUrl":"https://doi.org/10.1109/TCAD.2024.3522878","url":null,"abstract":"Thermal optimization plays a crucial role in the design of advanced systems in package. Due to the large number of thermal simulations needed for full design space exploration, reductions in simulation run-time are critical. Here, we propose a data-driven approach to physics simulation by using neural networks (NNs) to cast the temperature solution process into an image-to-image translation problem. We first model the power generation map, conductivity map, and boundary conditions (BCs) into separate channels of an image. We then generate temperature solutions by training a generative adversarial network, composed of a U-Net shaped generator and a discriminator. The resultant NN model can handle diverse thermal simulation scenarios with accuracy. More importantly, our model can handle BCs, power maps, and physical package designs which are unseen during the training. Experiments show that speed wise, it enables near real-time design, providing a <inline-formula> <tex-math>$2581times $ </tex-math></inline-formula> and <inline-formula> <tex-math>$9171times $ </tex-math></inline-formula> speedup over a custom sparse matrix optimized finite element method and ABAQUS, respectively. Comparisons with state-of-the-art methods have demonstrated the accuracy, efficiency, and versatility of the proposed work.","PeriodicalId":13251,"journal":{"name":"IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems","volume":"44 7","pages":"2439-2450"},"PeriodicalIF":2.7,"publicationDate":"2024-12-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144322988","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"SPLIM: Bridging the Gap Between Unstructured SpGEMM and Structured In-Situ Computing","authors":"Huize Li;Dan Chen;Tulika Mitra","doi":"10.1109/TCAD.2024.3522882","DOIUrl":"https://doi.org/10.1109/TCAD.2024.3522882","url":null,"abstract":"Sparse matrix-matrix multiplication (SpGEMM) is a critical kernel widely employed in machine learning and graph algorithms. However, high sparsity of real-world matrices makes SpGEMM memory-intensive. In-situ computing offers the potential to accelerate memory-intensive applications through high bandwidth and parallelism. Nevertheless, the irregular distribution of nonzeros renders software SpGEMM computation unstructured. In contrast, in-situ hardware platforms follow a fixed computation pattern, making them structured. The mismatch between unstructured software and structured hardware leads to suboptimal performance of current solutions. In this article, we propose SPLIM, a novel in-situ computing SpGEMM accelerator. SPLIM involves two innovations. First, we present a novel computation paradigm that converts SpGEMM into structured in-situ multiplication and unstructured accumulation. Second, we develop a unique coordinates alignment method utilizing in-situ search operations, effectively transforming unstructured accumulation into highly parallel search operations. Our experimental results demonstrate that SPLIM achieves <inline-formula> <tex-math>$276times $ </tex-math></inline-formula> performance improvement and <inline-formula> <tex-math>$687times $ </tex-math></inline-formula> energy saving compared to NVIDIA RTX A6000 GPU.","PeriodicalId":13251,"journal":{"name":"IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems","volume":"44 6","pages":"2412-2423"},"PeriodicalIF":2.7,"publicationDate":"2024-12-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144100060","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"An Architecture-Level CPU Modeling Framework for Power and Other Design Qualities","authors":"Qijun Zhang;Mengming Li;Andrea Mondelli;Zhiyao Xie","doi":"10.1109/TCAD.2024.3522877","DOIUrl":"https://doi.org/10.1109/TCAD.2024.3522877","url":null,"abstract":"Power efficiency is a critical design objective in modern microprocessor design. To evaluate the impact of architectural-level design decisions, an accurate yet efficient architecture-level power model is desired. However, widely adopted analytical power models like McPAT and Wattch have been criticized for their unreliable accuracy, while machine learning (ML) methods like McPAT-Calib rely on sufficient known designs for training and perform poorly when available designs are limited, which is the case in realistic scenarios. In this work, we propose PANDA, an innovative architecture-level solution that combines the advantages of analytical and ML power models. It achieves unprecedented high accuracy on unknown new designs even when there are very limited designs for training. Besides being an excellent average power model, we also extend PANDA to support the time-based power trace prediction, which can enable the analysis of peak power, power fluctuations, and voltage fluctuation. This is highly challenging at the architecture level. Other qualities, such as area, performance, and energy accurately, can also be supported. In addition to single design quality, PANDA can model the tradeoffs among different design qualities, such as the tradeoff between power and timing, by predicting the Pareto-optimal curve. Finally, PANDA can further support power prediction for unknown new technology nodes. Our experiment shows that, for average power prediction, our method can achieve high accuracy with a correlation coefficient R of 0.99 and mean absolute percentage error (MAPE) of 7.91% even when only one configuration is known, outperforming McPAT-Calib which has R of -0.24 and MAPE of 35.96%. For time-based power trace prediction, our method can achieve a low MAPE of 4.34%, outperforming the state-of-the-art method Powertrain which has a MAPE of 53.8%.","PeriodicalId":13251,"journal":{"name":"IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems","volume":"44 7","pages":"2751-2764"},"PeriodicalIF":2.7,"publicationDate":"2024-12-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144323109","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Yunping Zhao;Sheng Ma;Yuhua Tang;Hengzhu Liu;Dongsheng Li
{"title":"Tradeoff Performance and Energy Efficiency by Optimizing the Data Flow for PIM Architectures","authors":"Yunping Zhao;Sheng Ma;Yuhua Tang;Hengzhu Liu;Dongsheng Li","doi":"10.1109/TCAD.2024.3522879","DOIUrl":"https://doi.org/10.1109/TCAD.2024.3522879","url":null,"abstract":"The processing-in-memory (PIM) architecture becomes a promising candidate for deep learning accelerators by integrating computation and memory. Most PIM-based studies improve the performance and energy efficiency by using the weight stationary (WS) data flow due to its high parallelism. However, the WS data flow has some fundamental limitations. First, the WS data flow has huge activation movements between on-chip memory and off-chip memory due to the limited memory space of the resistive random-access memory (ReRAM) array. Second, the WS data flow needs to read the input activation repeatedly according to the convolution window. These data movements decrease the energy efficiency and performance of the PIM architecture. To address these issues, the input stationary (IS) data flow stores activations instead of weights to reduce data movements. But the IS data flow faces some challenges. First, the data dependency between adjacent layers limits the performance. Second, there are huge across-array computations due to the special mapping method. Third, the previous IS data flow cannot realize the high parallelism. Fourth, the IS data flow depends on the 3-D ReRAM structure. To address these issues, we propose a novel data flow for PIM architectures. We optimize the IS data flow to decrease the activation movement and propose a parallel computing method to realize high parallelism and reduce the across-array computations. We identify and analyze the fundamental limitations and impact of different interlayer data flows, including the WS-WS, IS-IS, WS-IS, and IS-WS. We also propose a method to build a hybrid data flow by combining these interlayer data flows to tradeoff performance and energy consumption. Our experimental results and analysis demonstrate the potential of our design. The performance and energy efficiency of our design reach 0.13–1.77 TFLOPS and 61–85 TOPS/J, respectively. Compared to the state-of-the-art design, the NEBULA, our design can improve performance by <inline-formula> <tex-math>$1.4times $ </tex-math></inline-formula>, <inline-formula> <tex-math>$2.3times $ </tex-math></inline-formula>, and <inline-formula> <tex-math>$3.5times $ </tex-math></inline-formula> for deploying the MobileNet-V1, ResNet-18, and VGG-16, and also can improve energy efficiency by <inline-formula> <tex-math>$3.3times $ </tex-math></inline-formula>, <inline-formula> <tex-math>$2times $ </tex-math></inline-formula>, and <inline-formula> <tex-math>$2times $ </tex-math></inline-formula>, respectively.","PeriodicalId":13251,"journal":{"name":"IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems","volume":"44 7","pages":"2530-2543"},"PeriodicalIF":2.7,"publicationDate":"2024-12-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=10816195","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144323096","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems society information","authors":"","doi":"10.1109/TCAD.2024.3513474","DOIUrl":"https://doi.org/10.1109/TCAD.2024.3513474","url":null,"abstract":"","PeriodicalId":13251,"journal":{"name":"IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems","volume":"44 1","pages":"C2-C2"},"PeriodicalIF":2.7,"publicationDate":"2024-12-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=10814109","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142880311","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems publication information","authors":"","doi":"10.1109/TCAD.2024.3513476","DOIUrl":"https://doi.org/10.1109/TCAD.2024.3513476","url":null,"abstract":"","PeriodicalId":13251,"journal":{"name":"IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems","volume":"44 1","pages":"C3-C3"},"PeriodicalIF":2.7,"publicationDate":"2024-12-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=10814919","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142880309","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"iSAFE: Enabling Evenness of Data Freshness in Multipriority Networked Intermittent Systems","authors":"Wen Sheng Lim;Yu-Cheng Chen;Yu-Hsuan Chu;Chia-Heng Tu;Yuan-Hao Chang","doi":"10.1109/TCAD.2024.3522211","DOIUrl":"https://doi.org/10.1109/TCAD.2024.3522211","url":null,"abstract":"Environmental monitoring applications use energy harvesting to cover wide-range deployment, where devices are powered by ambient energy and operate intermittently when energy is sufficient. In such an intermittent networked system (NIS), a sink node is used to forward the environmental data collected by sensors to a central controller to reflect the physical environment status. Nevertheless, existing data forwarding algorithms for NISs cannot fulfill modern application requirements, where multiple types of data with different timeliness requirements (i.e., multipriorities) are desired to report real-time environmental data for monitoring critical situations. Without considering the multipriorities, we show in this article that it introduces a new problem: unevenness of data freshness. We then propose the sink node-based evenness-aware update forwarding (iSAFE) algorithm to provide evenness among different priorities of data sources in NISs. iSAFE consists of three important components: 1) a theoretical analysis to derive the optimal data forwarding interval between two adjacent status updates from the sensor; 2) an evenness-aware forwarding algorithm to adaptively adjust the forwarding interval based on the runtime status; and 3) a fresh-aware energy preservation algorithm to maintain the freshness of collected data. The experimental results show that iSAFE can achieve up to 682% evenness (94.47% close to the ideal) and 53.3% data freshness compared to the state of the art while being energy-efficient and scalable, suitable for modern applications.","PeriodicalId":13251,"journal":{"name":"IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems","volume":"44 6","pages":"2093-2104"},"PeriodicalIF":2.7,"publicationDate":"2024-12-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144108346","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Equivalent Thermal Conductance Network (ETCN) Model for Domain Decomposition and Efficient Thermal Simulation of GaN HEMTs","authors":"Shunxiang Lan;Min Tang","doi":"10.1109/TCAD.2024.3521324","DOIUrl":"https://doi.org/10.1109/TCAD.2024.3521324","url":null,"abstract":"Accurate and efficient simulation is essential for the thermal management of gallium nitride (GaN) high-electron-mobility transistors (HEMTs). However, conventional numerical approaches are usually time-consuming when dealing with transient thermal simulations with temperature-dependent parameters. To conquer this problem, we present a novel method based on the equivalent thermal conductance network (ETCN) model for efficient thermal simulation of GaN HEMTs. First, according to the temperature-dependent characteristics of the materials of the device, the entire structure is divided into region of variation (ROV) and region of fixity (ROF). Then, we decompose the transient response of ROF into a zero-input (ZI) and a zero-state (ZS) response based on the intrinsic property of linear time-invariant systems. After that, a novel ETCN model is developed for efficient transient simulation of GaN HEMTs. The principle of the ETCN model is to transform the impacts of the ROF on the ROV in the form of the equivalent thermal boundary conditions. By this means, we only need to focus on the ROV in the nonlinear iteration, enabling a significant reduction of degrees of freedom in solving the nonlinear equation and thus significantly improving the computational efficiency. In addition, the ETCN model is also available to handle the steady-state thermal problems. Several numerical examples are provided to validate the accuracy and efficiency of the proposed method. Compared with the conventional finite volume method, a speed-up of 105x is achieved by the ETCN model in simulating a typical multifinger GaN HEMT with microchannel cooling.","PeriodicalId":13251,"journal":{"name":"IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems","volume":"44 6","pages":"2343-2352"},"PeriodicalIF":2.7,"publicationDate":"2024-12-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144100030","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}