Anurup Saha, C. Amarnath, Kwondo Ma, Abhijit Chatterjee
{"title":"Signature Driven Post-Manufacture Testing and Tuning of RRAM Spiking Neural Networks for Yield Recovery","authors":"Anurup Saha, C. Amarnath, Kwondo Ma, Abhijit Chatterjee","doi":"10.1109/ASP-DAC58780.2024.10473874","DOIUrl":"https://doi.org/10.1109/ASP-DAC58780.2024.10473874","url":null,"abstract":"Resistive random access Memory (RRAM) based spiking neural networks (SNN) are becoming increasingly attractive for pervasive energy-efficient classification tasks. However, such networks suffer from degradation of performance (as determined by classification accuracy) due to the effects of process variations on fabricated RRAM devices resulting in loss of manufacturing yield. To address such yield loss, a two-step approach is developed. First, an alternative test framework is used to predict the performance of fabricated RRAM based SNNs using the SNN response to a small subset of images from the test image dataset, called the SNN response signature (to minimize test cost). This diagnoses those SNNs that need to be performance-tuned for yield recovery. Next, SNN tuning is performed by modulating the spiking thresholds of the SNN neurons on a layer-by-layer basis using a trained regressor that maps the SNN response signature to the optimal spiking threshold values during tuning. The optimal spiking threshold values are determined by an off-line optimization algorithm. Experiments show that the proposed framework can reduce the number of out-of-spec SNN devices by up to 54% and improve yield by as much as 8.6%.","PeriodicalId":518586,"journal":{"name":"2024 29th Asia and South Pacific Design Automation Conference (ASP-DAC)","volume":"30 3","pages":"740-745"},"PeriodicalIF":0.0,"publicationDate":"2024-01-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140531341","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Guoqing He, Wenjie Ding, Yuyang Ye, Xu Cheng, Qianqian Song, Peng Cao
{"title":"An Optimization-aware Pre-Routing Timing Prediction Framework Based on Heterogeneous Graph Learning","authors":"Guoqing He, Wenjie Ding, Yuyang Ye, Xu Cheng, Qianqian Song, Peng Cao","doi":"10.1109/ASP-DAC58780.2024.10473937","DOIUrl":"https://doi.org/10.1109/ASP-DAC58780.2024.10473937","url":null,"abstract":"Accurate and efficient pre-routing timing estimation is particularly crucial in timing-driven placement, as design iterations caused by timing divergence are time-consuming. However, existing machine learning prediction models overlook the impact of timing optimization techniques during routing stage, such as adjusting gate sizes or swapping threshold voltage types to fix routing-induced timing violations. In this work, an optimization-aware pre-routing timing prediction framework based on heterogeneous graph learning is proposed to calibrate the timing changes introduced by wire parasitic and optimization techniques. The path embedding generated by the proposed framework fuses learned local information from graph neural network and global information from transformer network to perform accurate endpoint arrival time prediction. Experimental results demonstrate that the proposed framework achieves an average accuracy improvement of 0.10 in terms of R2 score on testing designs and brings average runtime acceleration of three orders of magnitude compared with the design flow.","PeriodicalId":518586,"journal":{"name":"2024 29th Asia and South Pacific Design Automation Conference (ASP-DAC)","volume":"268 8","pages":"177-182"},"PeriodicalIF":0.0,"publicationDate":"2024-01-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140531014","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Efficient Sublogic-Cone-Based Switching Activity Estimation using Correlation Factor","authors":"Kexin Zhu, Runjie Zhang, Qing He","doi":"10.1109/ASP-DAC58780.2024.10473841","DOIUrl":"https://doi.org/10.1109/ASP-DAC58780.2024.10473841","url":null,"abstract":"Switching activity is one of the key factors that determine digital circuits’ power consumption. While gate-level simulations are too slow to support the average power analysis of modern designs blocks (e.g., millions or even billions of gates) over a longer period of time (e.g., millions of cycles), probabilistic methods provide a solution by using RTL simulation results and propagating the switching activity through the combinational logic. This work presents a sublogic-cone-based, probabilistic method for switching activity propagation in combinational logic circuits. We divide the switching activity estimation problem into two parts: incremental propagation (across the entire circuit) and accurate calculation (within the sublogic cones). To construct the sublogic cones, we first introduce a new metric called correlation factor to quantify the impact induced by the correlations between signal nets; then we develop an efficient algorithm that uses the calculated correlation factor to guide the construction of sublogic cones. The experimental results show that our method produces 73.2% more accurate switching activity estimation results compared with the state-of-the-art method, and achieves a 19X speedup at the meantime.","PeriodicalId":518586,"journal":{"name":"2024 29th Asia and South Pacific Design Automation Conference (ASP-DAC)","volume":"246 2","pages":"638-643"},"PeriodicalIF":0.0,"publicationDate":"2024-01-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140531159","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"KalmanHD: Robust On-Device Time Series Forecasting with Hyperdimensional Computing","authors":"Ivannia Gomez Moreno, Xiaofan Yu, Tajana Rosing","doi":"10.1109/ASP-DAC58780.2024.10473878","DOIUrl":"https://doi.org/10.1109/ASP-DAC58780.2024.10473878","url":null,"abstract":"Time series forecasting is shifting towards Edge AI, where models are trained and executed on edge devices instead of in the cloud. However, training forecasting models at the edge faces two challenges concurrently: (1) dealing with streaming data containing abundant noise, which can lead to degradation in model predictions, and (2) coping with limited on-device resources. Traditional approaches focus on simple statistical methods like ARIMA or neural networks, which are either not robust to sensor noise or not efficient for edge deployment, or both. In this paper, we propose a novel, robust, and lightweight method named KalmanHD for on-device time series forecasting using Hyperdimensional Computing (HDC). KalmanHD integrates Kalman Filter (KF) with HDC, resulting in a new regression method that combines the robustness of KF towards sensor noise and the efficiency of HDC. KalmanHD first encodes the past values into a high-dimensional vector representation, then applies the Expectation-Maximization (EM) approach as in KF to iteratively update the model based on the incoming samples. KalmanHD inherently considers the variability of each sample and thereby enhances robustness. We further accelerate KalmanHD by substituting the expensive matrix multiplication with efficient binary operations between the covariance and the encoded values. Our results show that KalmanHD achieves MAE comparable to the state-of-the-art noise-optimized NN-based methods while running $3.6-8.6times$ faster on typical edge platforms. The source code is available at https://github.com/DarthIV02/Ka1manHD","PeriodicalId":518586,"journal":{"name":"2024 29th Asia and South Pacific Design Automation Conference (ASP-DAC)","volume":"225 2","pages":"710-715"},"PeriodicalIF":0.0,"publicationDate":"2024-01-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140531161","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"WER: Maximizing Parallelism of Irregular Graph Applications Through GPU Warp EqualizeR","authors":"En-Ming Huang, Bo Wun Cheng, Meng-Hsien Lin, Chun-Yi Lee, Tsung-Tai Yeh","doi":"10.1109/ASP-DAC58780.2024.10473955","DOIUrl":"https://doi.org/10.1109/ASP-DAC58780.2024.10473955","url":null,"abstract":"Irregular graphs are becoming increasingly prevalent across a broad spectrum of data analysis applications. Despite their versatility, the inherent complexity and irregularity of these graphs often result in the underutilization of Single Instruction, Multiple Data (SIMD) resources when processed on Graphics Processing Units (GPUs). This underutilization originates from two primary issues: the occurrence of inactive threads and intra-warp load imbalances. These issues can produce idle threads, lead to inefficient usage of SIMD resources, consequently hamper throughput, and increase program execution time. To address these challenges, we introduce Warp EqualizeR (WER), a framework designed to optimize the utilization of SIMD resources on a GPU for processing irregular graphs. WER employs both software API and a specifically-tailored hardware microarchitecture. Such a synergistic approach enables workload redistribution in irregular graphs, which allows WER to enhance SIMD lane utilization and further harness the SIMD resources within a GPU. Our experimental results over seven different graph applications indicate that WER yields a geometric mean speedup of $2.52 times$ and $1.47 times$ over the baseline GPU and existing state-of-the-art methodologies, respectively.","PeriodicalId":518586,"journal":{"name":"2024 29th Asia and South Pacific Design Automation Conference (ASP-DAC)","volume":"3 6","pages":"201-206"},"PeriodicalIF":0.0,"publicationDate":"2024-01-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140531351","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Cheng-Yang Chang, Chi-Tse Huang, Yu-Chuan Chuang, Kuang-Chao Chou, A. Wu
{"title":"BFP-CIM: Data-Free Quantization with Dynamic Block-Floating-Point Arithmetic for Energy-Efficient Computing-In-Memory-based Accelerator","authors":"Cheng-Yang Chang, Chi-Tse Huang, Yu-Chuan Chuang, Kuang-Chao Chou, A. Wu","doi":"10.1109/ASP-DAC58780.2024.10473797","DOIUrl":"https://doi.org/10.1109/ASP-DAC58780.2024.10473797","url":null,"abstract":"Convolutional neural networks (CNNs) are known for their exceptional performance in various applications; however, their energy consumption during inference can be substantial. Analog Computing-In-Memory (CIM) has shown promise in enhancing the energy efficiency of CNNs, but the use of analog-to-digital converters (ADCs) remains a challenge. ADCs convert analog partial sums from CIM crossbar arrays to digital values, with high-precision ADCs accounting for over 60% of the system’s energy. Researchers have explored quantizing CNNs to use low-precision ADCs to tackle this issue, trading off accuracy for efficiency. However, these methods necessitate data-dependent adjustments to minimize accuracy loss. Instead, we observe that the first most significant toggled bit indicates the optimal quantization range for each input value. Accordingly, we propose a range-aware rounding (RAR) for runtime bit-width adjustment, eliminating the need for pre-deployment efforts. RAR can be easily integrated into a CIM accelerator using dynamic block-floating-point arithmetic. Experimental results show that our methods maintain accuracy while achieving up to 1.81 × and 2.08 × energy efficiency improvements on CIFAR-10 and ImageNet datasets, respectively, compared with state-of-the-art techniques.","PeriodicalId":518586,"journal":{"name":"2024 29th Asia and South Pacific Design Automation Conference (ASP-DAC)","volume":"39 5-6","pages":"545-550"},"PeriodicalIF":0.0,"publicationDate":"2024-01-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140530936","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"JustQ: Automated Deployment of Fair and Accurate Quantum Neural Networks","authors":"Ruhan Wang, Fahiz Baba-Yara, Fan Chen","doi":"10.1109/ASP-DAC58780.2024.10473829","DOIUrl":"https://doi.org/10.1109/ASP-DAC58780.2024.10473829","url":null,"abstract":"Despite the success of Quantum Neural Networks (QNNs) in decision-making systems, their fairness remains unexplored, as the focus primarily lies on accuracy. This work conducts a design space exploration, unveiling QNN unfairness, and highlighting the significant influence of QNN deployment and quantum noise on accuracy and fairness. To effectively navigate the vast QNN deployment design space, we propose JustQ, a framework for deploying fair and accurate QNNs on NISQ computers. It includes a complete NISQ error model, reinforcement learning-based deployment, and a flexible optimization objective incorporating both fairness and accuracy. Experimental results show JustQ outperforms previous methods, achieving superior accuracy and fairness. This work pioneers fair QNN design on NISQ computers, paving the way for future investigations.","PeriodicalId":518586,"journal":{"name":"2024 29th Asia and South Pacific Design Automation Conference (ASP-DAC)","volume":"168 1","pages":"121-126"},"PeriodicalIF":0.0,"publicationDate":"2024-01-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140531167","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Supriyo Maji, A. Budak, Souradip Poddar, David Z. Pan
{"title":"Toward End-to-End Analog Design Automation with ML and Data-Driven Approaches (Invited Paper)","authors":"Supriyo Maji, A. Budak, Souradip Poddar, David Z. Pan","doi":"10.1109/ASP-DAC58780.2024.10473840","DOIUrl":"https://doi.org/10.1109/ASP-DAC58780.2024.10473840","url":null,"abstract":"Designing analog circuits poses significant challenges due to their knowledge-intensive nature and the diverse range of requirements. There has been limited success in achieving a fully automated framework for designing analog circuits. However, the advent of advanced machine learning algorithms is invigorating design automation efforts by enabling tools to replicate the techniques employed by experienced designers. In this paper, we aim to provide an overview of the recent progress in ML-driven analog circuit sizing and layout automation tool developments. In advanced technology nodes, layout effects must be considered during circuit sizing to avoid costly rerun of the flow. We will discuss the latest research in layout-aware sizing. In the end-to-end analog design automation flow, topology selection plays an important role, as the final performance depends on the choice of topology. We will discuss recent developments in ML-driven topology selection before delving into our vision of an end-to-end data-driven framework that leverages ML techniques to facilitate the selection of optimal topology from a library of topologies.","PeriodicalId":518586,"journal":{"name":"2024 29th Asia and South Pacific Design Automation Conference (ASP-DAC)","volume":"13 9","pages":"657-664"},"PeriodicalIF":0.0,"publicationDate":"2024-01-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140530656","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Raphael Cardoso, Clément Zrounba, M.F. Abdalla, Paul Jiménez, Mauricio Gomes de Queiroz, B. Charbonnier, Fabio Pavanello, Ian O'Connor, S. L. Beux
{"title":"Signed Convolution in Photonics with Phase-Change Materials using Mixed-Polarity Bitstreams","authors":"Raphael Cardoso, Clément Zrounba, M.F. Abdalla, Paul Jiménez, Mauricio Gomes de Queiroz, B. Charbonnier, Fabio Pavanello, Ian O'Connor, S. L. Beux","doi":"10.1109/ASP-DAC58780.2024.10473952","DOIUrl":"https://doi.org/10.1109/ASP-DAC58780.2024.10473952","url":null,"abstract":"As AI continues to grow in importance, in order to reduce its carbon footprint and utilization of computer resources, numerous alternatives are under investigation to improve its hardware building blocks. In particular, in convolutional neural networks (CNNs), the convolution function represents the most important operation and one of the best targets for optimization. A new approach to convolution had recently emerged using optics, phase-change materials (PCMs) and stochastic computing, but is thus far limited to unsigned operands. In this paper, we propose an extension in which the convolutional kernels are signed, using mixed-polarity bitstreams. We present a proof of validity for our method, while also showing that, in simulation and under similar operating conditions, our approach is less affected by noise than the common approach in the literature.","PeriodicalId":518586,"journal":{"name":"2024 29th Asia and South Pacific Design Automation Conference (ASP-DAC)","volume":"253 11","pages":"854-859"},"PeriodicalIF":0.0,"publicationDate":"2024-01-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140531029","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Zhiteng Chao, Xindi Zhang, Junying Huang, Jing Ye, Shaowei Cai, Huawei Li, Xiaowei Li
{"title":"A Fast Test Compaction Method for Commercial DFT Flow Using Dedicated Pure-MaxSAT Solver","authors":"Zhiteng Chao, Xindi Zhang, Junying Huang, Jing Ye, Shaowei Cai, Huawei Li, Xiaowei Li","doi":"10.1109/ASP-DAC58780.2024.10473833","DOIUrl":"https://doi.org/10.1109/ASP-DAC58780.2024.10473833","url":null,"abstract":"Minimizing the testing cost is crucial in the context of the design for test (DFT) flow. In our observation, the test patterns generated by commercial ATPG tools in test compression mode still contain redundancy. To tackle this obstacle, we propose a post-flow static test compaction method that utilizes a partial fault dictionary instead of a full fault dictionary, and leverages a dedicated Pure-MaxSAT solver to re-compact the test patterns generated by commercial ATPG tools. We also observe that commercial ATPG tools offer a more comprehensive selection of candidate patterns for compaction in the “n-detect” mode, leading to superior compaction efficacy. In experiments on ISCAS89, ITC99, and open-source RISC-V CPU benchmarks, our method achieves an average reduction of 21.58% and a maximum of 29.93% in test cycles evaluated by commercial tools while maintaining fault coverage. Furthermore, our approach demonstrates improved performance compared with existing methods.","PeriodicalId":518586,"journal":{"name":"2024 29th Asia and South Pacific Design Automation Conference (ASP-DAC)","volume":"280 6","pages":"503-508"},"PeriodicalIF":0.0,"publicationDate":"2024-01-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140530979","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}