2022 IEEE/ACM International Conference On Computer Aided Design (ICCAD)最新文献_第10页

Qilin: Enabling Performance Analysis and Optimization of Shared-Virtual Memory Systems with FPGA Accelerators 齐林:基于FPGA加速器的共享虚拟内存系统性能分析与优化

2022 IEEE/ACM International Conference On Computer Aided Design (ICCAD) Pub Date : 2022-10-29 DOI: 10.1145/3508352.3549431

Edward Richter, Deming Chen

{"title":"Qilin: Enabling Performance Analysis and Optimization of Shared-Virtual Memory Systems with FPGA Accelerators","authors":"Edward Richter, Deming Chen","doi":"10.1145/3508352.3549431","DOIUrl":"https://doi.org/10.1145/3508352.3549431","url":null,"abstract":"While the tight integration of components in heterogeneous systems has increased the popularity of the Shared-Virtual Memory (SVM) system programming model, the overhead of SVM can significantly impact end-to-end application performance. However, studying SVM implementations is difficult, as there is no open and flexible system to explore trade-offs between different SVM implementations and the SVM design space is not clearly defined. To this end, we present Qilin, the first open-source system which enables thorough study of SVM in heterogeneous computing environments for discrete accelerators. Qilin is a transparent and flexible system built on top of an open-source FPGA shell, which allows researchers to alter components of the underlying SVM implementation to understand how SVM design decisions impact performance. Using Qilin, we perform an extensive quantitative analysis on the over-heads of three SVM architectures, and generate several insights which highlight the cost and benefits of each architecture. From these insights, we propose a flowchart of how to choose the best SVM implementation given the application characteristics and the SVM capabilities of the system. Qilin also provides application developers a flexible SVM shell for high-performance virtualized applications. Optimizations enabled by Qilin can reduce the latency of translations by 6.86x compared to an open-source FPGA shell.","PeriodicalId":270592,"journal":{"name":"2022 IEEE/ACM International Conference On Computer Aided Design (ICCAD)","volume":"55 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-10-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126668122","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

False Data Injection Attacks on Sensor Systems 传感器系统的虚假数据注入攻击

2022 IEEE/ACM International Conference On Computer Aided Design (ICCAD) Pub Date : 2022-10-29 DOI: 10.1145/3508352.3561098

D. Serpanos

引用次数: 0

2022 CAD Contest Problem A: Learning Arithmetic Operations from Gate-Level Circuit 2022年CAD竞赛题目A:从门级电路学习算术运算

2022 IEEE/ACM International Conference On Computer Aided Design (ICCAD) Pub Date : 2022-10-29 DOI: 10.1145/3508352.3561107

Chung-Han Chou, Chih-Jen Hsu, Chi-An Wu, Kuan-Hua Tu

引用次数: 2

Squeezing Accumulators in Binary Neural Networks for Extremely Resource-Constrained Applications 资源极度受限的二元神经网络压缩累加器

2022 IEEE/ACM International Conference On Computer Aided Design (ICCAD) Pub Date : 2022-10-29 DOI: 10.1145/3508352.3549418

Azat Azamat, Jaewoo Park, Jongeun Lee

{"title":"Squeezing Accumulators in Binary Neural Networks for Extremely Resource-Constrained Applications","authors":"Azat Azamat, Jaewoo Park, Jongeun Lee","doi":"10.1145/3508352.3549418","DOIUrl":"https://doi.org/10.1145/3508352.3549418","url":null,"abstract":"The cost and power consumption of BNN (Binarized Neural Network) hardware is dominated by additions. In particular, accumulators account for a large fraction of hardware overhead, which could be effectively reduced by using reduced-width accumulators. However, it is not straightforward to find the optimal accumulator width due to the complex interplay between width, scale, and the effect of training. In this paper we present algorithmic and hardware-level methods to find the optimal accumulator size for BNN hardware with minimal impact on the quality of result. First, we present partial sum scaling, a top-down approach to minimize the BNN accumulator size based on advanced quantization techniques. We also present an efficient, zero-overhead hardware design for partial sum scaling. Second, we evaluate a bottom-up approach that is to use saturating accumulator, which is more robust against overflows. Our experimental results using CIFAR-10 dataset demonstrate that our partial sum scaling along with our optimized accumulator architecture can reduce the area and power consumption of datapath by 15.50% and 27.03%, respectively, with little impact on inference performance (less than 2%), compared to using 16-bit accumulator.","PeriodicalId":270592,"journal":{"name":"2022 IEEE/ACM International Conference On Computer Aided Design (ICCAD)","volume":"62 2","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-10-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114034486","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Speculative Load Forwarding Attack on Modern Processors 现代处理器的推测负载转发攻击

2022 IEEE/ACM International Conference On Computer Aided Design (ICCAD) Pub Date : 2022-10-29 DOI: 10.1145/3508352.3549417

Hasini Witharana, P. Mishra

引用次数: 3

How Good Is Your Verilog RTL Code? A Quick Answer from Machine Learning 你的Verilog RTL代码有多好?来自机器学习的快速回答

2022 IEEE/ACM International Conference On Computer Aided Design (ICCAD) Pub Date : 2022-10-29 DOI: 10.1145/3508352.3549375

Prianka Sengupta, Aakash Tyagi, Yiran Chen, Jiangkun Hu

{"title":"How Good Is Your Verilog RTL Code? A Quick Answer from Machine Learning","authors":"Prianka Sengupta, Aakash Tyagi, Yiran Chen, Jiangkun Hu","doi":"10.1145/3508352.3549375","DOIUrl":"https://doi.org/10.1145/3508352.3549375","url":null,"abstract":"Hardware Description Language (HDL) is a common entry point for designing digital circuits. Differences in HDL coding styles and design choices may lead to considerably different design quality and performance-power tradeoff. In general, the impact of HDL coding is not clear until logic synthesis or even layout is completed. However, running synthesis merely as a feedback for HDL code is computationally not economical especially in early design phases when the code needs to be frequently modified. Furthermore, in late stages of design convergence burdened with high-impact engineering change orders (ECO’s), design iterations become prohibitively expensive. To this end, we propose a machine learning approach to Verilog-based Register-Transfer Level (RTL) design assessment without going through the synthesis process. It would allow designers to quickly evaluate the performance-power tradeoff among different options of RTL designs. Experimental results show that our proposed technique achieves an average of 95% prediction accuracy in terms of post-placement analysis, and is 6 orders of magnitude faster than evaluation by running logic synthesis and placement.","PeriodicalId":270592,"journal":{"name":"2022 IEEE/ACM International Conference On Computer Aided Design (ICCAD)","volume":"549 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-10-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116559244","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 6

Hidden-ROM: A Compute-in-ROM Architecture to Deploy Large-Scale Neural Networks on Chip with Flexible and Scalable Post-Fabrication Task Transfer Capability 隐藏rom:一种具有灵活和可扩展的制造后任务传输能力的芯片上大规模神经网络的计算-in- rom架构

2022 IEEE/ACM International Conference On Computer Aided Design (ICCAD) Pub Date : 2022-10-29 DOI: 10.1145/3508352.3549335

Yiming Chen, Guodong Yin, Ming-En Lee, Wenjun Tang, Zekun Yang, Yongpan Liu, Huazhong Yang, Xueqing Li

{"title":"Hidden-ROM: A Compute-in-ROM Architecture to Deploy Large-Scale Neural Networks on Chip with Flexible and Scalable Post-Fabrication Task Transfer Capability","authors":"Yiming Chen, Guodong Yin, Ming-En Lee, Wenjun Tang, Zekun Yang, Yongpan Liu, Huazhong Yang, Xueqing Li","doi":"10.1145/3508352.3549335","DOIUrl":"https://doi.org/10.1145/3508352.3549335","url":null,"abstract":"Motivated by reducing the data transfer activities in dataintensive neural network computing, SRAM-based compute-inmemory (CiM) has made significant progress. Unfortunately, SRAM has low density and limited on-chip capacity. This makes the deployment of large models inefficient due to the frequent DRAM access to update the weight in SRAM. Recently, a ROM-based CiM design, YOLoC, reveals the unique opportunity of deploying a large-scale neural network in CMOS by exploring the intriguing high density of ROM. However, even though assisting SRAM has been adopted in YOLoC for task transfer within the same domain, it is still a big challenge to overcome the read-only limitation in ROM and enable more flexibility. Therefore, it is of paramount significance to develop new ROM-based CiM architectures and provide broader task space and model expansion capability for more complex tasks.This paper presents Hidden-ROM for high flexibility of ROM-based CiM. Hidden-ROM provides several novel ideas beyond YOLoC. First, it adopts a one-SRAM-many-ROM method that \"hides\" ROM cells to support various datasets of different domains, including CIFAR10/100, FER2013, and ImageNet. Second, HiddenROM provides the model expansion capability after chip fabrication to update the model for more complex tasks when needed. Experiments show that Hidden-ROM designed for ResNet-18 pretrained on CIFAR100 (item classification) can achieve <0.5% accuracy loss in FER2013 (facial expression recognition), while YOLoC degrades by >40%. After expanding to ResNet-50/101, Hidden-ROM even achieves 68.6%/72.3% accuracy in ImageNet, close to 74.9%/76.4% by software. Such expansion costs only 7.6%/12.7% energy efficiency overhead while providing 12%/16% accuracy improvement after expansion.","PeriodicalId":270592,"journal":{"name":"2022 IEEE/ACM International Conference On Computer Aided Design (ICCAD)","volume":"8 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-10-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127707885","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

A Robust Global Routing Engine with High-accuracy Cell Movement under Advanced Constraints 先进约束下具有高精度单元移动的鲁棒全局路由引擎

2022 IEEE/ACM International Conference On Computer Aided Design (ICCAD) Pub Date : 2022-10-29 DOI: 10.1145/3508352.3549421

Ziran Zhu, Fuheng Shen, Yangjie Mei, Zhipeng Huang, Jianli Chen, Jun-Zhi Yang

{"title":"A Robust Global Routing Engine with High-accuracy Cell Movement under Advanced Constraints","authors":"Ziran Zhu, Fuheng Shen, Yangjie Mei, Zhipeng Huang, Jianli Chen, Jun-Zhi Yang","doi":"10.1145/3508352.3549421","DOIUrl":"https://doi.org/10.1145/3508352.3549421","url":null,"abstract":"Placement and routing are typically defined as two separate problems to reduce the design complexity. However, such a divide-and-conquer approach inevitably incurs the degradation of solution quality due to the correlation/objectives of placement and routing are not entirely consistent. Besides, with various constraints (e.g., timing, R/C characteristic, voltage area, etc.) imposed by advanced circuit designs, bridging the gap between placement and routing while satisfying the advanced constraints has become more challenging. In this paper, we develop a robust global routing engine with high-accuracy cell movement under advanced constraints to narrow the gap and improve the routing solution. We first present a routing refinement technique to obtain the convergent routing result based on fixed placement, which provides more accurate information for subsequent cell movement. To achieve fast and high-accuracy position prediction for cell movement, we construct a lookup table (LUT) considering complex constraints/objectives (e.g., routing direction and layer-based power consumption), and generate a timing-driven gain map for each cell based on the LUT. Finally, based on the prediction, we propose an alternating cell movement and cluster movement scheme followed by partial rip-up and reroute to optimize the routing solution. Experimental results on the ICCAD 2020 contest benchmarks show that our algorithm achieves the best total scores among all published works. Compared with the champion of the ICCAD 2021 contest, experimental results on the ICCAD 2021 contest benchmarks show that our algorithm achieves better solution quality in shorter runtime.","PeriodicalId":270592,"journal":{"name":"2022 IEEE/ACM International Conference On Computer Aided Design (ICCAD)","volume":"25 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-10-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132437142","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Pin Accessibility and Routing Congestion Aware DRC Hotspot Prediction using Graph Neural Network and U-Net 基于图神经网络和U-Net的引脚可达性和路由拥塞感知DRC热点预测

2022 IEEE/ACM International Conference On Computer Aided Design (ICCAD) Pub Date : 2022-10-29 DOI: 10.1145/3508352.3549346

Kyeonghyeon Baek, Hyunbum Park, Suwan Kim, Kyumyung Choi, Taewhan Kim

{"title":"Pin Accessibility and Routing Congestion Aware DRC Hotspot Prediction using Graph Neural Network and U-Net","authors":"Kyeonghyeon Baek, Hyunbum Park, Suwan Kim, Kyumyung Choi, Taewhan Kim","doi":"10.1145/3508352.3549346","DOIUrl":"https://doi.org/10.1145/3508352.3549346","url":null,"abstract":"An accurate DRC (design rule check) hotspot prediction at the placement stage is essential in order to reduce a substantial amount of design time required for the iterations of placement and routing. It is known that for implementing chips with advanced technology nodes, (1) pin accessibility and (2) routing congestion are two major causes of DRVs (design rule violations). Though many ML (machine learning) techniques have been proposed to address this prediction problem, it was not easy to assemble the aggregate data on items 1 and 2 in a unified fashion for training ML models, resulting in a considerable accuracy loss in DRC hotspot prediction. This work overcomes this limitation by proposing a novel ML based DRC hotspot prediction technique, which is able to accurately capture the combined impact of items 1 and 2 on DRC hotspots. Precisely, we devise a graph, called pin proximity graph, that effectively models the spatial information on cell I/O pins and the information on pin-to-pin disturbance relation. Then, we propose a new ML model, called PGNN, which tightly combines GNN (graph neural network) and U-net in a way that GNN is used to embed pin accessibility information abstracted from our pin proximity graph while U-net is used to extract routing congestion information from grid-based features. Through experiments with a set of benchmark designs using Nangate 15nm library, our PGNN outperforms the existing ML models on all benchmark designs, achieving on average 7.8~12.5% improvements on F1-score while taking 5.5× fast inference time in comparison with that of the state-of-the-art techniques.","PeriodicalId":270592,"journal":{"name":"2022 IEEE/ACM International Conference On Computer Aided Design (ICCAD)","volume":"25 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-10-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130532575","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 7

On Minimizing the Read Latency of Flash Memory to Preserve Inter-tree Locality in Random Forest 最小化闪存读延迟以保持随机森林树间局部性的研究

2022 IEEE/ACM International Conference On Computer Aided Design (ICCAD) Pub Date : 2022-10-29 DOI: 10.1145/3508352.3549365

Yu-Cheng Lin, Yu-Pei Liang, Tseng-Yi Chen, Yuan-Hao Chang, Shuo-Han Chen, W. Shih

{"title":"On Minimizing the Read Latency of Flash Memory to Preserve Inter-tree Locality in Random Forest","authors":"Yu-Cheng Lin, Yu-Pei Liang, Tseng-Yi Chen, Yuan-Hao Chang, Shuo-Han Chen, W. Shih","doi":"10.1145/3508352.3549365","DOIUrl":"https://doi.org/10.1145/3508352.3549365","url":null,"abstract":"Many prior research works have been widely discussed how to bring machine learning algorithms to embedded systems. Because of resource constraints, embedded platforms for machine learning applications play the role of a predictor. That is, an inference model will be constructed on a personal computer or a server platform, and then integrated into embedded systems for just-in-time inference. With the consideration of the limited main memory space in embedded systems, an important problem for embedded machine learning systems is how to efficiently move inference model between the main memory and a secondary storage (e.g., flash memory). For tackling this problem, we need to consider how to preserve the locality inside the inference model during model construction. Therefore, we have proposed a solution, namely locality-aware random forest (LaRF), to preserve the inter-locality of all decision trees within a random forest model during the model construction process. Owing to the locality preservation, LaRF can improve the read latency by 81.5% at least, compared to the original random forest library.","PeriodicalId":270592,"journal":{"name":"2022 IEEE/ACM International Conference On Computer Aided Design (ICCAD)","volume":"18 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-10-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125364534","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0