2021 IEEE/ACM International Conference On Computer Aided Design (ICCAD)最新文献_第4页

Online and Offline Machine Learning for Industrial Design Flow Tuning: (Invited - ICCAD Special Session Paper) 在线和离线机器学习用于工业设计流程调整:(邀请- ICCAD特别会议论文)

2021 IEEE/ACM International Conference On Computer Aided Design (ICCAD) Pub Date : 2021-11-01 DOI: 10.1109/ICCAD51958.2021.9643577

M. Ziegler, Jihye Kwon, Hung-Yi Liu, L. Carloni

{"title":"Online and Offline Machine Learning for Industrial Design Flow Tuning: (Invited - ICCAD Special Session Paper)","authors":"M. Ziegler, Jihye Kwon, Hung-Yi Liu, L. Carloni","doi":"10.1109/ICCAD51958.2021.9643577","DOIUrl":"https://doi.org/10.1109/ICCAD51958.2021.9643577","url":null,"abstract":"Modern logic and physical synthesis tools provide numerous options and parameters that can drastically affect design quality; however, the large number of options leads to a complex design space difficult for human designers to navigate. Fortunately, machine learning approaches and cloud computing environments are well suited for tackling complex parameter tuning problems like those seen in VLSI design flows. This paper proposes a holistic approach where online and offline machine learning approaches work together for tuning industrial design flows. We describe a system called SynTunSys (STS) that has been used to optimize multiple industrial high-performance processors. STS consists of an online system that optimizes designs and generates data for a recommender system that performs offline training and recommendation. Experimental results show the collaboration between STS online and offline machine learning systems as well as insight from human designers provide best-of-breed results. Finally, we discuss potential new directions for research on design flow tuning.","PeriodicalId":370791,"journal":{"name":"2021 IEEE/ACM International Conference On Computer Aided Design (ICCAD)","volume":"9 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125358481","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 2

AutoMap: Automated Mapping of Security Properties Between Different Levels of Abstraction in Design Flow AutoMap:设计流程中不同抽象级别之间安全属性的自动映射

2021 IEEE/ACM International Conference On Computer Aided Design (ICCAD) Pub Date : 2021-11-01 DOI: 10.1109/ICCAD51958.2021.9643467

Bulbul Ahmed, Fahim Rahman, Nick Hooten, Farimah Farahmandi, M. Tehranipoor

{"title":"AutoMap: Automated Mapping of Security Properties Between Different Levels of Abstraction in Design Flow","authors":"Bulbul Ahmed, Fahim Rahman, Nick Hooten, Farimah Farahmandi, M. Tehranipoor","doi":"10.1109/ICCAD51958.2021.9643467","DOIUrl":"https://doi.org/10.1109/ICCAD51958.2021.9643467","url":null,"abstract":"The security of system-on-chip (SoC) designs is threatened by many vulnerabilities introduced by untrusted third-party IPs, and designers and CAD tools' lack of awareness of security requirements. Ensuring the security of an SoC has become highly challenging due to the diverse threat models, high design complexity, and lack of effective security-aware verification solutions. Moreover, new security vulnerabilities are introduced during the design transformation from higher to lower abstraction levels. As a result, security verification becomes a major bottleneck that should be performed at every level of design abstraction. Reducing the verification effort by mapping the security properties at different design stages could be an efficient solution to lower the total verification time if the new vulnerabilities introduced at different abstraction levels are addressed properly. To address this challenge, we introduce AutoMap that, in addition to the mapping, extends and expands the security properties to identify new vulnerabilities introduced when the design moves from higher-to lower-level abstraction. Starting at the higher abstraction level with a defined set of security properties for the target threat models, AutoMap automatically maps the properties to the lower levels of abstraction to reduce the verification effort. Furthermore, it extends and expands the properties to cover new vulnerabilities introduced by design transformations and updates to the lower abstraction level. We demonstrate AutoMap's efficacy by applying it to AES, RSA, and SHA256 at C++, RTL, and gate-level. We show that AutoMap effectively facilitates the detection of security vulnerabilities from different sources during the design transformation.","PeriodicalId":370791,"journal":{"name":"2021 IEEE/ACM International Conference On Computer Aided Design (ICCAD)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129594154","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 3

An Optimal Algorithm for Splitter and Buffer Insertion in Adiabatic Quantum-Flux-Parametron Circuits 绝热量子通量参数电路中分路器和缓冲器插入的最优算法

2021 IEEE/ACM International Conference On Computer Aided Design (ICCAD) Pub Date : 2021-11-01 DOI: 10.1109/ICCAD51958.2021.9643456

Chao-Yuan Huang, Yi-Chen Chang, Ming-Jer Tsai, Tsung-Yi Ho

{"title":"An Optimal Algorithm for Splitter and Buffer Insertion in Adiabatic Quantum-Flux-Parametron Circuits","authors":"Chao-Yuan Huang, Yi-Chen Chang, Ming-Jer Tsai, Tsung-Yi Ho","doi":"10.1109/ICCAD51958.2021.9643456","DOIUrl":"https://doi.org/10.1109/ICCAD51958.2021.9643456","url":null,"abstract":"The Adiabatic Quantum-Flux-Parametron (AQFP), which benefits from low power consumption and rapid switching, is one of the rising superconducting logics. Due to the rapid switching, the delay of the inputs of an AQFP gate is strictly specified so that additional buffers are needed to synchronize the delay. Meanwhile, to maintain the symmetry layout of gates and reduce the undesired parasitic magnetic coupling, the AQFP cell library adopts the minimalist design method in which splitters are employed for the gates with multiple fan-outs. Thus, an AQFP circuit may demand numerous splitters and buffers, resulting in a considerable amount of power consumption and delay. This provides a motivation for proposing an effective splitter and buffer insertion algorithm for the AQFP circuits. In this paper, we propose a dynamic programming-based algorithm that provides an optimal splitter and buffer insertion for each wire of the input circuit. Experimental results show that our method is fast, and has a 10% reduction of additional Josephson Junctions (JJs) in the complicated circuits compared with the state-of-the-art method.","PeriodicalId":370791,"journal":{"name":"2021 IEEE/ACM International Conference On Computer Aided Design (ICCAD)","volume":"23 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128569035","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 8

A Circuit-Based SAT Solver for Logic Synthesis 基于电路的逻辑综合SAT求解器

2021 IEEE/ACM International Conference On Computer Aided Design (ICCAD) Pub Date : 2021-11-01 DOI: 10.1109/ICCAD51958.2021.9643505

He-Teng Zhang, Jie-Hong R. Jiang, A. Mishchenko

引用次数: 7

Generating Architecture-Level Abstractions from RTL Designs for Processors and Accelerators Part I: Determining Architectural State Variables 从处理器和加速器的RTL设计中生成体系结构级抽象。第一部分:确定体系结构状态变量

2021 IEEE/ACM International Conference On Computer Aided Design (ICCAD) Pub Date : 2021-11-01 DOI: 10.1109/ICCAD51958.2021.9643584

Yu Zeng, Bo-Yuan Huang, Hongce Zhang, Aarti Gupta, S. Malik

{"title":"Generating Architecture-Level Abstractions from RTL Designs for Processors and Accelerators Part I: Determining Architectural State Variables","authors":"Yu Zeng, Bo-Yuan Huang, Hongce Zhang, Aarti Gupta, S. Malik","doi":"10.1109/ICCAD51958.2021.9643584","DOIUrl":"https://doi.org/10.1109/ICCAD51958.2021.9643584","url":null,"abstract":"Today's Systems-on-Chips (SoCs) comprise general/special purpose programmable processors and specialized hardware modules referred to as accelerators. These accelerators serve as co-processors and are invoked through software or firmware. Thus, verifying SoCs requires co-verification of hardware with software/firmware. Co-verification using cycle-accurate hardware models is often not scalable, and requires hardware abstractions. Among various abstractions, architecture-level abstractions are very effective as they retain only the software visible state. An Instruction-Set Architecture (ISA) serves this role for processors and such ISA-like abstractions are also desirable for accelerators. Manually creating such abstractions for accelerators is tedious and error-prone, and there is a growing need for automation in deriving them from existing Register-Transfer Level (RTL) implementations. An important part of this automation is determining which state variables to retain in the abstract model. For processors and accelerators, this set of variables is naturally the Architectural State Variables (ASVs) - variables that are persistent across instructions. This paper presents the first work to automatically determine ASVs of processors and accelerators from their RTL implementations. We propose three novel algorithms based on different characteristics of ASVs. Each algorithm provides a sound abstraction, i.e., an over-approximate set of ASVs. The quality of the abstraction is measured by the size of the set of ASVs computed. Experiments on several processors and accelerators demonstrate that these algorithms perform best in different cases, and by combining them a high quality set of ASVs can be found in reasonable time.","PeriodicalId":370791,"journal":{"name":"2021 IEEE/ACM International Conference On Computer Aided Design (ICCAD)","volume":"88 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127134926","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 4

Rerec: In-ReRAM Acceleration with Access-Aware Mapping for Personalized Recommendation 基于访问感知映射的个性化推荐的In-ReRAM加速

2021 IEEE/ACM International Conference On Computer Aided Design (ICCAD) Pub Date : 2021-11-01 DOI: 10.1109/ICCAD51958.2021.9643573

Yitu Wang, Zhenhua Zhu, Fan Chen, Mingyuan Ma, Guohao Dai, Yu Wang, Hai Helen Li, Yiran Chen

{"title":"Rerec: In-ReRAM Acceleration with Access-Aware Mapping for Personalized Recommendation","authors":"Yitu Wang, Zhenhua Zhu, Fan Chen, Mingyuan Ma, Guohao Dai, Yu Wang, Hai Helen Li, Yiran Chen","doi":"10.1109/ICCAD51958.2021.9643573","DOIUrl":"https://doi.org/10.1109/ICCAD51958.2021.9643573","url":null,"abstract":"Personalized recommendation systems are widely used in many Internet services. The sparse embedding lookup in recommendation models dominates the computational cost of inference due to its intensive irregular memory accesses. Applying resistive random access memory (ReRAM) based process-in-memory (PIM) architecture to accelerate recommendation processing can avoid data movements caused by off-chip memory accesses. However, naïve adoption of ReRAM-based DNN accelerators leads to low computation parallelism and severe under-utilization of computing resources, which is caused by the fine-grained inner-product in feature interaction. In this paper, we propose Rerec, an architecture-algorithm co-designed accelerator, which specializes in fine-grained ReRAM-based inner-product engines with access-aware mapping algorithm for recommendation inference. At the architecture level, we reduce the size and increase the amount of crossbars. The crossbars are fully-connected by Analog-to-Digital Converters (ADCs) in one inner-product engine, which can adapt to the fine-grained and irregular computational patterns and improve the processing parallelism. We further explore trade-offs of (i) crossbar size vs. hardware utilization, and (ii) ADC implementation vs. area/energy efficiency to optimize the design. At the algorithm level, we propose a novel access-aware mapping (AAM) algorithm to optimize resource allocations. Our AAM algorithm tackles the problems of (i) the workload imbalance and (ii) the long recommendation inference latency induced by the great variance of access frequency of embedding vectors. Experimental results show that Rerecachieves 7.69x speedup compared with a ReRAM-based baseline design. Compared to CPU and the state-of-the-art recommendation accelerator, Rerecdemonstrates 29.26x and 3.48x performance improvement, respectively.","PeriodicalId":370791,"journal":{"name":"2021 IEEE/ACM International Conference On Computer Aided Design (ICCAD)","volume":"43 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130459483","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 6

AMF-Placer: High-Performance Analytical Mixed-size Placer for FPGA AMF-Placer:用于FPGA的高性能分析混合大小的Placer

2021 IEEE/ACM International Conference On Computer Aided Design (ICCAD) Pub Date : 2021-11-01 DOI: 10.1109/ICCAD51958.2021.9643574

Tingyuan Liang, Gengjie Chen, Jieru Zhao, Sharad Sinha, Wei Zhang

{"title":"AMF-Placer: High-Performance Analytical Mixed-size Placer for FPGA","authors":"Tingyuan Liang, Gengjie Chen, Jieru Zhao, Sharad Sinha, Wei Zhang","doi":"10.1109/ICCAD51958.2021.9643574","DOIUrl":"https://doi.org/10.1109/ICCAD51958.2021.9643574","url":null,"abstract":"To enable the performance optimization of application mapping on modern field-programmable gate arrays (FPGAs), certain critical path portions of the designs might be prearranged into many multi-cell macros during synthesis. These movable macros with constraints of shape and resources lead to challenging mixed-size placement for FPGA designs which cannot be addressed by previous works of analytical placers. In this work, we propose AMF-Placer, an open-source Analytical Mixed-size FPGA placer supporting mixed-size placement on FPGA, with an interface to Xilinx Vivado. To speed up the convergence and improve the quality of the placement, AMF-Placer is equipped with a series of new techniques for wirelength optimization, cell spreading, packing, and legalization. Based on a set of the latest large open-source benchmarks from various domains for Xilinx Ultrascale FPGAs, experimental results indicate that AMF-Placer can improve HPWL by 20.4%-89.3% and reduce runtime by 8.0%-84.2%, compared to the baseline. Furthermore, utilizing the parallelism of the proposed algorithms, with 8 threads, the placement procedure can be accelerated by 2.41x on average.","PeriodicalId":370791,"journal":{"name":"2021 IEEE/ACM International Conference On Computer Aided Design (ICCAD)","volume":"81 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116930343","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 7

Heterogeneous Manycore Architectures Enabled by Processing-in-Memory for Deep Learning: From CNNs to GNNs: (ICCAD Special Session Paper) 基于内存处理的异构多核深度学习架构:从cnn到GNNs (ICCAD特别会议论文)

2021 IEEE/ACM International Conference On Computer Aided Design (ICCAD) Pub Date : 2021-11-01 DOI: 10.1109/ICCAD51958.2021.9643559

Biresh Kumar Joardar, Aqeeb Iqbal Arka, J. Doppa, P. Pande, Hai Helen Li, K. Chakrabarty

引用次数: 2

A Framework for Area-efficient Multi-task BERT Execution on ReRAM-based Accelerators 基于reram加速器的区域高效多任务BERT执行框架

2021 IEEE/ACM International Conference On Computer Aided Design (ICCAD) Pub Date : 2021-11-01 DOI: 10.1109/ICCAD51958.2021.9643471

Myeonggu Kang, Hyein Shin, Jaekang Shin, L. Kim

{"title":"A Framework for Area-efficient Multi-task BERT Execution on ReRAM-based Accelerators","authors":"Myeonggu Kang, Hyein Shin, Jaekang Shin, L. Kim","doi":"10.1109/ICCAD51958.2021.9643471","DOIUrl":"https://doi.org/10.1109/ICCAD51958.2021.9643471","url":null,"abstract":"With the superior algorithmic performances, BERT has become the de-facto standard model for various NLP tasks. Accordingly, multiple BERT models have been adopted on a single system, which is also called multi-task BERT. Although the ReRAM-based accelerator shows the sufficient potential to execute a single BERT model by adopting in-memory computation, processing multi-task BERT on the ReRAM-based accelerator extremely increases the overall area due to multiple fine-tuned models. In this paper, we propose a framework for area-efficient multi-task BERT execution on the ReRAM-based accelerator. Firstly, we decompose the fine-tuned model of each task by utilizing the base-model. After that, we propose a two-stage weight compressor, which shrinks the decomposed models by analyzing the properties of the ReRAM-based accelerator. We also present a profiler to generate hyper-parameters for the proposed compressor. By sharing the base-model and compressing the decomposed models, the proposed framework successfully reduces the total area of the ReRAM-based accelerator without an additional training procedure. It achieves a 0.26 x area than baseline while maintaining the algorithmic performances.","PeriodicalId":370791,"journal":{"name":"2021 IEEE/ACM International Conference On Computer Aided Design (ICCAD)","volume":"23 9 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116407809","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 1

HeteroCPPR: Accelerating Common Path Pessimism Removal with Heterogeneous CPU-GPU Parallelism HeteroCPPR:利用异构CPU-GPU并行加速公共路径悲观去除

2021 IEEE/ACM International Conference On Computer Aided Design (ICCAD) Pub Date : 2021-11-01 DOI: 10.1109/ICCAD51958.2021.9643457

Zizheng Guo, Tsung-Wei Huang, Yibo Lin

{"title":"HeteroCPPR: Accelerating Common Path Pessimism Removal with Heterogeneous CPU-GPU Parallelism","authors":"Zizheng Guo, Tsung-Wei Huang, Yibo Lin","doi":"10.1109/ICCAD51958.2021.9643457","DOIUrl":"https://doi.org/10.1109/ICCAD51958.2021.9643457","url":null,"abstract":"Common path pessimism removal (CPPR) is a key step to eliminating unwanted pessimism during static timing analysis (STA). Unwanted pessimism will force designers and optimization algorithms to waste a significant yet unnecessary amount of effort on fixing paths that meet the intended timing constraints. However, CPPR is extremely time-consuming and can incur 10–100× runtime overheads to complete. Existing solutions for speeding up CPPR are architecturally constrained by CPU-only parallelism, and their runtimes do not scale beyond 8–16 cores. In this paper, we introduce HeteroCPPR, a new algorithm to accelerate CPPR by harnessing the power of heterogeneous CPU-GPU parallelism. We devise an efficient CPU-GPU task decomposition strategy and highly optimized GPU kernels to handle CPPR that scales to large numbers of paths. Also, HeteroCPPR can scale to multiple GPUs. As an example, HeteroCPPR is up to 16×faster than a state-of-the-art CPU-parallel CPPR algorithm for completing the analysis of 10K post-CPPR critical paths in a million-gate design under a machine of 40 CPUs and 4 GPUs.","PeriodicalId":370791,"journal":{"name":"2021 IEEE/ACM International Conference On Computer Aided Design (ICCAD)","volume":"81 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"117307705","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 10