2019 IEEE/ACM International Conference on Computer-Aided Design (ICCAD)最新文献_第8页

An Agile Precision-Tunable CNN Accelerator based on ReRAM 基于ReRAM的敏捷精度可调CNN加速器

2019 IEEE/ACM International Conference on Computer-Aided Design (ICCAD) Pub Date : 2019-11-01 DOI: 10.1109/iccad45719.2019.8942163

Yintao He, Ying Wang, Yongchen Wang, Huawei Li, Xiaowei Li

{"title":"An Agile Precision-Tunable CNN Accelerator based on ReRAM","authors":"Yintao He, Ying Wang, Yongchen Wang, Huawei Li, Xiaowei Li","doi":"10.1109/iccad45719.2019.8942163","DOIUrl":"https://doi.org/10.1109/iccad45719.2019.8942163","url":null,"abstract":"Precision-tuning is a popular approach of approximate computing to trade-off excessive computation exactness for power and efficiency gains. Particularly, it has been proved useful to reduce the computation and memory overhead for the deep neural networks on embedded and IoT usage. However, the switching overhead of precision tuning in hardware severely impacts its applicability and effectiveness to save more energy by quickly reacting to the change of environment, user constraint or input quality. This work for the first time investigates the feasibility of agile and cost-free precision tuning for neural network accelerators to benefit from approximate computing. The proposed Processing in Memory (PIM) CNN accelerators fully utilize the normally-off characteristics of memristor crossbars to achieve instant network precision tuning without worrying about the model reloading penalty. The ReRAM-based accelerator, with the proposed neural parameter mapping policy and the novel mixed-model training method, involves negligible precision-switching latency and power consumption compared with traditional variable precision accelerators. The proposed mixed-model training perfectly unifies the neural models of different precision into a single ReRAM array without compromising the accuracy, and the ReRAM accelerator could save 58.3%-62.47% area overhead compared with conventional designs that have to program multiple independent models into ReRAM arrays for precision tuning.","PeriodicalId":363364,"journal":{"name":"2019 IEEE/ACM International Conference on Computer-Aided Design (ICCAD)","volume":"21 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127557046","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 5

Embedding Binary Perceptrons in FPGA to improve Area, Power and Performance 在FPGA中嵌入二进制感知器以提高面积、功耗和性能

2019 IEEE/ACM International Conference on Computer-Aided Design (ICCAD) Pub Date : 2019-11-01 DOI: 10.1109/iccad45719.2019.8942071

Ankit Wagle, E. Azari, S. Vrudhula

{"title":"Embedding Binary Perceptrons in FPGA to improve Area, Power and Performance","authors":"Ankit Wagle, E. Azari, S. Vrudhula","doi":"10.1109/iccad45719.2019.8942071","DOIUrl":"https://doi.org/10.1109/iccad45719.2019.8942071","url":null,"abstract":"For the flexibility of implementing any given Boolean function(s), the FPGA uses re-configurable building blocks called LUTs. The price for this reconfigurability is a large number of registers and multiplexers required to construct the FPGA. While researchers have been working on complex LUT structures to reduce the area and power for several years, most of these implementations come at the cost of performance penalty. This paper demonstrates simultaneous improvement in area, power, and performance in an FPGA by using special logic cells called Threshold Logic Cells (TLCs) (also known as binary perceptrons). The TLCs are capable of implementing a complex threshold function, which if implemented using conventional gates would require several levels of logic gates. The TLCs only require 7 SRAM cells and are significantly faster than the conventional LUTs. The implementation of the proposed FPGA architecture has been done using 28nm FDSOI standard cells and has been evaluated using ISCAS-85, ISCAS-89, and a few large industrial designs. Experiments demonstrate that the proposed architecture can be used to get an average reduction of 18.1% in configuration registers, 18.1% reduction in multiplexer count, 12.3% in Basic Logic Element (BLE) area, 16.3% in BLE power, 5.9% improvement in operating frequency, with a slight reduction in track count, routing area and routing power. The improvements are also demonstrated on the physically designed version of the architecture.","PeriodicalId":363364,"journal":{"name":"2019 IEEE/ACM International Conference on Computer-Aided Design (ICCAD)","volume":"11 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127638723","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

elfPlace: Electrostatics-based Placement for Large-Scale Heterogeneous FPGAs elfPlace:基于静电的大规模异构fpga布局

2019 IEEE/ACM International Conference on Computer-Aided Design (ICCAD) Pub Date : 2019-11-01 DOI: 10.1109/iccad45719.2019.8942075

Wuxi Li, Yibo Lin, D. Pan

引用次数: 26

GeniusRoute: A New Analog Routing Paradigm Using Generative Neural Network Guidance GeniusRoute:使用生成神经网络引导的新的模拟路由范式

2019 IEEE/ACM International Conference on Computer-Aided Design (ICCAD) Pub Date : 2019-11-01 DOI: 10.1109/iccad45719.2019.8942164

Keren Zhu, Mingjie Liu, Yibo Lin, Biying Xu, Shaolan Li, Xiyuan Tang, Nan Sun, D. Pan

引用次数: 42

Making the Fault-Tolerance of Emerging Neural Network Accelerators Scalable 新兴神经网络加速器容错可扩展性研究

2019 IEEE/ACM International Conference on Computer-Aided Design (ICCAD) Pub Date : 2019-11-01 DOI: 10.1109/iccad45719.2019.8942073

Tao Liu, Wujie Wen

{"title":"Making the Fault-Tolerance of Emerging Neural Network Accelerators Scalable","authors":"Tao Liu, Wujie Wen","doi":"10.1109/iccad45719.2019.8942073","DOIUrl":"https://doi.org/10.1109/iccad45719.2019.8942073","url":null,"abstract":"Deep neural network (DNN) accelerators built upon emerging technologies, such as memristor, are gaining increasing research attention because of the impressive computing efficiency brought by processing-in-memory. One critical challenge faced by these promising accelerators, however, is their poor reliability: the weight, which is stored as the memristance or resistance value of each device, suffers large uncertainty incurred by unique device physical limitations, e.g. stochastic programming, resistance drift etc., translating into prominent testing accuracy degradation. Non-trivial retraining, weight remapping or redundant cell fixing, are popular approaches to address this issue. However, these solutions have limited scalability since they are more like tedious patch adding or bug fixing after identifying each accelerator-dependent defect map. On the other side, scalable solutions are highly desirable in the envisioned scenario of a neural network trained once in the cloud and deployed to many edge devices with each equipped with an emerging accelerator. In this paper, we discuss the challenge and requirement of the fault-tolerance in these new accelerators. Then we show how to address this problem through a scalable algorithm-hardware codesign method, with a focus on unleashing the algorithmic error-resilience of DNN classifiers, so as to eliminate any expensive defect-map-specific calibration or training-from-scratch.","PeriodicalId":363364,"journal":{"name":"2019 IEEE/ACM International Conference on Computer-Aided Design (ICCAD)","volume":"61 2","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131452298","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 2

LSOracle: a Logic Synthesis Framework Driven by Artificial Intelligence: Invited Paper LSOracle:一个人工智能驱动的逻辑综合框架

2019 IEEE/ACM International Conference on Computer-Aided Design (ICCAD) Pub Date : 2019-11-01 DOI: 10.1109/iccad45719.2019.8942145

Walter Lau Neto, Max Austin, Scott Temple, L. Amarù, Xifan Tang, P. Gaillardon

{"title":"LSOracle: a Logic Synthesis Framework Driven by Artificial Intelligence: Invited Paper","authors":"Walter Lau Neto, Max Austin, Scott Temple, L. Amarù, Xifan Tang, P. Gaillardon","doi":"10.1109/iccad45719.2019.8942145","DOIUrl":"https://doi.org/10.1109/iccad45719.2019.8942145","url":null,"abstract":"The increasing complexity of modern Integrated Circuits (ICs) leads to systems composed of various different Intellectual Property (IPs) blocks, known as System-on-Chip (SoC). Such complexity requires strong expertise from engineers, that rely on expansive commercial EDA tools. To overcome such a limitation, an automated open-source logic synthesis flow is required. In this context, this work proposes LSOracle: a novel automated mixed logic synthesis framework. LSOracle is the first to exploit state-of-the-art And-Inverter Graph (AIG) and Majority-Inverter Graph (MIG) logic optimizers and relies on a Deep Neural Network (DNN) to automatically decide which optimizer should handle different portions of the circuit. To do so, LSOracle applies $k-way$ partitioning to split a DAG into multiple partitions and uses a to chose the best-fit optimizer. Post-tech mapping ASIC results, targeting the 7nm ASAP standard cell library, for a set of mixed-logic circuits, show an average improvement in area-delay product of 6.87% (up to 10.26%) and 2.70% (up to 6.27%) when compared to AIG and MIG, respectively. In addition, we show that for the considered circuits, LSOracle achieves an area close to AIGs (which delivered smaller circuits) with a similar performance of MIGs, which delivered faster circuits.","PeriodicalId":363364,"journal":{"name":"2019 IEEE/ACM International Conference on Computer-Aided Design (ICCAD)","volume":"146 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133173393","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 42

ACG-Engine: An Inference Accelerator for Content Generative Neural Networks ACG-Engine:内容生成神经网络的推理加速器

2019 IEEE/ACM International Conference on Computer-Aided Design (ICCAD) Pub Date : 2019-11-01 DOI: 10.1109/iccad45719.2019.8942169

Haobo Xu, Ying Wang, Yujie Wang, Jiajun Li, Bosheng Liu, Yinhe Han

{"title":"ACG-Engine: An Inference Accelerator for Content Generative Neural Networks","authors":"Haobo Xu, Ying Wang, Yujie Wang, Jiajun Li, Bosheng Liu, Yinhe Han","doi":"10.1109/iccad45719.2019.8942169","DOIUrl":"https://doi.org/10.1109/iccad45719.2019.8942169","url":null,"abstract":"The technological breakthrough in Generative Adversarial Networks (GAN) has propelled the advancement of content generative applications such as AI-based paintings, style transfer, and music composition. However, in contrast to previous deep learning models for prediction and categorization, generative networks generally rely on instance normalization (IN) layer for better feature distribution, which performs significantly better than batch normalization(BN) in image style-transfer, image to image translation, etc. Unlike batch or group normalization that can be fused into convolutional layers and ignored during the network inference stage, an instance normalization layer induces intensive computation and memory access. However, prior deep learning accelerator designs for traditional Neural Network and Generative Adversarial Networks mostly focus on the acceleration of convolution and deconvolution layer but lack of support for IN operations, which could become a performance bottleneck on edge devices with insufficient computational power. To address this problem, we propose an inference accelerator for content generation (ACG-Engine) aimed to support the fundamental operations of generative networks, including convolution layers, deconvolution layers, specifically instance normalization layer. We performed a hardware-aware mathematical transformation of the IN operation for less computation complexity and memory-friendliness, so that it can be efficiently mapped to the classic 2D processing element array. Owing to the proposed optimization techniques, ACG-Engine achieves 4.56X speedup and improve power efficiency up to 29X compared to prior baseline acceleration scheme in generative network acceleration. In addition, ACG-Engine can achieve performance comparable to the classic CNN-specific accelerators with negligible power consumption and area overhead.","PeriodicalId":363364,"journal":{"name":"2019 IEEE/ACM International Conference on Computer-Aided Design (ICCAD)","volume":"56 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131101453","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 5

A Spectral Convolutional Net for Co-Optimization of Integrated Voltage Regulators and Embedded Inductors 基于频谱卷积网络的集成稳压器和嵌入式电感器协同优化

2019 IEEE/ACM International Conference on Computer-Aided Design (ICCAD) Pub Date : 2019-11-01 DOI: 10.1109/iccad45719.2019.8942109

H. Torun, Huan Yu, N. Dasari, Venkata Chaitanya Krishna Chekuri, Arvind Singh, Jinwoo Kim, S. Lim, S. Mukhopadhyay, M. Swaminathan

{"title":"A Spectral Convolutional Net for Co-Optimization of Integrated Voltage Regulators and Embedded Inductors","authors":"H. Torun, Huan Yu, N. Dasari, Venkata Chaitanya Krishna Chekuri, Arvind Singh, Jinwoo Kim, S. Lim, S. Mukhopadhyay, M. Swaminathan","doi":"10.1109/iccad45719.2019.8942109","DOIUrl":"https://doi.org/10.1109/iccad45719.2019.8942109","url":null,"abstract":"Integrated voltage regulators (IVR) with embedded inductors is an emerging technology that provides point-of-load voltage regulation to high-performance systems. Conventional two-step approaches to the design of IVRs can suffer from suboptimal design as the optimal inductor depends on the characteristics of the buck converter (BC). Furthermore, inductor-level trade-offs such as AC and DC resistance, inductance and area can not be determined independently from the BC. This co-dependency of the BC and the inductor creates a highly non-linear response surface, which raises the necessity of co-optimization, involving multiple time-consuming electromagnetics (EM) simulations. In this paper, we propose a machine learning based optimization methodology that eliminates EM simulations from the optimization loop to significantly reduce the optimization complexity. A novel technique named as Spectral Transposed Convolutional Neural Network (S-TCNN) is presented to derive an accurate predictive model of the inductor frequency response using a small amount of training data. The derived S-TCNN is then used along with a time-domain model of the BC to perform multi-objective optimization that approximates the Pareto front for 5 objectives, namely inductor area, BC settling time, voltage conversion efficiency, droop and ripple. The resulting methodology provides multiple Pareto optimal inductors in an efficient and fully automated fashion, thereby allows to rapidly determine the optimal trade-offs for possibly contradicting design objectives. We demonstrate the proposed framework on co-optimization of solenoidal inductor with magnetic core and BC that are integrated on silicon interposer.","PeriodicalId":363364,"journal":{"name":"2019 IEEE/ACM International Conference on Computer-Aided Design (ICCAD)","volume":"19 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115590748","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 17

IcySAT: Improved SAT-based Attacks on Cyclic Locked Circuits IcySAT:改进的基于sat的循环锁定电路攻击

2019 IEEE/ACM International Conference on Computer-Aided Design (ICCAD) Pub Date : 2019-11-01 DOI: 10.1109/iccad45719.2019.8942049

Kaveh Shamsi, D. Pan, Yier Jin

{"title":"IcySAT: Improved SAT-based Attacks on Cyclic Locked Circuits","authors":"Kaveh Shamsi, D. Pan, Yier Jin","doi":"10.1109/iccad45719.2019.8942049","DOIUrl":"https://doi.org/10.1109/iccad45719.2019.8942049","url":null,"abstract":"“Cyclic” circuit locking/camouflaging is a recently proposed direction in logic obfuscation for thwarting foundry and end-user reverse engineering. As opposed to traditional schemes, these techniques create cycles in the obfuscated circuit in a way that confuses the attacker but does not disrupt the combinational nature of the circuit. While these schemes can thwart the baseline SAT-based attack, the CycSAT attack was proposed recently to break these schemes through a preprocessing step that builds a Boolean condition to avoid cyclic solutions/keys during the attack. However, follow-up work has suggested that extracting these conditions requires enumerating all cycles in the circuit, or that instead of relying on these conditions preemptively, cyclic solutions must be banned individually on the fly. In this paper we present new algorithms for performing SAT-based attacks on cyclic circuits. We first propose an algorithm that can produce non-cyclic conditions in polynomial time with respect to the size of the circuit, avoiding the potentially exponential runtime of explicit key-banning or cycle enumeration. We then take a deeper look at the problem, discussing some of the fundamental limitations of extracting precise non-cyclic conditions and propose a more complex but complete procedure for cyclic deobfuscation. We evaluate our attacks on densely cyclic obfuscated benchmark circuits.","PeriodicalId":363364,"journal":{"name":"2019 IEEE/ACM International Conference on Computer-Aided Design (ICCAD)","volume":"92 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"113956262","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 20

eSRCNN: A Framework for Optimizing Super-Resolution Tasks on Diverse Embedded CNN Accelerators eSRCNN:一个在多种嵌入式CNN加速器上优化超分辨率任务的框架

2019 IEEE/ACM International Conference on Computer-Aided Design (ICCAD) Pub Date : 2019-11-01 DOI: 10.1109/iccad45719.2019.8942086

Youngbeom Jung, Yeongjae Choi, Jaehyeong Sim, L. Kim

引用次数: 2