{"title":"Ising-FPGA","authors":"Ankit Mondal, Ankur Srivastava","doi":"10.1145/3411511","DOIUrl":"https://doi.org/10.1145/3411511","url":null,"abstract":"The Ising model has been explored as a framework for modeling NP-hard problems, with several diverse systems proposed to solve it. The Magnetic Tunnel Junction– (MTJ) based Magnetic RAM is capable of replacing CMOS in memory chips. In this article, we propose the use of MTJs for representing the units of an Ising model and leveraging its intrinsic physics for finding the ground state of the system through annealing. We design the structure of a basic MTJ-based Ising cell capable of performing the functions essential to an Ising solver. The hardware overhead of the Ising model is analyzed, and a technique to use the basic Ising cell for scaling to large problems is described. We then go on to propose Ising-FPGA, a parallel and reconfigurable architecture that can be used to map a large class of NP-hard problems, and show how a standard Place and Route tool can be utilized to program the Ising-FPGA. The effects of this hardware platform on our proposed design are characterized and methods to overcome these effects are prescribed. We discuss how three representative NP-hard problems can be mapped to the Ising model. Further, we suggest ways to simplify these problems to reduce the use of hardware and analyze the impact of these simplifications on the quality of solutions. Simulation results show the effectiveness of MTJs as Ising units by producing solutions close/comparable to the optimum and demonstrate that our design methodology holds the capability to account for the effects of the hardware.","PeriodicalId":6933,"journal":{"name":"ACM Transactions on Design Automation of Electronic Systems (TODAES)","volume":"91 1","pages":"1 - 27"},"PeriodicalIF":0.0,"publicationDate":"2020-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"81465049","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"FaultDroid","authors":"Indrani Roy, C. Rebeiro, Aritra Hazra, S. Bhunia","doi":"10.1145/3410336","DOIUrl":"https://doi.org/10.1145/3410336","url":null,"abstract":"Fault attacks belong to a potent class of implementation-based attacks that can compromise a crypto-device within a few milliseconds. Out of the large numbers of faults that can occur in the device, only a very few are exploitable in terms of leaking the secret key. Ignorance of this fact has resulted in countermeasures that have either significant overhead or inadequate protection. This article presents a framework, referred to as FaultDroid, for automated vulnerability analysis of fault attacks. It explores the entire fault attack space, identifies the single/multiple fault scenarios that can be exploited by a differential fault attack, rank-orders them in terms of criticality, and provides design guidance to mitigate the vulnerabilities at low cost. The framework enables a designer to automatically evaluate the fault attack vulnerabilities of a block cipher implementation and then incorporate efficient countermeasures. FaultDroid uses a formal model of fault attacks on a high-level specification of a block cipher and hence is equally applicable to both software and hardware implementation of the cipher. As case studies, we employ FaultDroid to comprehensively evaluate the fault scenarios in several common ciphers—AES, CLEFIA, CAMELLIA, SMS4, SIMON, PRESENT, and GIFT—and assess their vulnerability.","PeriodicalId":6933,"journal":{"name":"ACM Transactions on Design Automation of Electronic Systems (TODAES)","volume":"194 1","pages":"1 - 27"},"PeriodicalIF":0.0,"publicationDate":"2020-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"79803252","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
YongTing Hu, Marcel Mettler, Daniel Mueller-Gritschneder, Thomas Wild, A. Herkersdorf, Ulf Schlichtmann
{"title":"Machine Learning Approaches for Efficient Design Space Exploration of Application-Specific NoCs","authors":"YongTing Hu, Marcel Mettler, Daniel Mueller-Gritschneder, Thomas Wild, A. Herkersdorf, Ulf Schlichtmann","doi":"10.1145/3403584","DOIUrl":"https://doi.org/10.1145/3403584","url":null,"abstract":"In many Multi-Processor Systems-on-Chip (MPSoCs), traffic between cores is unbalanced. This motivates the use of an application-specific Network-on-Chip (NoC) that is customized and can provide a high performance at low cost in terms of power and area. However, finding an optimized application-specific NoC architecture is a challenging task due to the huge design space. This article proposes to apply machine learning approaches for this task. Using graph rewriting, the NoC Design Space Exploration (DSE) is modelled as a Markov Decision Process (MDP). Monte Carlo Tree Search (MCTS), a technique from reinforcement learning, is used as search heuristic. Our experimental results show that—with the same cost function and exploration budget—MCTS finds superior NoC architectures compared to Simulated Annealing (SA) and a Genetic Algorithm (GA). However, the NoC DSE process suffers from the high computation time due to expensive cycle-accurate SystemC simulations for latency estimation. This article therefore additionally proposes to replace latency simulation by fast latency estimation using a Recurrent Neural Network (RNN). The designed RNN is sufficiently general for latency estimation on arbitrary NoC architectures. Our experiments show that compared to SystemC simulation, the RNN-based latency estimation offers a similar speed-up as the widely used Queuing Theory (QT). Yet, in terms of estimation accuracy and fidelity, the RNN is superior to QT, especially for high-traffic scenarios. When replacing SystemC simulations with the RNN estimation, the obtained solution quality decreases only slightly, whereas it suffers significantly when QT is used.","PeriodicalId":6933,"journal":{"name":"ACM Transactions on Design Automation of Electronic Systems (TODAES)","volume":"25 1","pages":"1 - 27"},"PeriodicalIF":0.0,"publicationDate":"2020-08-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"80733647","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Reconfigurable Network-on-Chip Security Architecture","authors":"Subodha Charles, P. Mishra","doi":"10.1145/3406661","DOIUrl":"https://doi.org/10.1145/3406661","url":null,"abstract":"Growth of the Internet-of-things has led to complex system-on-chips (SoCs) being used in the edge devices in IoT applications. The increased complexity is demanding designers to consider several critical factors, such as dynamic requirement changes, long application life, mass production, and tight time-to-market deadlines. These requirements lead to more complex security concerns. SoC manufacturers outsource some of the intellectual property cores integrated on the SoC to untrusted third-party vendors. The untrusted intellectual properties can contain malicious implants, which can launch attacks using the resources provided by the on-chip interconnection network, commonly known as the network-on-chip (NoC). Existing efforts on securing NoC have considered lightweight encryption, authentication, and other attack detection mechanisms such as denial-of-service and buffer overflows. Unfortunately, these approaches focus on designing statically optimized security solutions. As a result, they are not suitable for many IoT systems with long application life and dynamic requirement changes. There is a critical need to design reconfigurable security architectures that can be dynamically tuned based on changing requirements. In this article, we propose a tier-based reconfigurable security architecture that can adapt to different use-case scenarios. We explore how to design an efficient reconfigurable architecture that can support three popular NoC security mechanisms (encryption, authentication, and denial-of-service attack detection and localization) and implement suitable dynamic reconfiguration techniques. We evaluate our proposed framework by running standard benchmarks enabling different tiers of security and provide a comprehensive analysis of how different levels of security can affect application performance, energy efficiency, and area overhead.","PeriodicalId":6933,"journal":{"name":"ACM Transactions on Design Automation of Electronic Systems (TODAES)","volume":"38 1","pages":"1 - 25"},"PeriodicalIF":0.0,"publicationDate":"2020-08-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"73178818","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Wire Load Oriented Analog Routing with Matching Constraints","authors":"Hao-Yu Chi, C. Liu, Hung-Ming Chen","doi":"10.1145/3403932","DOIUrl":"https://doi.org/10.1145/3403932","url":null,"abstract":"As design complexity is increased exponentially, electronic design automation (EDA) tools are essential to reduce design efforts. However, the analog layout design has still been done manually for decades because it is a sensitive and error-prone task. Tool-generated layouts are still not well-accepted by analog designers due to the performance loss under non-ideal effects. Most previous works focus on adding more layout constraints on the analog placement. Routing the nets is thus considered as a trivial step that can be done by typical digital routing methodology, which is to use vias to connect every horizontal and vertical lines. Those extra vias will significantly increase the wire loads and degrade the circuit performance. Therefore, in this article, a wire load oriented analog routing methodology is proposed to reduce the number of layer changing of each routing net. Wire load is considered in the optimization goal as well as the wire length to keep the circuit performance after layout, while the analog layout constraints like symmetry and length matching are still satisfied during routing. As shown in the experimental results, this approach significantly reduces the wire load and performance loss after layout with little overhead on wire length.","PeriodicalId":6933,"journal":{"name":"ACM Transactions on Design Automation of Electronic Systems (TODAES)","volume":"15 1","pages":"1 - 26"},"PeriodicalIF":0.0,"publicationDate":"2020-08-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"89428557","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Nektar Xama, M. Andraud, Jhon Gomez, B. Esen, Wim Dobbelaere, Ronny Vanhooren, Anthony Coyette, G. Gielen
{"title":"Machine Learning-based Defect Coverage Boosting of Analog Circuits under Measurement Variations","authors":"Nektar Xama, M. Andraud, Jhon Gomez, B. Esen, Wim Dobbelaere, Ronny Vanhooren, Anthony Coyette, G. Gielen","doi":"10.1145/3408063","DOIUrl":"https://doi.org/10.1145/3408063","url":null,"abstract":"Safety-critical and mission-critical systems, such as airplanes or (semi-)autonomous cars, are relying on an ever-increasing number of embedded integrated circuits. Consequently, there is a need for complete defect coverage during the testing of these circuits to guarantee their functionality in the field. In this context, reducing the escape rate of defects during production testing is crucial, and significant progress has been made to this end. However, production testing using automatic test equipment is subject to various measurement parasitic variations, which may have a negative impact on the testing procedure and therefore limit the final defect coverage. To tackle this issue, this article proposes an improved test flow targeting increased analog defect coverage, both at the system and block levels, by analyzing and improving the coverage of typical functional and structural tests under these measurement variations. To illustrate the flow, the technique of inserting a pseudo-random signal at available circuit nodes and applying machine learning techniques to its response is presented. A DC-DC converter, derived from an industrial product, is used as a case study to validate the flow. In short, results show that system-level tests for the converter suffer strongly from the measurement variations and are limited to just under 80% coverage, even when applying the proposed test flow. Block-level testing, however, can achieve only 70% fault coverage without improvements but is able to consistently achieve 98% of fault coverage at a cost of at most 2% yield loss with the proposed machine learning–based boosting technique.","PeriodicalId":6933,"journal":{"name":"ACM Transactions on Design Automation of Electronic Systems (TODAES)","volume":"12 1","pages":"1 - 27"},"PeriodicalIF":0.0,"publicationDate":"2020-08-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"85390109","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Kang Liu, Haoyu Yang, Yuzhe Ma, Benjamin Tan, Bei Yu, Evangeline F. Y. Young, R. Karri, S. Garg
{"title":"Adversarial Perturbation Attacks on ML-based CAD","authors":"Kang Liu, Haoyu Yang, Yuzhe Ma, Benjamin Tan, Bei Yu, Evangeline F. Y. Young, R. Karri, S. Garg","doi":"10.1145/3408288","DOIUrl":"https://doi.org/10.1145/3408288","url":null,"abstract":"There is substantial interest in the use of machine learning (ML)-based techniques throughout the electronic computer-aided design (CAD) flow, particularly those based on deep learning. However, while deep learning methods have surpassed state-of-the-art performance in several applications, they have exhibited intrinsic susceptibility to adversarial perturbations—small but deliberate alterations to the input of a neural network, precipitating incorrect predictions. In this article, we seek to investigate whether adversarial perturbations pose risks to ML-based CAD tools, and if so, how these risks can be mitigated. To this end, we use a motivating case study of lithographic hotspot detection, for which convolutional neural networks (CNN) have shown great promise. In this context, we show the first adversarial perturbation attacks on state-of-the-art CNN-based hotspot detectors; specifically, we show that small (on average 0.5% modified area), functionality preserving, and design-constraint-satisfying changes to a layout can nonetheless trick a CNN-based hotspot detector into predicting the modified layout as hotspot free (with up to 99.7% success in finding perturbations that flip a detector’s output prediction, based on a given set of attack constraints). We propose an adversarial retraining strategy to improve the robustness of CNN-based hotspot detection and show that this strategy significantly improves robustness (by a factor of ~3) against adversarial attacks without compromising classification accuracy.","PeriodicalId":6933,"journal":{"name":"ACM Transactions on Design Automation of Electronic Systems (TODAES)","volume":"41 1","pages":"1 - 31"},"PeriodicalIF":0.0,"publicationDate":"2020-08-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"91188615","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Hannah Szentimrey, Abeer Y. Al-Hyari, Jérémy Foxcroft, T. Martin, D. Noel, G. Grewal, S. Areibi
{"title":"Machine Learning for Congestion Management and Routability Prediction within FPGA Placement","authors":"Hannah Szentimrey, Abeer Y. Al-Hyari, Jérémy Foxcroft, T. Martin, D. Noel, G. Grewal, S. Areibi","doi":"10.1145/3373269","DOIUrl":"https://doi.org/10.1145/3373269","url":null,"abstract":"Placement for Field Programmable Gate Arrays (FPGAs) is one of the most important but time-consuming steps for achieving design closure. This article proposes the integration of three unique machine learning models into the state-of-the-art analytic placement tool GPlace3.0 with the aim of significantly reducing placement runtimes. The first model, MLCong, is based on linear regression and replaces the computationally expensive global router currently used in GPlace3.0 to estimate switch-level congestion. The second model, DLManage, is a convolutional encoder-decoder that uses heat maps based on the switch-level congestion estimates produced by MLCong to dynamically determine the amount of inflation to apply to each switch to resolve congestion. The third model, DLRoute, is a convolutional neural network that uses the previous heat maps to predict whether or not a placement solution is routable. Once a placement solution is determined to be routable, further optimization may be avoided, leading to improved runtimes. Experimental results obtained using 372 benchmarks provided by Xilinx Inc. show that when all three models are integrated into GPlace3.0, placement runtimes decrease by an average of 48%.","PeriodicalId":6933,"journal":{"name":"ACM Transactions on Design Automation of Electronic Systems (TODAES)","volume":"1 1","pages":"1 - 25"},"PeriodicalIF":0.0,"publicationDate":"2020-08-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"80728646","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Qicheng Huang, Chenlei Fang, Soumya Mittal, R. D. Blanton
{"title":"Towards Smarter Diagnosis","authors":"Qicheng Huang, Chenlei Fang, Soumya Mittal, R. D. Blanton","doi":"10.1145/3398267","DOIUrl":"https://doi.org/10.1145/3398267","url":null,"abstract":"Given the inherent perturbations during the fabrication process of integrated circuits that lead to yield loss, diagnosis of failing chips is a mitigating method employed during both yield ramping and high-volume manufacturing for yield learning. However, various uncertainties in the fabrication process bring a number of challenges, resulting in diagnosis with undesirable outcomes or low efficiency, including, for example, diagnosis failure, bad resolution, and extremely long runtime. It would therefore be very beneficial to have a comprehensive preview of diagnostic outcomes beforehand, which allows fail logs to be prioritized in a more reasonable way for smarter allocation of diagnosis resources. In this work, we propose a learning-based previewer, which is able to predict five aspects of diagnostic outcomes for a failing IC, including diagnosis success, defect count, failure type, resolution, and runtime magnitude. The previewer consists of three classification models and one regression model, where Random Forest classification and regression are used. Experiments on a 28 nm test chip and a high-volume 90 nm part demonstrate that the predictors can provide accurate prediction results, and in a virtual application scenario the overall previewer can bring up to 9× speed-up for the test chip and 6× for the high-volume part.","PeriodicalId":6933,"journal":{"name":"ACM Transactions on Design Automation of Electronic Systems (TODAES)","volume":"92 1","pages":"1 - 20"},"PeriodicalIF":0.0,"publicationDate":"2020-08-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"77072196","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Energy-Efficient GPU L2 Cache Design Using Instruction-Level Data Locality Similarity","authors":"Jingweijia Tan, Kaige Yan, S. Song, Xin Fu","doi":"10.1145/3408060","DOIUrl":"https://doi.org/10.1145/3408060","url":null,"abstract":"This article presents a novel energy-efficient cache design for massively parallel, throughput-oriented architectures like GPUs. Unlike L1 data cache on modern GPUs, L2 cache shared by all of the streaming multiprocessors is not the primary performance bottleneck, but it does consume a large amount of chip energy. We observe that L2 cache is significantly underutilized by spending 95.6% of the time storing useless data. If such “dead time” on L2 is identified and reduced, L2’s energy efficiency can be drastically improved. Fortunately, we discover that the SIMT programming model of GPUs provides a unique feature among threads: instruction-level data locality similarity, which can be used to accurately predict the data re-reference counts at L2 cache block level. We propose a simple design that leverages this Locality Similarity to build an energy-efficient GPU L2 Cache, named LoSCache. Specifically, LoSCache uses the data locality information from a small group of cooperative thread arrays to dynamically predict the L2-level data re-reference counts of the remaining cooperative thread arrays. After that, specific L2 cache lines can be powered off if they are predicted to be “dead” after certain accesses. Experimental results on a wide range of applications demonstrate that our proposed design can significantly reduce the L2 cache energy by an average of 64% with only 0.5% performance loss. In addition, LoSCache is cost effective, independent of the scheduling policies, and compatible with the state-of-the-art L1 cache designs for additional energy savings.","PeriodicalId":6933,"journal":{"name":"ACM Transactions on Design Automation of Electronic Systems (TODAES)","volume":"27 1","pages":"1 - 18"},"PeriodicalIF":0.0,"publicationDate":"2020-08-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"88938698","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}