Lilas Alrahis, Satwik Patnaik, Muhammad Abdullah Hanif, M. Shafique, O. Sinanoglu
{"title":"UNTANGLE: Unlocking Routing and Logic Obfuscation Using Graph Neural Networks-based Link Prediction","authors":"Lilas Alrahis, Satwik Patnaik, Muhammad Abdullah Hanif, M. Shafique, O. Sinanoglu","doi":"10.1109/ICCAD51958.2021.9643476","DOIUrl":"https://doi.org/10.1109/ICCAD51958.2021.9643476","url":null,"abstract":"Logic locking aims to prevent intellectual property (IP) piracy and unauthorized overproduction of integrated circuits (ICs). However, initial logic locking techniques were vulnerable to the Boolean satisfiability (SAT)-based attacks. In response, researchers proposed various SAT-resistant locking techniques such as point function-based locking and symmetric interconnection (SAT-hard) obfuscation. We focus on the latter since point function-based locking suffers from various structural vulnerabilities. The SAT-hard logic locking technique, InterLock [1], achieves a unified logic and routing obfuscation that thwarts state-of-the-art attacks on logic locking. In this work, we propose a novel link prediction-based attack, UNTANGLE, that successfully breaks InterLock in an oracle-less setting without having access to an activated IC (oracle). Since InterLock hides selected timing paths in key-controlled routing blocks, UNTANGLE reveals the gates and interconnections hidden in the routing blocks upon formulating this task as a link prediction problem. The intuition behind our approach is that ICs contain a large amount of repetition and reuse cores. Hence, UNTANGLE can infer the hidden timing paths by learning the composition of gates in the observed locked netlist or a circuit library leveraging graph neural networks. We show that circuits withstanding SAT-based and other attacks can be unlocked in seconds with 100% precision using UNTANGLE in an oracle-less setting. UNTANGLE is a generic attack platform (which we also open source [2]) that applies to multiplexer (MUX)-based obfuscation, as demonstrated through our experiments on ISCAS-85 and ITC-99 benchmarks locked using InterLock and random MUX-based locking.","PeriodicalId":370791,"journal":{"name":"2021 IEEE/ACM International Conference On Computer Aided Design (ICCAD)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130741555","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Peripheral Circuitry Assisted Mapping Framework for Resistive Logic-In-Memory Computing","authors":"Shuhang Zhang, Hai Helen Li, Ulf Schlichtmann","doi":"10.1109/ICCAD51958.2021.9643588","DOIUrl":"https://doi.org/10.1109/ICCAD51958.2021.9643588","url":null,"abstract":"In-memory computing has been applied in different fields due to its superior speed and energy efficiency. Among a variety of memory technologies that have been explored, resistive memory has widely been adopted for various purposes, including Processing-In-Memory (PIM) for neural networks and Logic-In-Memory (LIM) for general logic operations. PIM has intensively been studied in recent years, while the progress in developing LIM computing falls behind. LIM computing is usually implemented based on MAGIC operations, which require inputs to be aligned regularly along rows or columns in a memory crossbar. As the intermediate data generated during the logic execution are normally scattered across the memory crossbar, alignment operations are inserted to align the data, which often costs numerous cycles and dominates the overall latency. In current MAGIC-based designs, alignment operations induce a significant overhead in either area or latency. Therefore, the Area-Latency-Product (ALP), known as a key metric for circuit performance, still has significant optimization potential in LIM computing. In this work, we leverage peripheral circuitry to conduct alignment operations and propose a novel mapping framework to optimize the latency and area costs. Intermediate data are read out, processed in peripheral circuits, then in parallel written back into target cells of the memory crossbar. The approach eliminates the use of redundant memory cells, leading to area reduction. Moreover, it enables simultaneous alignments of multiple intermediate data, which can decrease the overall latency significantly. Based on simulation results, our proposed mapping framework can achieve around 93% ALP reductions on average compared with prior designs with merely 2.13% total area overhead.","PeriodicalId":370791,"journal":{"name":"2021 IEEE/ACM International Conference On Computer Aided Design (ICCAD)","volume":"128 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128225264","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
I. Tuzov, Pablo Andreu, Laura Medina, Tomás Picornell, A. Robles, P. López, J. Flich, Carles Hernández
{"title":"Improving the Robustness of Redundant Execution with Register File Randomization","authors":"I. Tuzov, Pablo Andreu, Laura Medina, Tomás Picornell, A. Robles, P. López, J. Flich, Carles Hernández","doi":"10.1109/ICCAD51958.2021.9643466","DOIUrl":"https://doi.org/10.1109/ICCAD51958.2021.9643466","url":null,"abstract":"Staggered Redundant execution (SRE) is a fault-tolerance mechanism that has been widely deployed in the context of safety-critical applications. SRE not only protects the system in the presence of faults but also helps relaxing safety requirements of individual elements. However, in this paper, we show that SRE does not effectively protect the system against a wide range of faults and thus, new mechanisms to increase the diversity of homogeneous cores are needed. In this paper, we propose Register File Randomization (RFR), a low-cost diversity mechanism that significantly increases the robustness of homogeneous multicores in front of common-cause faults (CCFs) and register file wearout. Our results show that RFR completely removes the failure rate for register file CCFs for certain workloads and reduces by a factor of 5X the impact of stress related register file aging for the workloads analysed. Our implementation requires less than 50 RTL lines of code and the area (FPGA logic) overhead of RFR is less than 0.2% of a 64-bit RISC-V core FPGA implementation.","PeriodicalId":370791,"journal":{"name":"2021 IEEE/ACM International Conference On Computer Aided Design (ICCAD)","volume":"17 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130987953","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Zhuolun He, Ziyi Wang, Chen Bail, Haoyu Yang, Bei Yu
{"title":"Graph Learning-Based Arithmetic Block Identification","authors":"Zhuolun He, Ziyi Wang, Chen Bail, Haoyu Yang, Bei Yu","doi":"10.1109/ICCAD51958.2021.9643581","DOIUrl":"https://doi.org/10.1109/ICCAD51958.2021.9643581","url":null,"abstract":"Arithmetic block identification in gate-level netlist is an essential procedure for malicious logic detection, functional verification, or macro-block optimization. We argue that existing methods suffer either scalability or performance issues. To address the problem, we propose a graph learning-based solution that promises to extract desired logic components from a complete design netlist. We further design a novel asynchronous bidirectional graph neural network (ABGNN) dedicated to representation learning on directed acyclic graphs. Experimental results on open-source RISC-V CPU designs demonstrate that our proposed solution significantly outperforms several state-of-the-art arithmetic block identification flows.","PeriodicalId":370791,"journal":{"name":"2021 IEEE/ACM International Conference On Computer Aided Design (ICCAD)","volume":"173 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122180854","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Jeremy Blackstone, D. Das, Alric Althoff, Shreyas Sen, R. Kastner
{"title":"iSTELLAR: intermittent Signature aTtenuation Embedded CRYPTO with Low-Level metAl Routing","authors":"Jeremy Blackstone, D. Das, Alric Althoff, Shreyas Sen, R. Kastner","doi":"10.1109/ICCAD51958.2021.9643540","DOIUrl":"https://doi.org/10.1109/ICCAD51958.2021.9643540","url":null,"abstract":"An adversary can exploit side-channel information such as power consumption, electromagnetic (EM) emanations, acoustic vibrations or the timing of encryption operations to derive the secret key from an electronic device. Signature aTtenuation Embedded CRYPTO with Low-Level metAl Routing (STELLAR) is a technique to mitigate power and EM-based attacks, however, it incurs 50% power overhead. This work presents iSTELLAR, which reduces the power overhead by operating STELLAR intermittently utilizing an intelligent scheduling algorithm. The proposed scheduling algorithm for iSTELLAR determines the optimal locations during the crypto operation to turn STELLAR ON, and thereby reduces the power overhead by $> 30%$ compared to the normal STELLAR operation, while eliminating the information leakage.","PeriodicalId":370791,"journal":{"name":"2021 IEEE/ACM International Conference On Computer Aided Design (ICCAD)","volume":"11 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121658277","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Marcello Traiola, Jorge Echavarria, A. Bosio, Jürgen Teich, Ian O’Connor
{"title":"Design Space Exploration of Approximation-Based Quadruple Modular Redundancy Circuits","authors":"Marcello Traiola, Jorge Echavarria, A. Bosio, Jürgen Teich, Ian O’Connor","doi":"10.1109/ICCAD51958.2021.9643561","DOIUrl":"https://doi.org/10.1109/ICCAD51958.2021.9643561","url":null,"abstract":"In the last decade, Approximate Computing (AxC) has been studied as a possible alternative computing paradigm. It has been used to reduce the overhead cost of conventional fault tolerant schemes, such as the Triple Modular Redundancy (TMR). One of the most recent propositions is the concept of Quadruple Approximate Modular Redundancy (QAMR). QAMR reduces the overhead cost w.r.t. conventional TMR structures, while guaranteeing the same fault-tolerance capability. In this paper, we propose a new approximation technique to realize the QAMR and we perform a Design Space Exploration (DSE) to find QAMR Pareto-optimal implementations. Moreover, we provide the design of a new majority voter for the proposed architecture. Experimental results show that it is possible to find QAMR variants achieving area and/or delay gains compared to the TMR counterpart, for 85.4% and 97% of the examined circuits for FPGA and ASIC technologies respectively.","PeriodicalId":370791,"journal":{"name":"2021 IEEE/ACM International Conference On Computer Aided Design (ICCAD)","volume":"2 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127925397","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"FedSwap: A Federated Learning based 5G Decentralized Dynamic Spectrum Access System","authors":"Zhihui Gao, Ang Li, Yunfan Gao, Bing Li, Yu Wang, Yiran Chen","doi":"10.1109/ICCAD51958.2021.9643496","DOIUrl":"https://doi.org/10.1109/ICCAD51958.2021.9643496","url":null,"abstract":"The era of 5G extends the available spectrum from the microwave band to the millimeter-wave band. The thriving Internet of Things (IoT) also enriches the user equipment (UEs) we used in our daily life, such as smart glasses, smart watches, and drones. With such a larger spectrum and massive UEs, existing dynamic spectrum access (DSA) suffers both low spectrum utilization efficiency and unfair spectrum allocation. Thus, a more sophisticated dynamic spectrum access (DSA) system is required in the 5G context. In this paper, we propose a federated learning based system, FedSwap, the first decentralized DSA system that improves both efficiency and fairness simultaneously. In FedSwap, we deploy an improved multi-agent reinforcement learning (iMARL) algorithm on each UE, enabling UEs to share the spectrum coordinately with fewer collisions. Furthermore, we also propose a novel swapping mechanism for aggregating UEs' models periodically so that UEs can fairly share the spectrum resources. Meanwhile, the sensory data of UEs are not transmitted and hence privacy is protected. We evaluate FedSwap's performance in 5G simulations with various settings. Compared to the state-of-the-art decentralized DSA methods, FedSwap can significantly improve the efficiency and fairness of spectrum utilization.","PeriodicalId":370791,"journal":{"name":"2021 IEEE/ACM International Conference On Computer Aided Design (ICCAD)","volume":"13 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129097121","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"ScaleDNN: Data Movement Aware DNN Training on Multi-GPU","authors":"Weizheng Xu, Ashutosh Pattnaik, Geng Yuan, Yanzhi Wang, Youtao Zhang, Xulong Tang","doi":"10.1109/ICCAD51958.2021.9643503","DOIUrl":"https://doi.org/10.1109/ICCAD51958.2021.9643503","url":null,"abstract":"Training Deep Neural Networks (DNNs) models is a time-consuming process that requires immense amount of data and computation. To this end, GPUs are widely adopted to accelerate the training process. However, the delivered training performance rarely scales with the increase in the number of GPUs. The major reason behind this is the large amount of data movement that prevents the system from providing the GPUs with the required data in a timely fashion. In this paper, we propose ScaleDNN, a framework that systematically and comprehensively investigates and optimizes data-parallel training on two types of multi-GPU systems (PCIe-based and NVLink-based). Specifically, ScaleDNN performs: i) CPU-centric input batch splitting, ii) mini-batch data pre-loading, and iii) model parameter compression to effectively a) reduce the data movement between the CPU and multiple GPUs, and b) hide the data movement overheads by overlapping the data transfer with the GPU computation. Our experimental results show that ScaleDNN achieves up to 39.38%, with an average of 17.96% execution time saving over modern data parallelism on PCIe-based multi-GPU system. The corresponding execution time reduction on NVLink-based multi-GPU system is up to 19.20% with an average of 10.26%.","PeriodicalId":370791,"journal":{"name":"2021 IEEE/ACM International Conference On Computer Aided Design (ICCAD)","volume":"30 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123359451","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Early Validation of SoCs Security Architecture Against Timing Flows Using SystemC-based VPs","authors":"Mehran Goli, R. Drechsler","doi":"10.1109/ICCAD51958.2021.9643579","DOIUrl":"https://doi.org/10.1109/ICCAD51958.2021.9643579","url":null,"abstract":"Modern System-on-Chips (SoCs) have been increasingly deployed in critical aspects of our lives. As a consequence, they have access to a large number of secret assets that must be protected against unauthorized access. In order to provide sound security guarantees, an SoC typically has a security architecture as authentication mechanisms to control the access of different Intellectual Properties (IPs) to secret assets. Since the SoC's security architecture cannot be changed after production, it is of utmost importance to detect any security flaws in the design phase. Moreover, to prevent costly fixes in later stages, security validation should start as early as possible. In this paper, we propose a novel approach to validate the security architecture of a given SoC against timing flows using SystemC-based Virtual Prototype (VP) and static information flow tracking technique at the system level. Experimental results on two real-world VP-based SoCs demonstrate the scalability and applicability of the proposed approach in identifying timing flows.","PeriodicalId":370791,"journal":{"name":"2021 IEEE/ACM International Conference On Computer Aided Design (ICCAD)","volume":"5 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114193754","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"SSR: A Skeleton-based Synthesis Flow for Hybrid Processing-in-RRAM Modes","authors":"Feng Wang, Guangyu Sun, Guojie Luo","doi":"10.1109/ICCAD51958.2021.9643493","DOIUrl":"https://doi.org/10.1109/ICCAD51958.2021.9643493","url":null,"abstract":"Recently, the emerging resistive random access memory (RRAM) shows its potential to construct a processing-in-memory (PIM) architecture. It supports a variety of computation modes, including the digital mode and the analog mode. Both modes can perform parallel computation inside an RRAM crossbar. However, the lack of automatic synthesis flow limits their application scenarios. Although previous works implement several large-scale applications, e.g., image processing algorithms and neural networks, using these two modes, most of their implementations are designed manually or semi-manually. In our view, the lack of a specific application representation is a limiting factor for developing a synthesis flow. Therefore, in this work, we propose the skeleton as an application representation. Users can model applications and their potential parallelism in RRAM with nested skeletons and primitive operations. Then, we propose SSR, a skeleton-based flow that can automatically synthesize large-scale applications to RRAM crossbars. For an application represented in skeletons, SSR first partitions it into the digital part and the potential analog part. After that, SSR optimizes primitive operations and allocates bounding boxes to skeletons for both parts under the guide of pre-synthesis results. Finally, SSR maps bounding boxes of skeletons onto crossbars to enable pipelined computation. Experimental evaluations on several popular applications show that SSR improves throughput, latency, and area multiple times over previous works.","PeriodicalId":370791,"journal":{"name":"2021 IEEE/ACM International Conference On Computer Aided Design (ICCAD)","volume":"2020 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114577734","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}