{"title":"Reverse Engineering Convolutional Neural Networks Through Side-channel Information Leaks","authors":"Weizhe Hua, Zhiru Zhang, G. Suh","doi":"10.1145/3195970.3196105","DOIUrl":"https://doi.org/10.1145/3195970.3196105","url":null,"abstract":"A convolutional neural network (CNN) model represents a crucial piece of intellectual property in many applications. Revealing its structure or weights would leak confidential information. In this paper we present novel reverse-engineering attacks on CNNs running on a hardware accelerator, where an adversary can feed inputs to the accelerator and observe the resulting off-chip memory accesses. Our study shows that even with data encryption, the adversary can infer the underlying network structure by exploiting the memory and timing side-channels. We further identify the information leakage on the values of weights when a CNN accelerator performs dynamic zero pruning for off-chip memory accesses. Overall, this work reveals the importance of hiding off-chip memory access pattern to truly protect confidential CNN models.","PeriodicalId":6491,"journal":{"name":"2018 55th ACM/ESDA/IEEE Design Automation Conference (DAC)","volume":"50 1","pages":"1-6"},"PeriodicalIF":0.0,"publicationDate":"2018-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"84471470","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
M. Donato, Brandon Reagen, Lillian Pentecost, Udit Gupta, D. Brooks, Gu-Yeon Wei
{"title":"On-Chip Deep Neural Network Storage with Multi-Level eNVM","authors":"M. Donato, Brandon Reagen, Lillian Pentecost, Udit Gupta, D. Brooks, Gu-Yeon Wei","doi":"10.1145/3195970.3196083","DOIUrl":"https://doi.org/10.1145/3195970.3196083","url":null,"abstract":"One of the biggest performance bottlenecks of today’s neural network (NN) accelerators is off-chip memory accesses [11]. In this paper, we propose a method to use multi-level, embedded nonvolatile memory (eNVM) to eliminate all off-chip weight accesses. The use of multi-level memory cells increases the probability of faults. Therefore, we co-design the weights and memories such that their properties complement each other and the faults result in no noticeable NN accuracy loss. In the extreme case, the weights in fully connected layers can be stored using a single transistor. With weight pruning and clustering, we show our technique reduces the memory area by over an order of magnitude compared to an SRAM baseline. In the case of VGG16 (130M weights), we are able to store all the weights in 4.9 mm2, well within the area allocated to SRAM in modern NN accelerators [6].","PeriodicalId":6491,"journal":{"name":"2018 55th ACM/ESDA/IEEE Design Automation Conference (DAC)","volume":"28 1","pages":"1-6"},"PeriodicalIF":0.0,"publicationDate":"2018-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"82334061","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"INVITED: Bandwidth-Efficient Deep Learning","authors":"Song Han, W. Dally","doi":"10.1109/DAC.2018.8465812","DOIUrl":"https://doi.org/10.1109/DAC.2018.8465812","url":null,"abstract":"Deep learning algorithms are achieving increasingly higher prediction accuracy on many machine learning tasks. However, applying brute-force programming to data demands a huge amount of machine power to perform training and inference, and a huge amount of manpower to design the neural network models, which is inefficient. In this paper, we provide techniques to solve these bottlenecks: saving memory bandwidth for inference by model compression, saving networking bandwidth for training by gradient compression, and saving engineer bandwidth for model design by using AI to automate the design of models.","PeriodicalId":6491,"journal":{"name":"2018 55th ACM/ESDA/IEEE Design Automation Conference (DAC)","volume":"44 1","pages":"1-6"},"PeriodicalIF":0.0,"publicationDate":"2018-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"82347898","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Winston Haaswijk, A. Mishchenko, Mathias Soeken, G. Micheli
{"title":"SAT Based Exact Synthesis using DAG Topology Families","authors":"Winston Haaswijk, A. Mishchenko, Mathias Soeken, G. Micheli","doi":"10.1145/3195970.3196111","DOIUrl":"https://doi.org/10.1145/3195970.3196111","url":null,"abstract":"SAT based exact synthesis is a powerful technique, with applications in logic optimization, technology mapping, and synthesis for emerging technologies. However, its runtime behavior can be unpredictable and slow. In this paper, we propose to add a new type of constraint based on families of DAG topologies. Such families restrict the search space considerably and let us partition the synthesis problem in a natural way. Our approach shows significant reductions in runtime as compared to state-of-the-art implementations, by up to 63.43%. Moreover, our implementation has significantly fewer timeouts compared to baseline and reference implementations, and reduces this number by up to 61%. In fact, our topology based implementation dominates the others with respect to the number of solved instances: given a runtime bound, it solves at least as many instances as any other implementation.","PeriodicalId":6491,"journal":{"name":"2018 55th ACM/ESDA/IEEE Design Automation Conference (DAC)","volume":"13 1","pages":"1-6"},"PeriodicalIF":0.0,"publicationDate":"2018-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"78645962","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Runtime Adjustment of IoT System-on-Chips for Minimum Energy Operation","authors":"M. Golanbari, M. Tahoori","doi":"10.1145/3195970.3196108","DOIUrl":"https://doi.org/10.1145/3195970.3196108","url":null,"abstract":"Energy-constrained Systems-on-Chips (SoC) are becoming major components of many emerging applications, especially in the Internet of Things (IoT) domain. Although the best energy efficiency is achieved when the SoC operates in the near-threshold region, the best operating point for maximum energy efficiency could vary depending on operating temperature, workload, and the power-gating state (power modes) of various SoC components at runtime. This paper presents a lightweight machine-learning based scheme to predict and tune the SoC to the most energy efficient supply voltage at the firmware level during runtime, considering the impacts of temperature variation and power-gating of SoC components while meeting the performance and reliability requirements. Simulation results indicate that the proposed method can determine the most energy efficient supply voltage of a circuit with high-accuracy (RMSE = 7mV), while considering the runtime performance and reliability constraints.","PeriodicalId":6491,"journal":{"name":"2018 55th ACM/ESDA/IEEE Design Automation Conference (DAC)","volume":"105 1","pages":"1-6"},"PeriodicalIF":0.0,"publicationDate":"2018-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"88988826","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Shixuan Zheng, Yonggang Liu, S. Yin, Leibo Liu, Shaojun Wei
{"title":"An Efficient Kernel Transformation Architecture for Binary- and Ternary-Weight Neural Network Inference","authors":"Shixuan Zheng, Yonggang Liu, S. Yin, Leibo Liu, Shaojun Wei","doi":"10.1145/3195970.3195988","DOIUrl":"https://doi.org/10.1145/3195970.3195988","url":null,"abstract":"While deep convolutional neural networks (CNNs) have emerged as the driving force of a wide range of domains, their computationally and memory intensive natures hinder the further deployment in mobile and embedded applications. Recently, CNNs with low-precision parameters have attracted much research attention. Among them, multiplier-free binary- and ternary-weight CNNs are reported to be of comparable recognition accuracy with full-precision networks, and have been employed to improve the hardware efficiency. However, even with the weights constrained to binary and ternary values, large-scale CNNs still require billions of operations in a single forward propagation pass.In this paper, we introduce a novel approach to maximally eliminate redundancy in binary- and ternary-weight CNN inference, improving both the performance and energy efficiency. The initial kernels are transformed into much fewer and sparser ones, and the output feature maps are rebuilt from the immediate results. Overall, the number of total operations in convolution is reduced. To find an efficient transformation solution for each already trained network, we propose a searching algorithm, which iteratively matches and eliminates the overlap in a set of kernels. We design a specific hardware architecture to optimize the implementation of kernel transformation. Specialized dataflow and scheduling method are proposed. Tested on SVHN, AlexNet, and VGG-16, our architecture removes 43.4%–79.9% operations, and speeds up the inference by 1.48–3.01 times.","PeriodicalId":6491,"journal":{"name":"2018 55th ACM/ESDA/IEEE Design Automation Conference (DAC)","volume":"2 1-2","pages":"1-6"},"PeriodicalIF":0.0,"publicationDate":"2018-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"91508867","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Exact Algorithms for Delay-Bounded Steiner Arborescences","authors":"S. Held, B. Rockel","doi":"10.1145/3195970.3196048","DOIUrl":"https://doi.org/10.1145/3195970.3196048","url":null,"abstract":"Rectilinear Steiner arborescences under linear delay constraints play an important role for buffering. We present exact algorithms for either minimizing the total length subject to delay constraints, or minimizing the total length plus the (weighted) absolute total negative slack.Our main theoretical contribution is the first minimum cost flow formulation for embedding Steiner arborescences at minimum length subject to delay constraints, resulting in the first strongly polynomial time algorithm for this subproblem.We use the minimum cost flow formulation to quickly compute lower bounds in a branch-&-bound algorithm for optimum Steiner arborescences. We demonstrate the benefit of our new algorithm experimentally.","PeriodicalId":6491,"journal":{"name":"2018 55th ACM/ESDA/IEEE Design Automation Conference (DAC)","volume":"28 1","pages":"1-6"},"PeriodicalIF":0.0,"publicationDate":"2018-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"90260560","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Yiwen Gao, Hailong Zhang, Wei Cheng, Yongbin Zhou, Yuchen Cao
{"title":"Electro-Magnetic Analysis of GPU-based AES Implementation","authors":"Yiwen Gao, Hailong Zhang, Wei Cheng, Yongbin Zhou, Yuchen Cao","doi":"10.1145/3195970.3196042","DOIUrl":"https://doi.org/10.1145/3195970.3196042","url":null,"abstract":"In this work, for the first time, we investigate Electro-Magnetic (EM) attacks on GPU-based AES implementation. In detail, we first sample EM traces using a delicate trigger; then, we build a heuristic leakage model and a novel leakage model to exploit the simultaneous EM leakages in parallel scenarios. After that, we evaluate the effectiveness of EM attacks on GPU-based AES implementation. Our evaluation results show that GPU-based AES implementation is vulnerable to EM attacks. This work also suggests that GPU-based AES implementation needs to be protected against EM attacks in real scenarios.","PeriodicalId":6491,"journal":{"name":"2018 55th ACM/ESDA/IEEE Design Automation Conference (DAC)","volume":"41 1","pages":"1-6"},"PeriodicalIF":0.0,"publicationDate":"2018-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"86043186","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"INVITED: Runtime Monitoring for Safety of Intelligent Vehicles","authors":"Kosuke Watanabe, Eunsuk Kang, Chung-Wei Lin, Shin'ichi Shiraishi","doi":"10.1145/3195970.3199856","DOIUrl":"https://doi.org/10.1145/3195970.3199856","url":null,"abstract":"Advanced driver-assistance systems (ADAS), autonomous driving, and connectivity have enabled a range of new features, but also made automotive design more complex than ever. Formal verification can be applied to establish functional correctness, but its scalability is limited due to the sheer complexity of a modern automotive system. To manage high complexity and limited development resources, one alternative is to apply runtime monitoring techniques to detect when the system transitions into an unsafe state (i.e., one where it violates a critical safety requirement). In this paper, we report on our experience integrating runtime monitoring into a development workflow and present practical design considerations on languages and tools from an industrial perspective. Using signal temporal logic (STL) [12] and the Breach [6] monitoring tool, we perform a case study showing how monitoring can be used to detect undesirable interactions between two ADAS features called Cooperative Pile-up Mitigation System (CPMS) and False-Start Prevention System (FPS). This is an initial step to utilize runtime monitoring to achieve high assurance in the design of intelligent vehicles.","PeriodicalId":6491,"journal":{"name":"2018 55th ACM/ESDA/IEEE Design Automation Conference (DAC)","volume":"59 1","pages":"1-6"},"PeriodicalIF":0.0,"publicationDate":"2018-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"91095555","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Satwik Patnaik, M. Ashraf, J. Knechtel, O. Sinanoglu
{"title":"Raise Your Game for Split Manufacturing: Restoring the True Functionality Through BEOL","authors":"Satwik Patnaik, M. Ashraf, J. Knechtel, O. Sinanoglu","doi":"10.1145/3195970.3196100","DOIUrl":"https://doi.org/10.1145/3195970.3196100","url":null,"abstract":"Split manufacturing (SM) seeks to protect against piracy of intellectual property (IP) in chip designs. Here we propose a scheme to manipulate both placement and routing in an intertwined manner, thereby increasing the resilience of SM layouts. Key stages of our scheme are to (partially) randomize a design, place and route the erroneous netlist, and restore the original design by re-routing the BEOL. Based on state-of-the-art proximity attacks, we demonstrate that our scheme notably excels over the prior art (i.e., 0% correct connection rates). Our scheme induces controllable PPA overheads and lowers commercial cost (the latter by splitting at higher layers).","PeriodicalId":6491,"journal":{"name":"2018 55th ACM/ESDA/IEEE Design Automation Conference (DAC)","volume":"46 1","pages":"1-6"},"PeriodicalIF":0.0,"publicationDate":"2018-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"77760345","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}