Tian Wang, Xiaohui Cui, Dunshan Yu, Omid Aramoon, Timothy Dunlap, G. Qu, Xiaole Cui
{"title":"A Novel Polymorphic Gate Based Circuit Fingerprinting Technique","authors":"Tian Wang, Xiaohui Cui, Dunshan Yu, Omid Aramoon, Timothy Dunlap, G. Qu, Xiaole Cui","doi":"10.1145/3194554.3194572","DOIUrl":"https://doi.org/10.1145/3194554.3194572","url":null,"abstract":"Polymorphic gates are reconfigurable devices that deliver multiple functionalities at different temperature, supply voltage or external inputs. Capable of working in different modes, polymorphic gate is a promising candidate for embedding secret information such as fingerprints. In this paper we report five polymorphic gates whose functionality varies in response to specific control input and propose a circuit fingerprinting scheme based on these gates. The scheme selectively replaces standard logic cells by polymorphic gates whose functionality differs with the standard cells only on Satisfiability Don't Care conditions. Additional dummy fingerprint bits are also introduced to enhance the fingerprint's robustness against attacks such as fingerprint removal and modification. Experimental results on ISCAS and MCNC benchmark circuits demonstrate that our scheme introduces low overhead. More specifically, the average overhead in area, speed and power are 4.04%, 6.97% and 4.15% respectively when we embed 64-bit fingerprint that consists of 32 real fingerprint bits and 32 dummy bits. This is only half of the overhead of the other known approach when they create 32-bit fingerprints.","PeriodicalId":215940,"journal":{"name":"Proceedings of the 2018 on Great Lakes Symposium on VLSI","volume":"40 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-05-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116964887","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
A. Jafari, M. Hosseini, Adwaya Kulkarni, C. Patel, T. Mohsenin
{"title":"BiNMAC","authors":"A. Jafari, M. Hosseini, Adwaya Kulkarni, C. Patel, T. Mohsenin","doi":"10.1145/3194554.3194634","DOIUrl":"https://doi.org/10.1145/3194554.3194634","url":null,"abstract":"This paper presents a low power, domain-specific manycore accelerator referred to as \"BiNMAC\"- Binarized neural Network Manycore ACcelerator, which effectively maps and executes Binary Deep Neural Networks (BNNs). With only 2.40% and 1.88% area and power overhead, novel instructions such as Population-Count and Patch-Select are added to the ISA of the BiNMAC, each of which replaces frequently used functions that would have taken 52 and 4 clock cycles respectively with 1 clock cycle. A 64-cluster architecture of the BiNMAC is fully placed and routed in 65~nm TSMC CMOS technology, where a single cluster occupies an area of 0.53 mm^2 with a power of 223 mW at 1 GHz clock frequency. The 64-cluster architecture takes 36.5 mm^2 area and, if fully utilized, consumes a power of 16.4 W. We also propose a multilayer perceptron (MLP) neural network for multimodal time-series data classification. Binarized versions of the 3-layers MLP and ResNet-20 are implemented on BiNMAC. The implementation results show that BiNMAC consumes 0.02 mJ and 3.8 mJ energy which is 13 times and 30 times lower than the implementation of standard non-binarized MLP and ResNet-20 on an equivalent predecessor platform. To compare the performance of the BiNMAC with other off-the-shelf platforms, the two networks are also implemented on the NVIDIA Jetson TX2 SoC (CPU+GPU). BiNMAC achieves 22 times and 78 times higher throughput and 23 times and 41 times lower energy consumption compared to TX2 SoC for the binarized MLP and ResNet-20, respectively.","PeriodicalId":215940,"journal":{"name":"Proceedings of the 2018 on Great Lakes Symposium on VLSI","volume":"36 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-05-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116279174","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Myungsuk Kim, Youngsun Song, Myoungsoo Jung, Jihong Kim
{"title":"SARO: A State-Aware Reliability Optimization Technique for High Density NAND Flash Memory","authors":"Myungsuk Kim, Youngsun Song, Myoungsoo Jung, Jihong Kim","doi":"10.1145/3194554.3194591","DOIUrl":"https://doi.org/10.1145/3194554.3194591","url":null,"abstract":"Recent advances in flash technologies, such as scaling and multi-leveling schemes, have been successful to make flash denser and secure more storage spaces per die. Unfortunately, these technology advances significantly degrade flash's reliability due to a smaller cell geometry and a finer-grained cell state control. In this paper, we propose a state-aware reliability optimization technique SARO), new flash optimization that improves the flash reliability under diverse scaling and multi-leveling schemes. To this end, we first reveal that reliability-related flash errors are highly skewed among flash cell states, which was not captured by prior studies. The proposed SARO exploits then the different per-state error behavior in flash cell states by selecting the most error-prone flash states (for each error type) and by forming narrow threshold voltage distributions(for the selected states only). Furthermore, SARO is applied only when the program time gets shorter because of flash cell aging, thereby keeping the program latency unchanged. Our experimental results with real MLC and TLC flash devices show that SARO can reduce a significant number of flash bit errors, which can in turn reduce the read latency by 40%, on average.","PeriodicalId":215940,"journal":{"name":"Proceedings of the 2018 on Great Lakes Symposium on VLSI","volume":"78 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-05-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126305160","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Leveraging RF Power for Intelligent Tag Networks","authors":"E. Salman, M. Stanaćević, Samir R Das, P. Djurić","doi":"10.1145/3194554.3194621","DOIUrl":"https://doi.org/10.1145/3194554.3194621","url":null,"abstract":"A novel framework and related methodologies are described to leverage RF power for building intelligent and battery-free devices with communication and computation capabilities. These passive devices are envisioned to make significant impact for the popular vision of smart dust due to extreme low power operation. The communication framework relies on tag-to-tag backscattering with very limited energy resources. The computing framework relies on a novel AC computing methodology that facilitates local data processing with an order of magnitude less power consumption. These enabling technologies, as described in this paper, revitalize the concept of smart dust with significant impact on various application domains such as smart spaces, implantable devices, and environmental/structural monitoring.","PeriodicalId":215940,"journal":{"name":"Proceedings of the 2018 on Great Lakes Symposium on VLSI","volume":"8 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-05-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126371927","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Bozhi Liu, Kemeng Chen, Minjun Seo, Janet Roveda, Roman L. Lysecky
{"title":"Evaluation of the Complexity of Automated Trace Alignment using Novel Power Obfuscation Methods","authors":"Bozhi Liu, Kemeng Chen, Minjun Seo, Janet Roveda, Roman L. Lysecky","doi":"10.1145/3194554.3194640","DOIUrl":"https://doi.org/10.1145/3194554.3194640","url":null,"abstract":"This paper presents a methodology for evaluating power obfuscation approaches that seek to obfuscate the location of sensitive operations in the power trace, thereby increasing the complexity of automated trace alignment. The paper presents a new adversary model and proposes a new metric, mean trials to success (MTTS), to evaluate power obfuscation methods in the context of automated trace alignment. We evaluate two common obfuscation methods, namely instruction shuffling and random instruction insertion, and we present a new obfuscation method using power shaping to intentionally mislead the attacker.","PeriodicalId":215940,"journal":{"name":"Proceedings of the 2018 on Great Lakes Symposium on VLSI","volume":"48 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-05-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127422649","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Ramin Rezaeizadeh Rookerd, Somayeh Sadeghi Kohan, Z. Navabi
{"title":"Performance and Energy Enhancement through an Online Single/Multi Level Mode Switching Cache Architecture","authors":"Ramin Rezaeizadeh Rookerd, Somayeh Sadeghi Kohan, Z. Navabi","doi":"10.1145/3194554.3194599","DOIUrl":"https://doi.org/10.1145/3194554.3194599","url":null,"abstract":"STT-RAM cells can be considered as an alternative or a hybrid addition to today's SRAM-based cache memories. This is mostly because of their scalability and low leakage power. Moreover, their data storing mechanism (storing the value as resistance) makes them very suitable and applicable for multivalue cache architectures. This feature results in system performance enhancement without any area overhead. On the other hand, the required two-step read/write procedure in multilevel cells results in a non-uniform time access and energy and power overhead on the system. In this paper, we propose a new architecture to dynamically swap data between soft (fast read access) and hard (slow read access) bits in ML cell. Moreover, by reconfiguring cache block size, the proposed architecture can switch between ML and SL modes at runtime. In other words, the swapping method places the hot part of each cache block into soft-bits and the less accessed part into the hard-bits. The SL/ML switching method benefits from the low latency and energy of SL mode and the high storing capacity of ML mode at the same time. Although experimental results show that our proposed method slightly increases the miss rate compared with the conventional ML caches, the performance and energy are improved by 4.9% and 6.5%, respectively. Also, the storage overhead of our method is about 1% that is negligible.","PeriodicalId":215940,"journal":{"name":"Proceedings of the 2018 on Great Lakes Symposium on VLSI","volume":"15 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-05-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129193473","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Session details: Special Session 3: Circuits and Systems for Autonomous IoT Devices","authors":"E. Salman, M. Stanaćević","doi":"10.1145/3252916","DOIUrl":"https://doi.org/10.1145/3252916","url":null,"abstract":"","PeriodicalId":215940,"journal":{"name":"Proceedings of the 2018 on Great Lakes Symposium on VLSI","volume":"7 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-05-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133590718","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Session details: Special Session 6: Stochastic and Approximate Computing for Emerging Learning and Communication Systems","authors":"Jie Han, Yue Zhang","doi":"10.1145/3252919","DOIUrl":"https://doi.org/10.1145/3252919","url":null,"abstract":"","PeriodicalId":215940,"journal":{"name":"Proceedings of the 2018 on Great Lakes Symposium on VLSI","volume":"29 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-05-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130418442","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"A Distributed Parallel Random Walk Algorithm for Large-Scale Capacitance Extraction and Simulation","authors":"Mingye Song, Zhezhao Xu, Wei Xue, Wenjian Yu","doi":"10.1145/3194554.3194568","DOIUrl":"https://doi.org/10.1145/3194554.3194568","url":null,"abstract":"Due to the advantages on scalability and reliability, the floating random walk (FRW) algorithm has been widely adopted for calculating the capacitances among three-dimensional (3-D) conductors. This is evidenced by the industrial practice of interconnect capacitance extraction during the design of high-performance very large-scale integrated (VLSI) circuits. In this work, the FRW algorithm is enhanced through the distributed parallel computing. With an efficient and adaptive task allocation scheme, the communication among different computer nodes is largely reduced. A distributed algorithm for accelerating the space management is also proposed. They have been implemented with Message Passing Interface (MPI) and applied to the high-precision capacitance simulation for touchscreen design and the interconnect capacitance extraction of VLSI circuits. Experiments on a computer cluster show that the proposed techniques achieve up to 114X speedup while using 120 cores, and build up the space management structure for a VLSI case including two million conductor blocks in just 22 seconds (37X parallel speedup on 60 cores).","PeriodicalId":215940,"journal":{"name":"Proceedings of the 2018 on Great Lakes Symposium on VLSI","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-05-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130771942","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Konstantinos Maragos, G. Lentaris, I. Stratakos, D. Soudris
{"title":"A Framework Exploiting Process Variability to Improve Energy Efficiency in FPGA Applications","authors":"Konstantinos Maragos, G. Lentaris, I. Stratakos, D. Soudris","doi":"10.1145/3194554.3194569","DOIUrl":"https://doi.org/10.1145/3194554.3194569","url":null,"abstract":"As technology node scales-down and process variability increases, the vendors impose even more conservative guard-bands to prevent potential malfunction of their microchips. However, this approach introduces considerable amounts of unexploited performance to individual chips, which can be harvested by developing novel customization tools. In the current work, we focus on the exploitation of process variability in modern FPGA chips to provide more energy efficient solutions. We propose a framework that i) generates variability maps characterizing the energy efficiency of commercial chips and ii) combines voltage and frequency scaling to limit the power dissipation of any given design for a given set of performance constraints. Experimental results on Zynq XC7Z020 28nm FPGAs show that the developed framework achieves up to 28.3% power reduction while maintaining the performance and functional integrity of realistic benchmarks. Moreover, by selecting the most efficient chip, we achieve up to 5.1% additional power savings.","PeriodicalId":215940,"journal":{"name":"Proceedings of the 2018 on Great Lakes Symposium on VLSI","volume":"268 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-05-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132912169","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}