R. Wille, Marcel Walter, F. Sill, Daniel Große, R. Drechsler
{"title":"Ignore Clocking Constraints: An Alternative Physical Design Methodology for Field-Coupled Nanotechnologies","authors":"R. Wille, Marcel Walter, F. Sill, Daniel Große, R. Drechsler","doi":"10.1109/ISVLSI.2019.00121","DOIUrl":"https://doi.org/10.1109/ISVLSI.2019.00121","url":null,"abstract":"Field-Coupled Nanocomputing (FCN) allows for conducting computations with a power consumption that is magnitudes below current CMOS technologies. Recent physical implementations confirmed these prospects and put pressure on the Electronic Design Automation (EDA) community to develop physical design methods comparable to those available for conventional circuits. While the major design task boils down to a place and route problem, certain characteristics of FCN circuits introduce further challenges in terms of dedicated clock arrangements which lead to rather cumbersome clocking constraints. Thus far, those constraints have been addressed in a rather unsatisfactory fashion only. In this work, we propose a physical design methodology which tackles this problem by simply ignoring the clocking constraints and using adjusted conventional place and route algorithms. In order to deal with the resulting ramifications, a dedicated synchronization element is introduced. Results extracted from a physics simulator confirm the feasibility of the approach. A proof of concept implementation illustrates that ignoring clocking constraints indeed allows for a promising alternative direction for FCN design that overcomes the obstacles preventing the development of efficient solutions thus far.","PeriodicalId":6703,"journal":{"name":"2019 IEEE Computer Society Annual Symposium on VLSI (ISVLSI)","volume":"129 1","pages":"651-656"},"PeriodicalIF":0.0,"publicationDate":"2019-07-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"85616464","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Jonas Gava, Vitor V. Bandeira, R. Reis, Luciano Ost
{"title":"Evaluation of Compilers Effects on OpenMP Soft Error Resiliency","authors":"Jonas Gava, Vitor V. Bandeira, R. Reis, Luciano Ost","doi":"10.1109/ISVLSI.2019.00055","DOIUrl":"https://doi.org/10.1109/ISVLSI.2019.00055","url":null,"abstract":"Software engineers are using different compilers and parallel programming models (e.g., Pthreads, OpenMP) to take the best performance offered by multicore systems. Both programming models and compilers have specific characteristics, which directly impact on applications code footprint, performance, power-efficiency and reliability. The occurrence of soft errors in multicore systems is a growing reliability issue in several domains (e.g., automotive, medical, avionics). In this scenario, this paper investigates the impact of widely adopted compilers on the soft error reliability of applications implemented with OpenMP library. Fault injection campaigns consider 3 open-source compilers (GNU/GCC 5.5.0, 7.3.1, Clang 6.0.1), 16 OpenMP benchmarks executing on single, dual and quad-core versions of the Arm Cortex-A72 processor. Results show that, on average, Clang is 15.85% more reliable than both versions of GCC, which demonstrate to be more sensitive to the optimisation flags when compared to Clang.","PeriodicalId":6703,"journal":{"name":"2019 IEEE Computer Society Annual Symposium on VLSI (ISVLSI)","volume":"26 1","pages":"259-264"},"PeriodicalIF":0.0,"publicationDate":"2019-07-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"75025130","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Alberto Marchisio, Muhammad Abdullah Hanif, Faiq Khalid, George Plastiras, C. Kyrkou, T. Theocharides, M. Shafique
{"title":"Deep Learning for Edge Computing: Current Trends, Cross-Layer Optimizations, and Open Research Challenges","authors":"Alberto Marchisio, Muhammad Abdullah Hanif, Faiq Khalid, George Plastiras, C. Kyrkou, T. Theocharides, M. Shafique","doi":"10.1109/ISVLSI.2019.00105","DOIUrl":"https://doi.org/10.1109/ISVLSI.2019.00105","url":null,"abstract":"In the Machine Learning era, Deep Neural Networks (DNNs) have taken the spotlight, due to their unmatchable performance in several applications, such as image processing, computer vision, and natural language processing. However, as DNNs grow in their complexity, their associated energy consumption becomes a challenging problem. Such challenge heightens for edge computing, where the computing devices are resource-constrained while operating on limited energy budget. Therefore, specialized optimizations for deep learning have to be performed at both software and hardware levels. In this paper, we comprehensively survey the current trends of such optimizations and discuss key open research mid-term and long-term challenges.","PeriodicalId":6703,"journal":{"name":"2019 IEEE Computer Society Annual Symposium on VLSI (ISVLSI)","volume":"1 1","pages":"553-559"},"PeriodicalIF":0.0,"publicationDate":"2019-07-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"83749397","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Shaahin Angizi, Zhezhi He, D. Reis, X. Hu, Wilman Tsai, Shy-Jay Lin, Deliang Fan
{"title":"Accelerating Deep Neural Networks in Processing-in-Memory Platforms: Analog or Digital Approach?","authors":"Shaahin Angizi, Zhezhi He, D. Reis, X. Hu, Wilman Tsai, Shy-Jay Lin, Deliang Fan","doi":"10.1109/ISVLSI.2019.00044","DOIUrl":"https://doi.org/10.1109/ISVLSI.2019.00044","url":null,"abstract":"Nowadays, research topics on AI accelerator designs have attracted great interest, where accelerating Deep Neural Network (DNN) using Processing-in-Memory (PIM) platforms is an actively-explored direction with great potential. PIM platforms, which simultaneously aims to address power- and memory-wall bottlenecks, have shown orders of performance enhancement in comparison to the conventional computing platforms with Von-Neumann architecture. As one direction of accelerating DNN in PIM, resistive memory array (aka. crossbar) has drawn great research interest owing to its analog current-mode weighted summation operation which intrinsically matches the dominant Multiplication-and-Accumulation (MAC) operation in DNN, making it one of the most promising candidates. An alternative direction for PIM-based DNN acceleration is through bulk bit-wise logic operations directly performed on the content in digital memories. Thanks to the high fault-tolerant characteristic of DNN, the latest algorithmic progression successfully quantized DNN parameters to low bit-width representations, while maintaining competitive accuracy levels. Such DNN quantization techniques essentially convert MAC operation to much simpler addition/subtraction or comparison operations, which can be performed by bulk bit-wise logic operations in a highly parallel fashion. In this paper, we build a comprehensive evaluation framework to quantitatively compare and analyze aforementioned PIM based analog and digital approaches for DNN acceleration.","PeriodicalId":6703,"journal":{"name":"2019 IEEE Computer Society Annual Symposium on VLSI (ISVLSI)","volume":"79 1","pages":"197-202"},"PeriodicalIF":0.0,"publicationDate":"2019-07-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"83770302","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Towards Efficient Compact Network Training on Edge-Devices","authors":"Feng Xiong, Fengbin Tu, S. Yin, Shaojun Wei","doi":"10.1109/ISVLSI.2019.00020","DOIUrl":"https://doi.org/10.1109/ISVLSI.2019.00020","url":null,"abstract":"Currently, there is a trend to deploy training on edge devices, which is crucial to future AI applications in various scenarios with transfer and online learning demands. Specifically, there may be a severe degradation of accuracy when directly deploying the trained models on edge devices, because the local environment forms an edge local dataset that is often different from the generic dataset. However, training on edge devices with limited computing and memory capability is a challenge problem. In this paper, we propose a novel quantization training framework for efficient compact network training on edge devices. Firstly, training-aware symmetric quantization is introduced to quantize all of the data types in the training process. Then, channel-wise quantization method is adopted for comapact network quantization, which has significantly high tolerance to quantization errors and can make the training process more stable. For further efficient training, we build a hardware evaluation platform to evaluate different settings of the network, so as to achieve a better trade-off among accuracy, energy and latency. Finally, we evaluate two widely used compact networks on a domain adaptation dataset for image classification, and the results demonstrate that the proposed methods can allow us achieve an improvement of 8.4 × -17.2× in energy reduction and 11.9 × -16.3× in latency reduction compared with 32-bit implementations, while maintaining the classification accuracy.","PeriodicalId":6703,"journal":{"name":"2019 IEEE Computer Society Annual Symposium on VLSI (ISVLSI)","volume":"306 1","pages":"61-67"},"PeriodicalIF":0.0,"publicationDate":"2019-07-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"77127026","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Effect of Loop Positions on Reliability and Attack Resistance of Feed-Forward PUFs","authors":"S. V. S. Avvaru, K. Parhi","doi":"10.1109/ISVLSI.2019.00073","DOIUrl":"https://doi.org/10.1109/ISVLSI.2019.00073","url":null,"abstract":"In this paper, we study multiplexer (MUX) based feed-forward (FF) physical unclonable functions (FF PUFs) with 64 stages. This paper provides the first systematic empirical analysis of the effect of FF PUF design choices on their performance by evaluating various FF PUF structures in terms of their reliability and attack resistance. To this end, the change in reliability is studied by varying the location of FF loops and varying the number of loops within the circuit. It is observed adding more loops and arbiters makes PUFs more susceptible to noise; FF PUFs with 5 intermediate arbiters can have reliability values that are as low as 81%. It is further demonstrated that a soft-response thresholding strategy can significantly increase the reliability during authentication to more than 96%. We also show that attack resistance can change as a consequence of relative positioning of the FF loops. In case of double-loop FF PUFs (one intermediate arbiter with two utputs), it is shown that appropriately choosing the input and output locations of the FF loops, the number of challenge-response pairs required to attack can be increased by 7 times and can be further increased by 15 times if two intermediate arbiters are used.","PeriodicalId":6703,"journal":{"name":"2019 IEEE Computer Society Annual Symposium on VLSI (ISVLSI)","volume":"76 1","pages":"366-371"},"PeriodicalIF":0.0,"publicationDate":"2019-07-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"78688023","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Md Jubaer Hossain Pantho, Pankaj Bhowmik, C. Bobda
{"title":"Neuromorphic Image Sensor Design with Region-Aware Processing","authors":"Md Jubaer Hossain Pantho, Pankaj Bhowmik, C. Bobda","doi":"10.1109/ISVLSI.2019.00089","DOIUrl":"https://doi.org/10.1109/ISVLSI.2019.00089","url":null,"abstract":"This paper presents a pixel parallel architecture of a neuromorphic image sensor, designed as a 3D bottom-up architecture composing of several computational planes where each plane performs different image processing algorithms. The model emulates the hierarchical process in biological vision by providing feedforward and feedback information flow between different planes. The on-chip attention module dynamically detects regions with relevant information and produces a feedback path to sample those regions with a higher clock frequency, whereas regions with low spatial and temporal information receive less attention. The results suggest that by sampling non-relevant regions with a lower frequency, the sensor can reduce redundancy and enable high-performance computing at low power. Furthermore, by deploying high-level reasoning only on the selected regions instead of the entire image the model can decrease computational expenses.","PeriodicalId":6703,"journal":{"name":"2019 IEEE Computer Society Annual Symposium on VLSI (ISVLSI)","volume":"17 1","pages":"459-464"},"PeriodicalIF":0.0,"publicationDate":"2019-07-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"87385295","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Fatemeh Serajeh-hassani, Mohammad Sadrosadati, S. Pointner, R. Wille, H. Sarbazi-Azad
{"title":"Focus on What is Needed: Area and Power Efficient FPGAs Using Turn-Restricted Switch Boxes","authors":"Fatemeh Serajeh-hassani, Mohammad Sadrosadati, S. Pointner, R. Wille, H. Sarbazi-Azad","doi":"10.1109/ISVLSI.2019.00115","DOIUrl":"https://doi.org/10.1109/ISVLSI.2019.00115","url":null,"abstract":"Field-Programmable Gate Arrays (FPGAs) employ a significant amount of SRAM cells in order to provide a flexible routing architecture. While this flexibility allows for a rather easy realization of arbitrary functionality, the respectively required cells significantly increase the area and power consumption of the FPGA. At the same time, it can be observed that full routing flexibility is frequently not needed in order to efficiently realize the desired functionality. In this work, we are proposing an FPGA realization which focuses on what is needed and realizes only a subset of the possible routing options using what we call Turn-Restricted Switch-Boxes. While this may yield a slight decrease in the run-time performance of the realized functionality, it allows for substantial improvements with respect to area and power consumption. In fact, experimental evaluations confirm that area and power can be reduced by more than 40% and 60%, respectively, in the best cases. The performance overhead is negligible (up to 3%), on average.","PeriodicalId":6703,"journal":{"name":"2019 IEEE Computer Society Annual Symposium on VLSI (ISVLSI)","volume":"89 1","pages":"615-620"},"PeriodicalIF":0.0,"publicationDate":"2019-07-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"80226950","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Memory Locking: An Automated Approach to Processor Design Obfuscation","authors":"Michael Zuzak, Ankur Srivastava","doi":"10.1109/ISVLSI.2019.00103","DOIUrl":"https://doi.org/10.1109/ISVLSI.2019.00103","url":null,"abstract":"Conventional logic obfuscation techniques largely focus on locking the functionality of combinational modules. However, for processor design obfuscation, module-level errors are tangential to the fundamental adversarial goal: to produce a processor capable of running useful applications. As noted in previous work such as SFLL, module-level locking poses the following problem: high corruption in a locked module results in a high application-level error rate, but fundamentally leads to SAT attack susceptibility. Therefore, for combinational, module-level locking, increases in application-level error rates are accompanied by a corresponding increase in SAT susceptibility and vice versa. To address this, we introduce an automated and attack-resistant obfuscation technique, called memory locking, which targets on-chip SRAM. We demonstrate the application-level effectiveness of memory locking through system-level simulations of obfuscated processors.","PeriodicalId":6703,"journal":{"name":"2019 IEEE Computer Society Annual Symposium on VLSI (ISVLSI)","volume":"74 1","pages":"541-546"},"PeriodicalIF":0.0,"publicationDate":"2019-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"80933773","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Vishalini R. Laguduva, S. A. Islam, Sathyanarayanan N. Aakur, S. Katkoori, Robert Karam
{"title":"Machine Learning Based IoT Edge Node Security Attack and Countermeasures","authors":"Vishalini R. Laguduva, S. A. Islam, Sathyanarayanan N. Aakur, S. Katkoori, Robert Karam","doi":"10.1109/ISVLSI.2019.00124","DOIUrl":"https://doi.org/10.1109/ISVLSI.2019.00124","url":null,"abstract":"Advances in technology have enabled tremendous progress in the development of a highly connected ecosystem of ubiquitous computing devices collectively called the Internet of Things (IoT). Ensuring the security of IoT devices is a high priority due to the sensitive nature of the collected data. Physically Unclonable Functions (PUFs) have emerged as critical hardware primitive for ensuring the security of IoT nodes. Malicious modeling of PUF architectures has proven to be difficult due to the inherently stochastic nature of PUF architectures. Extant approaches to malicious PUF modeling assume that a priori knowledge and physical access to the PUF architecture is available for malicious attack on the IoT node. However, many IoT networks make the underlying assumption that the PUF architecture is sufficiently tamper-proof, both physically and mathematically. In this work, we show that knowledge of the underlying PUF structure is not necessary to clone a PUF. We present a novel non-invasive, architecture independent, machine learning attack for strong PUF designs with a cloning accuracy of 93.5% and improvements of up to 48.31% over an alternative, two-stage brute force attack model. We also propose a machine-learning based countermeasure, discriminator, which can distinguish cloned PUF devices and authentic PUFs with an average accuracy of 96.01%. The proposed discriminator can be used for rapidly authenticating millions of IoT nodes remotely from the cloud server.","PeriodicalId":6703,"journal":{"name":"2019 IEEE Computer Society Annual Symposium on VLSI (ISVLSI)","volume":"1 1","pages":"670-675"},"PeriodicalIF":0.0,"publicationDate":"2019-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"84070138","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}