{"title":"Error Analysis of the Variational Quantum Eigensolver Algorithm","authors":"Sebastian Brandhofer, S. Devitt, I. Polian","doi":"10.1109/NANOARCH53687.2021.9642249","DOIUrl":"https://doi.org/10.1109/NANOARCH53687.2021.9642249","url":null,"abstract":"Variational quantum algorithms have been one of the most intensively studied applications for near-term quantum computing applications. The noisy intermediate-scale quantum (NISQ) regime, where small enough algorithms can be run successfully on noisy quantum computers expected during the next 5 years, is driving both a large amount of research work and a significant amount of private sector funding. Therefore, it is important to understand whether variational algorithms are effective at successfully converging to the correct answer in presence of noise. We perform a comprehensive study of the variational quantum eigensolver (VQE) and its individual quantum subroutines. Building on asymptotic bounds, we show through explicit simulation that the VQE algorithm effectively collapses already when single errors occur during a quantum processing call. We discuss the significant implications of this result in the context of being able to run any variational type algorithm without resource expensive error correction protocols.","PeriodicalId":424982,"journal":{"name":"2021 IEEE/ACM International Symposium on Nanoscale Architectures (NANOARCH)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-11-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122638778","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Yuancheng Zhou, Guangjun Xie, Jie Han, Yongqiang Zhang
{"title":"Absolute Subtraction and Division Circuits Using Uncorrelated Random Bitstreams in Stochastic Computing","authors":"Yuancheng Zhou, Guangjun Xie, Jie Han, Yongqiang Zhang","doi":"10.1109/NANOARCH53687.2021.9642251","DOIUrl":"https://doi.org/10.1109/NANOARCH53687.2021.9642251","url":null,"abstract":"Different from conventional deterministic binary computing, stochastic computing (SC) utilizes random binary bitstreams to implement arithmetic functions. It has shown advantages in hardware cost and fault tolerance in applications such as image processing. In contrast stretching and edge detection, specifically, division and absolute subtraction are important functions. However, it is challenging to directly compute these functions in SC, especially when uncorrelated bitstreams are used. In this paper, a counter-based unipolar scaled absolute subtractor (UCASub) is first proposed for using two uncorrelated bitstreams. Based on the UCASub, a bipolar scaled absolute subtractor and unipolar and bipolar dividers are further proposed for using uncorrelated bitstreams. Experimental results show that these circuits are more accurate with lower mean squared errors and similar hardware overhead when compared with previous designs.","PeriodicalId":424982,"journal":{"name":"2021 IEEE/ACM International Symposium on Nanoscale Architectures (NANOARCH)","volume":"30 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-11-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129039034","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Junyi Qian, Yu Jiang, Zilong Zhang, Renyuan Zhang, Ziyue Wang, Bo Liu
{"title":"Reconfigurable Approximate Multiplication Architecture for CNN-Based Speech Recognition Using Wallace Tree Tensor Multiplier Unit","authors":"Junyi Qian, Yu Jiang, Zilong Zhang, Renyuan Zhang, Ziyue Wang, Bo Liu","doi":"10.1109/NANOARCH53687.2021.9642240","DOIUrl":"https://doi.org/10.1109/NANOARCH53687.2021.9642240","url":null,"abstract":"When the neural network technology is applied to the battery-powered terminal equipment, the energy efficiency of its hardware calculation has become the key problem to be considered. Given this, this paper designs and realizes a reconfigurable approximate multiplication architecture for CNN-Based speech recognition. First, a convolutional neural network reconfigurable computing cell structure is presented. Second, it is extended to the design and implementation of a low-power precision controllable convolutional neural network, which includes the Wallace tree tensor multiplier unit and the design of an approximate compressor. As a case study, the proposed approximate designs are applied to a CNN-based keywords speech recognition system. Under TSMC 22nm ULL UHVT process condition, compared with the speech keyword recognition system without approximate computation, the power consumption of the processing engine with approximate multiplication computation unit is reduced by 51.55%, while the recognition accuracy is reduced by only 1%.","PeriodicalId":424982,"journal":{"name":"2021 IEEE/ACM International Symposium on Nanoscale Architectures (NANOARCH)","volume":"7 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-11-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129942308","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Kangqiang Pan, Amr M. S. Tosson, Norman Y. Zhou, Lan Wei
{"title":"A Novel Programmable Variation-Tolerant RRAM-based Delay Element Circuit","authors":"Kangqiang Pan, Amr M. S. Tosson, Norman Y. Zhou, Lan Wei","doi":"10.1109/NANOARCH53687.2021.9642239","DOIUrl":"https://doi.org/10.1109/NANOARCH53687.2021.9642239","url":null,"abstract":"A programmable delay element (DE) circuit based on Resistive Random-Access Memory (RRAM) with delay range from ~100 ps to ~1 ns is proposed. Impacts of RRAM resistance on delay range and power consumption of the circuit are analyzed. An improved circuit structure utilizing RRAMs in parallel is proposed to reduce impact of RRAM variability to DE circuit.","PeriodicalId":424982,"journal":{"name":"2021 IEEE/ACM International Symposium on Nanoscale Architectures (NANOARCH)","volume":"2 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-11-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125905492","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Error Resilience and Recovery of Process Induced Stuck-at Faults in MLP Neural Networks using Emerging Technology","authors":"A. Zhang, Amr M. S. Tosson, Lan Wei","doi":"10.1109/NANOARCH53687.2021.9642243","DOIUrl":"https://doi.org/10.1109/NANOARCH53687.2021.9642243","url":null,"abstract":"With the end of Moore’s law, emerging technologies and materials that offer greater performance than silicon are gaining interest, such as alternative low dimensional channel materials (LDMs) including Carbon Nanotube FETs (CNFETs) and so on. Although LDM transistors offer higher performance due to better electrostatic control and/or higher mobilities than their silicon counterpart, their fabrication processes are immature and suffer greatly from defects and variations, leading to high chances of stuck-at faults. Unlike general-purpose applications intolerable to high fault rates, applications with approximate components in their algorithm such as neuromorphic networks and machine learning are inherently error resilient. Meanwhile, such applications are computation-heavy and can benefit from the reduced power and improved performance that emerging technologies offer. This work analyses the effect of stuck-at faults in the SRAM cells of the NeuroSim Multi-Layer Perceptron (MLP) under various fault patterns, and presents fault recovery techniques to improve the re-trained accuracy against high stuckat fault rates to assess the applicability of emerging technologies to machine learning applications. With the proper selection of a recovery technique, the system can tolerate a high level of stuckat faults, which means emerging technologies can be useful even at the early stage of technology development with an immature process.","PeriodicalId":424982,"journal":{"name":"2021 IEEE/ACM International Symposium on Nanoscale Architectures (NANOARCH)","volume":"12 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-11-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131376669","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"HDCOG: A Lightweight Hyperdimensional Computing Framework with Feature Extraction","authors":"Shijin Duan, Xiaolin Xu","doi":"10.1109/NANOARCH53687.2021.9642247","DOIUrl":"https://doi.org/10.1109/NANOARCH53687.2021.9642247","url":null,"abstract":"As an emerging classification and recognition paradigms on nanoscale edge-devices, the performance (e.g., inference accuracy) of Hyperdimensional Computing (HDC) is usually obstructed by the limited computing resources. This paper proposes HDCOG, a lightweight framework leveraging feature extraction as a preliminary of binary HDC model. HDCOG extracts distinguishable feature information from the data input during the encoding phase, which significantly eliminates information redundancy in object classification and recognition. We validate HDCOG framework with several popular datasets against other state-of-the-art binary HDC models and machine learning methods. Experimental evaluation and analysis demonstrate that, compared with other state-of-the-art binary HDC models, HDCOG significantly reduces the memory overhead to lower than 10% and consumes comparable or even less inference time. Moreover, HDCOG also achieves about 9% accuracy improvement on complex datasets against other binary HDC models, which is comparable to lightweight neural networks.","PeriodicalId":424982,"journal":{"name":"2021 IEEE/ACM International Symposium on Nanoscale Architectures (NANOARCH)","volume":"36 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-11-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122879681","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"FeFET-based Process-in-Memory Architecture for Low-Power DNN Training","authors":"Farzaneh Zokaee, Bing Li, Fan Chen","doi":"10.1109/NANOARCH53687.2021.9642234","DOIUrl":"https://doi.org/10.1109/NANOARCH53687.2021.9642234","url":null,"abstract":"Although deep neural networks (DNNs) have become the cornerstone of Artificial Intelligence, the current training of DNNs still requires dozens of CPU hours. Prior works created various customized hardware accelerators for DNNs, however, most of these accelerators are designed to accelerate DNN inference and lack basic support for complex compute phases and sophisticated data dependency involved in DNN training. The major challenges for supporting DNN training come from various layers of the system stack: (1) the current de-facto training methods, error backpropagation (BP), requires all the weights and intermediate data to be stored in memory, and then sequentially consumed in backward paths. Therefore, weight updates are non-local and rely on upstream layers, which makes training parallelization extremely challenging and also incurs significant memory and computing overheads; (2) the power consumption of such CMOS accelerators can reach 200~250 Watt. Though emerging memory technology based designs demonstrated a great potential in low-power DNN acceleration, their power efficiency is bottlenecked by CMOS analog-to-digital converters (ADCs).In this work, we review the current advance in accelerator designs for DNNs and point out their limitations. Then we set out to address these challenges by combining innovations in training algorithm, circuits, and accelerator architecture. Our research still follows the Process-in-Memory (PIM) strategy. Specifically, we leverage the recently proposed Direct Feedback Alignment (DFA) training algorithm to overcome the limitation of long-range data dependency required by BP. We then propose to execute the DNN training in parallel in a particularly designed pipeline. We implement the proposed architecture using Ferroelectric Field-Effect Transistors (FeFET) due to their high performance and low-power operations. To further improve the power efficiency, we propose a random number generator (RNG) and an ultra-low power FeFET-based ADC. Preliminary results suggest the feasibility and promise of our approaches for low-power and highly parallel DNN training in a broad range of applications.","PeriodicalId":424982,"journal":{"name":"2021 IEEE/ACM International Symposium on Nanoscale Architectures (NANOARCH)","volume":"184 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-11-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127055211","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Yaoru Hou, Wei-qi Ge, Yanan Guo, L. Naviner, You Wang, Bo Liu, Jun Yang, Hao Cai
{"title":"Cryogenic In-MRAM Computing","authors":"Yaoru Hou, Wei-qi Ge, Yanan Guo, L. Naviner, You Wang, Bo Liu, Jun Yang, Hao Cai","doi":"10.1109/NANOARCH53687.2021.9642238","DOIUrl":"https://doi.org/10.1109/NANOARCH53687.2021.9642238","url":null,"abstract":"In the computation storage separated von-Neumann architecture, memory-wall becomes critical due to large access latency and tremendous amount of data movement. In this work, we pursue cryogenic temperature based memory design and focus on spin-transfer-torque magnetoresistive random access memory (STT-MRAM) at 77-Kelvin (achieved with low-cost liquid nitrogen). Cryogenic compact model and related cryogenic bitcell are investigated based on 77K experiment data of magnetic tunnel junction (MTJ) and CMOS transistor. Aggressive energy reduction is obtained through in-MRAM computing architecture. A 1Kb sub-array is simulated based on above cryogenic models. Results show that cryogenic in-MRAM computing provides performance improvements of 32% on average, and concurrently reduces memory energy consumption by 19% on average. Compared with room temperature (RT) simulation results, a 70% reduction of sensing latency is realized at 0.7-V supply voltage, with the cost of 30% increased writing latency and 20% higher energy consumption. A 32.5% sensing failure probability is alleviated in the 77K cryogenic environment. The proposed 77K cryogenic design methodology can be further applied to energy constrained applications.","PeriodicalId":424982,"journal":{"name":"2021 IEEE/ACM International Symposium on Nanoscale Architectures (NANOARCH)","volume":"10 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-11-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128055589","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Deep Neural Network Security From a Hardware Perspective","authors":"Tong Zhou, Yuheng Zhang, Shijin Duan, Yukui Luo, Xiaolin Xu","doi":"10.1109/NANOARCH53687.2021.9642246","DOIUrl":"https://doi.org/10.1109/NANOARCH53687.2021.9642246","url":null,"abstract":"Deep neural networks (DNNs) have been deployed on various computing platforms for acceleration, making the hardware security of DNNs an emerging concern. Several attacking methods related to the hardware accelerator of DNN have been introduced, which either affect the DNN inference accuracy or leak the privacy of DNN architectures and parameters. To provide a generic understanding of this emerging research area, in this survey, we systematically review the recent research progress of DNN security from a hardware perspective. Specially, we discuss the existing hardware-oriented attacks targeting different DNN acceleration platforms, and point out the potential vulnerabilities.","PeriodicalId":424982,"journal":{"name":"2021 IEEE/ACM International Symposium on Nanoscale Architectures (NANOARCH)","volume":"49 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-11-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129316417","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Dewmini Sudara Marakkalage, Heinz Riener, G. Micheli
{"title":"Optimizing Adiabatic Quantum-Flux-Parametron (AQFP) Circuits using an Exact Database","authors":"Dewmini Sudara Marakkalage, Heinz Riener, G. Micheli","doi":"10.1109/NANOARCH53687.2021.9642241","DOIUrl":"https://doi.org/10.1109/NANOARCH53687.2021.9642241","url":null,"abstract":"Adiabatic Quantum-Flux-Parametron (AQFP) is a family of superconducting electronic (SCE) circuits exhibiting high energy efficiency. In AQFP technology, logic gates require splitters to drive multiple fanouts and both the logic gates and the splitters are clocked, requiring path balancing using buffers to ensure all fanins of a gate arrive simultaneously. In this work, we propose a new synthesis approach comprising of two stages: In the first stage, a database of optimum small AQFP circuit structures is generated. This is a one-time, network-independent operation. In the second stage, the input network is first mapped to a LUT network and then the LUTs are replaced with the locally optimum (area or delay) AQFP structures from the generated database in the topological order. Our proposed method simultaneously optimizes the resources used by 1) gates that compute logic functions and 2) buffers/splitters. Hence, it captures additional optimization opportunities that are not explored in the state-of-the-art methods where buffer-splitter optimizations are done after the logic optimizations. Our method, when using a delay-oriented (area-oriented) strategy, achieves over a 40% (35%) decrease in delay in the critical path (the number of levels) and a 19% (21%) decrease in area (the number of Josephson Junctions) as compared to existing work.","PeriodicalId":424982,"journal":{"name":"2021 IEEE/ACM International Symposium on Nanoscale Architectures (NANOARCH)","volume":"21 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-11-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124534619","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}