2021 IEEE/ACM International Symposium on Nanoscale Architectures (NANOARCH)最新文献_第2页

Error Analysis of the Variational Quantum Eigensolver Algorithm 变分量子特征求解算法的误差分析

2021 IEEE/ACM International Symposium on Nanoscale Architectures (NANOARCH) Pub Date : 2021-11-08 DOI: 10.1109/NANOARCH53687.2021.9642249

Sebastian Brandhofer, S. Devitt, I. Polian

引用次数: 2

Absolute Subtraction and Division Circuits Using Uncorrelated Random Bitstreams in Stochastic Computing 随机计算中使用不相关随机比特流的绝对减法和绝对除法电路

2021 IEEE/ACM International Symposium on Nanoscale Architectures (NANOARCH) Pub Date : 2021-11-08 DOI: 10.1109/NANOARCH53687.2021.9642251

Yuancheng Zhou, Guangjun Xie, Jie Han, Yongqiang Zhang

引用次数: 1

Reconfigurable Approximate Multiplication Architecture for CNN-Based Speech Recognition Using Wallace Tree Tensor Multiplier Unit 基于Wallace树张量乘法器单元的cnn语音识别的可重构近似乘法结构

2021 IEEE/ACM International Symposium on Nanoscale Architectures (NANOARCH) Pub Date : 2021-11-08 DOI: 10.1109/NANOARCH53687.2021.9642240

Junyi Qian, Yu Jiang, Zilong Zhang, Renyuan Zhang, Ziyue Wang, Bo Liu

{"title":"Reconfigurable Approximate Multiplication Architecture for CNN-Based Speech Recognition Using Wallace Tree Tensor Multiplier Unit","authors":"Junyi Qian, Yu Jiang, Zilong Zhang, Renyuan Zhang, Ziyue Wang, Bo Liu","doi":"10.1109/NANOARCH53687.2021.9642240","DOIUrl":"https://doi.org/10.1109/NANOARCH53687.2021.9642240","url":null,"abstract":"When the neural network technology is applied to the battery-powered terminal equipment, the energy efficiency of its hardware calculation has become the key problem to be considered. Given this, this paper designs and realizes a reconfigurable approximate multiplication architecture for CNN-Based speech recognition. First, a convolutional neural network reconfigurable computing cell structure is presented. Second, it is extended to the design and implementation of a low-power precision controllable convolutional neural network, which includes the Wallace tree tensor multiplier unit and the design of an approximate compressor. As a case study, the proposed approximate designs are applied to a CNN-based keywords speech recognition system. Under TSMC 22nm ULL UHVT process condition, compared with the speech keyword recognition system without approximate computation, the power consumption of the processing engine with approximate multiplication computation unit is reduced by 51.55%, while the recognition accuracy is reduced by only 1%.","PeriodicalId":424982,"journal":{"name":"2021 IEEE/ACM International Symposium on Nanoscale Architectures (NANOARCH)","volume":"7 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-11-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129942308","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

A Novel Programmable Variation-Tolerant RRAM-based Delay Element Circuit 一种新型可编程容差随机存储器延迟元件电路

2021 IEEE/ACM International Symposium on Nanoscale Architectures (NANOARCH) Pub Date : 2021-11-08 DOI: 10.1109/NANOARCH53687.2021.9642239

Kangqiang Pan, Amr M. S. Tosson, Norman Y. Zhou, Lan Wei

引用次数: 1

Error Resilience and Recovery of Process Induced Stuck-at Faults in MLP Neural Networks using Emerging Technology 基于新兴技术的MLP神经网络过程诱导卡滞故障的错误恢复与恢复

2021 IEEE/ACM International Symposium on Nanoscale Architectures (NANOARCH) Pub Date : 2021-11-08 DOI: 10.1109/NANOARCH53687.2021.9642243

A. Zhang, Amr M. S. Tosson, Lan Wei

{"title":"Error Resilience and Recovery of Process Induced Stuck-at Faults in MLP Neural Networks using Emerging Technology","authors":"A. Zhang, Amr M. S. Tosson, Lan Wei","doi":"10.1109/NANOARCH53687.2021.9642243","DOIUrl":"https://doi.org/10.1109/NANOARCH53687.2021.9642243","url":null,"abstract":"With the end of Moore’s law, emerging technologies and materials that offer greater performance than silicon are gaining interest, such as alternative low dimensional channel materials (LDMs) including Carbon Nanotube FETs (CNFETs) and so on. Although LDM transistors offer higher performance due to better electrostatic control and/or higher mobilities than their silicon counterpart, their fabrication processes are immature and suffer greatly from defects and variations, leading to high chances of stuck-at faults. Unlike general-purpose applications intolerable to high fault rates, applications with approximate components in their algorithm such as neuromorphic networks and machine learning are inherently error resilient. Meanwhile, such applications are computation-heavy and can benefit from the reduced power and improved performance that emerging technologies offer. This work analyses the effect of stuck-at faults in the SRAM cells of the NeuroSim Multi-Layer Perceptron (MLP) under various fault patterns, and presents fault recovery techniques to improve the re-trained accuracy against high stuckat fault rates to assess the applicability of emerging technologies to machine learning applications. With the proper selection of a recovery technique, the system can tolerate a high level of stuckat faults, which means emerging technologies can be useful even at the early stage of technology development with an immature process.","PeriodicalId":424982,"journal":{"name":"2021 IEEE/ACM International Symposium on Nanoscale Architectures (NANOARCH)","volume":"12 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-11-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131376669","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

HDCOG: A Lightweight Hyperdimensional Computing Framework with Feature Extraction 基于特征提取的轻量级超维计算框架

2021 IEEE/ACM International Symposium on Nanoscale Architectures (NANOARCH) Pub Date : 2021-11-08 DOI: 10.1109/NANOARCH53687.2021.9642247

Shijin Duan, Xiaolin Xu

引用次数: 1

FeFET-based Process-in-Memory Architecture for Low-Power DNN Training 低功耗DNN训练中基于fet的内存中进程架构

2021 IEEE/ACM International Symposium on Nanoscale Architectures (NANOARCH) Pub Date : 2021-11-08 DOI: 10.1109/NANOARCH53687.2021.9642234

Farzaneh Zokaee, Bing Li, Fan Chen

{"title":"FeFET-based Process-in-Memory Architecture for Low-Power DNN Training","authors":"Farzaneh Zokaee, Bing Li, Fan Chen","doi":"10.1109/NANOARCH53687.2021.9642234","DOIUrl":"https://doi.org/10.1109/NANOARCH53687.2021.9642234","url":null,"abstract":"Although deep neural networks (DNNs) have become the cornerstone of Artificial Intelligence, the current training of DNNs still requires dozens of CPU hours. Prior works created various customized hardware accelerators for DNNs, however, most of these accelerators are designed to accelerate DNN inference and lack basic support for complex compute phases and sophisticated data dependency involved in DNN training. The major challenges for supporting DNN training come from various layers of the system stack: (1) the current de-facto training methods, error backpropagation (BP), requires all the weights and intermediate data to be stored in memory, and then sequentially consumed in backward paths. Therefore, weight updates are non-local and rely on upstream layers, which makes training parallelization extremely challenging and also incurs significant memory and computing overheads; (2) the power consumption of such CMOS accelerators can reach 200~250 Watt. Though emerging memory technology based designs demonstrated a great potential in low-power DNN acceleration, their power efficiency is bottlenecked by CMOS analog-to-digital converters (ADCs).In this work, we review the current advance in accelerator designs for DNNs and point out their limitations. Then we set out to address these challenges by combining innovations in training algorithm, circuits, and accelerator architecture. Our research still follows the Process-in-Memory (PIM) strategy. Specifically, we leverage the recently proposed Direct Feedback Alignment (DFA) training algorithm to overcome the limitation of long-range data dependency required by BP. We then propose to execute the DNN training in parallel in a particularly designed pipeline. We implement the proposed architecture using Ferroelectric Field-Effect Transistors (FeFET) due to their high performance and low-power operations. To further improve the power efficiency, we propose a random number generator (RNG) and an ultra-low power FeFET-based ADC. Preliminary results suggest the feasibility and promise of our approaches for low-power and highly parallel DNN training in a broad range of applications.","PeriodicalId":424982,"journal":{"name":"2021 IEEE/ACM International Symposium on Nanoscale Architectures (NANOARCH)","volume":"184 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-11-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127055211","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 1

Cryogenic In-MRAM Computing 低温In-MRAM计算

2021 IEEE/ACM International Symposium on Nanoscale Architectures (NANOARCH) Pub Date : 2021-11-08 DOI: 10.1109/NANOARCH53687.2021.9642238

Yaoru Hou, Wei-qi Ge, Yanan Guo, L. Naviner, You Wang, Bo Liu, Jun Yang, Hao Cai

{"title":"Cryogenic In-MRAM Computing","authors":"Yaoru Hou, Wei-qi Ge, Yanan Guo, L. Naviner, You Wang, Bo Liu, Jun Yang, Hao Cai","doi":"10.1109/NANOARCH53687.2021.9642238","DOIUrl":"https://doi.org/10.1109/NANOARCH53687.2021.9642238","url":null,"abstract":"In the computation storage separated von-Neumann architecture, memory-wall becomes critical due to large access latency and tremendous amount of data movement. In this work, we pursue cryogenic temperature based memory design and focus on spin-transfer-torque magnetoresistive random access memory (STT-MRAM) at 77-Kelvin (achieved with low-cost liquid nitrogen). Cryogenic compact model and related cryogenic bitcell are investigated based on 77K experiment data of magnetic tunnel junction (MTJ) and CMOS transistor. Aggressive energy reduction is obtained through in-MRAM computing architecture. A 1Kb sub-array is simulated based on above cryogenic models. Results show that cryogenic in-MRAM computing provides performance improvements of 32% on average, and concurrently reduces memory energy consumption by 19% on average. Compared with room temperature (RT) simulation results, a 70% reduction of sensing latency is realized at 0.7-V supply voltage, with the cost of 30% increased writing latency and 20% higher energy consumption. A 32.5% sensing failure probability is alleviated in the 77K cryogenic environment. The proposed 77K cryogenic design methodology can be further applied to energy constrained applications.","PeriodicalId":424982,"journal":{"name":"2021 IEEE/ACM International Symposium on Nanoscale Architectures (NANOARCH)","volume":"10 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-11-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128055589","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 3

Deep Neural Network Security From a Hardware Perspective 从硬件角度看深度神经网络安全

2021 IEEE/ACM International Symposium on Nanoscale Architectures (NANOARCH) Pub Date : 2021-11-08 DOI: 10.1109/NANOARCH53687.2021.9642246

Tong Zhou, Yuheng Zhang, Shijin Duan, Yukui Luo, Xiaolin Xu

引用次数: 1

Optimizing Adiabatic Quantum-Flux-Parametron (AQFP) Circuits using an Exact Database 利用精确数据库优化绝热量子通量参数电路

2021 IEEE/ACM International Symposium on Nanoscale Architectures (NANOARCH) Pub Date : 2021-11-08 DOI: 10.1109/NANOARCH53687.2021.9642241

Dewmini Sudara Marakkalage, Heinz Riener, G. Micheli

{"title":"Optimizing Adiabatic Quantum-Flux-Parametron (AQFP) Circuits using an Exact Database","authors":"Dewmini Sudara Marakkalage, Heinz Riener, G. Micheli","doi":"10.1109/NANOARCH53687.2021.9642241","DOIUrl":"https://doi.org/10.1109/NANOARCH53687.2021.9642241","url":null,"abstract":"Adiabatic Quantum-Flux-Parametron (AQFP) is a family of superconducting electronic (SCE) circuits exhibiting high energy efficiency. In AQFP technology, logic gates require splitters to drive multiple fanouts and both the logic gates and the splitters are clocked, requiring path balancing using buffers to ensure all fanins of a gate arrive simultaneously. In this work, we propose a new synthesis approach comprising of two stages: In the first stage, a database of optimum small AQFP circuit structures is generated. This is a one-time, network-independent operation. In the second stage, the input network is first mapped to a LUT network and then the LUTs are replaced with the locally optimum (area or delay) AQFP structures from the generated database in the topological order. Our proposed method simultaneously optimizes the resources used by 1) gates that compute logic functions and 2) buffers/splitters. Hence, it captures additional optimization opportunities that are not explored in the state-of-the-art methods where buffer-splitter optimizations are done after the logic optimizations. Our method, when using a delay-oriented (area-oriented) strategy, achieves over a 40% (35%) decrease in delay in the critical path (the number of levels) and a 19% (21%) decrease in area (the number of Josephson Junctions) as compared to existing work.","PeriodicalId":424982,"journal":{"name":"2021 IEEE/ACM International Symposium on Nanoscale Architectures (NANOARCH)","volume":"21 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-11-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124534619","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 5