2021 IEEE 3rd International Conference on Artificial Intelligence Circuits and Systems (AICAS)最新文献

筛选
英文 中文
Dynamically-biased Fixed-point LSTM for Time Series Processing in AIoT Edge Device AIoT边缘设备时间序列处理的动态偏置不动点LSTM
Jinhai Hu, W. Goh, Yuan Gao
{"title":"Dynamically-biased Fixed-point LSTM for Time Series Processing in AIoT Edge Device","authors":"Jinhai Hu, W. Goh, Yuan Gao","doi":"10.1109/AICAS51828.2021.9458508","DOIUrl":"https://doi.org/10.1109/AICAS51828.2021.9458508","url":null,"abstract":"In this paper, a Dynamically-Biased Long Short-Term Memory (DB-LSTM) neural network architecture is proposed for artificial intelligence internet of things (AIoT) applications. Different from the conventional LSTM which uses static bias, DB-LSTM adjusts the cell bias dynamically based on the previous status. Hence, a DB-LSTM cell contains information of both the previous output and the current cell state. With more information, the DB-LSTM is able to achieve faster training convergence and better accuracy. Furthermore, weight quantization is performed to reduce the weights to either 1-bit or 2-bit, so that the algorithm can be implemented in portable edge device. With the same 100 epochs training setup, more than 70% loss reduction are achieved for floating 32-bit, 1-bit and 2-bit weights, respectively. The loss degradation due to weight quantization is also negligible. The performance of the proposed model is also validated with the classical air passenger forecasting problem. 0.075 loss and 94.96% accuracy are achieved with 2-bit weight when compared to the ground truth, which is comparable to full-length 32-bit weight.","PeriodicalId":173204,"journal":{"name":"2021 IEEE 3rd International Conference on Artificial Intelligence Circuits and Systems (AICAS)","volume":"8 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-06-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123283065","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 3
MRAM-based BER resilient Quantized edge-AI Networks for Harsh Industrial Conditions 苛刻工业条件下基于mram的BER弹性量化边缘人工智能网络
V. Parmar, M. Suri, K. Yamane, T. Lee, Nyuk Leong Chung, V. B. Naik
{"title":"MRAM-based BER resilient Quantized edge-AI Networks for Harsh Industrial Conditions","authors":"V. Parmar, M. Suri, K. Yamane, T. Lee, Nyuk Leong Chung, V. B. Naik","doi":"10.1109/AICAS51828.2021.9458528","DOIUrl":"https://doi.org/10.1109/AICAS51828.2021.9458528","url":null,"abstract":"We investigate Edge-AI Inference (EAI) architectures based on 22nm FD-SOI embedded-MRAM (eMRAM) using quantized neural networks (QNN) for inference applications in harsh industrial conditions having strong magnetic field and wide operating temperature (-40∼125 °C). We achieved best case test accuracy of 98.99% with Quantized-Convolutional Neural Network (QCNN) and 89.94% with Quantized-Multi-layer Perceptron (QMLP) surpassing prior reported results in literature on MNIST dataset. By exploiting BER resilience of QNN, we show that eMRAM based EAI offers a superior magnetic immunity of ≈ 700 Oe at 125 °C (≈ 98% accuracy) without the use of ECC and significant energy saving of ≈ 14% for QCNN and ≈ 11% for QMLP. A detailed analysis on the tradeoff between retention time, write energy and inference accuracy is presented.","PeriodicalId":173204,"journal":{"name":"2021 IEEE 3rd International Conference on Artificial Intelligence Circuits and Systems (AICAS)","volume":"15 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-06-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122402515","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
Design Tools for Resistive Crossbar based Machine Learning Accelerators
I. Chakraborty, Sourjya Roy, S. Sridharan, M. Ali, Aayush Ankit, Shubham Jain, A. Raghunathan
{"title":"Design Tools for Resistive Crossbar based Machine Learning Accelerators","authors":"I. Chakraborty, Sourjya Roy, S. Sridharan, M. Ali, Aayush Ankit, Shubham Jain, A. Raghunathan","doi":"10.1109/AICAS51828.2021.9458433","DOIUrl":"https://doi.org/10.1109/AICAS51828.2021.9458433","url":null,"abstract":"Resistive crossbar based accelerators for Machine Learning (ML) have attracted great interest as they offer the prospect of high density on-chip storage as well as efficient in-memory matrix-vector multiplication (MVM) operations. Despite their promises, they present several design challenges, such as high write costs, overhead of analog-to-digital and digital-to-analog converters and other peripheral circuits, and accuracy degradation due to the the analog nature of in-memory computing coupled with device and circuit level non-idealities. The unique characteristics of crossbar-based accelerators pose unique challenges for design automation. We outline a design flow for crossbar-based accelerators, and elaborate on some key tools involved in such a flow. Specifically, we discuss architectural estimation of metrics such as power, performance and area, and functional simulation to evaluate algorithmic accuracy considering the impact of non-idealities.","PeriodicalId":173204,"journal":{"name":"2021 IEEE 3rd International Conference on Artificial Intelligence Circuits and Systems (AICAS)","volume":"27 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-06-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129130867","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
Two-phase Scheme for Trimming QTMT CU Partition using Multi-branch Convolutional Neural Networks 基于多分支卷积神经网络的QTMT CU分区两相修剪方案
Pin-Chieh Fu, Chia-Cheng Yen, Nien-Chen Yang, Jia-Shung Wang
{"title":"Two-phase Scheme for Trimming QTMT CU Partition using Multi-branch Convolutional Neural Networks","authors":"Pin-Chieh Fu, Chia-Cheng Yen, Nien-Chen Yang, Jia-Shung Wang","doi":"10.1109/AICAS51828.2021.9458479","DOIUrl":"https://doi.org/10.1109/AICAS51828.2021.9458479","url":null,"abstract":"Versatile Video Coding (VVC) initialized in October 2017, will provide the same subjective quality at roughly 50% the bitrate of its predecessor HEVC. VVC introduced a complex structure of quad-tree plus multi-type tree block partitioning (QT + MTT, or QTMT) in each $128 times 128$ block. However, it brings more encoding complexity. In this work, to tackle this problem effectively, a two-phase scheme for trimming QTMT CU partition using multi-branch CNN is presented. The goal is to predict the (QTMT) depth of QTMT partitioning on the basis of each block of size $32 times 32$. In the first phase, a backbone CNN followed by three parallel branches extracts latent features to predict which QT depth and whether using ternary-tree (TT) or not. In the second phase, based on the above prediction information, a huge number of possible (distinct) combinations of QTMT CU partition can be trimmed to reduce computational complexity. However, the practice of multiple branches leads to a significant increase in the amount of neural parameters in the CNN and consequently, the total computations of both training and inferencing will be raised significantly. Therefore, effective deep learning modules in MobilenetV2 are applied to downgrade the amount of parameters to an adequate level eventually. The experimental results show that the proposed method achieves 42.341% average saving of encoding time for all VVC test sequences and with 0.71 Bjntegaard-Delta bit-rate (BD-BR) increasing compared with VTM 6.1 in All-intra (AI) configuration.","PeriodicalId":173204,"journal":{"name":"2021 IEEE 3rd International Conference on Artificial Intelligence Circuits and Systems (AICAS)","volume":"45 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-06-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121472540","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 7
Efficient Digital Implementation of n-mode Tensor-Matrix Multiplication n模张量-矩阵乘法的高效数字化实现
C. Gianoglio, E. Ragusa, R. Zunino, P. Gastaldo
{"title":"Efficient Digital Implementation of n-mode Tensor-Matrix Multiplication","authors":"C. Gianoglio, E. Ragusa, R. Zunino, P. Gastaldo","doi":"10.1109/AICAS51828.2021.9458404","DOIUrl":"https://doi.org/10.1109/AICAS51828.2021.9458404","url":null,"abstract":"With the growth of pervasive electronics, the availability of compact digital circuitry for the support of data processing is becoming a key requirement. This paper tackles the design of a digital architecture supporting the $n -$mode tensormatrix product in fixed point representation. The design aims to minimize the resources occupancy, targeting low cost and low power devices. Tests on a Kintex-7 FPGA confirm that the architecture leads to an efficient digital implementation, which can afford real-time performances on benchmark applications with power consumption lower than 100mW.","PeriodicalId":173204,"journal":{"name":"2021 IEEE 3rd International Conference on Artificial Intelligence Circuits and Systems (AICAS)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-06-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127730019","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Quantization Strategy for Pareto-optimally Low-cost and Accurate CNN pareto最优低成本精确CNN量化策略
K. Nakata, D. Miyashita, A. Maki, F. Tachibana, S. Sasaki, J. Deguchi, Ryuichi Fujimoto
{"title":"Quantization Strategy for Pareto-optimally Low-cost and Accurate CNN","authors":"K. Nakata, D. Miyashita, A. Maki, F. Tachibana, S. Sasaki, J. Deguchi, Ryuichi Fujimoto","doi":"10.1109/AICAS51828.2021.9458452","DOIUrl":"https://doi.org/10.1109/AICAS51828.2021.9458452","url":null,"abstract":"Quantization is an effective technique to reduce memory and computational costs for inference of convolutional neural networks (CNNs). However, it has not been clarified which model can achieve higher recognition accuracy with lower memory and computational costs: a fat model (large number of parameters) quantized to an extremely low bit width (e.g., 1 or 2 bits) or a slim model (small number of parameters) quantized to moderately low bit width (e.g., 4 or 5 bits). To answer this question, we define a metric that combines the number of parameters and computations with bit widths of quantized weight parameters. Using this metric, we demonstrate that Pareto-optimal performance, where the best accuracy is obtained at a given memory or computational cost, is achieved when a slim model is moderately quantized rather than when a fat model is extremely quantized. Moreover, employing a strategy based on this finding, we empirically show that the Pareto frontier is improved by 4.3× under a post-training quantization scenario on the ImageNet dataset.","PeriodicalId":173204,"journal":{"name":"2021 IEEE 3rd International Conference on Artificial Intelligence Circuits and Systems (AICAS)","volume":"63 6 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-06-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130863999","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Energy Efficient Computing with Heterogeneous DNN Accelerators 基于异构DNN加速器的节能计算
Md. Shazzad Hossain, I. Savidis
{"title":"Energy Efficient Computing with Heterogeneous DNN Accelerators","authors":"Md. Shazzad Hossain, I. Savidis","doi":"10.1109/AICAS51828.2021.9458474","DOIUrl":"https://doi.org/10.1109/AICAS51828.2021.9458474","url":null,"abstract":"The exploration of custom deep neural network (DNN) based accelerators for highly energy constrained edge devices with on-device intelligence is gaining traction in the research community. Despite the superior throughout and performance of custom accelerators as compared to CPUs or GPUs, the energy efficiency and versatility of state-of-the-art DNN accelerators is constrained due to the limited scope of monolithic architectures, where the entire accelerator executes only one model at any given time. In this paper, a multi-voltage domain heterogeneous DNN accelerator architecture is proposed that simultaneously executes multiple models with different power-performance operating points. The proposed architecture and circuits are evaluated with SPICE simulation in a 65 nm CMOS technology. The simulation results indicate that the proposed heterogeneous architecture improves the energy efficiency to 2.04 TOPS/W, while the conventional monolithic and single voltage domain architecture exhibits an energy efficiency of 0.0458 TOPS/W. In addition, the total power consumption of the accelerator SoC is reduced to 1.34 W as compared to the 3.72 W consumed by the baseline architecture when all multiply-and-accumulate (MACs) units operate at a voltage of 0.45 V.","PeriodicalId":173204,"journal":{"name":"2021 IEEE 3rd International Conference on Artificial Intelligence Circuits and Systems (AICAS)","volume":"27 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-06-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133072344","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Graph-Based Spatio-Temporal Backpropagation for Training Spiking Neural Networks 基于图的脉冲神经网络训练时空反向传播
Yulong Yan, Haoming Chu, Xin Chen, Yi Jin, Y. Huan, Lirong Zheng, Zhuo Zou
{"title":"Graph-Based Spatio-Temporal Backpropagation for Training Spiking Neural Networks","authors":"Yulong Yan, Haoming Chu, Xin Chen, Yi Jin, Y. Huan, Lirong Zheng, Zhuo Zou","doi":"10.1109/AICAS51828.2021.9458461","DOIUrl":"https://doi.org/10.1109/AICAS51828.2021.9458461","url":null,"abstract":"Dedicated hardware for spiking neural networks (SNN) reduces energy consumption with spike-driven computing. This paper proposes a graph-based spatio-temporal backpropagation (G-STBP) to train SNN, aiming to enhance spike sparsity for energy efficiency, while ensuring the accuracy. A differentiable leaky integrate-and-fire (LIF) model is suggested to establish the backpropagation path. The sparse regularization is proposed to reduce the spike firing rate with a guaranteed accuracy. GSTBP enables training in any network topologies thanks to graph representation. A recurrent network is demonstrated with spike-sparse rank order coding. The experimental result on rank order coded MNIST shows that the recurrent SNN trained by G-STBP achieves the accuracy of 97.3% using 392 spikes per inference.","PeriodicalId":173204,"journal":{"name":"2021 IEEE 3rd International Conference on Artificial Intelligence Circuits and Systems (AICAS)","volume":"55 35 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-06-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124761625","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 6
NeuroSim Validation with 40nm RRAM Compute-in-Memory Macro 40nm RRAM内存宏计算的NeuroSim验证
A. Lu, Xiaochen Peng, Wantong Li, Hongwu Jiang, Shimeng Yu
{"title":"NeuroSim Validation with 40nm RRAM Compute-in-Memory Macro","authors":"A. Lu, Xiaochen Peng, Wantong Li, Hongwu Jiang, Shimeng Yu","doi":"10.1109/AICAS51828.2021.9458501","DOIUrl":"https://doi.org/10.1109/AICAS51828.2021.9458501","url":null,"abstract":"Compute-in-memory (CIM) is an attractive solution to process the extensive workloads of multiply-and-accumulate (MAC) operations in deep neural network (DNN) hardware accelerators. A simulator with options of various mainstream and emerging memory technologies, architectures and networks can be a great convenience for fast early-stage design space exploration of CIM accelerators. DNN+NeuroSim is an integrated benchmark framework supporting flexible and hierarchical CIM array design options from device-level, to circuit-level and up to algorithm-level. In this paper, we validate and calibrate the prediction of NeuroSim against a 40nm RRAM-based CIM macro post-layout simulations. First, the parameters of memory device and CMOS transistor are extracted from the TSMC’s PDK and employed on the NeuroSim settings; the peripheral modules and operating process are also configured to be the same as the actual chip. Next, the area, critical path and energy consumption values from the SPICE simulations at the module-level are compared with those from NeuroSim. Some adjustment factors are introduced to account for transistor sizing and wiring area in layout, gate switching activity and post-layout performance drop, etc. We show that the prediction from NeuroSim is precise with chip-level error under 1% after the calibration.","PeriodicalId":173204,"journal":{"name":"2021 IEEE 3rd International Conference on Artificial Intelligence Circuits and Systems (AICAS)","volume":"2 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-06-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124867656","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 6
LPE: Logarithm Posit Processing Element for Energy-Efficient Edge-Device Training LPE:用于节能边缘设备训练的对数正态处理单元
Yang Wang, Dazheng Deng, Leibo Liu, Shaojun Wei, S. Yin
{"title":"LPE: Logarithm Posit Processing Element for Energy-Efficient Edge-Device Training","authors":"Yang Wang, Dazheng Deng, Leibo Liu, Shaojun Wei, S. Yin","doi":"10.1109/AICAS51828.2021.9458421","DOIUrl":"https://doi.org/10.1109/AICAS51828.2021.9458421","url":null,"abstract":"Recently, edge-device training has arisen an urgent necessity since it can enhance the model adaptability without causing high transmission cost and privacy issues. Due to the need for a wide data range and high data precision to improve accuracy, DNN training requires much wider floating-point (FP) data for convolution and complicated arithmetics for batch normalization. They lead to massive computation and memory access energy, which yields challenges for power-constrained edge-devices. This paper proposes a novel PE, called LPE, with three innovations to solve this issue. First, LPE stores the operands in the posit format, satisfying both precision and data range with lower bit-width. It reduces training latency and energy for memory access. Second, LPE transfers complicated arithmetics during training into the logarithm domain, including multiplication in convolution layer and division, square, square root in batch normalization layers. It reduces computation energy and improves throughput. Third, LPE contains a two-stage floating-point accumulation unit. It extends the computation range while using the low bit-width accumulator, enhancing precision and reducing power consumption. Evaluated with 28 nm CMOS process, our PE achieves 1.81× power and 1.35× area reduction compared with IEEE 754 float-point 16 (FP16) fused MAC while maintaining the same dynamic range. When performing training with the proposed PE unit, it can achieve 1.97× energy reduction and offer 1.68× speed up.","PeriodicalId":173204,"journal":{"name":"2021 IEEE 3rd International Conference on Artificial Intelligence Circuits and Systems (AICAS)","volume":"38 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-06-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132494134","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
相关产品
×
本文献相关产品
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信