2021 IEEE 3rd International Conference on Artificial Intelligence Circuits and Systems (AICAS)最新文献

Dynamically-biased Fixed-point LSTM for Time Series Processing in AIoT Edge Device AIoT边缘设备时间序列处理的动态偏置不动点LSTM

2021 IEEE 3rd International Conference on Artificial Intelligence Circuits and Systems (AICAS) Pub Date : 2021-06-06 DOI: 10.1109/AICAS51828.2021.9458508

Jinhai Hu, W. Goh, Yuan Gao

{"title":"Dynamically-biased Fixed-point LSTM for Time Series Processing in AIoT Edge Device","authors":"Jinhai Hu, W. Goh, Yuan Gao","doi":"10.1109/AICAS51828.2021.9458508","DOIUrl":"https://doi.org/10.1109/AICAS51828.2021.9458508","url":null,"abstract":"In this paper, a Dynamically-Biased Long Short-Term Memory (DB-LSTM) neural network architecture is proposed for artificial intelligence internet of things (AIoT) applications. Different from the conventional LSTM which uses static bias, DB-LSTM adjusts the cell bias dynamically based on the previous status. Hence, a DB-LSTM cell contains information of both the previous output and the current cell state. With more information, the DB-LSTM is able to achieve faster training convergence and better accuracy. Furthermore, weight quantization is performed to reduce the weights to either 1-bit or 2-bit, so that the algorithm can be implemented in portable edge device. With the same 100 epochs training setup, more than 70% loss reduction are achieved for floating 32-bit, 1-bit and 2-bit weights, respectively. The loss degradation due to weight quantization is also negligible. The performance of the proposed model is also validated with the classical air passenger forecasting problem. 0.075 loss and 94.96% accuracy are achieved with 2-bit weight when compared to the ground truth, which is comparable to full-length 32-bit weight.","PeriodicalId":173204,"journal":{"name":"2021 IEEE 3rd International Conference on Artificial Intelligence Circuits and Systems (AICAS)","volume":"8 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-06-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123283065","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 3

MRAM-based BER resilient Quantized edge-AI Networks for Harsh Industrial Conditions 苛刻工业条件下基于mram的BER弹性量化边缘人工智能网络

2021 IEEE 3rd International Conference on Artificial Intelligence Circuits and Systems (AICAS) Pub Date : 2021-06-06 DOI: 10.1109/AICAS51828.2021.9458528

V. Parmar, M. Suri, K. Yamane, T. Lee, Nyuk Leong Chung, V. B. Naik

引用次数: 2

Design Tools for Resistive Crossbar based Machine Learning Accelerators

2021 IEEE 3rd International Conference on Artificial Intelligence Circuits and Systems (AICAS) Pub Date : 2021-06-06 DOI: 10.1109/AICAS51828.2021.9458433

I. Chakraborty, Sourjya Roy, S. Sridharan, M. Ali, Aayush Ankit, Shubham Jain, A. Raghunathan

引用次数: 2

Two-phase Scheme for Trimming QTMT CU Partition using Multi-branch Convolutional Neural Networks 基于多分支卷积神经网络的QTMT CU分区两相修剪方案

2021 IEEE 3rd International Conference on Artificial Intelligence Circuits and Systems (AICAS) Pub Date : 2021-06-06 DOI: 10.1109/AICAS51828.2021.9458479

Pin-Chieh Fu, Chia-Cheng Yen, Nien-Chen Yang, Jia-Shung Wang

{"title":"Two-phase Scheme for Trimming QTMT CU Partition using Multi-branch Convolutional Neural Networks","authors":"Pin-Chieh Fu, Chia-Cheng Yen, Nien-Chen Yang, Jia-Shung Wang","doi":"10.1109/AICAS51828.2021.9458479","DOIUrl":"https://doi.org/10.1109/AICAS51828.2021.9458479","url":null,"abstract":"Versatile Video Coding (VVC) initialized in October 2017, will provide the same subjective quality at roughly 50% the bitrate of its predecessor HEVC. VVC introduced a complex structure of quad-tree plus multi-type tree block partitioning (QT + MTT, or QTMT) in each $128 times 128$ block. However, it brings more encoding complexity. In this work, to tackle this problem effectively, a two-phase scheme for trimming QTMT CU partition using multi-branch CNN is presented. The goal is to predict the (QTMT) depth of QTMT partitioning on the basis of each block of size $32 times 32$. In the first phase, a backbone CNN followed by three parallel branches extracts latent features to predict which QT depth and whether using ternary-tree (TT) or not. In the second phase, based on the above prediction information, a huge number of possible (distinct) combinations of QTMT CU partition can be trimmed to reduce computational complexity. However, the practice of multiple branches leads to a significant increase in the amount of neural parameters in the CNN and consequently, the total computations of both training and inferencing will be raised significantly. Therefore, effective deep learning modules in MobilenetV2 are applied to downgrade the amount of parameters to an adequate level eventually. The experimental results show that the proposed method achieves 42.341% average saving of encoding time for all VVC test sequences and with 0.71 Bjntegaard-Delta bit-rate (BD-BR) increasing compared with VTM 6.1 in All-intra (AI) configuration.","PeriodicalId":173204,"journal":{"name":"2021 IEEE 3rd International Conference on Artificial Intelligence Circuits and Systems (AICAS)","volume":"45 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-06-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121472540","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 7

Efficient Digital Implementation of n-mode Tensor-Matrix Multiplication n模张量-矩阵乘法的高效数字化实现

2021 IEEE 3rd International Conference on Artificial Intelligence Circuits and Systems (AICAS) Pub Date : 2021-06-06 DOI: 10.1109/AICAS51828.2021.9458404

C. Gianoglio, E. Ragusa, R. Zunino, P. Gastaldo

引用次数: 1

Quantization Strategy for Pareto-optimally Low-cost and Accurate CNN pareto最优低成本精确CNN量化策略

2021 IEEE 3rd International Conference on Artificial Intelligence Circuits and Systems (AICAS) Pub Date : 2021-06-06 DOI: 10.1109/AICAS51828.2021.9458452

K. Nakata, D. Miyashita, A. Maki, F. Tachibana, S. Sasaki, J. Deguchi, Ryuichi Fujimoto

{"title":"Quantization Strategy for Pareto-optimally Low-cost and Accurate CNN","authors":"K. Nakata, D. Miyashita, A. Maki, F. Tachibana, S. Sasaki, J. Deguchi, Ryuichi Fujimoto","doi":"10.1109/AICAS51828.2021.9458452","DOIUrl":"https://doi.org/10.1109/AICAS51828.2021.9458452","url":null,"abstract":"Quantization is an effective technique to reduce memory and computational costs for inference of convolutional neural networks (CNNs). However, it has not been clarified which model can achieve higher recognition accuracy with lower memory and computational costs: a fat model (large number of parameters) quantized to an extremely low bit width (e.g., 1 or 2 bits) or a slim model (small number of parameters) quantized to moderately low bit width (e.g., 4 or 5 bits). To answer this question, we define a metric that combines the number of parameters and computations with bit widths of quantized weight parameters. Using this metric, we demonstrate that Pareto-optimal performance, where the best accuracy is obtained at a given memory or computational cost, is achieved when a slim model is moderately quantized rather than when a fat model is extremely quantized. Moreover, employing a strategy based on this finding, we empirically show that the Pareto frontier is improved by 4.3× under a post-training quantization scenario on the ImageNet dataset.","PeriodicalId":173204,"journal":{"name":"2021 IEEE 3rd International Conference on Artificial Intelligence Circuits and Systems (AICAS)","volume":"63 6 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-06-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130863999","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Energy Efficient Computing with Heterogeneous DNN Accelerators 基于异构DNN加速器的节能计算

2021 IEEE 3rd International Conference on Artificial Intelligence Circuits and Systems (AICAS) Pub Date : 2021-06-06 DOI: 10.1109/AICAS51828.2021.9458474

Md. Shazzad Hossain, I. Savidis

{"title":"Energy Efficient Computing with Heterogeneous DNN Accelerators","authors":"Md. Shazzad Hossain, I. Savidis","doi":"10.1109/AICAS51828.2021.9458474","DOIUrl":"https://doi.org/10.1109/AICAS51828.2021.9458474","url":null,"abstract":"The exploration of custom deep neural network (DNN) based accelerators for highly energy constrained edge devices with on-device intelligence is gaining traction in the research community. Despite the superior throughout and performance of custom accelerators as compared to CPUs or GPUs, the energy efficiency and versatility of state-of-the-art DNN accelerators is constrained due to the limited scope of monolithic architectures, where the entire accelerator executes only one model at any given time. In this paper, a multi-voltage domain heterogeneous DNN accelerator architecture is proposed that simultaneously executes multiple models with different power-performance operating points. The proposed architecture and circuits are evaluated with SPICE simulation in a 65 nm CMOS technology. The simulation results indicate that the proposed heterogeneous architecture improves the energy efficiency to 2.04 TOPS/W, while the conventional monolithic and single voltage domain architecture exhibits an energy efficiency of 0.0458 TOPS/W. In addition, the total power consumption of the accelerator SoC is reduced to 1.34 W as compared to the 3.72 W consumed by the baseline architecture when all multiply-and-accumulate (MACs) units operate at a voltage of 0.45 V.","PeriodicalId":173204,"journal":{"name":"2021 IEEE 3rd International Conference on Artificial Intelligence Circuits and Systems (AICAS)","volume":"27 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-06-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133072344","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 1

Graph-Based Spatio-Temporal Backpropagation for Training Spiking Neural Networks 基于图的脉冲神经网络训练时空反向传播

2021 IEEE 3rd International Conference on Artificial Intelligence Circuits and Systems (AICAS) Pub Date : 2021-06-06 DOI: 10.1109/AICAS51828.2021.9458461

Yulong Yan, Haoming Chu, Xin Chen, Yi Jin, Y. Huan, Lirong Zheng, Zhuo Zou

引用次数: 6

NeuroSim Validation with 40nm RRAM Compute-in-Memory Macro 40nm RRAM内存宏计算的NeuroSim验证

2021 IEEE 3rd International Conference on Artificial Intelligence Circuits and Systems (AICAS) Pub Date : 2021-06-06 DOI: 10.1109/AICAS51828.2021.9458501

A. Lu, Xiaochen Peng, Wantong Li, Hongwu Jiang, Shimeng Yu

{"title":"NeuroSim Validation with 40nm RRAM Compute-in-Memory Macro","authors":"A. Lu, Xiaochen Peng, Wantong Li, Hongwu Jiang, Shimeng Yu","doi":"10.1109/AICAS51828.2021.9458501","DOIUrl":"https://doi.org/10.1109/AICAS51828.2021.9458501","url":null,"abstract":"Compute-in-memory (CIM) is an attractive solution to process the extensive workloads of multiply-and-accumulate (MAC) operations in deep neural network (DNN) hardware accelerators. A simulator with options of various mainstream and emerging memory technologies, architectures and networks can be a great convenience for fast early-stage design space exploration of CIM accelerators. DNN+NeuroSim is an integrated benchmark framework supporting flexible and hierarchical CIM array design options from device-level, to circuit-level and up to algorithm-level. In this paper, we validate and calibrate the prediction of NeuroSim against a 40nm RRAM-based CIM macro post-layout simulations. First, the parameters of memory device and CMOS transistor are extracted from the TSMC’s PDK and employed on the NeuroSim settings; the peripheral modules and operating process are also configured to be the same as the actual chip. Next, the area, critical path and energy consumption values from the SPICE simulations at the module-level are compared with those from NeuroSim. Some adjustment factors are introduced to account for transistor sizing and wiring area in layout, gate switching activity and post-layout performance drop, etc. We show that the prediction from NeuroSim is precise with chip-level error under 1% after the calibration.","PeriodicalId":173204,"journal":{"name":"2021 IEEE 3rd International Conference on Artificial Intelligence Circuits and Systems (AICAS)","volume":"2 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-06-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124867656","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 6

LPE: Logarithm Posit Processing Element for Energy-Efficient Edge-Device Training LPE:用于节能边缘设备训练的对数正态处理单元

2021 IEEE 3rd International Conference on Artificial Intelligence Circuits and Systems (AICAS) Pub Date : 2021-06-06 DOI: 10.1109/AICAS51828.2021.9458421

Yang Wang, Dazheng Deng, Leibo Liu, Shaojun Wei, S. Yin

{"title":"LPE: Logarithm Posit Processing Element for Energy-Efficient Edge-Device Training","authors":"Yang Wang, Dazheng Deng, Leibo Liu, Shaojun Wei, S. Yin","doi":"10.1109/AICAS51828.2021.9458421","DOIUrl":"https://doi.org/10.1109/AICAS51828.2021.9458421","url":null,"abstract":"Recently, edge-device training has arisen an urgent necessity since it can enhance the model adaptability without causing high transmission cost and privacy issues. Due to the need for a wide data range and high data precision to improve accuracy, DNN training requires much wider floating-point (FP) data for convolution and complicated arithmetics for batch normalization. They lead to massive computation and memory access energy, which yields challenges for power-constrained edge-devices. This paper proposes a novel PE, called LPE, with three innovations to solve this issue. First, LPE stores the operands in the posit format, satisfying both precision and data range with lower bit-width. It reduces training latency and energy for memory access. Second, LPE transfers complicated arithmetics during training into the logarithm domain, including multiplication in convolution layer and division, square, square root in batch normalization layers. It reduces computation energy and improves throughput. Third, LPE contains a two-stage floating-point accumulation unit. It extends the computation range while using the low bit-width accumulator, enhancing precision and reducing power consumption. Evaluated with 28 nm CMOS process, our PE achieves 1.81× power and 1.35× area reduction compared with IEEE 754 float-point 16 (FP16) fused MAC while maintaining the same dynamic range. When performing training with the proposed PE unit, it can achieve 1.97× energy reduction and offer 1.68× speed up.","PeriodicalId":173204,"journal":{"name":"2021 IEEE 3rd International Conference on Artificial Intelligence Circuits and Systems (AICAS)","volume":"38 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-06-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132494134","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0