Mohammad Hasan Ahmadilivani, Mahdi Taheri, J. Raik, M. Daneshtalab, M. Jenihhin
{"title":"Enhancing Fault Resilience of QNNs by Selective Neuron Splitting","authors":"Mohammad Hasan Ahmadilivani, Mahdi Taheri, J. Raik, M. Daneshtalab, M. Jenihhin","doi":"10.1109/AICAS57966.2023.10168633","DOIUrl":"https://doi.org/10.1109/AICAS57966.2023.10168633","url":null,"abstract":"The superior performance of Deep Neural Networks (DNNs) has led to their application in various aspects of human life. Safety-critical applications are no exception and impose rigorous reliability requirements on DNNs. Quantized Neural Networks (QNNs) have emerged to tackle the complexity of DNN accelerators, however, they are more prone to reliability issues.In this paper, a recent analytical resilience assessment method is adapted for QNNs to identify critical neurons based on a Neuron Vulnerability Factor (NVF). Thereafter, a novel method for splitting the critical neurons is proposed that enables the design of a Lightweight Correction Unit (LCU) in the accelerator without redesigning its computational part.The method is validated by experiments on different QNNs and datasets. The results demonstrate that the proposed method for correcting the faults has a twice smaller overhead than a selective Triple Modular Redundancy (TMR) while achieving a similar level of fault resiliency.","PeriodicalId":296649,"journal":{"name":"2023 IEEE 5th International Conference on Artificial Intelligence Circuits and Systems (AICAS)","volume":"6 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-06-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132951417","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Temporal Similarity-Based Computation Reduction for Video Transformers in Edge Camera Nodes","authors":"Udari De Alwis, Zhongheng Xie, Massimo Alioto","doi":"10.1109/AICAS57966.2023.10168610","DOIUrl":"https://doi.org/10.1109/AICAS57966.2023.10168610","url":null,"abstract":"Recognizing human actions in video sequences has become an essential task in video surveillance applications. In such applications, transformer models have rapidly gained wide interest thanks to their performance. However, their advantages come at the cost of a high computational and memory cost, especially when they need to be incorporated in edge devices. In this work, temporal similarity tunnel insertion is utilized to reduce the overall computation burden in video transformer networks in action recognition tasks. Furthermore, an edge-friendly video transformer model is proposed based on temporal similarity, which substantially reduces the computation cost. Its smaller variant EMViT achieves 38% computation reduction under the UCF101 dataset, while keeping the accuracy degradation insignificant (<0.02%). Also, the larger variant CMViT reduces computation by 14% (13%) with an accuracy degradation of 2% (3%) in scaled Kinetic400 and Jester datasets.","PeriodicalId":296649,"journal":{"name":"2023 IEEE 5th International Conference on Artificial Intelligence Circuits and Systems (AICAS)","volume":"34 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-06-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132076976","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Jingyu He, Ziyang Shen, Fengshi Tian, Jinbo Chen, Jie Yang, M. Sawan, Hsiang-Ting Chen, P. Bogdan, C. Tsui
{"title":"SNNOpt: An Application-Specific Design Framework for Spiking Neural Networks","authors":"Jingyu He, Ziyang Shen, Fengshi Tian, Jinbo Chen, Jie Yang, M. Sawan, Hsiang-Ting Chen, P. Bogdan, C. Tsui","doi":"10.1109/AICAS57966.2023.10168605","DOIUrl":"https://doi.org/10.1109/AICAS57966.2023.10168605","url":null,"abstract":"We propose a systematic application-specific hardware design methodology for designing Spiking Neural Network (SNN), SNNOpt, which consists of three novel phases: 1) an Olliver-Ricci-Curvature (ORC)-based architecture-aware network partitioning, 2) a reinforcement learning mapping strategy, and 3) a Bayesian optimization algorithm for NoC design space exploration. Experimental results show that SNNOpt achieves a 47.45% less runtime and 58.64% energy savings over state-of-the-art approaches.","PeriodicalId":296649,"journal":{"name":"2023 IEEE 5th International Conference on Artificial Intelligence Circuits and Systems (AICAS)","volume":"9 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-06-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132385149","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Hanrui Li, Aijaz H. Lone, Fengshi Tian, Jie Yang, M. Sawan, Nazek El‐Atab
{"title":"Novel Knowledge Distillation to Improve Training Accuracy of Spin-based SNN","authors":"Hanrui Li, Aijaz H. Lone, Fengshi Tian, Jie Yang, M. Sawan, Nazek El‐Atab","doi":"10.1109/AICAS57966.2023.10168575","DOIUrl":"https://doi.org/10.1109/AICAS57966.2023.10168575","url":null,"abstract":"Spintronics-based magnetic tunnel junction (MTJ) devices have shown the ability working as both synapse and spike threshold neurons, which is perfectly suitable with the hardware implementation of spike neural network (SNN). It has the inherent advantage of high energy efficiency with ultra-low operation voltage due to its small nanometric size and low depinning current densities. However, hardware-based SNNs training always suffers a significant performance loss compared with original neural networks due to variations among devices and information deficiency as the weights map with device synaptic conductance. Knowledge distillation is a model compression and acceleration method that enables transferring the learning knowledge from a large machine learning model to a smaller model with minimal loss in performance. In this paper, we propose a novel training scheme based on spike knowledge distillation which helps improve the training performance of spin-based SNN (SSNN) model via transferring knowledge from a large CNN model. We propose novel distillation methodologies and demonstrate the effectiveness of the proposed method with detailed experiments on four datasets. The experimental results indicate that our proposed training scheme consistently improves the performance of SSNN model by a large margin.","PeriodicalId":296649,"journal":{"name":"2023 IEEE 5th International Conference on Artificial Intelligence Circuits and Systems (AICAS)","volume":"15 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-06-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126632409","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"A Column-Parallel Time-Interleaved SAR/SS ADC for Computing in Memory with 2-8bit Reconfigurable Resolution","authors":"Yuandong Li, Li Du, Yuan Du","doi":"10.1109/AICAS57966.2023.10168604","DOIUrl":"https://doi.org/10.1109/AICAS57966.2023.10168604","url":null,"abstract":"Computing in Memory (CiM), as a computing system with non-von Neumann architecture, has been reported as one of the most promising neural network accelerators in the future. Compared with digital-based computation, CiM uses RAM arrays to calculate and store in the analog domain, avoiding the high delay and energy consumption caused by data transfer. However, the computational results require data converters for quantization, which often limits the development of high-performance CiMs. In this work, we propose a 2-8bit reconfigurable time-interleaved hybrid ADC architecture for high-speed CiMs, including successive approximation and single-slope stages. Reconfigurability introduces a trade-off between resolution and conversion speed for ADCs in different computing scenarios. A prototype was implemented in a 55 nm CMOS technology, which occupies an area of 330μm × 13μm and consumes a power of 1.429mW at 8-bit conversion mode. With a Nyquist frequency input sampled at 350 MS/s, the SNDR and SFDR are 40.93 dB and 51.08 dB, respectively. The resultant Walden figure of merit is 44.8 fJ/conv.","PeriodicalId":296649,"journal":{"name":"2023 IEEE 5th International Conference on Artificial Intelligence Circuits and Systems (AICAS)","volume":"55 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-06-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123237127","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Low-Power Convolutional Neural Network Accelerator on FPGA","authors":"Kasem Khalil, Ashok Kumar V, M. Bayoumi","doi":"10.1109/AICAS57966.2023.10168646","DOIUrl":"https://doi.org/10.1109/AICAS57966.2023.10168646","url":null,"abstract":"Convolutional Neural Network (CNN) accelerator is highly beneficial for mobile and resource-constrained devices. One of the research challenges is to design a power-economic accelerator. This paper proposes a CNN accelerator with low power consumption and acceptable performance. The proposed method uses pipelining between the used kernels for the convolution process and a shared multiplication and accumulation block. The available kernels work consequently while each one performs a different operation in sequence. The proposed method utilizes a series of operations between the kernels and memory weights to speed up the convolution process. The proposed accelerator is implemented using VHDL and FPGA Altera Arria 10 GX. The results show that the proposed method achieves 26.37 GOPS/W of energy consumption, which is lower than the existing method, with acceptable resource usage and performance. The proposed method is ideally suited for small and constrained devices.","PeriodicalId":296649,"journal":{"name":"2023 IEEE 5th International Conference on Artificial Intelligence Circuits and Systems (AICAS)","volume":"25 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-06-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123555096","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"RC-GNN: Fast and Accurate Signoff Wire Delay Estimation with Customized Graph Neural Networks","authors":"Linyu Zhu, Yue Gu, Xinfei Guo","doi":"10.1109/AICAS57966.2023.10168562","DOIUrl":"https://doi.org/10.1109/AICAS57966.2023.10168562","url":null,"abstract":"As interconnect delay becomes more dominate in a timing path compared to the gate delay, accurate yet fast estimation of wire delay during the signoff stage is required. Prior machine learning-based wire delay estimation approaches either relied on tedious feature extraction processes or failed to capture the net topology information, incurring long turn around time. In this paper, we propose to leverage the power of graph neural networks (GNN) to estimate the interconnect delays during signoff. Different from other GNN-assisted timing analysis methods that were usually applied to a netlist, we harness the global message passing graph representation learning on RC graph directly to perform ultra-fast net delay estimation without requiring extra features. Furthermore, pre-processed graph features can be added to boost the estimation accuracy with slight run time penalty. Our proposed customized GNN models have been evaluated with the industrial design and compared against state of the art ML-based wire delay estimator. It shows that the proposed model outperforms the state-of-the-art ML-based signoff wire delay estimator by 4x in terms of run time while achieving similar accuracy levels.","PeriodicalId":296649,"journal":{"name":"2023 IEEE 5th International Conference on Artificial Intelligence Circuits and Systems (AICAS)","volume":"12 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-06-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125176141","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Jiahao Liu, Xinyu Liu, Liang Zhou, L. Chang, Jun Zhou
{"title":"A Lightweight Convolutional Neural Network for Atrial Fibrillation Detection Using Dual-Channel Binary Features from Single-Lead Short ECG","authors":"Jiahao Liu, Xinyu Liu, Liang Zhou, L. Chang, Jun Zhou","doi":"10.1109/AICAS57966.2023.10168645","DOIUrl":"https://doi.org/10.1109/AICAS57966.2023.10168645","url":null,"abstract":"Atrial fibrillation (AF) is a prevalent cardiovascular disease in the elderly, significantly increasing the risk of stroke and heart failure, etc. While the artificial neural network (ANN) has recently demonstrated high accuracy in ECG-based AF detection, its high computation complexity makes it challenging for real-time and long-term monitoring on low-power wearable devices, which is critical for detecting paroxysmal AF. Therefore, in this work, a lightweight convolutional neural network for AF detection is proposed using a dual-channel binary features extraction technique from single-lead short ECG to achieve both high classification accuracy and low computation complexity, and evaluated on the 2017 PhysioNet/CinC Challenge dataset, the proposed method achieves 93.6% sensitivity and 0.81 F1 score for AF detection. Moreover, this design consumes only 1.83M parameters, achieving up to 27x reductions compared with prior works, and only needs 57M MACs for calculation. As a result, it is suitable for deployment in low-power wearable devices for long-term AF monitoring.","PeriodicalId":296649,"journal":{"name":"2023 IEEE 5th International Conference on Artificial Intelligence Circuits and Systems (AICAS)","volume":"76 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-06-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121807942","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
D. Kushwaha, Rajat Kohli, Jwalant Mishra, R. Joshi, S. Dasgupta, B. Anand
{"title":"A Fully Differential 4-Bit Analog Compute-In-Memory Architecture for Inference Application","authors":"D. Kushwaha, Rajat Kohli, Jwalant Mishra, R. Joshi, S. Dasgupta, B. Anand","doi":"10.1109/AICAS57966.2023.10168599","DOIUrl":"https://doi.org/10.1109/AICAS57966.2023.10168599","url":null,"abstract":"A robust, fully differential multiplication and accumulate (MAC) scheme for analog compute-in-memory (CIM) architecture is proposed in this article. The proposed method achieves a high signal margin for 4-bit CIM architecture due to fully differential voltage changes on read bit-lines (RBL/RBLBs). The signal margin achieved for 4-bit MAC operation is 32 mV, which is 1.14×, 5.82×, and 10.24× higher than the state-of-the-art. The proposed scheme is robust against the process, voltage, and temperature (PVT) variations and achieves a variability metric (σ/µ) of 3.64 %, which is 2.36× and 2.66× lower than the reported works. The architecture has achieved an energy-efficiency of 2.53 TOPS/W at 1 V supply voltage in 65 nm CMOS technology, that is 6.2× efficient than digital baseline HW [25]. Furthermore, the inference accuracy of the architecture is 97.6% on the MNIST data set with a LeNet-5 CNN model. The figure-of-merit (FoM) of the proposed design is 355, which is 3.28×, 3.58×, and 17.75× higher than state-of-the-art.","PeriodicalId":296649,"journal":{"name":"2023 IEEE 5th International Conference on Artificial Intelligence Circuits and Systems (AICAS)","volume":"85 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-06-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114512651","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"TPE: A High-Performance Edge-Device Inference with Multi-level Transformational Mechanism","authors":"Zhou Wang, Jingchuang Wei, Xiaonan Tang, Boxiao Han, Hongjun He, Leibo Liu, Shaojun Wei, S. Yin","doi":"10.1109/AICAS57966.2023.10168614","DOIUrl":"https://doi.org/10.1109/AICAS57966.2023.10168614","url":null,"abstract":"DNN inference of edge devices has been very important for a long time with large computing and energy consumption demand. This paper proposes a TPE(Transformation Process Element) with three characteristics. Firstly, TPE has a method of Data Segmentation Skip and Pre-Reorganization(DSSPR). Secondly, TPE has a Typical Value Matching and Calibration Computer (TVMCC) system, which converts direct calculation into matching and calibration calculation. Thirdly, TPE includes a Data Format Pre-Configuration and Self-Adjustment (DFPCSA) scheme. Compared with the most typical pure reasoning processor UNPU, TPE achieves 1.25× better energy consumption.","PeriodicalId":296649,"journal":{"name":"2023 IEEE 5th International Conference on Artificial Intelligence Circuits and Systems (AICAS)","volume":"70 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-06-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114851104","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}