{"title":"4b/4b/8b Precision Charge-Domain 8T-SRAM Based CiM for CNN Processing","authors":"Qibang Zang, W. Goh, Y. Chong, A. Do","doi":"10.1109/AICAS57966.2023.10168593","DOIUrl":"https://doi.org/10.1109/AICAS57966.2023.10168593","url":null,"abstract":"Compute-in-memory (CiM) is a promising solution for solving the bottleneck of frequent data movement between the memory and processor in Von-Neumann architecture. In conventional multi-bit CiM architecture, when computing N-bit input and N-bit weight MAC operation, 2N 1 cycles are needed for N-bit input modulation and normally 3-4−cycles with complex switch operation are needed for N-bit weight realization, which significantly degrades the final throughput and power efficiency. In this work, a C-2C DAC built in the 8T SRAM CiM array is designed for 4-bit weight and 4-bit input MAC operation, which can be completed in just one cycle. In the final power efficiency evaluation, our 4b/4b/8b CiM architecture attained up to 640 TOPS/W (normalized to 1b/1b/1b precision) which is a 6-10 times improvement as compared to the conventional multi-bit CiM architectures. The proposed architecture with 4b/4b/8b precision can provide 91.47% and 68.10% accuracy on CIFAR-10 and CIFAR-100 dataset classification, respectively.","PeriodicalId":296649,"journal":{"name":"2023 IEEE 5th International Conference on Artificial Intelligence Circuits and Systems (AICAS)","volume":"183 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-06-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133156224","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"A Convolved Self-Attention Model for IMU-based Gait Detection and Human Activity Recognition","authors":"Shuailin Tao, W. Goh, Yuan Gao","doi":"10.1109/AICAS57966.2023.10168654","DOIUrl":"https://doi.org/10.1109/AICAS57966.2023.10168654","url":null,"abstract":"This paper presents a convolved self-attention neural network model for gait detection and human activity recognition (HAR) tasks using wearable inertial measurement unit (IMU) sensors. By embedding a convolved window inside the self-attention module, prior time step knowledge is utilized by self-attention layer to improve accuracy. Moreover, a streamlined fully connected (FC) layer without hidden layers is proposed for the feature mixer. This arrangement enables significant reduction of overall network parameters, since hidden layers occupy the majority of the parameters in a transformer encoder. Compared to the other state-of-art neural networks, the proposed method achieved better accuracy of 95.83% and 96.01% with the smallest network size on HAR datasets UCI-HAR and MHEALTH respectively,","PeriodicalId":296649,"journal":{"name":"2023 IEEE 5th International Conference on Artificial Intelligence Circuits and Systems (AICAS)","volume":"135 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-06-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131725053","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Thanh-Dat Nguyen, Minh-Son Le, Thi-Nhan Pham, I. Chang
{"title":"TRIO: a Novel 10T Ternary SRAM Cell for Area-Efficient In-memory Computing of Ternary Neural Networks","authors":"Thanh-Dat Nguyen, Minh-Son Le, Thi-Nhan Pham, I. Chang","doi":"10.1109/AICAS57966.2023.10168596","DOIUrl":"https://doi.org/10.1109/AICAS57966.2023.10168596","url":null,"abstract":"We introduce TRIO, a 10T SRAM cell for inmemory computing circuits in ternary neural networks (TNNs). TRIO's thin-cell type layout occupies only 0.492μm2 in a 28nm FD-SOI technology, which is smaller than some state-of-the-art ternary SRAM cells. Comparing TRIO to other works, we found that it consumes less analog multiplication power, indicating its potential for improving the area and power efficiency of TNN IMC circuits. Our optimized TNN IMC circuit using TRIO achieved high area and power efficiencies of 369.39 TOPS/mm2 and 333.8 TOPS/W in simulations.","PeriodicalId":296649,"journal":{"name":"2023 IEEE 5th International Conference on Artificial Intelligence Circuits and Systems (AICAS)","volume":"50 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-06-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132038767","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Deep Learning Compiler Optimization on Multi-Chiplet Architecture","authors":"Huiqing Xu, Kuang Mao, Quihong Pan, Zhaorong Tang, Mengdi Wang, Ying Wang","doi":"10.1109/AICAS57966.2023.10168656","DOIUrl":"https://doi.org/10.1109/AICAS57966.2023.10168656","url":null,"abstract":"Multi-chiplet architecture can provide a high-performance solution for new tasks such as deep learning models. In order to fully utilize chiplets and accelerate the execution of deep learning models, we present a deep learning compilation optimization framework for chiplets, and propose a scheduling method based on data dependence. Experiments show that our method improves the compilation efficiency, and the performance of the scheduling scheme is at least 1-2 times higher than the traditional algorithms.","PeriodicalId":296649,"journal":{"name":"2023 IEEE 5th International Conference on Artificial Intelligence Circuits and Systems (AICAS)","volume":"68 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-06-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132168979","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Deruo Cheng, Yiqiong Shi, Yee-Yang Tee, Jingsi Song, Xue Wang, B. Wen, B. Gwee
{"title":"Deep-learning-based X-ray CT Slice Analysis for Layout Verification in Printed Circuit Boards","authors":"Deruo Cheng, Yiqiong Shi, Yee-Yang Tee, Jingsi Song, Xue Wang, B. Wen, B. Gwee","doi":"10.1109/AICAS57966.2023.10168608","DOIUrl":"https://doi.org/10.1109/AICAS57966.2023.10168608","url":null,"abstract":"3D X-ray Computational Tomography (CT) systems have been employed to inspect Printed Circuit Boards (PCB) for security analysis, considering the heightened trustworthiness concern on the globalized supply chain. In this paper, we propose a deep-learning-based layout verification (DELVer) framework to automatically extract PCB layout information from X-ray CT slices and verify against the design files. Leveraging on geometrical projective transformation, our proposed DELVer framework aligns the acquired CT slice of each PCB layer with their corresponding design file, to train state-of-the-art deep learning models for layout extraction and verification. It thus alleviates the laborious manual data labeling for deep learning models. With a cross-device evaluation on 4 multi-layer satellite PCBs of board size around 90 cm2, our proposed DELVer framework demonstrates how deep learning models can generalize to unseen target PCBs for layout verification, establishing an efficient solution for PCB assurance and industrial failure analysis.","PeriodicalId":296649,"journal":{"name":"2023 IEEE 5th International Conference on Artificial Intelligence Circuits and Systems (AICAS)","volume":"30 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-06-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116541201","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Jiaxin Huang, Florian Kelber, B. Vogginger, Binyi Wu, Felix Kreutz, Pascal Gerhards, Daniel Scholz, Klaus Knobloch, C. Mayr
{"title":"Efficient Algorithms for Accelerating Spiking Neural Networks on MAC Array of SpiNNaker 2","authors":"Jiaxin Huang, Florian Kelber, B. Vogginger, Binyi Wu, Felix Kreutz, Pascal Gerhards, Daniel Scholz, Klaus Knobloch, C. Mayr","doi":"10.1109/AICAS57966.2023.10168559","DOIUrl":"https://doi.org/10.1109/AICAS57966.2023.10168559","url":null,"abstract":"The CPU-based system is widely used for simulating the brain-inspired spiking neural networks (SNN) by taking the benefit of flexibility, while processing high input spiking rates caused by immature coding mechanism costs many CPU cycles, and the introduction of additional information required by serial execution needs the time-consuming pre- and post-neuron matching algorithm. To address these issues, we propose an algorithm set leveraging the multiply-accumulate (MAC) array to accelerate the SNN inference. By rearranging and compressing operands losslessly, we retain the advantage of the MAC array on fast parallel computing, as well as alleviate the ineffective memory occupation and the waste of computing resources, which result from the inherent sparse feature of SNN and reluctant memory alignment from fixed MAC hardware structure. Benchmarking with an SNN radar gesture recognition model, the algorithms jointly optimize 82.71% of the execution time compared to the serial computation on the ARM M4F of the SpiNNaker 2 chip; 49.89% of the memory footprint is reduced contrasted with the unoptimized MAC calculation. This article explicitly expands the application field of the General Sparse Matrix-Matrix Multiplication (SpGEMM) issue to SNN, developing novel SpGEMM optimization algorithms fitting the SNN feature and MAC array.","PeriodicalId":296649,"journal":{"name":"2023 IEEE 5th International Conference on Artificial Intelligence Circuits and Systems (AICAS)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-06-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121198381","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Junzhe Wang, Shiqi Zhao, Chaoming Fang, Jie Yang, M. Sawan
{"title":"Live Demonstration: Real-time Analyses of Biosignals based on a Dedicated CMOS Configurable Deep Learning Engine","authors":"Junzhe Wang, Shiqi Zhao, Chaoming Fang, Jie Yang, M. Sawan","doi":"10.1109/AICAS57966.2023.10168631","DOIUrl":"https://doi.org/10.1109/AICAS57966.2023.10168631","url":null,"abstract":"Biosignals generated by human bodies contain valuable information about a person’s physical or psychological states. In recent years, machine-learning algorithms have significantly increased the accuracy and usefulness of biosignal analysis in areas such as disease diagnoses and treatments. To make these analyses more portable and accessible, we have designed and fabricated a dedicated processor named CODE, which supports machine-learning processing of various types of biosignals, including electroencephalography (EEG), electromyography (EMG), and electrocardiography (ECG), with high power efficiency and low latency. In this demonstration, we will show how the CODE chip processes biosignal data in real-time and show measurements of its power consumption and efficiency.","PeriodicalId":296649,"journal":{"name":"2023 IEEE 5th International Conference on Artificial Intelligence Circuits and Systems (AICAS)","volume":"74 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-06-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124361062","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Modified Logarithmic Multiplication Approximation for Machine Learning","authors":"I. Kouretas, Vassilis Paliouras, T. Stouraitis","doi":"10.1109/AICAS57966.2023.10168664","DOIUrl":"https://doi.org/10.1109/AICAS57966.2023.10168664","url":null,"abstract":"In this paper, a novel approximation that allows exploitation of the full potential of logarithmic multiplication is proposed. More specifically, the proposed approximation is quantified in terms of mean square error (MSE) and compared to a competitive recent publication. Subsequently, an LSTM network is used as an illustrative test case and the proposed approximation is validated in terms of the accuracy of the netowrk. It has been shown that for short data wordlengths, the proposed approximation can achieve small loss values, for the particular LSTM network. Finally, the circuit implementation of the logarithmic multiplier is synthesized in a 28 nm standard-cell library. Results show reduced hardware complexity for similar loss values on the specific LSTM network.","PeriodicalId":296649,"journal":{"name":"2023 IEEE 5th International Conference on Artificial Intelligence Circuits and Systems (AICAS)","volume":"67 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-06-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123888790","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Energy Efficient Software-hardware Co-design of Quantized Recurrent Convolutional Neural Network for Continuous Cardiac Monitoring","authors":"Jinhai Hu, Cong Sheng Leow, W. Goh, Yuan Gao","doi":"10.1109/AICAS57966.2023.10168601","DOIUrl":"https://doi.org/10.1109/AICAS57966.2023.10168601","url":null,"abstract":"This paper presents an electrocardiogram (ECG) signal classification model based on Recurrent Convolutional Neural Network (RCNN). With recurrent connections and data buffers, a single convolutional layer is reused to implement multiple layers function. Using a 5-layers CNN network as an example, this approach reduces the number of parameters by more than 50% while achieving the same feature extraction size. Furthermore, quantized RCNN (QRCNN) is proposed where the input signal, interlayer output, and kernel weights are quantized to unsigned INT8, INT4, and signed INT4 respectively. For hardware implementation, pipelining and data reuse within the 1-D convolution kernel can potentially reduce latency. QRCNN model achieved 98.08% validation accuracy on MIT-BIH datasets with only 1% degradation due to quantization. The estimated dynamic power consumption of the QRCNN is less than 60% of a conventional quantized CNN when implemented on a Xilinx Artix-7 FPGA, showing the potential for resource-constraint edge devices.","PeriodicalId":296649,"journal":{"name":"2023 IEEE 5th International Conference on Artificial Intelligence Circuits and Systems (AICAS)","volume":"7 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-06-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122393487","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Fengshi Tian, Xiaomeng Wang, Jinbo Chen, Jie Yang, M. Sawan, C. Tsui, Kwang-Ting Cheng
{"title":"Binary is All You Need: Ultra-Efficient Arrhythmia Detection with a Binary-Only Compressive System","authors":"Fengshi Tian, Xiaomeng Wang, Jinbo Chen, Jie Yang, M. Sawan, C. Tsui, Kwang-Ting Cheng","doi":"10.1109/AICAS57966.2023.10168576","DOIUrl":"https://doi.org/10.1109/AICAS57966.2023.10168576","url":null,"abstract":"Detecting cardiac arrhythmia is critical for preventing heart attacks, and wearable electrocardiograph (ECG) systems have been developed to address this issue. However, the energy consumption of existing wearable systems is still significant at both the circuit and system levels, posing a challenge for their design. In this paper, we propose a novel ultra-efficient binary-only compressive ECG system for edge cardiac arrhythmia detection, featuring an event-driven level-crossing analog-to-spike converter (ATS) for sensing and a computing-in-memory (CIM) based binarized neural network (BNN) processor for compressive processing. Through system-level co-design, our proposed system achieves high arrhythmia detection accuracy and ultra-low energy consumption. Our simulations using the MIT-BIH dataset show that the proposed system achieves a 90.1% reduction in sampled data points compared to Nyquist sampling. Moreover, our dedicated BNN on a CIM engine delivers 97.03% arrhythmia detection accuracy with energy efficiency as low as 0.17uJ/inference.","PeriodicalId":296649,"journal":{"name":"2023 IEEE 5th International Conference on Artificial Intelligence Circuits and Systems (AICAS)","volume":"21 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-06-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126436519","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}