2023 IEEE 5th International Conference on Artificial Intelligence Circuits and Systems (AICAS)最新文献_第5页

A Novel Transpose 2T-DRAM based Computing-in-Memory Architecture for On-chip DNN Training and Inference 一种新的基于转置2T-DRAM的片上深度神经网络训练与推理的内存计算架构

2023 IEEE 5th International Conference on Artificial Intelligence Circuits and Systems (AICAS) Pub Date : 2023-06-11 DOI: 10.1109/AICAS57966.2023.10168641

Yuansheng Zhao, Zixuan Shen, Jiarui Xu, K. Chai, Yanqing Wu, Chao Wang

{"title":"A Novel Transpose 2T-DRAM based Computing-in-Memory Architecture for On-chip DNN Training and Inference","authors":"Yuansheng Zhao, Zixuan Shen, Jiarui Xu, K. Chai, Yanqing Wu, Chao Wang","doi":"10.1109/AICAS57966.2023.10168641","DOIUrl":"https://doi.org/10.1109/AICAS57966.2023.10168641","url":null,"abstract":"Recently, DRAM-based Computing-in-Memory (CIM) has emerged as one of the potential CIM solutions due to its unique advantages of high bit-cell density, large memory capacity and CMOS compatibility. This paper proposes a 2T-DRAM based CIM architecture, which can perform both CIM inference and training for deep neural networks (DNNs) efficiently. The proposed CIM architecture employs 2T-DRAM based transpose circuitry to implement transpose weight memory array and uses digital logic in the array peripheral to implement digital DNN computation in memory. A novel mapping method is proposed to map the convolutional and full-connection computation of the forward propagation and back propagation process into the transpose 2T-DRAM CIM array to achieve digital weight multiplexing and parallel computing. Simulation results show that the computing power of proposed transpose 2T-DRAM based CIM architecture is estimated to 11.26 GOPS by a 16K DRAM array to accelerate 4CONV+3FC @100 MHz and has an 82.15% accuracy on CIFAR-10 dataset, which are much higher than the state-of-the-art DRAM-based CIM accelerators without CIM learning capability. Preliminary evaluation of retention time in DRAM CIM also shows that a refresh-less training-inference process of lightweight networks can be realized by a suitable scale of CIM array through the proposed mapping strategy with negligible refresh-induced performance loss or power increase.","PeriodicalId":296649,"journal":{"name":"2023 IEEE 5th International Conference on Artificial Intelligence Circuits and Systems (AICAS)","volume":"31 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-06-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133249341","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

F-CNN: Faster CNN Exploiting Data Re-Use with Statistical Analysis F-CNN:更快的CNN利用统计分析数据重用

2023 IEEE 5th International Conference on Artificial Intelligence Circuits and Systems (AICAS) Pub Date : 2023-06-11 DOI: 10.1109/AICAS57966.2023.10168606

Fatmah Alantali, Y. Halawani, B. Mohammad, M. Al-Qutayri

{"title":"F-CNN: Faster CNN Exploiting Data Re-Use with Statistical Analysis","authors":"Fatmah Alantali, Y. Halawani, B. Mohammad, M. Al-Qutayri","doi":"10.1109/AICAS57966.2023.10168606","DOIUrl":"https://doi.org/10.1109/AICAS57966.2023.10168606","url":null,"abstract":"Many of the current edge computing devices need efficient implementation of Artificial Intelligence (AI) applications due to strict latency, security and power requirements. Nonetheless, such devices, face various challenges when executing AI applications due to their limited computing and energy resources. In particular, Convolutional Neural Networks (CNN) is a popular machine learning method that derives a high-level function from being trained on various visual input examples. This paper contributes to enabling the use of CNN on resource-constrained devices offline, where a trade-off between accuracy, running time and power efficiency is verified. The paper investigates the use of minimum pre-processing methods of input data to identify nonessential computations in the convolutional layers. In this work, Spatial locality of input data is considered along with an efficient pre-processing method to mitigate the accuracy loss caused by the computational re-use approach. This technique was tested on LeNet and CIFAR-10 structures and was responsible for 1.9% and 1.6% accuracy loss while reducing the processing time by 38.3% and 20.9% and reducing the energy by 38.3%, and 20.7%, respectively. The models were deployed and verified on Raspberry Pi 4 B platform using the MATLAB coder to measure time and energy.","PeriodicalId":296649,"journal":{"name":"2023 IEEE 5th International Conference on Artificial Intelligence Circuits and Systems (AICAS)","volume":"38 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-06-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130546974","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Group Vectored Absolute-Value-Subtraction Cell Array for the Efficient Acceleration of AdderNet 用于AdderNet高效加速的群矢量绝对值减法单元阵列

2023 IEEE 5th International Conference on Artificial Intelligence Circuits and Systems (AICAS) Pub Date : 2023-06-11 DOI: 10.1109/AICAS57966.2023.10168637

Jiahao Chen, Wanbo Hu, Wenling Ma, Zhilin Zhang, Mingqiang Huang

{"title":"Group Vectored Absolute-Value-Subtraction Cell Array for the Efficient Acceleration of AdderNet","authors":"Jiahao Chen, Wanbo Hu, Wenling Ma, Zhilin Zhang, Mingqiang Huang","doi":"10.1109/AICAS57966.2023.10168637","DOIUrl":"https://doi.org/10.1109/AICAS57966.2023.10168637","url":null,"abstract":"Convolutional neural networks (CNN) have been widely used for boosting the performance of Artificial Intelligence (AI) tasks. However, the CNN models are usually computational intensive. Recently, the novel absolute-value-subtraction (ABS) operation based CNN, namely the AdderNet is proposed to reduce the computation complexity and energy burden. But the specific hardware design has rarely been explored. In this work, we propose an energy-efficient AdderNet accelerator to address such issue. At the hardware architecture level, we develop a flexible and group vectored systolic array to balance the circuit area, power, and speed. Thanks to the low delay of ABS operation, the systolic array can reach extremely high frequency up to 2GHz. Meanwhile the power- and area- efficiency exhibits about 3× improvement compared with its CNN counterpart. At the processing element level, we propose new ABS cell based on algorithm optimization, which shows about 10% higher performance than the naive design. Finally, the accelerator is practically deployed on FPGA platform to accelerate the AdderNet ResNet-18 network as a case study. The peak throughput is 424.2 GOP/s, which is much higher than previous works.","PeriodicalId":296649,"journal":{"name":"2023 IEEE 5th International Conference on Artificial Intelligence Circuits and Systems (AICAS)","volume":"166 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-06-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129368423","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

GPIL: Gradient with PseudoInverse Learning for High Accuracy Fine-Tuning 基于伪逆学习的梯度高精度微调

2023 IEEE 5th International Conference on Artificial Intelligence Circuits and Systems (AICAS) Pub Date : 2023-06-11 DOI: 10.1109/AICAS57966.2023.10168584

Gilha Lee, N. Kim, Hyun Kim

{"title":"GPIL: Gradient with PseudoInverse Learning for High Accuracy Fine-Tuning","authors":"Gilha Lee, N. Kim, Hyun Kim","doi":"10.1109/AICAS57966.2023.10168584","DOIUrl":"https://doi.org/10.1109/AICAS57966.2023.10168584","url":null,"abstract":"PseudoInverse learning (PIL) is proposed to increase the convergence speed of conventional gradient descent. PIL can be trained with fast and reliable convolutional neural networks (CNNs) without a gradient using a pseudoinverse matrix. However, PIL has several problems when training a network. First, there is an out-of-memory problem because all batches are required during one epoch of training. Second, the network cannot be deeper because more unreliable input pseudoinverse matrices are used as the deeper PIL layer is stacked. Therefore, PIL has not yet been effectively applied to widely used deep models. Inspired by the limitation of the existing PIL, we propose a novel error propagation methodology that allows the fine-tuning process, which is often used in a resource-constrained environment, to be performed more accurately. In detail, by using both PIL and gradient descent, we not only enable mini-batch training, which was impossible in PIL, but also achieve higher accuracy through more accurate error propagation. Moreover, unlike the existing PIL, which uses only the pseudoinverse matrix of the CNN input, we additionally use the pseudoinverse matrix of weights to compensate for the limitations of PIL; thus, the proposed method enables faster and more accurate error propagation in the CNN training process. As a result, it is efficient for fine-tuning in resource-constrained environments, such as mobile/edge devices that require an accuracy comparable to small training epochs. Experimental results show that the proposed method improves the accuracy after ResNet-101 fine-tuning on the CIFAR-100 dataset by 2.78% compared to the baseline.","PeriodicalId":296649,"journal":{"name":"2023 IEEE 5th International Conference on Artificial Intelligence Circuits and Systems (AICAS)","volume":"73 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-06-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116140787","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Configurable Multi-Precision Floating-Point Multiplier Architecture Design for Computation in Deep Learning 面向深度学习计算的可配置多精度浮点乘法器架构设计

2023 IEEE 5th International Conference on Artificial Intelligence Circuits and Systems (AICAS) Pub Date : 2023-06-11 DOI: 10.1109/AICAS57966.2023.10168572

Pei-Hsuan Kuo, Yu-Hsiang Huang, Juinn-Dar Huang

{"title":"Configurable Multi-Precision Floating-Point Multiplier Architecture Design for Computation in Deep Learning","authors":"Pei-Hsuan Kuo, Yu-Hsiang Huang, Juinn-Dar Huang","doi":"10.1109/AICAS57966.2023.10168572","DOIUrl":"https://doi.org/10.1109/AICAS57966.2023.10168572","url":null,"abstract":"The increasing AI applications demands efficient computing capabilities to support a huge amount of calculations. Among the related arithmetic operations, multiplication is an indispensable part in most of deep learning applications. To support computing in different precisions demanded by various applications, it is essential for a multiplier architecture to meet the multi-precision demand while still achieving high utilization of the multiplication array and power efficiency. In this paper, a configurable multi-precision FP multiplier architecture with minimized redundant bits is presented. It can execute 16× FP8 operations, or 8× brain-floating-point (BF16) operations, or 4× half-precision (FP16) operations, or 1× single-precision (FP32) operation every cycle while maintaining a 100% multiplication hardware utilization ratio. Moreover, the computing results can also be represented in higher precision formats for succeeding high-precision computations. The proposed design has been implemented using the TSMC 40nm process with 1GHz clock frequency and consumes only 16.78mW on average. Compared to existing multi-precision FP multiplier architectures, the proposed design achieves the highest hardware utilization ratio with only 4.9K logic gates in the multiplication array. It also achieves high energy efficiencies of 1212.1, 509.6, 207.3, and 42.6 GFLOPS/W at FP8, BF16, FP16 and FP32 modes, respectively.","PeriodicalId":296649,"journal":{"name":"2023 IEEE 5th International Conference on Artificial Intelligence Circuits and Systems (AICAS)","volume":"2 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-06-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122212060","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 1

Live Demonstration: An Integrated Computing and Communication Platform for Vehicle-Infrastructure Cooperative Autonomous Driving 现场演示:车辆-基础设施协同自动驾驶集成计算与通信平台

2023 IEEE 5th International Conference on Artificial Intelligence Circuits and Systems (AICAS) Pub Date : 2023-06-11 DOI: 10.1109/AICAS57966.2023.10168600

Yuhang Gu, Wei Zhang, Yi-xing Shi, Limin Jiang, Shan-Guo Li, Sha Cao, Zhiyuan Jiang, Ruiqing Mao, Zhewen Lou, Sheng Zhou

引用次数: 0

EpilepsyNet: Interpretable Self-Supervised Seizure Detection for Low-Power Wearable Systems 癫痫网:用于低功耗可穿戴系统的可解释自监督癫痫检测

2023 IEEE 5th International Conference on Artificial Intelligence Circuits and Systems (AICAS) Pub Date : 2023-06-11 DOI: 10.1109/AICAS57966.2023.10168560

Baichuan Huang, R. Zanetti, A. Abtahi, D. Atienza, A. Aminifar

{"title":"EpilepsyNet: Interpretable Self-Supervised Seizure Detection for Low-Power Wearable Systems","authors":"Baichuan Huang, R. Zanetti, A. Abtahi, D. Atienza, A. Aminifar","doi":"10.1109/AICAS57966.2023.10168560","DOIUrl":"https://doi.org/10.1109/AICAS57966.2023.10168560","url":null,"abstract":"Epilepsy is one of the most common neurological disorders that is characterized by recurrent and unpredictable seizures. Wearable systems can be used to detect the onset of a seizure and notify family members and emergency units for rescue. The majority of state-of-the-art studies in the epilepsy domain currently explore modern machine learning techniques, e.g., deep neural networks, to accurately detect epileptic seizures. However, training deep learning networks requires a large amount of data and computing resources, which is a major challenge for resource-constrained wearable systems. In this paper, we propose EpilepsyNet, the first interpretable self-supervised network tailored to resource-constrained devices without using any seizure data in its initial offline training. At runtime, however, once a seizure is detected, it can be incorporated into our self-supervised technique to improve seizure detection performance, without the need to retrain our learning model, hence incurring no energy overheads. Our self-supervised approach can reach a detection performance of 79.2%, which is on par with the state-of-the-art fully-supervised deep neural networks trained on seizure data. At the same time, our proposed approach can be deployed in resource-constrained wearable devices, reaching up to 1.3 days of battery life on a single charge.","PeriodicalId":296649,"journal":{"name":"2023 IEEE 5th International Conference on Artificial Intelligence Circuits and Systems (AICAS)","volume":"41 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-06-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124827070","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Architecture-Aware Optimization of Layer Fusion for Latency-Optimal CNN Inference 时延最优CNN推理层融合的体系结构感知优化

2023 IEEE 5th International Conference on Artificial Intelligence Circuits and Systems (AICAS) Pub Date : 2023-06-11 DOI: 10.1109/AICAS57966.2023.10168659

Minyong Yoon, Jungwook Choi

引用次数: 0

Live Demonstration: An Efficient Neural Network Processor with Reduced Data Transmission and On-chip Shortcut Mapping 现场演示:具有减少数据传输和片上快捷映射的高效神经网络处理器

2023 IEEE 5th International Conference on Artificial Intelligence Circuits and Systems (AICAS) Pub Date : 2023-06-11 DOI: 10.1109/AICAS57966.2023.10168666

Yichuan Bai, Zhuang Shao, Chenshuo Zhang, Aojie Jiang, Yuan Du, Li Du

引用次数: 0

Read-disturb Detection Methodology for RRAM-based Computation-in-Memory Architecture 基于随机存储器的内存计算体系结构的读干扰检测方法

2023 IEEE 5th International Conference on Artificial Intelligence Circuits and Systems (AICAS) Pub Date : 2023-06-11 DOI: 10.1109/AICAS57966.2023.10168638

Mohammad Amin Yaldagard, Sumit Diware, R. Joshi, S. Hamdioui, R. Bishnoi

引用次数: 0