2023 IEEE 5th International Conference on Artificial Intelligence Circuits and Systems (AICAS)最新文献

筛选
英文 中文
A Novel Transpose 2T-DRAM based Computing-in-Memory Architecture for On-chip DNN Training and Inference 一种新的基于转置2T-DRAM的片上深度神经网络训练与推理的内存计算架构
Yuansheng Zhao, Zixuan Shen, Jiarui Xu, K. Chai, Yanqing Wu, Chao Wang
{"title":"A Novel Transpose 2T-DRAM based Computing-in-Memory Architecture for On-chip DNN Training and Inference","authors":"Yuansheng Zhao, Zixuan Shen, Jiarui Xu, K. Chai, Yanqing Wu, Chao Wang","doi":"10.1109/AICAS57966.2023.10168641","DOIUrl":"https://doi.org/10.1109/AICAS57966.2023.10168641","url":null,"abstract":"Recently, DRAM-based Computing-in-Memory (CIM) has emerged as one of the potential CIM solutions due to its unique advantages of high bit-cell density, large memory capacity and CMOS compatibility. This paper proposes a 2T-DRAM based CIM architecture, which can perform both CIM inference and training for deep neural networks (DNNs) efficiently. The proposed CIM architecture employs 2T-DRAM based transpose circuitry to implement transpose weight memory array and uses digital logic in the array peripheral to implement digital DNN computation in memory. A novel mapping method is proposed to map the convolutional and full-connection computation of the forward propagation and back propagation process into the transpose 2T-DRAM CIM array to achieve digital weight multiplexing and parallel computing. Simulation results show that the computing power of proposed transpose 2T-DRAM based CIM architecture is estimated to 11.26 GOPS by a 16K DRAM array to accelerate 4CONV+3FC @100 MHz and has an 82.15% accuracy on CIFAR-10 dataset, which are much higher than the state-of-the-art DRAM-based CIM accelerators without CIM learning capability. Preliminary evaluation of retention time in DRAM CIM also shows that a refresh-less training-inference process of lightweight networks can be realized by a suitable scale of CIM array through the proposed mapping strategy with negligible refresh-induced performance loss or power increase.","PeriodicalId":296649,"journal":{"name":"2023 IEEE 5th International Conference on Artificial Intelligence Circuits and Systems (AICAS)","volume":"31 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-06-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133249341","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
F-CNN: Faster CNN Exploiting Data Re-Use with Statistical Analysis F-CNN:更快的CNN利用统计分析数据重用
Fatmah Alantali, Y. Halawani, B. Mohammad, M. Al-Qutayri
{"title":"F-CNN: Faster CNN Exploiting Data Re-Use with Statistical Analysis","authors":"Fatmah Alantali, Y. Halawani, B. Mohammad, M. Al-Qutayri","doi":"10.1109/AICAS57966.2023.10168606","DOIUrl":"https://doi.org/10.1109/AICAS57966.2023.10168606","url":null,"abstract":"Many of the current edge computing devices need efficient implementation of Artificial Intelligence (AI) applications due to strict latency, security and power requirements. Nonetheless, such devices, face various challenges when executing AI applications due to their limited computing and energy resources. In particular, Convolutional Neural Networks (CNN) is a popular machine learning method that derives a high-level function from being trained on various visual input examples. This paper contributes to enabling the use of CNN on resource-constrained devices offline, where a trade-off between accuracy, running time and power efficiency is verified. The paper investigates the use of minimum pre-processing methods of input data to identify nonessential computations in the convolutional layers. In this work, Spatial locality of input data is considered along with an efficient pre-processing method to mitigate the accuracy loss caused by the computational re-use approach. This technique was tested on LeNet and CIFAR-10 structures and was responsible for 1.9% and 1.6% accuracy loss while reducing the processing time by 38.3% and 20.9% and reducing the energy by 38.3%, and 20.7%, respectively. The models were deployed and verified on Raspberry Pi 4 B platform using the MATLAB coder to measure time and energy.","PeriodicalId":296649,"journal":{"name":"2023 IEEE 5th International Conference on Artificial Intelligence Circuits and Systems (AICAS)","volume":"38 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-06-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130546974","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Group Vectored Absolute-Value-Subtraction Cell Array for the Efficient Acceleration of AdderNet 用于AdderNet高效加速的群矢量绝对值减法单元阵列
Jiahao Chen, Wanbo Hu, Wenling Ma, Zhilin Zhang, Mingqiang Huang
{"title":"Group Vectored Absolute-Value-Subtraction Cell Array for the Efficient Acceleration of AdderNet","authors":"Jiahao Chen, Wanbo Hu, Wenling Ma, Zhilin Zhang, Mingqiang Huang","doi":"10.1109/AICAS57966.2023.10168637","DOIUrl":"https://doi.org/10.1109/AICAS57966.2023.10168637","url":null,"abstract":"Convolutional neural networks (CNN) have been widely used for boosting the performance of Artificial Intelligence (AI) tasks. However, the CNN models are usually computational intensive. Recently, the novel absolute-value-subtraction (ABS) operation based CNN, namely the AdderNet is proposed to reduce the computation complexity and energy burden. But the specific hardware design has rarely been explored. In this work, we propose an energy-efficient AdderNet accelerator to address such issue. At the hardware architecture level, we develop a flexible and group vectored systolic array to balance the circuit area, power, and speed. Thanks to the low delay of ABS operation, the systolic array can reach extremely high frequency up to 2GHz. Meanwhile the power- and area- efficiency exhibits about 3× improvement compared with its CNN counterpart. At the processing element level, we propose new ABS cell based on algorithm optimization, which shows about 10% higher performance than the naive design. Finally, the accelerator is practically deployed on FPGA platform to accelerate the AdderNet ResNet-18 network as a case study. The peak throughput is 424.2 GOP/s, which is much higher than previous works.","PeriodicalId":296649,"journal":{"name":"2023 IEEE 5th International Conference on Artificial Intelligence Circuits and Systems (AICAS)","volume":"166 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-06-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129368423","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
GPIL: Gradient with PseudoInverse Learning for High Accuracy Fine-Tuning 基于伪逆学习的梯度高精度微调
Gilha Lee, N. Kim, Hyun Kim
{"title":"GPIL: Gradient with PseudoInverse Learning for High Accuracy Fine-Tuning","authors":"Gilha Lee, N. Kim, Hyun Kim","doi":"10.1109/AICAS57966.2023.10168584","DOIUrl":"https://doi.org/10.1109/AICAS57966.2023.10168584","url":null,"abstract":"PseudoInverse learning (PIL) is proposed to increase the convergence speed of conventional gradient descent. PIL can be trained with fast and reliable convolutional neural networks (CNNs) without a gradient using a pseudoinverse matrix. However, PIL has several problems when training a network. First, there is an out-of-memory problem because all batches are required during one epoch of training. Second, the network cannot be deeper because more unreliable input pseudoinverse matrices are used as the deeper PIL layer is stacked. Therefore, PIL has not yet been effectively applied to widely used deep models. Inspired by the limitation of the existing PIL, we propose a novel error propagation methodology that allows the fine-tuning process, which is often used in a resource-constrained environment, to be performed more accurately. In detail, by using both PIL and gradient descent, we not only enable mini-batch training, which was impossible in PIL, but also achieve higher accuracy through more accurate error propagation. Moreover, unlike the existing PIL, which uses only the pseudoinverse matrix of the CNN input, we additionally use the pseudoinverse matrix of weights to compensate for the limitations of PIL; thus, the proposed method enables faster and more accurate error propagation in the CNN training process. As a result, it is efficient for fine-tuning in resource-constrained environments, such as mobile/edge devices that require an accuracy comparable to small training epochs. Experimental results show that the proposed method improves the accuracy after ResNet-101 fine-tuning on the CIFAR-100 dataset by 2.78% compared to the baseline.","PeriodicalId":296649,"journal":{"name":"2023 IEEE 5th International Conference on Artificial Intelligence Circuits and Systems (AICAS)","volume":"73 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-06-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116140787","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Configurable Multi-Precision Floating-Point Multiplier Architecture Design for Computation in Deep Learning 面向深度学习计算的可配置多精度浮点乘法器架构设计
Pei-Hsuan Kuo, Yu-Hsiang Huang, Juinn-Dar Huang
{"title":"Configurable Multi-Precision Floating-Point Multiplier Architecture Design for Computation in Deep Learning","authors":"Pei-Hsuan Kuo, Yu-Hsiang Huang, Juinn-Dar Huang","doi":"10.1109/AICAS57966.2023.10168572","DOIUrl":"https://doi.org/10.1109/AICAS57966.2023.10168572","url":null,"abstract":"The increasing AI applications demands efficient computing capabilities to support a huge amount of calculations. Among the related arithmetic operations, multiplication is an indispensable part in most of deep learning applications. To support computing in different precisions demanded by various applications, it is essential for a multiplier architecture to meet the multi-precision demand while still achieving high utilization of the multiplication array and power efficiency. In this paper, a configurable multi-precision FP multiplier architecture with minimized redundant bits is presented. It can execute 16× FP8 operations, or 8× brain-floating-point (BF16) operations, or 4× half-precision (FP16) operations, or 1× single-precision (FP32) operation every cycle while maintaining a 100% multiplication hardware utilization ratio. Moreover, the computing results can also be represented in higher precision formats for succeeding high-precision computations. The proposed design has been implemented using the TSMC 40nm process with 1GHz clock frequency and consumes only 16.78mW on average. Compared to existing multi-precision FP multiplier architectures, the proposed design achieves the highest hardware utilization ratio with only 4.9K logic gates in the multiplication array. It also achieves high energy efficiencies of 1212.1, 509.6, 207.3, and 42.6 GFLOPS/W at FP8, BF16, FP16 and FP32 modes, respectively.","PeriodicalId":296649,"journal":{"name":"2023 IEEE 5th International Conference on Artificial Intelligence Circuits and Systems (AICAS)","volume":"2 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-06-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122212060","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Live Demonstration: An Integrated Computing and Communication Platform for Vehicle-Infrastructure Cooperative Autonomous Driving 现场演示:车辆-基础设施协同自动驾驶集成计算与通信平台
Yuhang Gu, Wei Zhang, Yi-xing Shi, Limin Jiang, Shan-Guo Li, Sha Cao, Zhiyuan Jiang, Ruiqing Mao, Zhewen Lou, Sheng Zhou
{"title":"Live Demonstration: An Integrated Computing and Communication Platform for Vehicle-Infrastructure Cooperative Autonomous Driving","authors":"Yuhang Gu, Wei Zhang, Yi-xing Shi, Limin Jiang, Shan-Guo Li, Sha Cao, Zhiyuan Jiang, Ruiqing Mao, Zhewen Lou, Sheng Zhou","doi":"10.1109/AICAS57966.2023.10168600","DOIUrl":"https://doi.org/10.1109/AICAS57966.2023.10168600","url":null,"abstract":"Perception, computing and communication are usually decoupled in today’s vehicle-road coordination applications, which significantly adds to the system delay and cost. In contrast, we showcase a platform that integrates perception, communication and computing to provide timely roadside bird-eye-view (BEV) maps to vehicles for vision fusion. A neural processing unit and a cellular vehicle-to-everything (C-V2X) wireless baseband are both implemented on FPGA.","PeriodicalId":296649,"journal":{"name":"2023 IEEE 5th International Conference on Artificial Intelligence Circuits and Systems (AICAS)","volume":"144 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-06-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125778701","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
EpilepsyNet: Interpretable Self-Supervised Seizure Detection for Low-Power Wearable Systems 癫痫网:用于低功耗可穿戴系统的可解释自监督癫痫检测
Baichuan Huang, R. Zanetti, A. Abtahi, D. Atienza, A. Aminifar
{"title":"EpilepsyNet: Interpretable Self-Supervised Seizure Detection for Low-Power Wearable Systems","authors":"Baichuan Huang, R. Zanetti, A. Abtahi, D. Atienza, A. Aminifar","doi":"10.1109/AICAS57966.2023.10168560","DOIUrl":"https://doi.org/10.1109/AICAS57966.2023.10168560","url":null,"abstract":"Epilepsy is one of the most common neurological disorders that is characterized by recurrent and unpredictable seizures. Wearable systems can be used to detect the onset of a seizure and notify family members and emergency units for rescue. The majority of state-of-the-art studies in the epilepsy domain currently explore modern machine learning techniques, e.g., deep neural networks, to accurately detect epileptic seizures. However, training deep learning networks requires a large amount of data and computing resources, which is a major challenge for resource-constrained wearable systems. In this paper, we propose EpilepsyNet, the first interpretable self-supervised network tailored to resource-constrained devices without using any seizure data in its initial offline training. At runtime, however, once a seizure is detected, it can be incorporated into our self-supervised technique to improve seizure detection performance, without the need to retrain our learning model, hence incurring no energy overheads. Our self-supervised approach can reach a detection performance of 79.2%, which is on par with the state-of-the-art fully-supervised deep neural networks trained on seizure data. At the same time, our proposed approach can be deployed in resource-constrained wearable devices, reaching up to 1.3 days of battery life on a single charge.","PeriodicalId":296649,"journal":{"name":"2023 IEEE 5th International Conference on Artificial Intelligence Circuits and Systems (AICAS)","volume":"41 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-06-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124827070","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Architecture-Aware Optimization of Layer Fusion for Latency-Optimal CNN Inference 时延最优CNN推理层融合的体系结构感知优化
Minyong Yoon, Jungwook Choi
{"title":"Architecture-Aware Optimization of Layer Fusion for Latency-Optimal CNN Inference","authors":"Minyong Yoon, Jungwook Choi","doi":"10.1109/AICAS57966.2023.10168659","DOIUrl":"https://doi.org/10.1109/AICAS57966.2023.10168659","url":null,"abstract":"Layer fusion is an effective technique for accelerating latency-sensitive CNN inference tasks on resource-constrained accelerators that exploit distributed on-chip integrated memory-accelerator processing-in memory (PIM). However, previous research primarily focused on optimizing memory access, neglecting the significant impact of hardware architecture on latency. This study presents an analytical latency model for a 2D systolic array accelerator, taking into account various hardware factors such as array dimensions, buffer size, and bandwidth. We then investigate the influence of hardware architecture and fusion strategies, including weight and overlap reuse, on performance; these aspects are insufficiently addressed in existing access-based fusion models. By incorporating layer fusion with our proposed latency model across different architectures, dataflows, and workloads, we achieve up to a 53.1% reduction in end-to-end network latency compared to an access-based model.","PeriodicalId":296649,"journal":{"name":"2023 IEEE 5th International Conference on Artificial Intelligence Circuits and Systems (AICAS)","volume":"15 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-06-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125029728","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Live Demonstration: An Efficient Neural Network Processor with Reduced Data Transmission and On-chip Shortcut Mapping 现场演示:具有减少数据传输和片上快捷映射的高效神经网络处理器
Yichuan Bai, Zhuang Shao, Chenshuo Zhang, Aojie Jiang, Yuan Du, Li Du
{"title":"Live Demonstration: An Efficient Neural Network Processor with Reduced Data Transmission and On-chip Shortcut Mapping","authors":"Yichuan Bai, Zhuang Shao, Chenshuo Zhang, Aojie Jiang, Yuan Du, Li Du","doi":"10.1109/AICAS57966.2023.10168666","DOIUrl":"https://doi.org/10.1109/AICAS57966.2023.10168666","url":null,"abstract":"This demonstration showcases an efficient neural network processor implemented in TSMC 28nm CMOS technology. The processor conducts neural network inference with 16-bit dynamic fix-point activation and 10-bit dynamic fix-point weight. The reconfigurable streaming architecture is employed for off-chip data transmission reduction and on-chip shortcut mapping. An integrated neural network toolchain, including network model converter, quantitative analysis tool, and deep learning compiler, is also developed for fast network deployment.","PeriodicalId":296649,"journal":{"name":"2023 IEEE 5th International Conference on Artificial Intelligence Circuits and Systems (AICAS)","volume":"9 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-06-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125078682","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Read-disturb Detection Methodology for RRAM-based Computation-in-Memory Architecture 基于随机存储器的内存计算体系结构的读干扰检测方法
Mohammad Amin Yaldagard, Sumit Diware, R. Joshi, S. Hamdioui, R. Bishnoi
{"title":"Read-disturb Detection Methodology for RRAM-based Computation-in-Memory Architecture","authors":"Mohammad Amin Yaldagard, Sumit Diware, R. Joshi, S. Hamdioui, R. Bishnoi","doi":"10.1109/AICAS57966.2023.10168638","DOIUrl":"https://doi.org/10.1109/AICAS57966.2023.10168638","url":null,"abstract":"Resistive random access memory (RRAM) based computation-in-memory (CIM) architectures can meet the unprecedented energy efficiency requirements to execute AI algorithms directly on edge devices. However, the read-disturb problem associated with these architectures can lead to accumulated computational errors. To achieve the necessary level of computational accuracy, after a specific number of read cycles, these devices must undergo a reprogramming process which is a static approach and needs a large counter. This paper proposes a circuit-level RRAM read-disturb detection technique by monitoring real-time conductance drifts of RRAM devices, which initiate the reprogramming when actually it needs. Moreover, an analytic method is presented to determine the minimum conductance detection requirements, and our proposed read-disturb detection technique is tuned for the same to detect it dynamically. SPICE simulation result using TSMC 40 nm shows the correct functionality of our proposed detection technique.","PeriodicalId":296649,"journal":{"name":"2023 IEEE 5th International Conference on Artificial Intelligence Circuits and Systems (AICAS)","volume":"18 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-06-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125181076","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
相关产品
×
本文献相关产品
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信