2023 IEEE 5th International Conference on Artificial Intelligence Circuits and Systems (AICAS)最新文献

筛选
英文 中文
Free Bits: Latency Optimization of Mixed-Precision Quantized Neural Networks on the Edge 自由位:边缘混合精度量化神经网络的延迟优化
Georg Rutishauser, Francesco Conti, L. Benini
{"title":"Free Bits: Latency Optimization of Mixed-Precision Quantized Neural Networks on the Edge","authors":"Georg Rutishauser, Francesco Conti, L. Benini","doi":"10.1109/AICAS57966.2023.10168577","DOIUrl":"https://doi.org/10.1109/AICAS57966.2023.10168577","url":null,"abstract":"Mixed-precision quantization, where a deep neural network’s layers are quantized to different precisions, offers the opportunity to optimize the trade-offs between model size, latency, and statistical accuracy beyond what can be achieved with homogeneous-bit-width quantization. To navigate the in-tractable search space of mixed-precision configurations for a given network, this paper proposes a hybrid search methodology. It consists of a hardware-agnostic differentiable search algorithm followed by a hardware-aware heuristic optimization to find mixed-precision configurations latency-optimized for a specific hardware target. We evaluate our algorithm on MobileNetV1 and MobileNetV2 and deploy the resulting networks on a family of multi-core RISC-V microcontroller platforms with different hardware characteristics. We achieve up to 28.6 % reduction of end-to-end latency compared to an 8-bit model at a negligible accuracy drop from a full-precision baseline on the 1000-class ImageNet dataset. We demonstrate speedups relative to an 8-bit baseline, even on systems with no hardware support for sub-byte arithmetic at negligible accuracy drop. Furthermore, we show the superiority of our approach with respect to differentiable search targeting reduced binary operation counts as a proxy for latency.","PeriodicalId":296649,"journal":{"name":"2023 IEEE 5th International Conference on Artificial Intelligence Circuits and Systems (AICAS)","volume":"21 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-06-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130372354","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
In-memory Activation Compression for GPT Training GPT训练的内存激活压缩
Seungyong Lee, Geonu Yun, Hyuk-Jae Lee
{"title":"In-memory Activation Compression for GPT Training","authors":"Seungyong Lee, Geonu Yun, Hyuk-Jae Lee","doi":"10.1109/AICAS57966.2023.10168658","DOIUrl":"https://doi.org/10.1109/AICAS57966.2023.10168658","url":null,"abstract":"Recently, a large number of parameters in Transformer-based language models have caused memory short-ages during training. Although solutions such as mixed precision and model parallelism have been proposed, they have the limitation of inducing communication overhead and requiring modification of the model by a programmer. To address this issue, we propose a scheme that compresses activation data in memory, enabling the reduction of memory usage during training in a user-transparent manner. The compression algorithm gathers activation data into a block and compresses it, using base-delta compression for the exponent and bit-plane zero compression for the sign and mantissa. Then, the important bits are arranged in order, and LSB truncation is applied to fit the target size. The proposed compression algorithm achieves a compression ratio of 2.09 for the sign, 2.04 for the exponent, and 1.21 for the mantissa. A compression ratio of 3.2 is obtained by applying up to the truncation, and we confirm the convergence of GPT-2 training with the compression.","PeriodicalId":296649,"journal":{"name":"2023 IEEE 5th International Conference on Artificial Intelligence Circuits and Systems (AICAS)","volume":"509 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-06-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115892716","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Reducing Overhead of Feature Importance Visualization via Static GradCAM Computation 通过静态渐变凸轮计算减少特征重要性可视化的开销
Ashwin Bhat, A. Raychowdhury
{"title":"Reducing Overhead of Feature Importance Visualization via Static GradCAM Computation","authors":"Ashwin Bhat, A. Raychowdhury","doi":"10.1109/AICAS57966.2023.10168594","DOIUrl":"https://doi.org/10.1109/AICAS57966.2023.10168594","url":null,"abstract":"Explainable AI (XAI) methods provide insights into the operation of black-box Deep Neural Network (DNN) models. GradCAM, an XAI algorithm, provides an explanation by highlighting regions in the input feature space that were relevant to the model’s output. It involves a gradient computation step that adds a significant overhead compared to inference and hinders providing explanations to end-users. In this work, we identify the root cause of the problem to be the dynamic run-time automatic differentiation. To overcome this issue, we propose to offload the gradient computation step to compile time via analytic evaluation. We validate the idea by designing an FPGA implementation of GradCAM that schedules the entire computation graph statically. For a TinyML ResNet18 model, we achieve a reduction in the explanation generation overhead from > 2× using software frameworks on CPU/GPU systems to < 0.01× on the FPGA using our designed hardware and static scheduling.","PeriodicalId":296649,"journal":{"name":"2023 IEEE 5th International Conference on Artificial Intelligence Circuits and Systems (AICAS)","volume":"36 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-06-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132805715","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Grand Challenge on Software and Hardware Co-Optimization for E-Commerce Recommendation System 电子商务推荐系统软硬件协同优化的重大挑战
Jianing Li, Jiabin Liu, Xingyuan Hu, Yuhang Zhang, Guosheng Yu, Shimeng Qian, Wei Mao, Li Du, Yongfu Li, Yuan Du
{"title":"Grand Challenge on Software and Hardware Co-Optimization for E-Commerce Recommendation System","authors":"Jianing Li, Jiabin Liu, Xingyuan Hu, Yuhang Zhang, Guosheng Yu, Shimeng Qian, Wei Mao, Li Du, Yongfu Li, Yuan Du","doi":"10.1109/AICAS57966.2023.10168648","DOIUrl":"https://doi.org/10.1109/AICAS57966.2023.10168648","url":null,"abstract":"E-commerce has become an indispensable part of the whole commodity economy with rapid expansion. A great deal of time is required for customers to search products by manual work. A good automatic recommendation system can not only bring the customers good shopping experience, but also help companies gain profit growth. In the IEEE AICAS 2023 conference, we have organized the grand challenge on software and hardware co-optimization for e-commerce recommendation system. The desensitized data from Alibaba Group which recorded online purchase behaviors of online shopping users in China are provided. We organize two rounds of the challenge with two different parts of data, separately encouraging participating teams to propose novel ideas for the recommendation algorithm design and deployment. In the preliminary round, participating teams are required to design a recommendation system with high accuracy performance. In the final round, the qualified teams from the preliminary round will be offered with an ARM-based multi-core Yitian 710 CPU cloud server, the teams are required to design an acceleration scheme for the hardware resolution. In the final, 6 best teams will be awarded by using standard evaluation criteria.","PeriodicalId":296649,"journal":{"name":"2023 IEEE 5th International Conference on Artificial Intelligence Circuits and Systems (AICAS)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-06-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133358066","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
CPGAN: Collective Punishment Generative Adversarial Network for Dry Fingerprint Image Enhancement 用于干指纹图像增强的集体惩罚生成对抗网络
Yu-Chi Su, Ching-Te Chiu, Chih-Han Cheng, Kuan-Hsien Liu, Tsung-Chan Lee, Jia-Lin Chen, Jie-Yu Luo, Wei-Chang Chung, Yao-Ren Chang, Kuan-Ying Ho
{"title":"CPGAN: Collective Punishment Generative Adversarial Network for Dry Fingerprint Image Enhancement","authors":"Yu-Chi Su, Ching-Te Chiu, Chih-Han Cheng, Kuan-Hsien Liu, Tsung-Chan Lee, Jia-Lin Chen, Jie-Yu Luo, Wei-Chang Chung, Yao-Ren Chang, Kuan-Ying Ho","doi":"10.1109/AICAS57966.2023.10168628","DOIUrl":"https://doi.org/10.1109/AICAS57966.2023.10168628","url":null,"abstract":"Fingerprint has been widely used in our daily life, such as mobile. However, some circumstances may lead to low unlocking rate, like fingerprint at low temperature(dry fingerprint) or washed fingerprint. Our method mainly focuses on the former by making it close to normal temperature fingerprint. The main idea of our method, which called \"CPGAN\", is to improve GAN to boost the quality of the enhanced fingerprint. Our objective is to make the generator generates the high quality of enhanced fingerprint. The method is divided into two parts: \"strengthening the discriminator\" and \"strengthening the generator\". For strengthening the generator, we adopt the mechanism of \"Collective Punishment\" to our work. For strengthening the discriminator, we utilize two generators and feature extractor to boost the discriminator. In our experiments, the results surpass the state-of-the-arts on FVC2002 about 75%.","PeriodicalId":296649,"journal":{"name":"2023 IEEE 5th International Conference on Artificial Intelligence Circuits and Systems (AICAS)","volume":"242 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-06-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123008671","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
WeightLock: A Mixed-Grained Weight Encryption Approach Using Local Decrypting Units for Ciphertext Computing in DNN Accelerators 在DNN加速器中使用本地解密单元进行密文计算的混合粒度权重加密方法
Jianfeng Wang, Zhonghao Chen, Yiming Chen, Yixin Xu, Tian Wang, Yao Yu, N. Vijaykrishnan, Sumitha George, Huazhong Yang, Xueqing Li
{"title":"WeightLock: A Mixed-Grained Weight Encryption Approach Using Local Decrypting Units for Ciphertext Computing in DNN Accelerators","authors":"Jianfeng Wang, Zhonghao Chen, Yiming Chen, Yixin Xu, Tian Wang, Yao Yu, N. Vijaykrishnan, Sumitha George, Huazhong Yang, Xueqing Li","doi":"10.1109/AICAS57966.2023.10168612","DOIUrl":"https://doi.org/10.1109/AICAS57966.2023.10168612","url":null,"abstract":"With the wide use of NVM-based DNN accelerators for higher computing efficiency, the long data retention time essentially causes a high risk of unauthorized weight stealing by attackers. Weight encryption is an effective method, but existing ciphertext computing accelerators cannot achieve high encryption complexity and flexibility. This paper proposes WeightLock, a mixed-grained hardware-software co-design approach based on local decrypting units (LDUs). This work proposes a key-controlled cell-level hardware design for higher granularity and two weight selection schemes for higher flexibility. The simulation results show that the accuracy of VGG-8 and ResNet-18 in the Cifar-10 classification drops from 80% to only 10% even if 80% of keys are leaked. This shows >20% higher key leakage tolerance and >17x longer retraining latency protection, compared with the prior state-of-the-art hardware and software approaches, respectively. The area cost of the encryption function is negligible, with ~600x, 2.2x, and 2.4x reduction from the state-of-the-art cell-wise, column-wise, and 1T4R structures, respectively.","PeriodicalId":296649,"journal":{"name":"2023 IEEE 5th International Conference on Artificial Intelligence Circuits and Systems (AICAS)","volume":"44 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-06-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114830238","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Landmark-Based Adversarial Network for RGB-D Pose Invariant Face Recognition 基于里程碑的RGB-D姿态不变人脸识别对抗网络
Wei-Jyun Chen, Ching-Te Chiu, Ting-Chun Lin
{"title":"Landmark-Based Adversarial Network for RGB-D Pose Invariant Face Recognition","authors":"Wei-Jyun Chen, Ching-Te Chiu, Ting-Chun Lin","doi":"10.1109/AICAS57966.2023.10168669","DOIUrl":"https://doi.org/10.1109/AICAS57966.2023.10168669","url":null,"abstract":"Even though numerous studies have been conducted, face recognition still suffers from poor performance in pose variance. Besides fine appearance details of the face from RGB images, we use depth images that present the 3D contour of the face to improve recognition performance in large poses. At first, we propose a dual-path RGB-D face recognition model which learns features from separate RGB and depth images and fuses the two features into one identity feature. We add associate loss to strengthen the complementary and improve performance. Second, we proposed a landmark-based adversarial network to help the face recognition model extract the pose-invariant identity feature. Our landmark-based adversarial network contains a feature generator, pose discriminator, and landmark module. After we use 2-stage optimization to optimize the pose discriminator and feature generator, we removed the pose factor in the feature extracted by the generator. We conduct experiments on KinectFaceDB, RealSensetest and LiDARtest. On KinectFaceDB, we achieve a recognition accuracy of 99.41%, which is 1.31% higher than other methods. On RealSensetest, we achieve a classification accuracy of 92.57%, which is 30.51% higher than other methods. On LiDARtest, we achieve 98.21%, which is 21.88% higher than other methods.","PeriodicalId":296649,"journal":{"name":"2023 IEEE 5th International Conference on Artificial Intelligence Circuits and Systems (AICAS)","volume":"198 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-06-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122530409","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Task-aware Scheduling and Performance Optimization on Yitian710 SoC for GEMM-based Workloads on the Cloud 一天710 SoC上基于gem的云工作负载的任务感知调度和性能优化
Guosheng Yu, Zhihong Lv, Haijiang Wang, Zilong Huang, Jicheng Chen
{"title":"Task-aware Scheduling and Performance Optimization on Yitian710 SoC for GEMM-based Workloads on the Cloud","authors":"Guosheng Yu, Zhihong Lv, Haijiang Wang, Zilong Huang, Jicheng Chen","doi":"10.1109/AICAS57966.2023.10168586","DOIUrl":"https://doi.org/10.1109/AICAS57966.2023.10168586","url":null,"abstract":"The YiTian710 SoC is a server processor based on ARM Neoverse N2 architecture and developed by T-HEAD Semiconductor Co., Ltd. to accelerate the compute-intensive tasks in Alicloud, where the ML related workloads play an important role in various applications. The General Matrix Multiplication is the fundamental and the most important computing kernel routine extensively utilized in the ML workloads. Generally, the whole GEMM workload is partitioned into a series of blocks and the sub-tasks are professionally assembled to exploit the parallel hardware. However, it is not the case for the cloud workloads which process multi-tasks concurrently and expect guaranteed QoS for commercial consideration. We introduce the task-aware parallel scheduling method to process the ML workloads and balance the response delay and the throughput of the YiTian710 ECS instance. We furtherly design a multi-thread scheduling algorithm with two-level division for the GEMM sub-tasks to achieve high efficiency. The optimized GEMM kernels are developed to attain the optimal performance. We evaluate the performance in YiTian710 based Alicloud ECS for different applications. The results show that our method can achieve remarkable performance improvement for different applications.","PeriodicalId":296649,"journal":{"name":"2023 IEEE 5th International Conference on Artificial Intelligence Circuits and Systems (AICAS)","volume":"64 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-06-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122578736","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
A Systolic Computing-in-Memory Array based Accelerator with Predictive Early Activation for Spatiotemporal Convolutions 一种基于收缩内存计算阵列的时空卷积预测早期激活加速器
Xiaofeng Chen, Ruiqi Guo, Zhiheng Yue, Yang Hu, Leibo Liu, Shaojun Wei, S. Yin
{"title":"A Systolic Computing-in-Memory Array based Accelerator with Predictive Early Activation for Spatiotemporal Convolutions","authors":"Xiaofeng Chen, Ruiqi Guo, Zhiheng Yue, Yang Hu, Leibo Liu, Shaojun Wei, S. Yin","doi":"10.1109/AICAS57966.2023.10168581","DOIUrl":"https://doi.org/10.1109/AICAS57966.2023.10168581","url":null,"abstract":"Residual (2+1)-dimensional convolution neural network (R(2+1)D CNN) has achieved great success in video recognition due to the spatiotemporal convolution structure. However, R(2+1)D CNN incurs large energy and latency overhead because of intensive computation and frequent memory access. To solve the issues, we propose a digital SRAM-CIM based accelerator with two key features: (1) Systolic CIM array to efficiently match massive computations in regular architecture; (2) Digtal CIM circuit design with output sparsity predicition to avoid redundant computations. The proposed design is implemented in 28nm technology and achieves an energy efficiency of 21.87 TOPS/W at 200 MHz and 0.9 V supply voltage.","PeriodicalId":296649,"journal":{"name":"2023 IEEE 5th International Conference on Artificial Intelligence Circuits and Systems (AICAS)","volume":"3 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-06-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115859481","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
An Efficient Design Framework for 2×2 CNN Accelerator Chiplet Cluster with SerDes Interconnects 具有SerDes互连的2×2 CNN加速器芯片集群的高效设计框架
Yajie Wu, Tianze Li, Zhuang Shao, Li Du, Yuan Du
{"title":"An Efficient Design Framework for 2×2 CNN Accelerator Chiplet Cluster with SerDes Interconnects","authors":"Yajie Wu, Tianze Li, Zhuang Shao, Li Du, Yuan Du","doi":"10.1109/AICAS57966.2023.10168573","DOIUrl":"https://doi.org/10.1109/AICAS57966.2023.10168573","url":null,"abstract":"Multi-Chiplet integrated systems for high-performance computing with dedicated CNN accelerators are highly demanded due to ever-increasing AI-related training and inferencing tasks; however, many design challenges hinder their large-scale applications, such as complicated multi-task scheduling, high-speed die-to-die SerDes (Serializer/Deserializer) link modeling, and detailed communication and computation hardware co-simulation. In this paper, an optimized 2×2 CNN accelerator chiplet framework with a SerDes link model is presented, which addresses the above challenges. A methodology for designing a 2×2 CNN accelerator chiplet framework is also proposed, and several experiments are conducted. The system performances of different designs are compared and analyzed with different design parameters of computation hardware, SerDes links, and improved scheduling algorithms. The results show that with the same interconnection structure and bandwidth, every 1TFLOPS increase in one chiplet’s computing power can bring an average 3.7% execution time reduction.","PeriodicalId":296649,"journal":{"name":"2023 IEEE 5th International Conference on Artificial Intelligence Circuits and Systems (AICAS)","volume":"17 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-06-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115378574","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
相关产品
×
本文献相关产品
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信