2023 IEEE 5th International Conference on Artificial Intelligence Circuits and Systems (AICAS)最新文献_第10页

Free Bits: Latency Optimization of Mixed-Precision Quantized Neural Networks on the Edge 自由位:边缘混合精度量化神经网络的延迟优化

2023 IEEE 5th International Conference on Artificial Intelligence Circuits and Systems (AICAS) Pub Date : 2023-06-11 DOI: 10.1109/AICAS57966.2023.10168577

Georg Rutishauser, Francesco Conti, L. Benini

{"title":"Free Bits: Latency Optimization of Mixed-Precision Quantized Neural Networks on the Edge","authors":"Georg Rutishauser, Francesco Conti, L. Benini","doi":"10.1109/AICAS57966.2023.10168577","DOIUrl":"https://doi.org/10.1109/AICAS57966.2023.10168577","url":null,"abstract":"Mixed-precision quantization, where a deep neural network’s layers are quantized to different precisions, offers the opportunity to optimize the trade-offs between model size, latency, and statistical accuracy beyond what can be achieved with homogeneous-bit-width quantization. To navigate the in-tractable search space of mixed-precision configurations for a given network, this paper proposes a hybrid search methodology. It consists of a hardware-agnostic differentiable search algorithm followed by a hardware-aware heuristic optimization to find mixed-precision configurations latency-optimized for a specific hardware target. We evaluate our algorithm on MobileNetV1 and MobileNetV2 and deploy the resulting networks on a family of multi-core RISC-V microcontroller platforms with different hardware characteristics. We achieve up to 28.6 % reduction of end-to-end latency compared to an 8-bit model at a negligible accuracy drop from a full-precision baseline on the 1000-class ImageNet dataset. We demonstrate speedups relative to an 8-bit baseline, even on systems with no hardware support for sub-byte arithmetic at negligible accuracy drop. Furthermore, we show the superiority of our approach with respect to differentiable search targeting reduced binary operation counts as a proxy for latency.","PeriodicalId":296649,"journal":{"name":"2023 IEEE 5th International Conference on Artificial Intelligence Circuits and Systems (AICAS)","volume":"21 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-06-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130372354","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 2

In-memory Activation Compression for GPT Training GPT训练的内存激活压缩

2023 IEEE 5th International Conference on Artificial Intelligence Circuits and Systems (AICAS) Pub Date : 2023-06-11 DOI: 10.1109/AICAS57966.2023.10168658

Seungyong Lee, Geonu Yun, Hyuk-Jae Lee

引用次数: 0

Reducing Overhead of Feature Importance Visualization via Static GradCAM Computation 通过静态渐变凸轮计算减少特征重要性可视化的开销

2023 IEEE 5th International Conference on Artificial Intelligence Circuits and Systems (AICAS) Pub Date : 2023-06-11 DOI: 10.1109/AICAS57966.2023.10168594

Ashwin Bhat, A. Raychowdhury

引用次数: 0

Grand Challenge on Software and Hardware Co-Optimization for E-Commerce Recommendation System 电子商务推荐系统软硬件协同优化的重大挑战

2023 IEEE 5th International Conference on Artificial Intelligence Circuits and Systems (AICAS) Pub Date : 2023-06-11 DOI: 10.1109/AICAS57966.2023.10168648

Jianing Li, Jiabin Liu, Xingyuan Hu, Yuhang Zhang, Guosheng Yu, Shimeng Qian, Wei Mao, Li Du, Yongfu Li, Yuan Du

{"title":"Grand Challenge on Software and Hardware Co-Optimization for E-Commerce Recommendation System","authors":"Jianing Li, Jiabin Liu, Xingyuan Hu, Yuhang Zhang, Guosheng Yu, Shimeng Qian, Wei Mao, Li Du, Yongfu Li, Yuan Du","doi":"10.1109/AICAS57966.2023.10168648","DOIUrl":"https://doi.org/10.1109/AICAS57966.2023.10168648","url":null,"abstract":"E-commerce has become an indispensable part of the whole commodity economy with rapid expansion. A great deal of time is required for customers to search products by manual work. A good automatic recommendation system can not only bring the customers good shopping experience, but also help companies gain profit growth. In the IEEE AICAS 2023 conference, we have organized the grand challenge on software and hardware co-optimization for e-commerce recommendation system. The desensitized data from Alibaba Group which recorded online purchase behaviors of online shopping users in China are provided. We organize two rounds of the challenge with two different parts of data, separately encouraging participating teams to propose novel ideas for the recommendation algorithm design and deployment. In the preliminary round, participating teams are required to design a recommendation system with high accuracy performance. In the final round, the qualified teams from the preliminary round will be offered with an ARM-based multi-core Yitian 710 CPU cloud server, the teams are required to design an acceleration scheme for the hardware resolution. In the final, 6 best teams will be awarded by using standard evaluation criteria.","PeriodicalId":296649,"journal":{"name":"2023 IEEE 5th International Conference on Artificial Intelligence Circuits and Systems (AICAS)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-06-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133358066","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

CPGAN: Collective Punishment Generative Adversarial Network for Dry Fingerprint Image Enhancement 用于干指纹图像增强的集体惩罚生成对抗网络

2023 IEEE 5th International Conference on Artificial Intelligence Circuits and Systems (AICAS) Pub Date : 2023-06-11 DOI: 10.1109/AICAS57966.2023.10168628

Yu-Chi Su, Ching-Te Chiu, Chih-Han Cheng, Kuan-Hsien Liu, Tsung-Chan Lee, Jia-Lin Chen, Jie-Yu Luo, Wei-Chang Chung, Yao-Ren Chang, Kuan-Ying Ho

引用次数: 0

WeightLock: A Mixed-Grained Weight Encryption Approach Using Local Decrypting Units for Ciphertext Computing in DNN Accelerators 在DNN加速器中使用本地解密单元进行密文计算的混合粒度权重加密方法

2023 IEEE 5th International Conference on Artificial Intelligence Circuits and Systems (AICAS) Pub Date : 2023-06-11 DOI: 10.1109/AICAS57966.2023.10168612

Jianfeng Wang, Zhonghao Chen, Yiming Chen, Yixin Xu, Tian Wang, Yao Yu, N. Vijaykrishnan, Sumitha George, Huazhong Yang, Xueqing Li

{"title":"WeightLock: A Mixed-Grained Weight Encryption Approach Using Local Decrypting Units for Ciphertext Computing in DNN Accelerators","authors":"Jianfeng Wang, Zhonghao Chen, Yiming Chen, Yixin Xu, Tian Wang, Yao Yu, N. Vijaykrishnan, Sumitha George, Huazhong Yang, Xueqing Li","doi":"10.1109/AICAS57966.2023.10168612","DOIUrl":"https://doi.org/10.1109/AICAS57966.2023.10168612","url":null,"abstract":"With the wide use of NVM-based DNN accelerators for higher computing efficiency, the long data retention time essentially causes a high risk of unauthorized weight stealing by attackers. Weight encryption is an effective method, but existing ciphertext computing accelerators cannot achieve high encryption complexity and flexibility. This paper proposes WeightLock, a mixed-grained hardware-software co-design approach based on local decrypting units (LDUs). This work proposes a key-controlled cell-level hardware design for higher granularity and two weight selection schemes for higher flexibility. The simulation results show that the accuracy of VGG-8 and ResNet-18 in the Cifar-10 classification drops from 80% to only 10% even if 80% of keys are leaked. This shows >20% higher key leakage tolerance and >17x longer retraining latency protection, compared with the prior state-of-the-art hardware and software approaches, respectively. The area cost of the encryption function is negligible, with ~600x, 2.2x, and 2.4x reduction from the state-of-the-art cell-wise, column-wise, and 1T4R structures, respectively.","PeriodicalId":296649,"journal":{"name":"2023 IEEE 5th International Conference on Artificial Intelligence Circuits and Systems (AICAS)","volume":"44 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-06-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114830238","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Landmark-Based Adversarial Network for RGB-D Pose Invariant Face Recognition 基于里程碑的RGB-D姿态不变人脸识别对抗网络

2023 IEEE 5th International Conference on Artificial Intelligence Circuits and Systems (AICAS) Pub Date : 2023-06-11 DOI: 10.1109/AICAS57966.2023.10168669

Wei-Jyun Chen, Ching-Te Chiu, Ting-Chun Lin

{"title":"Landmark-Based Adversarial Network for RGB-D Pose Invariant Face Recognition","authors":"Wei-Jyun Chen, Ching-Te Chiu, Ting-Chun Lin","doi":"10.1109/AICAS57966.2023.10168669","DOIUrl":"https://doi.org/10.1109/AICAS57966.2023.10168669","url":null,"abstract":"Even though numerous studies have been conducted, face recognition still suffers from poor performance in pose variance. Besides fine appearance details of the face from RGB images, we use depth images that present the 3D contour of the face to improve recognition performance in large poses. At first, we propose a dual-path RGB-D face recognition model which learns features from separate RGB and depth images and fuses the two features into one identity feature. We add associate loss to strengthen the complementary and improve performance. Second, we proposed a landmark-based adversarial network to help the face recognition model extract the pose-invariant identity feature. Our landmark-based adversarial network contains a feature generator, pose discriminator, and landmark module. After we use 2-stage optimization to optimize the pose discriminator and feature generator, we removed the pose factor in the feature extracted by the generator. We conduct experiments on KinectFaceDB, RealSensetest and LiDARtest. On KinectFaceDB, we achieve a recognition accuracy of 99.41%, which is 1.31% higher than other methods. On RealSensetest, we achieve a classification accuracy of 92.57%, which is 30.51% higher than other methods. On LiDARtest, we achieve 98.21%, which is 21.88% higher than other methods.","PeriodicalId":296649,"journal":{"name":"2023 IEEE 5th International Conference on Artificial Intelligence Circuits and Systems (AICAS)","volume":"198 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-06-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122530409","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Task-aware Scheduling and Performance Optimization on Yitian710 SoC for GEMM-based Workloads on the Cloud 一天710 SoC上基于gem的云工作负载的任务感知调度和性能优化

2023 IEEE 5th International Conference on Artificial Intelligence Circuits and Systems (AICAS) Pub Date : 2023-06-11 DOI: 10.1109/AICAS57966.2023.10168586

Guosheng Yu, Zhihong Lv, Haijiang Wang, Zilong Huang, Jicheng Chen

{"title":"Task-aware Scheduling and Performance Optimization on Yitian710 SoC for GEMM-based Workloads on the Cloud","authors":"Guosheng Yu, Zhihong Lv, Haijiang Wang, Zilong Huang, Jicheng Chen","doi":"10.1109/AICAS57966.2023.10168586","DOIUrl":"https://doi.org/10.1109/AICAS57966.2023.10168586","url":null,"abstract":"The YiTian710 SoC is a server processor based on ARM Neoverse N2 architecture and developed by T-HEAD Semiconductor Co., Ltd. to accelerate the compute-intensive tasks in Alicloud, where the ML related workloads play an important role in various applications. The General Matrix Multiplication is the fundamental and the most important computing kernel routine extensively utilized in the ML workloads. Generally, the whole GEMM workload is partitioned into a series of blocks and the sub-tasks are professionally assembled to exploit the parallel hardware. However, it is not the case for the cloud workloads which process multi-tasks concurrently and expect guaranteed QoS for commercial consideration. We introduce the task-aware parallel scheduling method to process the ML workloads and balance the response delay and the throughput of the YiTian710 ECS instance. We furtherly design a multi-thread scheduling algorithm with two-level division for the GEMM sub-tasks to achieve high efficiency. The optimized GEMM kernels are developed to attain the optimal performance. We evaluate the performance in YiTian710 based Alicloud ECS for different applications. The results show that our method can achieve remarkable performance improvement for different applications.","PeriodicalId":296649,"journal":{"name":"2023 IEEE 5th International Conference on Artificial Intelligence Circuits and Systems (AICAS)","volume":"64 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-06-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122578736","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

A Systolic Computing-in-Memory Array based Accelerator with Predictive Early Activation for Spatiotemporal Convolutions 一种基于收缩内存计算阵列的时空卷积预测早期激活加速器

2023 IEEE 5th International Conference on Artificial Intelligence Circuits and Systems (AICAS) Pub Date : 2023-06-11 DOI: 10.1109/AICAS57966.2023.10168581

Xiaofeng Chen, Ruiqi Guo, Zhiheng Yue, Yang Hu, Leibo Liu, Shaojun Wei, S. Yin

引用次数: 0

An Efficient Design Framework for 2×2 CNN Accelerator Chiplet Cluster with SerDes Interconnects 具有SerDes互连的2×2 CNN加速器芯片集群的高效设计框架

2023 IEEE 5th International Conference on Artificial Intelligence Circuits and Systems (AICAS) Pub Date : 2023-06-11 DOI: 10.1109/AICAS57966.2023.10168573

Yajie Wu, Tianze Li, Zhuang Shao, Li Du, Yuan Du

{"title":"An Efficient Design Framework for 2×2 CNN Accelerator Chiplet Cluster with SerDes Interconnects","authors":"Yajie Wu, Tianze Li, Zhuang Shao, Li Du, Yuan Du","doi":"10.1109/AICAS57966.2023.10168573","DOIUrl":"https://doi.org/10.1109/AICAS57966.2023.10168573","url":null,"abstract":"Multi-Chiplet integrated systems for high-performance computing with dedicated CNN accelerators are highly demanded due to ever-increasing AI-related training and inferencing tasks; however, many design challenges hinder their large-scale applications, such as complicated multi-task scheduling, high-speed die-to-die SerDes (Serializer/Deserializer) link modeling, and detailed communication and computation hardware co-simulation. In this paper, an optimized 2×2 CNN accelerator chiplet framework with a SerDes link model is presented, which addresses the above challenges. A methodology for designing a 2×2 CNN accelerator chiplet framework is also proposed, and several experiments are conducted. The system performances of different designs are compared and analyzed with different design parameters of computation hardware, SerDes links, and improved scheduling algorithms. The results show that with the same interconnection structure and bandwidth, every 1TFLOPS increase in one chiplet’s computing power can bring an average 3.7% execution time reduction.","PeriodicalId":296649,"journal":{"name":"2023 IEEE 5th International Conference on Artificial Intelligence Circuits and Systems (AICAS)","volume":"17 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-06-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115378574","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0