{"title":"TPE: A High-Performance Edge-Device Inference with Multi-level Transformational Mechanism","authors":"Zhou Wang, Jingchuang Wei, Xiaonan Tang, Boxiao Han, Hongjun He, Leibo Liu, Shaojun Wei, S. Yin","doi":"10.1109/AICAS57966.2023.10168614","DOIUrl":"https://doi.org/10.1109/AICAS57966.2023.10168614","url":null,"abstract":"DNN inference of edge devices has been very important for a long time with large computing and energy consumption demand. This paper proposes a TPE(Transformation Process Element) with three characteristics. Firstly, TPE has a method of Data Segmentation Skip and Pre-Reorganization(DSSPR). Secondly, TPE has a Typical Value Matching and Calibration Computer (TVMCC) system, which converts direct calculation into matching and calibration calculation. Thirdly, TPE includes a Data Format Pre-Configuration and Self-Adjustment (DFPCSA) scheme. Compared with the most typical pure reasoning processor UNPU, TPE achieves 1.25× better energy consumption.","PeriodicalId":296649,"journal":{"name":"2023 IEEE 5th International Conference on Artificial Intelligence Circuits and Systems (AICAS)","volume":"70 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-06-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114851104","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Free Bits: Latency Optimization of Mixed-Precision Quantized Neural Networks on the Edge","authors":"Georg Rutishauser, Francesco Conti, L. Benini","doi":"10.1109/AICAS57966.2023.10168577","DOIUrl":"https://doi.org/10.1109/AICAS57966.2023.10168577","url":null,"abstract":"Mixed-precision quantization, where a deep neural network’s layers are quantized to different precisions, offers the opportunity to optimize the trade-offs between model size, latency, and statistical accuracy beyond what can be achieved with homogeneous-bit-width quantization. To navigate the in-tractable search space of mixed-precision configurations for a given network, this paper proposes a hybrid search methodology. It consists of a hardware-agnostic differentiable search algorithm followed by a hardware-aware heuristic optimization to find mixed-precision configurations latency-optimized for a specific hardware target. We evaluate our algorithm on MobileNetV1 and MobileNetV2 and deploy the resulting networks on a family of multi-core RISC-V microcontroller platforms with different hardware characteristics. We achieve up to 28.6 % reduction of end-to-end latency compared to an 8-bit model at a negligible accuracy drop from a full-precision baseline on the 1000-class ImageNet dataset. We demonstrate speedups relative to an 8-bit baseline, even on systems with no hardware support for sub-byte arithmetic at negligible accuracy drop. Furthermore, we show the superiority of our approach with respect to differentiable search targeting reduced binary operation counts as a proxy for latency.","PeriodicalId":296649,"journal":{"name":"2023 IEEE 5th International Conference on Artificial Intelligence Circuits and Systems (AICAS)","volume":"21 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-06-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130372354","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Reducing Overhead of Feature Importance Visualization via Static GradCAM Computation","authors":"Ashwin Bhat, A. Raychowdhury","doi":"10.1109/AICAS57966.2023.10168594","DOIUrl":"https://doi.org/10.1109/AICAS57966.2023.10168594","url":null,"abstract":"Explainable AI (XAI) methods provide insights into the operation of black-box Deep Neural Network (DNN) models. GradCAM, an XAI algorithm, provides an explanation by highlighting regions in the input feature space that were relevant to the model’s output. It involves a gradient computation step that adds a significant overhead compared to inference and hinders providing explanations to end-users. In this work, we identify the root cause of the problem to be the dynamic run-time automatic differentiation. To overcome this issue, we propose to offload the gradient computation step to compile time via analytic evaluation. We validate the idea by designing an FPGA implementation of GradCAM that schedules the entire computation graph statically. For a TinyML ResNet18 model, we achieve a reduction in the explanation generation overhead from > 2× using software frameworks on CPU/GPU systems to < 0.01× on the FPGA using our designed hardware and static scheduling.","PeriodicalId":296649,"journal":{"name":"2023 IEEE 5th International Conference on Artificial Intelligence Circuits and Systems (AICAS)","volume":"36 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-06-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132805715","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Jianing Li, Jiabin Liu, Xingyuan Hu, Yuhang Zhang, Guosheng Yu, Shimeng Qian, Wei Mao, Li Du, Yongfu Li, Yuan Du
{"title":"Grand Challenge on Software and Hardware Co-Optimization for E-Commerce Recommendation System","authors":"Jianing Li, Jiabin Liu, Xingyuan Hu, Yuhang Zhang, Guosheng Yu, Shimeng Qian, Wei Mao, Li Du, Yongfu Li, Yuan Du","doi":"10.1109/AICAS57966.2023.10168648","DOIUrl":"https://doi.org/10.1109/AICAS57966.2023.10168648","url":null,"abstract":"E-commerce has become an indispensable part of the whole commodity economy with rapid expansion. A great deal of time is required for customers to search products by manual work. A good automatic recommendation system can not only bring the customers good shopping experience, but also help companies gain profit growth. In the IEEE AICAS 2023 conference, we have organized the grand challenge on software and hardware co-optimization for e-commerce recommendation system. The desensitized data from Alibaba Group which recorded online purchase behaviors of online shopping users in China are provided. We organize two rounds of the challenge with two different parts of data, separately encouraging participating teams to propose novel ideas for the recommendation algorithm design and deployment. In the preliminary round, participating teams are required to design a recommendation system with high accuracy performance. In the final round, the qualified teams from the preliminary round will be offered with an ARM-based multi-core Yitian 710 CPU cloud server, the teams are required to design an acceleration scheme for the hardware resolution. In the final, 6 best teams will be awarded by using standard evaluation criteria.","PeriodicalId":296649,"journal":{"name":"2023 IEEE 5th International Conference on Artificial Intelligence Circuits and Systems (AICAS)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-06-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133358066","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Jianfeng Wang, Zhonghao Chen, Yiming Chen, Yixin Xu, Tian Wang, Yao Yu, N. Vijaykrishnan, Sumitha George, Huazhong Yang, Xueqing Li
{"title":"WeightLock: A Mixed-Grained Weight Encryption Approach Using Local Decrypting Units for Ciphertext Computing in DNN Accelerators","authors":"Jianfeng Wang, Zhonghao Chen, Yiming Chen, Yixin Xu, Tian Wang, Yao Yu, N. Vijaykrishnan, Sumitha George, Huazhong Yang, Xueqing Li","doi":"10.1109/AICAS57966.2023.10168612","DOIUrl":"https://doi.org/10.1109/AICAS57966.2023.10168612","url":null,"abstract":"With the wide use of NVM-based DNN accelerators for higher computing efficiency, the long data retention time essentially causes a high risk of unauthorized weight stealing by attackers. Weight encryption is an effective method, but existing ciphertext computing accelerators cannot achieve high encryption complexity and flexibility. This paper proposes WeightLock, a mixed-grained hardware-software co-design approach based on local decrypting units (LDUs). This work proposes a key-controlled cell-level hardware design for higher granularity and two weight selection schemes for higher flexibility. The simulation results show that the accuracy of VGG-8 and ResNet-18 in the Cifar-10 classification drops from 80% to only 10% even if 80% of keys are leaked. This shows >20% higher key leakage tolerance and >17x longer retraining latency protection, compared with the prior state-of-the-art hardware and software approaches, respectively. The area cost of the encryption function is negligible, with ~600x, 2.2x, and 2.4x reduction from the state-of-the-art cell-wise, column-wise, and 1T4R structures, respectively.","PeriodicalId":296649,"journal":{"name":"2023 IEEE 5th International Conference on Artificial Intelligence Circuits and Systems (AICAS)","volume":"44 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-06-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114830238","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Online low-power large-scale real-time decision-making all at once","authors":"Thomas Pontoizeau, Éric Jacopin","doi":"10.1109/AICAS57966.2023.10168570","DOIUrl":"https://doi.org/10.1109/AICAS57966.2023.10168570","url":null,"abstract":"In this paper, we set up a simulation under Unreal Engine 5 that communicates with an Optical Processing Unit (OPU) in order to make real-time decisions on the current state of the actors of the simulation. Our experiment shows that the OPU is able to manage at least 50 000 actors in real-time and is able to make decisions depending of the current state of the actors.","PeriodicalId":296649,"journal":{"name":"2023 IEEE 5th International Conference on Artificial Intelligence Circuits and Systems (AICAS)","volume":"24 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-06-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115146688","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Task-aware Scheduling and Performance Optimization on Yitian710 SoC for GEMM-based Workloads on the Cloud","authors":"Guosheng Yu, Zhihong Lv, Haijiang Wang, Zilong Huang, Jicheng Chen","doi":"10.1109/AICAS57966.2023.10168586","DOIUrl":"https://doi.org/10.1109/AICAS57966.2023.10168586","url":null,"abstract":"The YiTian710 SoC is a server processor based on ARM Neoverse N2 architecture and developed by T-HEAD Semiconductor Co., Ltd. to accelerate the compute-intensive tasks in Alicloud, where the ML related workloads play an important role in various applications. The General Matrix Multiplication is the fundamental and the most important computing kernel routine extensively utilized in the ML workloads. Generally, the whole GEMM workload is partitioned into a series of blocks and the sub-tasks are professionally assembled to exploit the parallel hardware. However, it is not the case for the cloud workloads which process multi-tasks concurrently and expect guaranteed QoS for commercial consideration. We introduce the task-aware parallel scheduling method to process the ML workloads and balance the response delay and the throughput of the YiTian710 ECS instance. We furtherly design a multi-thread scheduling algorithm with two-level division for the GEMM sub-tasks to achieve high efficiency. The optimized GEMM kernels are developed to attain the optimal performance. We evaluate the performance in YiTian710 based Alicloud ECS for different applications. The results show that our method can achieve remarkable performance improvement for different applications.","PeriodicalId":296649,"journal":{"name":"2023 IEEE 5th International Conference on Artificial Intelligence Circuits and Systems (AICAS)","volume":"64 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-06-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122578736","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Xinzi Xu, Qiao Cai, Hongqian Wang, Yanxing Suo, Yang Zhao, T. Wan, Guoxing Wang, Yong Lian
{"title":"A 12-Lead ECG Delineation Algorithm based on a Quantized CNN-BiLSTM Auto-encoder with 1-12 Mapping","authors":"Xinzi Xu, Qiao Cai, Hongqian Wang, Yanxing Suo, Yang Zhao, T. Wan, Guoxing Wang, Yong Lian","doi":"10.1109/AICAS57966.2023.10168552","DOIUrl":"https://doi.org/10.1109/AICAS57966.2023.10168552","url":null,"abstract":"12-lead electrocardiogram (ECG) delineation is a critical step in diagnosing of various heart diseases. Current practices for 12-lead ECG delineation typically involve processing each of the 12 leads separately using a network, which is computationally expensive. To solve this issue, 1-12 mapping strategy is proposed to directly map one lead network predictions to other leads and then fine-tune boundaries. CNN-BiLSTM autoencoder architecture is employed to model the sequential dependencies of ECG signal. Besides, data augmentation and mixed losses are utilized to enhance the robustness of the network. Evaluated on QTDB and LUDB, the delineation results for 12-lead ECG achieve a Se of 97%, 99%, and 98%, DS of 95.3%, 96.2%, and 94.4% for P-wave, QRS complex, and T-wave respectively. At last, quantization-aware training is employed to convert float32 model to int8 one with only about a 2% drop of accuracy.","PeriodicalId":296649,"journal":{"name":"2023 IEEE 5th International Conference on Artificial Intelligence Circuits and Systems (AICAS)","volume":"79 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-06-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130066955","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Image Frequency Separation Residual Network for End-to-end RAW to RGB Mapping","authors":"Mengchuan Dong, Weiti Zhou, Cong Pang, Xiangyu Zhang, Xin Lou","doi":"10.1109/AICAS57966.2023.10168597","DOIUrl":"https://doi.org/10.1109/AICAS57966.2023.10168597","url":null,"abstract":"Due to the limitations of hardware specification of smartphones' camera system, there is still a visible gap in imaging quality between smartphones and digital singlelens reflex (DSLR) cameras. Sophisticated learning-based image processing becomes a promising solution to close this gap. In this paper, we propose an Image Frequency Separation Residual Network (IFS Net) to perform the end-to-end RAW to RGB image mapping. Different from existing methods that directly train the input image and the ground truth image one-to-one as a whole, our proposed method first divides the input image and the ground truth into high-frequency and low-frequency parts by discrete wavelet transform (DWT). These two parts are then trained separately using different networks for details and global information, and finally synthesized into the output image using inverse DWT. Experimental results show that the proposed IFS Net outperforms other existing algorithms in both PSNR and SSIM. Visual comparison shows that the images produces by IFS Net preserves more details and look close to that captured by DSLR cameras.","PeriodicalId":296649,"journal":{"name":"2023 IEEE 5th International Conference on Artificial Intelligence Circuits and Systems (AICAS)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-06-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130957977","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Landmark-Based Adversarial Network for RGB-D Pose Invariant Face Recognition","authors":"Wei-Jyun Chen, Ching-Te Chiu, Ting-Chun Lin","doi":"10.1109/AICAS57966.2023.10168669","DOIUrl":"https://doi.org/10.1109/AICAS57966.2023.10168669","url":null,"abstract":"Even though numerous studies have been conducted, face recognition still suffers from poor performance in pose variance. Besides fine appearance details of the face from RGB images, we use depth images that present the 3D contour of the face to improve recognition performance in large poses. At first, we propose a dual-path RGB-D face recognition model which learns features from separate RGB and depth images and fuses the two features into one identity feature. We add associate loss to strengthen the complementary and improve performance. Second, we proposed a landmark-based adversarial network to help the face recognition model extract the pose-invariant identity feature. Our landmark-based adversarial network contains a feature generator, pose discriminator, and landmark module. After we use 2-stage optimization to optimize the pose discriminator and feature generator, we removed the pose factor in the feature extracted by the generator. We conduct experiments on KinectFaceDB, RealSensetest and LiDARtest. On KinectFaceDB, we achieve a recognition accuracy of 99.41%, which is 1.31% higher than other methods. On RealSensetest, we achieve a classification accuracy of 92.57%, which is 30.51% higher than other methods. On LiDARtest, we achieve 98.21%, which is 21.88% higher than other methods.","PeriodicalId":296649,"journal":{"name":"2023 IEEE 5th International Conference on Artificial Intelligence Circuits and Systems (AICAS)","volume":"198 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-06-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122530409","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}