Jin Young Shin, Sang Ho Lee, Kwang Hyun Go, Soo-Gon Kim, Seung Eun Lee
{"title":"AI Processor based Data Correction for Enhancing Accuracy of Ultrasonic Sensor","authors":"Jin Young Shin, Sang Ho Lee, Kwang Hyun Go, Soo-Gon Kim, Seung Eun Lee","doi":"10.1109/AICAS57966.2023.10168652","DOIUrl":"https://doi.org/10.1109/AICAS57966.2023.10168652","url":null,"abstract":"The usage of various sensors in vehicles has increased with the generalization of advanced driver assistance systems (ADAS). To ensure the safety of drivers and pedestrians, considering the accuracy of measured sensor data is essential. In this paper, we propose a data correction system for enhancing the accuracy of distance data from an ultrasonic sensor utilizing an AI processor. The proposed system detects the motion of an object and adjusts the obtained distance data to align with an ideal gradient of sequential data. Experimental results of the proposed system show an error detection rate of 90.6%.","PeriodicalId":296649,"journal":{"name":"2023 IEEE 5th International Conference on Artificial Intelligence Circuits and Systems (AICAS)","volume":"348 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-06-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116066337","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Shamma Nasrin, Maeesha Binte Hashem, Nastaran Darabi, Benjamin Parpillon, F. Fahim, Wilfred Gomes, A. Trivedi
{"title":"Memory-Immersed Collaborative Digitization for Area-Efficient Compute-in-Memory Deep Learning","authors":"Shamma Nasrin, Maeesha Binte Hashem, Nastaran Darabi, Benjamin Parpillon, F. Fahim, Wilfred Gomes, A. Trivedi","doi":"10.1109/AICAS57966.2023.10168632","DOIUrl":"https://doi.org/10.1109/AICAS57966.2023.10168632","url":null,"abstract":"This work discusses memory-immersed collaborative digitization among compute-in-memory (CiM) arrays to minimize the area overheads of a conventional analog-to-digital converter (ADC) for deep learning inference. Thereby, using the proposed scheme, significantly more CiM arrays can be accommodated within limited footprint designs to improve parallelism and minimize external memory accesses. Under the digitization scheme, CiM arrays exploit their parasitic bit lines to form a within-memory capacitive digital-to-analog converter (DAC) that facilitates area-efficient successive approximation (SA) digitization. CiM arrays collaborate where a proximal array digitizes the analog-domain product-sums when an array computes the scalar product of input and weights. We discuss various networking configurations among CiM arrays where Flash, SA, and their hybrid digitization steps can be efficiently implemented using the proposed memory-immersed scheme. The results are demonstrated using a 65 nm CMOS test chip. Compared to a 40 nm-node 5-bit SAR ADC, our 65 nm design requires ~25 area× less and ∼1.4× less energy by leveraging in-memory computing structures. Compared to a 40 nm-node 5-bit Flash ADC, our design requires ∼51× less area and ∼13× less energy.","PeriodicalId":296649,"journal":{"name":"2023 IEEE 5th International Conference on Artificial Intelligence Circuits and Systems (AICAS)","volume":"23 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-06-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129330767","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"AI-assisted ISP hyperparameter auto tuning","authors":"Fa Xu, Zihao Liu, YanHeng Lu, Sicheng Li, Susong Xu, Yibo Fan, Yen-Kuang Chen","doi":"10.1109/AICAS57966.2023.10168574","DOIUrl":"https://doi.org/10.1109/AICAS57966.2023.10168574","url":null,"abstract":"Images and videos are vital visual information carriers, and the image signal processor (ISP) is an essential hardware component for capturing and processing these visual signals. ISPs convert raw data into high-quality color images, which requires various function modules to control different aspects of image quality. However, the results of these modules are interdependent and have crosstalk with each other, making it tedious and time-consuming for manual tuning to obtain a set of ideal parameter configurations to achieve stable performance. In this paper, we introduce xkISP, a self-developed open-source ISP project which includes both a C model and hardware implementation of an 8-stage ISP pipeline. Most importantly, we present a novel proxy function-based AI-assisted ISP tuning solution that is demonstrated to accelerate the ISP parameter configuration process and improve performance for both human vision and computer vision tasks.","PeriodicalId":296649,"journal":{"name":"2023 IEEE 5th International Conference on Artificial Intelligence Circuits and Systems (AICAS)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-06-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129072570","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"HNSG – A SNN Training Method Ultilizing Hidden Network","authors":"Chunhui Wu, Wenbing Fang, Yi Kang","doi":"10.1109/AICAS57966.2023.10168579","DOIUrl":"https://doi.org/10.1109/AICAS57966.2023.10168579","url":null,"abstract":"Spiking Neural Network is more energy efficient compared to traditional ANNs, and many training methods of SNNs have been proposed in past decades. However, traditional backward-propagation based training methods are difficult to deploy on SNN due to its discontinuous gradient. Previous works mainly focused on weight training or weight transferring. The Hidden Network inspired by Lottery Ticket Hypothesis that is proposed for convolutional neural networks opens possibility of network connection training on SNN. In this article, a training algorithm based on Hidden Network is applied to SNN to show its potential on neuromorphic spiking networks. A novel training method called HNSG is proposed that modifies hidden network search using surrogate gradient function based back propagation. The proposed HNSG method is tested on image classification task using MNIST with simple two fully-connected layer SNN model. Simulation shows HNSG reaches 93.73% accuracy on average fire intensity of 0.138 with LIF neuron.","PeriodicalId":296649,"journal":{"name":"2023 IEEE 5th International Conference on Artificial Intelligence Circuits and Systems (AICAS)","volume":"27 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-06-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121659307","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"A Systolic Array with Activation Stationary Dataflow for Deep Fully-Connected Networks","authors":"Haochuan Wan, Chaolin Rao, Yueyang Zheng, Pingqiang Zhou, Xin Lou","doi":"10.1109/AICAS57966.2023.10168602","DOIUrl":"https://doi.org/10.1109/AICAS57966.2023.10168602","url":null,"abstract":"This paper presents an activation stationary (AS) dataflow suitable for networks with pure fully-connected (FC) layers. It is shown that the proposed AS dataflow can help to reduce the required memory size in hardware design and optimize energy efficiency by reducing data movement. Based on the AS dataflow, an output stationary (OS) systolic array is proposed to compute FC networks. To evaluate the proposed design, we further implement an accelerator for the FC-based implicit representation for MRI (IREM) algorithm. A proofof-concept demonstration system is developed based on field programmable gate array (FPGA). To evaluate the proposed design, We also map the IREM accelerator to 40nm CMOS technology and compare it with CPU, GPU-based and ASIC implementations.","PeriodicalId":296649,"journal":{"name":"2023 IEEE 5th International Conference on Artificial Intelligence Circuits and Systems (AICAS)","volume":"63 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-06-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127906620","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Searching Tiny Neural Networks for Deployment on Embedded FPGA","authors":"Haiyan Qin, Yejun Zeng, Jinyu Bai, Wang Kang","doi":"10.1109/AICAS57966.2023.10168571","DOIUrl":"https://doi.org/10.1109/AICAS57966.2023.10168571","url":null,"abstract":"Embedded FPGAs have become increasingly popular as acceleration platforms for the deployment of edge-side artificial intelligence (AI) applications, due in part to their flexible and configurable heterogeneous architectures. However, the complex deployment process hinders the realization of AI democratization, particularly at the edge. In this paper, we propose a software-hardware co-design framework that enables simultaneous searching for neural network architectures and corresponding accelerator designs on embedded FPGAs. The proposed framework comprises a hardware-friendly neural architecture search space, a reconfigurable streaming-based accelerator architecture, and a model performance estimator. An evolutionary algorithm targeting multi-objective optimization is employed to identify the optimal neural architecture and corresponding accelerator design. We evaluate our framework on various datasets and demonstrate that, in a typical edge AI scenario, the searched network and accelerator can achieve up to a 2.9% accuracy improvement and up to a 21 speedup compared to manually designed networks based on× common accelerator designs when deployed on a widely used embedded FPGA (Xilinx XC7Z020).","PeriodicalId":296649,"journal":{"name":"2023 IEEE 5th International Conference on Artificial Intelligence Circuits and Systems (AICAS)","volume":"77 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-06-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121462398","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Three Challenges in ReRAM-Based Process-In-Memory for Neural Network","authors":"Ziyi Yang, Kehan Liu, Yiru Duan, Mingjia Fan, Qiyue Zhang, Zhou Jin","doi":"10.1109/AICAS57966.2023.10168640","DOIUrl":"https://doi.org/10.1109/AICAS57966.2023.10168640","url":null,"abstract":"Artificial intelligence (AI) has been successfully applied to various fields of natural science. One of the biggest challenges in AI acceleration is the performance and energy bottleneck caused by the limited capacity and bandwidth of massive data movement between memory and processing units. In the past decade, much AI accelerator work based on process-in-memory (PIM) has been studied, especially on emerging non-volatile resistive random access memory (ReRAM). In this paper, we provide a comprehensive perspective on ReRAM-based AI accelerators, including software-hardware co-design, the status of chip fabrications, researches on ReRAM non-idealities, and support for the EDA tool chain. Finally, we summarize and provide three directions for future trends: support for complex patterns of models, addressing the impact of non-idealities such as improving endurance, process perturbations, and leakage current, and addressing the lack of EDA tools.","PeriodicalId":296649,"journal":{"name":"2023 IEEE 5th International Conference on Artificial Intelligence Circuits and Systems (AICAS)","volume":"24 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-06-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134346208","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"High-Accuracy and Energy-Efficient Acoustic Inference using Hardware-Aware Training and a 0.34nW/Ch Full-Wave Rectifier","authors":"Sheng Zhou, Xi Chen, Kwantae Kim, Shih-Chii Liu","doi":"10.1109/AICAS57966.2023.10168561","DOIUrl":"https://doi.org/10.1109/AICAS57966.2023.10168561","url":null,"abstract":"A full-wave rectifier (FWR) is a necessary component of many analog acoustic feature extractor (FEx) designs targeted at edge audio applications. However, analog circuits that perform close-to-ideal rectification contribute a significant portion of the total power of the FEx. This work presents an energy-efficient FWR design by using a dynamic comparator and scaling the comparator clock frequency with its input signal bandwidth. Simulated in a 65nm CMOS process, the rectifier circuit consumes 0.34nW per channel for a 0.6V supply. Although the FWR does not perform ideal rectification, an acoustic FEx behavioral model in Python is proposed based on our FWR design, and a neural network trained with the output of the proposed behavioral model recovers high classification accuracy in an audio keyword spotting (KWS) task. The behavioral model also included comparator noise and offset extracted from transistor-level simulation. The whole KWS chain using our behavioral model achieves 89.45% accuracy for 12-class KWS on the Google Speech Commands Dataset.","PeriodicalId":296649,"journal":{"name":"2023 IEEE 5th International Conference on Artificial Intelligence Circuits and Systems (AICAS)","volume":"16 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-06-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131819815","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Ryotaro Ohara, Masaya Kabuto, Masakazu Taichi, Atsushi Fukunaga, Yuto Yasuda, Riku Hamabe, S. Izumi, H. Kawaguchi
{"title":"A 1W8R 20T SRAM Codebook for 20% Energy Reduction in Mixed-Precision Deep-Learning Inference Processor System","authors":"Ryotaro Ohara, Masaya Kabuto, Masakazu Taichi, Atsushi Fukunaga, Yuto Yasuda, Riku Hamabe, S. Izumi, H. Kawaguchi","doi":"10.1109/AICAS57966.2023.10168555","DOIUrl":"https://doi.org/10.1109/AICAS57966.2023.10168555","url":null,"abstract":"This study introduces a 1W8R 20T multiport memory for codebook quantization in deep-learning processors. We manufactured the memory in a 40 nm process and achieved memory read-access time at 2.75 ns and 2.7-pj/byte power consumption. In addition, we used NVDLA, which was NVIDIA’s deep-learning processor, as a motif and simulated it based on the power obtained from the actual proposed memory. The obtained power and area reduction results are 20.24% and 26.24%, respectively.","PeriodicalId":296649,"journal":{"name":"2023 IEEE 5th International Conference on Artificial Intelligence Circuits and Systems (AICAS)","volume":"297 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-06-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127415307","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Abhairaj Singh, R. Bishnoi, A. Kaichouhi, Sumit Diware, R. Joshi, S. Hamdioui
{"title":"A 115.1 TOPS/W, 12.1 TOPS/mm2 Computation-in-Memory using Ring-Oscillator based ADC for Edge AI","authors":"Abhairaj Singh, R. Bishnoi, A. Kaichouhi, Sumit Diware, R. Joshi, S. Hamdioui","doi":"10.1109/AICAS57966.2023.10168647","DOIUrl":"https://doi.org/10.1109/AICAS57966.2023.10168647","url":null,"abstract":"Analog computation-in-memory (CIM) architecture alleviates massive data movement between the memory and the processor, thus promising great prospects to accelerate certain computational tasks in an energy-efficient manner. However, data converters involved in these architectures typically achieve the required computing accuracy at the expense of high area and energy footprint which can potentially determine CIM candidacy for low-power and compact edge-AI devices. In this work, we present a memory-periphery co-design to perform accurate A/D conversions of analog matrix-vector-multiplication (MVM) outputs. Here, we introduce a scheme where select-lines and bit-lines in the memory are virtually fixed to improve conversion accuracy and aid a ring-oscillator-based A/D conversion, equipped with component sharing and inter-matching of the reference blocks. In addition, we deploy a self-timed technique to further ensure high robustness addressing global design and cycle-to-cycle variations. Based on measurement results of a 4Kb CIM chip prototype equipped with TSMC 40nm, a relative accuracy of up to 99.71% is achieved with an energy efficiency of 115.1 TOPS/W and computational density of 12.1 TOPS/mm2 for the MNIST dataset. Thus, an improvement of up to 11.3X and 7.5X compared to the state-of-the-art, respectively.","PeriodicalId":296649,"journal":{"name":"2023 IEEE 5th International Conference on Artificial Intelligence Circuits and Systems (AICAS)","volume":"27 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-06-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124294022","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}