O. Chen, Z. Li, Tomoharu Yamauchi, Yanzhi Wang, N. Yoshikawa
{"title":"Performance Assessment of an Extremely Energy-Efficient Binary Neural Network Using Adiabatic Superconductor Devices","authors":"O. Chen, Z. Li, Tomoharu Yamauchi, Yanzhi Wang, N. Yoshikawa","doi":"10.1109/AICAS57966.2023.10168607","DOIUrl":"https://doi.org/10.1109/AICAS57966.2023.10168607","url":null,"abstract":"Binary Neural Networks (BNNs) are gaining popularity for solving real-world problems using Deep Neural Networks (DNNs), such as image recognition and natural language processing. BNNs use binary precision for weights and activations, reducing memory usage by 32 times compared to conventional networks using 32-bit floating-point precision. Among various types of BNNs, AQFP-based BNNs utilizing superconducting logic families are promising for energy-efficient computing, using magnetic flux quantization and quantum interference in Josephson-junction-based superconductor loops. This paper presents a performance assessment of a novel AQFP-based BNN architecture, highlighting scalability issues caused by increased inductance in the analog accumulation circuit. We also discuss potential optimization approaches to address these issues and improve scalability.","PeriodicalId":296649,"journal":{"name":"2023 IEEE 5th International Conference on Artificial Intelligence Circuits and Systems (AICAS)","volume":"13 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-06-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114567020","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Harshil Patel, Anup Vanarse, Kristofor D. Carlson, A. Osseiran
{"title":"Bringing Touch to the Edge: A Neuromorphic Processing Approach For Event-Based Tactile Systems","authors":"Harshil Patel, Anup Vanarse, Kristofor D. Carlson, A. Osseiran","doi":"10.1109/AICAS57966.2023.10168592","DOIUrl":"https://doi.org/10.1109/AICAS57966.2023.10168592","url":null,"abstract":"The rise of neuromorphic applications has highlighted the remarkable potential of biologically-inspired systems. Despite significant advancements in audio and visual technologies, research directed towards tactile sensing has not been as extensive. We propose a neuromorphic tactile system for sensing and processing that presents promising results for edge devices and applications. In this study, a neuromorphic tactile sensor, two data encoding techniques, and a two-layer spiking neural network (SNN) deployed on the AKD1000 Akida Neuromorphic System on Chip (NSoC) were used to demonstrate the system's capabilities. Results from experiments on the ST-MNIST dataset showed high accuracy, with the complement-coded variant achieving 93.1%, outperforming previous state-of-the-art models for this dataset. Additionally, an exploratory study showed that early classification was possible, with most samples requiring only 38% of the available events to classify correctly, reducing the amount of data that needs to be processed. The low power consumption and high throughput of both SNN models, with an average dynamic power consumption of 6.37 mW and 7.76 mW and an average throughput of 586 and 589 frames-per-second respectively, make the proposed system suitable for edge devices with limited power and processing resources. Overall, the proposed tactile sensing system presents a promising solution for edge applications that require high accuracy, low power consumption, and high throughput.","PeriodicalId":296649,"journal":{"name":"2023 IEEE 5th International Conference on Artificial Intelligence Circuits and Systems (AICAS)","volume":"21 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-06-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116940887","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Ziyang Shen, Chaoming Fang, Fengshi Tian, Jie Yang, M. Sawan
{"title":"PN-TMS: Pruned Node-fusion Tree-based Multicast Scheme for Efficient Neuromorphic Systems","authors":"Ziyang Shen, Chaoming Fang, Fengshi Tian, Jie Yang, M. Sawan","doi":"10.1109/AICAS57966.2023.10168590","DOIUrl":"https://doi.org/10.1109/AICAS57966.2023.10168590","url":null,"abstract":"A growing demand for low-power and real-time computation is motivating the development of dedicated neuromorphic processors. To maximize scalability and power efficiency, multicore architecture has been broadly applied in existing neuromorphic processors. Nevertheless, mapping a Spiking Neural Network (SNN) on a multicore architecture requires a lot of multicast operations. Conventional routing algorithms like path-based routing and dimension order routing (DOR) lead to a severe overhead in both latency and power. To address these limitations, we propose a novel routing algorithm named Pruned Node-fusion Tree-based Multicast Scheme (PN-TMS). PN-TMS leverages multiple algorithms for route planning, optimizing latency and power simultaneously. Experiment results show that PN-TMS outperforms existing network processors’ routing schemes in terms of both energy consumption and latency, achieves an average energy delay product (EDP) reduction of 38.9%.","PeriodicalId":296649,"journal":{"name":"2023 IEEE 5th International Conference on Artificial Intelligence Circuits and Systems (AICAS)","volume":"17 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-06-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"117090266","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Context Swap: Multi-PIM System Preventing Remote Memory Access for Large Embedding Model Acceleration","authors":"Hong Kal, Cheolhwan Kim, Minjae Kim, W. Ro","doi":"10.1109/AICAS57966.2023.10168595","DOIUrl":"https://doi.org/10.1109/AICAS57966.2023.10168595","url":null,"abstract":"Processing-in-Memory (PIM) has been an attractive solution to accelerate memory-intensive neural network layers. Especially, PIM is efficient for layers using embeddings, such as the embedding layer and graph convolution layer, because of their large capacity and low arithmetic intensity. The embedding tables of such layers are stored across multiple memory nodes and processed by local PIM modules with sparse access patterns. Towards computing data from other memory nodes on a local PIM module, a naive approach is to allow the local PIM to retrieve data from remote memory nodes. This approach might incur significant performance degradation due to the long latency overhead of remote accesses. To avoid remote access, PIM system can adopt a framework based on MapReduce programming model, which enables PIMs to compute the local data only and CPUs to compute intermediate results of PIMs. However, the multi-PIM system still suffers from performance degradation because the framework is processed on the CPU and it has a long delay compared to the PIM kernel execution. Therefore, we propose a context swap technique that prevents remote data access even without a high-latency framework. We observe that transferring PIM contexts to the remote PIM node needs much fewer data traffic than remote accesses of data. Our PIM system makes PIM nodes swap their context data with each other when they complete their own computation and no longer have local data to compute. Until all PIMs calculate all local data, several context swaps occur. The context swap is performed by a memory controller between PIMs in the same CPU socket and simple software between PIMs in different CPU sockets. To this end, the proposed multi-PIM system outperforms the base PIM system transferring remote data and the PIM system with the kernel-managing framework by 4.1 × and 3.3 ×, respectively.","PeriodicalId":296649,"journal":{"name":"2023 IEEE 5th International Conference on Artificial Intelligence Circuits and Systems (AICAS)","volume":"55 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-06-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124024851","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"A Demonstration Platform for Large-Scaled Point Cloud Network Based on 28nm 2D/3D Unified Sparse Convolution Accelerator","authors":"Xiaoyu Feng, Wenyu Sun, Shupei Fan, Chen Tang, Yixiong Yang, Jinshan Yue, Q. Liao, Huazhong Yang, Yongpan Liu","doi":"10.1109/AICAS57966.2023.10168558","DOIUrl":"https://doi.org/10.1109/AICAS57966.2023.10168558","url":null,"abstract":"3D point cloud processing plays an important role in many emerging applications such as autonomous driving, visual navigation, and virtual reality. It calls for hardware acceleration of multiple key operations, including 3D Submanifold SCONV, 3D non-Submanifold SCONV, and 2D SCONV. This work presents a 2D/3D unified sparse convolution accelerator for large-scale voxel-based point cloud networks. The chip is fabricated in TSMC 28nm CMOS technology to achieve 3.3-16.9 FPS running from 60-400MHz when computing the SECOND network on KITTI dataset. This work has been included by ISSCC2023 [1]. A demonstration is given to show the real-time 3D processing with a lidar sensor.","PeriodicalId":296649,"journal":{"name":"2023 IEEE 5th International Conference on Artificial Intelligence Circuits and Systems (AICAS)","volume":"107 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-06-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124793176","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Yaotian Liu, Yuhang Zhang, Qing Zhang, Rui Chen, Yongfu Li
{"title":"FEEP: Functional ECO Synthesis with Efficient Patch Minimization","authors":"Yaotian Liu, Yuhang Zhang, Qing Zhang, Rui Chen, Yongfu Li","doi":"10.1109/AICAS57966.2023.10168557","DOIUrl":"https://doi.org/10.1109/AICAS57966.2023.10168557","url":null,"abstract":"Functional engineering change order (ECO) has been an essential process in modern complex integrated circuit design. Finding a high-quality circuit patch efficiently has long been a challenge. This paper proposes FEEP, an automatic and efficient synthesis-based functional ECO method. Structural pruning and stratified searching techniques are proposed to minimize search space without extra logical equivalence checks. Moreover, we propose a machine-learning-based two-stage patch size predictor that assists in predicting patch quality. Experimental results show that our algorithm can efficiently search and produce high-quality patches under various test cases.","PeriodicalId":296649,"journal":{"name":"2023 IEEE 5th International Conference on Artificial Intelligence Circuits and Systems (AICAS)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-06-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129668853","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Xia Han, F. Amiel, Xun Zhang, Kunni Wei, Cong Yan, Wenjun Hu, Zefeng Wang
{"title":"Efficiency Comparison of Machine Learning Algorithms for EEG Interpretation","authors":"Xia Han, F. Amiel, Xun Zhang, Kunni Wei, Cong Yan, Wenjun Hu, Zefeng Wang","doi":"10.1109/AICAS57966.2023.10168626","DOIUrl":"https://doi.org/10.1109/AICAS57966.2023.10168626","url":null,"abstract":"This paper intends to use a small protocol to detect stroke disease on a patient by using signals provided by only three EEG probes. To achieve this objective, we compare the performances in terms of accuracy and time of six machine learning (ML) algorithms (Random Forest, Logistic Regression, Support Vector Machine, K-Nearest Neighbor, Decision Tree and CatBoost) during a process of EEG-based classification pathology. We use a database of EEG recording signals collected by three electrodes, established by Beijing University of Chinese Medicine and carried out on subjects healthy or affected by strokes when they are exposed to the vision of planes of five different colors. The subjects are known to be healthy or affected by strokes. The records are used to train each algorithm for 70% of the population, and the performances are estimated on the remaining 30%. Then the process is repeated one hundred times when changing the set used for training and the set used to test. We then consider a statistic on the results obtained using each method for comparison. Our results show that the SVM algorithm is the most efficient in terms of the accuracy of the results, and can detect stoke disease with a reliability of 70%.","PeriodicalId":296649,"journal":{"name":"2023 IEEE 5th International Conference on Artificial Intelligence Circuits and Systems (AICAS)","volume":"172 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-06-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121314415","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Image Recovery Through Scattering Media via GAN Reconstruction and SNES Optimization","authors":"Pengfei Qi, Yuanjin Zheng","doi":"10.1109/AICAS57966.2023.10168553","DOIUrl":"https://doi.org/10.1109/AICAS57966.2023.10168553","url":null,"abstract":"Optical image recovery through scattering media is a significant yet challenging problem. Iterative wavefront shaping is one of the powerful tools to re-distribute the diffusive light and compensate for the diffuser by controlling the incident wavefront. However, in the scenario that only a feedback signal on the camera can be obtained, this technology would fail due to the lack of target images. In this paper, we propose a new scheme for recovering images through scattering media in an absence of target images. In particular, we employ an improved Generative Adversarial Network (GAN) for computational reconstruction and separable natural evolution strategy (SNES) for wavefront shaping optimization. Both simulation and experimental results suggest that the proposed scheme will open up new opportunities in the applications of biomedical imaging, optical encryption, holographic display, etc.","PeriodicalId":296649,"journal":{"name":"2023 IEEE 5th International Conference on Artificial Intelligence Circuits and Systems (AICAS)","volume":"324 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-06-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122975450","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"KP2Dtiny: Quantized Neural Keypoint Detection and Description on the Edge","authors":"Thomas Rüegg, Marco Giordano, Michele Magno","doi":"10.1109/AICAS57966.2023.10168598","DOIUrl":"https://doi.org/10.1109/AICAS57966.2023.10168598","url":null,"abstract":"Detection and description of keypoints in images is a fundamental component of a wide range of tasks such as Simultaneous Localization And Mapping (SLAM), image alignment and structure from motion (SfM). Efficient computation of these features is crucial for real-time applications and has been addressed by multiple handcrafted algorithms and, recently, by deep neural network-based detectors. Learned detectors achieve high detection performance, but pose high computational requirements, making them slow and impractical for low-power resource constraint platforms. This paper presents a quantized neural keypoint detector and descriptor optimized for edge devices exploiting two recent AI platforms such as MAX78000 by Analog Devices and the Coral AI USB accelerator from Google. To accommodate the diverse constraints and requirements of various applications, we propose and evaluate two model architectures (KP2DtinySmall and KP2DtinyFast) and deploy them on the aforementioned platforms using full 8-bit integer quantization. Furthermore, we extensively evaluate these models in terms of power, latency and accuracy, reporting results on three image sizes (88x88, 320x240 and 640x480), evaluating both quantized and non-quantized models. Fully quantized, KP2DtinySmall reduces network size by a factor of 54x while improving homographic estimation accuracy on 88x88 images on the most stringent threshold (Correctness d1) by 32.4% (0.550) and on 320x240 images by 10.7% (0.648) compared to the KeypointNet architecture by Yang You et. al. This result is achieved by designing a new network with low power platforms in mind, particularly addressing the lower resolution by increasing the density of detectable features. Deployed on the MAX78000 MCU, inference of low-resolution images is run at 59 FPS, consuming 1.1 mJ per image. On the Coral usb accelerator, KP2DtinyFast runs inference on low-resolution images at 527 FPS consuming 3.1 mJ, on high resolution it achieves 70 FPS at 19.9 mJ per inference.","PeriodicalId":296649,"journal":{"name":"2023 IEEE 5th International Conference on Artificial Intelligence Circuits and Systems (AICAS)","volume":"8 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-06-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126459845","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Liyuan Guo, M. Jobst, J. Partzsch, Stefan Scholze, Andreas Dixius, Matthias Lohrmann, S. Zeinolabedin, C. Mayr
{"title":"A Low-Power Hardware Accelerator of MFCC Extraction for Keyword Spotting in 22nm FDSOI","authors":"Liyuan Guo, M. Jobst, J. Partzsch, Stefan Scholze, Andreas Dixius, Matthias Lohrmann, S. Zeinolabedin, C. Mayr","doi":"10.1109/AICAS57966.2023.10168587","DOIUrl":"https://doi.org/10.1109/AICAS57966.2023.10168587","url":null,"abstract":"With the development of artificial intelligence, the real-time feature extraction of acoustic signals is required in a wide variety of applications, such as keyword spotting and speech recognition. Feature extraction based on Mel-frequency cepstral coefficients (MFCCs) is one of the most significant methods thereinto. A software implementation of the MFCC extraction results in relatively high power consumption and computational time limitation, often making it unsuitable for tiny battery powered devices. Therefore, an on-chip accelerator of MFCC extraction is of interest in cutting-edge scenarios. This paper presents a fixed-point low-power hardware accelerator of MFCC feature extraction implemented in 22nm FDSOI technology. It consumes an average power of 2.78µW for 1024-sample frame at a clock frequency of 1MHz. For keyword spotting, the quantized accelerator achieves an average accuracy of around 96% working along with different classification networks.","PeriodicalId":296649,"journal":{"name":"2023 IEEE 5th International Conference on Artificial Intelligence Circuits and Systems (AICAS)","volume":"116 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-06-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121778994","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}