2022 IEEE Symposium in Low-Power and High-Speed Chips (COOL CHIPS)最新文献

筛选
英文 中文
Reinforcement Learning based Efficient Mapping of DNN Models onto Accelerators 基于强化学习的DNN模型到加速器的高效映射
2022 IEEE Symposium in Low-Power and High-Speed Chips (COOL CHIPS) Pub Date : 2022-04-20 DOI: 10.1109/coolchips54332.2022.9772673
Shine Parekkadan Sunny, Satyajit Das
{"title":"Reinforcement Learning based Efficient Mapping of DNN Models onto Accelerators","authors":"Shine Parekkadan Sunny, Satyajit Das","doi":"10.1109/coolchips54332.2022.9772673","DOIUrl":"https://doi.org/10.1109/coolchips54332.2022.9772673","url":null,"abstract":"The input tensors in each layer of Deep Neural Network (DNN) models are often partitioned/tiled to get accommodated in the limited on-chip memory of accelerators. Studies show that efficient tiling schedules (commonly referred to as mapping) for a given accelerator and DNN model reduce the data movement between the accelerator and different levels of the memory hierarchy improving the performance. However, finding layer-wise optimum mapping for a target architecture with a given energy and latency envelope is an open problem due to the huge search space in the mappings. In this paper, we propose a Reinforcement Learning (RL) based automated mapping approach to find optimum schedules of DNN layers for a given architecture model without violating the specified energy and latency constraints. The learned policies easily adapt to a wide range of DNN models with different hardware configurations, facilitating transfer learning improving the training time. Experiments show that the proposed work improves latency and energy consumption by an average of 21.5% and 15.6% respectively compared to the state-of-the-art genetic algorithm-based GAMMA approach for a wide range of DNN models running on NVIDIA Deep Learning Accelerator (NVDLA). The training time of RL-based transfer learning is 15× faster than that of GAMMA.","PeriodicalId":266152,"journal":{"name":"2022 IEEE Symposium in Low-Power and High-Speed Chips (COOL CHIPS)","volume":"18 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-04-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127722938","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Encoder-based Many-Pattern Matching on FPGAs 基于编码器的fpga多模式匹配
2022 IEEE Symposium in Low-Power and High-Speed Chips (COOL CHIPS) Pub Date : 2022-04-20 DOI: 10.1109/coolchips54332.2022.9772671
H. Vu, Ngoc-Dai Bui
{"title":"Encoder-based Many-Pattern Matching on FPGAs","authors":"H. Vu, Ngoc-Dai Bui","doi":"10.1109/coolchips54332.2022.9772671","DOIUrl":"https://doi.org/10.1109/coolchips54332.2022.9772671","url":null,"abstract":"Many-pattern matching is one of the most essential algorithms in many application domains, such as data mining, network security, and bioinformatics. Such high-throughput application domains require high-performance matching engines, leading to the deployment of the algorithm on hardware. However, such hardware deployment consumes a large number of hardware resources. This challenge becomes more critical when scaling the number of patterns as well as the data throughput. In this paper, we first proposed an encoder-based hardware architecture for many-pattern matching on FPGAs. The matching architecture includes two parts: encoder-based filter and matching block. We also proposed an algorithm to simplify the structure of the encoder-based filter, thus reducing the hardware utilization. The hardware architecture is scalable with the number of patterns and the input data throughput. We evaluated our matching architecture and our algorithm with 2048 32-byte patterns abstracted from Snort rules for malware. The evaluation on Xilinx Zedboard shows that at 2.16 Gbps throughput, the proposed architecture achieves higher hardware efficiency at 0.05 LUTs per character, a block RAM consumption 10% of total device, and almost no flip-flop consumption, while the maximum clock frequency and the latency are 270 MHz and 11 ns, respectively.","PeriodicalId":266152,"journal":{"name":"2022 IEEE Symposium in Low-Power and High-Speed Chips (COOL CHIPS)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-04-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128818076","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
A 1036 TOp/s/W, 12.2 mW, 2.72 μJ/Inference All Digital TNN Accelerator in 22 nm FDX Technology for TinyML Applications 用于TinyML应用的1036 TOp/s/W、12.2 mW、2.72 μJ/Inference全数字TNN加速器
2022 IEEE Symposium in Low-Power and High-Speed Chips (COOL CHIPS) Pub Date : 2022-04-20 DOI: 10.1109/coolchips54332.2022.9772668
Moritz Scherer, Alfio Di Mauro, Georg Rutishauser, Tim Fischer, L. Benini
{"title":"A 1036 TOp/s/W, 12.2 mW, 2.72 μJ/Inference All Digital TNN Accelerator in 22 nm FDX Technology for TinyML Applications","authors":"Moritz Scherer, Alfio Di Mauro, Georg Rutishauser, Tim Fischer, L. Benini","doi":"10.1109/coolchips54332.2022.9772668","DOIUrl":"https://doi.org/10.1109/coolchips54332.2022.9772668","url":null,"abstract":"Tiny Machine Learning (TinyML) applications impose μJ/Inference constraints, with maximum power consumption of a few tens of mW. It is extremely challenging to meet these requirement at a reasonable accuracy level. In this work, we address this challenge with a flexible, fully digital Ternary Neural Network (TNN) accelerator in a RISC-V-based SoC. The design achieves 2.72 μJ/Inference, 12.2 mW, 3200 Inferences/sec at 0.5 V for a non-trivial 9-layer, 96 channels-per-layer network with CIFAR-10 accuracy of 86 %. The peak energy efficiency is 1036 TOp/s/W, outperforming the state-of-the-art in silicon-proven TinyML accelerators by 1.67x.","PeriodicalId":266152,"journal":{"name":"2022 IEEE Symposium in Low-Power and High-Speed Chips (COOL CHIPS)","volume":"55 Pt B 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-04-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122598171","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 6
Body Bias Control on a CGRA based on Convex Optimization 基于凸优化的CGRA车身偏置控制
2022 IEEE Symposium in Low-Power and High-Speed Chips (COOL CHIPS) Pub Date : 2022-04-20 DOI: 10.1109/coolchips54332.2022.9772708
Takuya Kojima, Hayate Okuhara, Masaaki Kondo, H. Amano
{"title":"Body Bias Control on a CGRA based on Convex Optimization","authors":"Takuya Kojima, Hayate Okuhara, Masaaki Kondo, H. Amano","doi":"10.1109/coolchips54332.2022.9772708","DOIUrl":"https://doi.org/10.1109/coolchips54332.2022.9772708","url":null,"abstract":"Body biasing is one of the critical techniques to realize more energy-efficient computing with reconfigurable devices, such as Coarse-Grained Reconfigurable Architectures (CGRAs). Its benefit depends on the control granularity, whereas fine-grained control makes it challenging to find the best body bias voltage for each domain due to the complexity of the optimization problem. This work reformulates the optimization problem and introduces continuous relaxation to solve it faster than previous work. Experimental result shows the proposed method can solve the problem within 0.5 sec for all benchmarks in any conditions and demonstrates up to 5.65x speed-up compared to the previous method with negligible loss of accuracy.","PeriodicalId":266152,"journal":{"name":"2022 IEEE Symposium in Low-Power and High-Speed Chips (COOL CHIPS)","volume":"317 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-04-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133857211","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Session III Panel Discussions: The Future of Mission-critical, Mixed-criticality High-performance Embedded Systems 第三部分小组讨论:关键任务、混合关键高性能嵌入式系统的未来
2022 IEEE Symposium in Low-Power and High-Speed Chips (COOL CHIPS) Pub Date : 2022-04-20 DOI: 10.1109/coolchips54332.2022.9772707
{"title":"Session III Panel Discussions: The Future of Mission-critical, Mixed-criticality High-performance Embedded Systems","authors":"","doi":"10.1109/coolchips54332.2022.9772707","DOIUrl":"https://doi.org/10.1109/coolchips54332.2022.9772707","url":null,"abstract":"","PeriodicalId":266152,"journal":{"name":"2022 IEEE Symposium in Low-Power and High-Speed Chips (COOL CHIPS)","volume":"12 3 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-04-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133895304","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
A Low-power and Real-time 3D Object Recognition Processor with Dense RGB-D Data Acquisition in Mobile Platforms 移动平台上具有密集RGB-D数据采集的低功耗实时三维目标识别处理器
2022 IEEE Symposium in Low-Power and High-Speed Chips (COOL CHIPS) Pub Date : 2022-04-20 DOI: 10.1109/coolchips54332.2022.9772667
Dongseok Im, Gwangtae Park, Junha Ryu, Zhiyong Li, Sanghoon Kang, Donghyeon Han, Jinsu Lee, Wonhoon Park, Hankyul Kwon, H. Yoo
{"title":"A Low-power and Real-time 3D Object Recognition Processor with Dense RGB-D Data Acquisition in Mobile Platforms","authors":"Dongseok Im, Gwangtae Park, Junha Ryu, Zhiyong Li, Sanghoon Kang, Donghyeon Han, Jinsu Lee, Wonhoon Park, Hankyul Kwon, H. Yoo","doi":"10.1109/coolchips54332.2022.9772667","DOIUrl":"https://doi.org/10.1109/coolchips54332.2022.9772667","url":null,"abstract":"A low-power and real-time 3D object recognition with RGBD data acquisition system-on-chip (SoC) is proposed. By synthesizing dense RGB-D data through monocular depth estimation, the proposed system reduces the sensor power for 3D data acquisition by ×27.3 lower. Moreover, the proposed processor reduces the energy consumption of a point cloud based neural network (PNN) exploiting bit-slice-level computation and a point feature reuse method with a pipelined architecture. Additionally, the processor supports the point sampling and grouping algorithms of the PNN with a unified point processing core. Finally, the processor achieves 210.0 mW while implementing 34.0 frame-per-second (fps) end-to-end RGB-D acquisition and 3D object recognition.","PeriodicalId":266152,"journal":{"name":"2022 IEEE Symposium in Low-Power and High-Speed Chips (COOL CHIPS)","volume":"57 1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-04-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"117218549","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
DXT501:An SDR-Based Baseband MP-SoC for Multi-Protocol Industrial Wireless Communication DXT501:基于sdr的多协议工业无线通信基带MP-SoC
2022 IEEE Symposium in Low-Power and High-Speed Chips (COOL CHIPS) Pub Date : 2022-04-20 DOI: 10.1109/coolchips54332.2022.9772697
Yang Chen, Lin Liu, Xuelin Feng, Jinglin Shi
{"title":"DXT501:An SDR-Based Baseband MP-SoC for Multi-Protocol Industrial Wireless Communication","authors":"Yang Chen, Lin Liu, Xuelin Feng, Jinglin Shi","doi":"10.1109/coolchips54332.2022.9772697","DOIUrl":"https://doi.org/10.1109/coolchips54332.2022.9772697","url":null,"abstract":"This paper design and implement an SDR-based baseband MP-SoC DXT501. It contains four high-performance 32-bit ASIPs, a real-time 32-bit RISC processor, a high-performance dual-core 32-bit GP processor ARC HS47Dx2, and some hardware accelerators that support LTE, 4G, MulteFire, and 5G(Release15). What's more, a mobile device solution supporting multiple protocols is proposed. The practical test shows that the mobile device running on the MulteFire 1.1 protocol in the unlicensed frequency band has a transmission capacity of more than 300Mbps in uplink and 150Mbps in downlink, which can meet the requirements of modern industrial wireless communication applications such as mobile inspection robots.","PeriodicalId":266152,"journal":{"name":"2022 IEEE Symposium in Low-Power and High-Speed Chips (COOL CHIPS)","volume":"67 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-04-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129704435","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
A Memcapacitive Spiking Neural Network with Circuit Nonlinearity-aware Training 具有电路非线性感知训练的记忆电容尖峰神经网络
2022 IEEE Symposium in Low-Power and High-Speed Chips (COOL CHIPS) Pub Date : 2022-04-20 DOI: 10.1109/coolchips54332.2022.9772674
Reon Oshio, Sugahara Takuya, Atsushi Sawada, Mutsumi Kimura, Renyuan Zhang, Y. Nakashima
{"title":"A Memcapacitive Spiking Neural Network with Circuit Nonlinearity-aware Training","authors":"Reon Oshio, Sugahara Takuya, Atsushi Sawada, Mutsumi Kimura, Renyuan Zhang, Y. Nakashima","doi":"10.1109/coolchips54332.2022.9772674","DOIUrl":"https://doi.org/10.1109/coolchips54332.2022.9772674","url":null,"abstract":"Neuromorphic computing is an unconventional computing scheme that executes computable algorithms using Spiking Neural Networks (SNNs) mimicking neural dynamics with high speed and low power consumption by the dedicated hardware. The analog implementation of neuromorphic computing has been studied in the field of edge computing etc. and is considered to be superior to the digital implementation in terms of power consumption. Furthermore, It is expected to have extremely low power consumption that Processing-In-Memory (PIM) based synaptic operations using non-volatile memory (NVM) devices for both weight memory and multiply-accumulate operations. However, unintended non-linearities and hysteresis occur when attempting to implement analog spiking neuron circuits as simply as possible. As a result, it is thought to cause accuracy loss when inference is performed by mapping the weight parameters of the SNNs which trained offline to the element parameters of the NVM. In this study, we newly designed neuromorphic hardware operating at 100 MHz that employs memcapacitor as a synaptic element, which is expected to have ultra-low power consumption. We also propose a method for training SNNs that incorporate the nonlinearity of the designed circuit into the neuron model and convert the synaptic weights into circuit element parameters. The proposed training method can reduce the degradation of accuracy even for very simple neuron circuits. The proposed circuit and method classify MNIST with ∼33.88 nJ/Inference, excluding the encoder, with ∼97% accuracy. The circuit design and measurement of circuit characteristics were performed in Rohm 180nm process using HSPICE. A spiking neuron model that incorporates circuit non-linearity as an activation function was implemented in PyTorch, a machine learning framework for Python.","PeriodicalId":266152,"journal":{"name":"2022 IEEE Symposium in Low-Power and High-Speed Chips (COOL CHIPS)","volume":"15 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-04-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131489833","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Power Analysis of Directly-connected FPGA Clusters 直连FPGA集群的功耗分析
2022 IEEE Symposium in Low-Power and High-Speed Chips (COOL CHIPS) Pub Date : 2022-04-20 DOI: 10.1109/coolchips54332.2022.9772675
Kensuke Iizuka, Haruna Takagi, Aika Kamei, Kazuei Hironaka, H. Amano
{"title":"Power Analysis of Directly-connected FPGA Clusters","authors":"Kensuke Iizuka, Haruna Takagi, Aika Kamei, Kazuei Hironaka, H. Amano","doi":"10.1109/coolchips54332.2022.9772675","DOIUrl":"https://doi.org/10.1109/coolchips54332.2022.9772675","url":null,"abstract":"Although low power consumption is a significant advantage of FPGA clusters, almost no power analyses with real systems have been reported. This study reports the detailed power consumption analyses of two FPGA clusters, namely, M-KUBOS and FiC, with power measurement tools and real applications. In both clusters, the type of logic design shells determines the base power consumption. For building clusters, the power for node communication links is mainly determined by the number of activated links and not influenced by the number of actually used links. Therefore, applying the link aggregation technique does not affect the power consumption. Increasing the clock frequency of the application logic mildly increases the power consumption. The obtained results suggest that the best way to reduce the total power consumption of an FPGA cluster and improve its performance is to use the minimum number of links for the application, apply link aggregation, and aggressively increase the clock frequency.","PeriodicalId":266152,"journal":{"name":"2022 IEEE Symposium in Low-Power and High-Speed Chips (COOL CHIPS)","volume":"24 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-04-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123713524","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Hardware Acceleration of Aggregate Signature Generation and Authentication by BLS Signature over BLS12-381 curve BLS12-381曲线上BLS签名聚合签名生成和认证的硬件加速
2022 IEEE Symposium in Low-Power and High-Speed Chips (COOL CHIPS) Pub Date : 2022-04-20 DOI: 10.1109/coolchips54332.2022.9772706
Kaoru Masada, R. Nakayama, M. Ikeda
{"title":"Hardware Acceleration of Aggregate Signature Generation and Authentication by BLS Signature over BLS12-381 curve","authors":"Kaoru Masada, R. Nakayama, M. Ikeda","doi":"10.1109/coolchips54332.2022.9772706","DOIUrl":"https://doi.org/10.1109/coolchips54332.2022.9772706","url":null,"abstract":"BLS signature is a digital signature scheme computed over elliptic curves, and it has been attracting attention with its interesting function that signatures can be aggregated. We will introduce our progress of designing two ASIC architectures to accelerate the complex computations of generating and verifying signatures respectively. The computations include mapping to elliptic curves and pairing. An important subject of our work is to adopt a relatively new curve called BLS12-381. BLS12-381 is currently one of the curves that gather the most interests and yet very few ASIC implementations are optimized for BLS12-381.","PeriodicalId":266152,"journal":{"name":"2022 IEEE Symposium in Low-Power and High-Speed Chips (COOL CHIPS)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-04-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125829342","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
相关产品
×
本文献相关产品
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信