Proceedings of the ACM/IEEE International Symposium on Low Power Electronics and Design最新文献

筛选
英文 中文
Multi-Complexity-Loss DNAS for Energy-Efficient and Memory-Constrained Deep Neural Networks 节能和内存约束深度神经网络的多重复杂性损失dna
Matteo Risso, A. Burrello, L. Benini, E. Macii, M. Poncino, D. J. Pagliari
{"title":"Multi-Complexity-Loss DNAS for Energy-Efficient and Memory-Constrained Deep Neural Networks","authors":"Matteo Risso, A. Burrello, L. Benini, E. Macii, M. Poncino, D. J. Pagliari","doi":"10.1145/3531437.3539720","DOIUrl":"https://doi.org/10.1145/3531437.3539720","url":null,"abstract":"Neural Architecture Search (NAS) is increasingly popular to automatically explore the accuracy versus computational complexity trade-off of Deep Learning (DL) architectures. When targeting tiny edge devices, the main challenge for DL deployment is matching the tight memory constraints, hence most NAS algorithms consider model size as the complexity metric. Other methods reduce the energy or latency of DL models by trading off accuracy and number of inference operations. Energy and memory are rarely considered simultaneously, in particular by low-search-cost Differentiable NAS (DNAS) solutions. We overcome this limitation proposing the first DNAS that directly addresses the most realistic scenario from a designer’s perspective: the co-optimization of accuracy and energy (or latency) under a memory constraint, determined by the target HW. We do so by combining two complexity-dependent loss functions during training, with independent strength. Testing on three edge-relevant tasks from the MLPerf Tiny benchmark suite, we obtain rich Pareto sets of architectures in the energy vs. accuracy space, with memory footprints constraints spanning from 75% to 6.25% of the baseline networks. When deployed on a commercial edge device, the STM NUCLEO-H743ZI2, our networks span a range of 2.18x in energy consumption and 4.04% in accuracy for the same memory constraint, and reduce energy by up to 2.2 × with negligible accuracy drop with respect to the baseline.","PeriodicalId":116486,"journal":{"name":"Proceedings of the ACM/IEEE International Symposium on Low Power Electronics and Design","volume":"107 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123237739","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 3
Neural Contextual Bandits Based Dynamic Sensor Selection for Low-Power Body-Area Networks 基于神经上下文强盗的低功耗体域网络动态传感器选择
B. U. Demirel, Luke Chen, M. A. Faruque
{"title":"Neural Contextual Bandits Based Dynamic Sensor Selection for Low-Power Body-Area Networks","authors":"B. U. Demirel, Luke Chen, M. A. Faruque","doi":"10.1145/3531437.3539713","DOIUrl":"https://doi.org/10.1145/3531437.3539713","url":null,"abstract":"Providing health monitoring devices with machine intelligence is important for enabling automatic mobile healthcare applications. However, this brings additional challenges due to the resource scarcity of these devices. This work introduces a neural contextual bandits based dynamic sensor selection methodology for high-performance and resource-efficient body-area networks to realize next generation mobile health monitoring devices. The methodology utilizes contextual bandits to select the most informative sensor combinations during runtime and ignore redundant data for decreasing transmission and computing power in a body area network (BAN). The proposed method has been validated using one of the most common health monitoring applications: cardiac activity monitoring. Solutions from our proposed method are compared against those from related works in terms of classification performance and energy while considering the communication energy consumption. Our final solutions could reach 78.8% AU-PRC on the PTB-XL ECG dataset for cardiac abnormality detection while decreasing the overall energy consumption and computational energy by 3.7 × and 4.3 ×, respectively.","PeriodicalId":116486,"journal":{"name":"Proceedings of the ACM/IEEE International Symposium on Low Power Electronics and Design","volume":"19 2","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-05-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114032686","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
A comprehensive methodology to determine optimal coherence interfaces for many-accelerator SoCs 确定多加速器soc的最佳相干接口的综合方法
K. Bhardwaj, Marton Havasi, Yuan Yao, D. Brooks, José Miguel Hernández-Lobato, Gu-Yeon Wei
{"title":"A comprehensive methodology to determine optimal coherence interfaces for many-accelerator SoCs","authors":"K. Bhardwaj, Marton Havasi, Yuan Yao, D. Brooks, José Miguel Hernández-Lobato, Gu-Yeon Wei","doi":"10.1145/3370748.3406564","DOIUrl":"https://doi.org/10.1145/3370748.3406564","url":null,"abstract":"Modern systems-on-chip (SoCs) include not only general-purpose CPUs but also specialized hardware accelerators. Typically, there are three coherence model choices to integrate an accelerator with the memory hierarchy: no coherence, coherent with the last-level cache (LLC), and private cache based full coherence. However, there has been very limited research on finding which coherence models are optimal for the accelerators of a complex many-accelerator SoC. This paper focuses on determining a cost-aware coherence interface for an SoC and its target application: find the best coherence models for the accelerators that optimize their power and performance, considering both workload characteristics and system-level contention. A novel comprehensive methodology is proposed that uses Bayesian optimization to efficiently find the cost-aware coherence interfaces for SoCs that are modeled using the gem5-Aladdin architectural simulator. For a complete analysis, gem5-Aladdin is extended to support LLC coherence in addition to already-supported no coherence and full coherence. For a heterogeneous SoC targeting applications with varying amount of accelerator-level parallelism, the proposed framework rapidly finds cost-aware coherence interfaces that show significant performance and power benefits over the other commonly-used coherence interfaces.","PeriodicalId":116486,"journal":{"name":"Proceedings of the ACM/IEEE International Symposium on Low Power Electronics and Design","volume":"35 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-08-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128375408","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 4
A 1.2-V, 1.8-GHz low-power PLL using a class-F VCO for driving 900-MHz SRD band SC-circuits 一个使用f类压控振荡器的1.2 v, 1.8 ghz低功耗锁相环,用于驱动900 mhz SRD频段sc电路
Tim Schumacher, Markus Stadelmayer, Thomas Faseth, H. Pretl
{"title":"A 1.2-V, 1.8-GHz low-power PLL using a class-F VCO for driving 900-MHz SRD band SC-circuits","authors":"Tim Schumacher, Markus Stadelmayer, Thomas Faseth, H. Pretl","doi":"10.1145/3370748.3406551","DOIUrl":"https://doi.org/10.1145/3370748.3406551","url":null,"abstract":"This work presents a 1.6 GHz to 2 GHz integer PLL with 2 MHz stepping, which is optimized for driving low-power 180 nm switched-capacitor (SC) circuits at a 1.2 V supply. To reduce the overall power consumption, a class-F VCO is implemented. Due to enriched odd harmonics of the oscillator, a rectangular oscillator signal is generated, which allows omitting output buffering stages. The rectangular signal results in a lowered power consumption and enables to directly drive SC-filters and an RF-divider using the oscillator signal. In addition, the proposed RF-divider includes a differential 4-phase signal generation at 868 MHz and 915 MHz SRD band frequencies that can be used for complex modulation schemes. With a fully integrated loop-filter, a maximum of integration is achieved. A test-chip was manufactured in a 1P6M 180 nm CMOS technology with triple-well option and confirms a PLL with a total active power consumption of 4.1 mW. It achieves a phase noise of -111 dBc/Hz at 1 MHz offset and a -42 dBc spurious response from a 1 MHz reference.","PeriodicalId":116486,"journal":{"name":"Proceedings of the ACM/IEEE International Symposium on Low Power Electronics and Design","volume":"26 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-08-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114731474","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Approximate inference systems (AxIS): end-to-end approximations for energy-efficient inference at the edge 近似推理系统(AxIS):在边缘进行节能推理的端到端近似
Soumendu Kumar Ghosh, Arnab Raha, V. Raghunathan
{"title":"Approximate inference systems (AxIS): end-to-end approximations for energy-efficient inference at the edge","authors":"Soumendu Kumar Ghosh, Arnab Raha, V. Raghunathan","doi":"10.1145/3370748.3406575","DOIUrl":"https://doi.org/10.1145/3370748.3406575","url":null,"abstract":"The rapid proliferation of the Internet-of-Things (IoT) and the dramatic resurgence of artificial intelligence (AI) based application workloads has led to immense interest in performing inference on energy-constrained edge devices. Approximate computing (a design paradigm that yields large energy savings at the cost of a small degradation in application quality) is a promising technique to enable energy-efficient inference at the edge. This paper introduces the concept of an approximate inference system (AxIS) and proposes a systematic methodology to perform joint approximations across different subsystems in a deep neural network-based inference system, leading to significant energy benefits compared to approximating individual subsystems in isolation. We use a smart camera system that executes various convolutional neural network (CNN) based image recognition applications to illustrate how the sensor, memory, compute, and communication subsystems can all be approximated synergistically. We demonstrate our proposed methodology using two variants of a smart camera system: (a) Camedge, where the CNN executes locally on the edge device, and (b) Camcloud, where the edge device sends the captured image to a remote cloud server that executes the CNN. We have prototyped such an approximate inference system using an Altera Stratix IV GX-based Terasic TR4-230 FPGA development board. Experimental results obtained using six CNNs demonstrate significant energy savings (around 1.7× for Camedge and 3.5× for Camcloud) for minimal (< 1%) loss in application quality. Compared to approximating a single subsystem in isolation, AxIS achieves additional energy benefits of 1.6×--1.7× (Camedge) and 1.4×--3.4× (Camcloud) on average for minimal application-level quality loss.","PeriodicalId":116486,"journal":{"name":"Proceedings of the ACM/IEEE International Symposium on Low Power Electronics and Design","volume":"26 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-08-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121760469","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 7
STINT 工作
Proceedings of the ACM/IEEE International Symposium on Low Power Electronics and Design Pub Date : 2020-08-10 DOI: 10.1007/978-3-642-41714-6_197299
Tao-Yi Lee, Khuong Vo, Wongi Baek, Michelle Khine, N. Dutt
{"title":"STINT","authors":"Tao-Yi Lee, Khuong Vo, Wongi Baek, Michelle Khine, N. Dutt","doi":"10.1007/978-3-642-41714-6_197299","DOIUrl":"https://doi.org/10.1007/978-3-642-41714-6_197299","url":null,"abstract":"","PeriodicalId":116486,"journal":{"name":"Proceedings of the ACM/IEEE International Symposium on Low Power Electronics and Design","volume":"364 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-08-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134480540","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
SparTANN
H. Sim, Jooyeon Choi, Jongeun Lee
{"title":"SparTANN","authors":"H. Sim, Jooyeon Choi, Jongeun Lee","doi":"10.1145/3370748.3406554","DOIUrl":"https://doi.org/10.1145/3370748.3406554","url":null,"abstract":"While sparsity has been exploited in many inference accelerators, not much work is done for training accelerators. Exploiting sparsity in training accelerators involves multiple issues, including where to find sparsity, how to exploit sparsity, and how to create more sparsity. In this paper we present a novel sparse training architecture that can exploit sparsity in gradient tensors in both back propagation and weight update computation. We also propose a single-pass sparsification algorithm, which is a hardware-friendly version of a recently proposed sparse training algorithm, that can create additional sparsity aggressively during training. Our experimental results using large networks such as AlexNet and GoogleNet demonstrate that our sparse training architecture can accelerate convolution layer training time by 4.20~8.88× over baseline dense training without accuracy loss, and further increase the training speed by 7.30~11.87× over the baseline with minimal accuracy loss.","PeriodicalId":116486,"journal":{"name":"Proceedings of the ACM/IEEE International Symposium on Low Power Electronics and Design","volume":"194 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-08-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114748290","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 3
Sound event detection with binary neural networks on tightly power-constrained IoT devices 在功率严格限制的物联网设备上使用二进制神经网络进行声音事件检测
G. Cerutti, Renzo Andri, L. Cavigelli, Elisabetta Farella, M. Magno, L. Benini
{"title":"Sound event detection with binary neural networks on tightly power-constrained IoT devices","authors":"G. Cerutti, Renzo Andri, L. Cavigelli, Elisabetta Farella, M. Magno, L. Benini","doi":"10.1145/3370748.3406588","DOIUrl":"https://doi.org/10.1145/3370748.3406588","url":null,"abstract":"Sound event detection (SED) is a hot topic in consumer and smart city applications. Existing approaches based on deep neural networks (DNNs) are very effective, but highly demanding in terms of memory, power, and throughput when targeting ultra-low power always-on devices. Latency, availability, cost, and privacy requirements are pushing recent IoT systems to process the data on the node, close to the sensor, with a very limited energy supply, and tight constraints on the memory size and processing capabilities precluding to run state-of-the-art DNNs. In this paper, we explore the combination of extreme quantization to a small-footprint binary neural network (BNN) with the highly energy-efficient, RISC-V-based (8+1)-core GAP8 microcontroller. Starting from an existing CNN for SED whose footprint (815 kB) exceeds the 512 kB of memory available on our platform, we retrain the network using binary filters and activations to match these memory constraints. (Fully) binary neural networks come with a natural drop in accuracy of 12-18% on the challenging ImageNet object recognition challenge compared to their equivalent full-precision baselines. This BNN reaches a 77.9% accuracy, just 7% lower than the full-precision version, with 58 kB (7.2× less) for the weights and 262 kB (2.4× less) memory in total. With our BNN implementation, we reach a peak throughput of 4.6 GMAC/s and 1.5 GMAC/s over the full network, including preprocessing with Mel bins, which corresponds to an efficiency of 67.1 GMAC/s/W and 31.3 GMAC/s/W, respectively. Compared to the performance of an ARM Cortex-M4 implementation, our system has a 10.3× faster execution time and a 51.1× higher energy-efficiency.","PeriodicalId":116486,"journal":{"name":"Proceedings of the ACM/IEEE International Symposium on Low Power Electronics and Design","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-08-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134254748","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 28
Integrating event-based dynamic vision sensors with sparse hyperdimensional computing: a low-power accelerator with online learning capability 基于事件的动态视觉传感器与稀疏超维计算的集成:具有在线学习能力的低功耗加速器
Michael Hersche, Edoardo Mello Rella, Alfio Di Mauro, L. Benini, Abbas Rahimi
{"title":"Integrating event-based dynamic vision sensors with sparse hyperdimensional computing: a low-power accelerator with online learning capability","authors":"Michael Hersche, Edoardo Mello Rella, Alfio Di Mauro, L. Benini, Abbas Rahimi","doi":"10.1145/3370748.3406560","DOIUrl":"https://doi.org/10.1145/3370748.3406560","url":null,"abstract":"We propose to embed features extracted from event-driven dynamic vision sensors to binary sparse representations in hyperdimensional (HD) space for regression. This embedding compresses events generated across 346×260 differential pixels to a sparse 8160-bit vector by applying random activation functions. The sparse representation not only simplifies inference, but also enables online learning with the same memory footprint. Specifically, it allows efficient updates by retaining binary vector components over the course of online learning that cannot be otherwise achieved with dense representations demanding multibit vector components. We demonstrate online learning capability: using estimates and confidences of an initial model trained with only 25% of data, our method continuously updates the model for the remaining 75% of data, resulting in a close match with accuracy obtained with an oracle model on ground truth labels. When mapped on an 8-core accelerator, our method also achieves lower error, latency, and energy compared to other sparse/dense alternatives. Furthermore, it is 9.84× more energy-efficient and 6.25× faster than an optimized 9-layer perceptron with comparable accuracy.","PeriodicalId":116486,"journal":{"name":"Proceedings of the ACM/IEEE International Symposium on Low Power Electronics and Design","volume":"15 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-08-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114778068","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 20
Steady state driven power gating for lightening always-on state retention storage 稳定状态驱动的电源门控,以减轻常开状态保持存储
Taehwan Kim, Gyoung-Hwan Hyun, Taewhan Kim
{"title":"Steady state driven power gating for lightening always-on state retention storage","authors":"Taehwan Kim, Gyoung-Hwan Hyun, Taewhan Kim","doi":"10.1145/3370748.3406556","DOIUrl":"https://doi.org/10.1145/3370748.3406556","url":null,"abstract":"It is generally known that a considerable portion of flip-flops in circuits is occupied by the ones with mux-feedback loop (called self-loop), which are the critical (inherently unavoidable) bottleneck in minimizing total (always-on) storage size for the allocation of non-uniform multi-bits for retaining flip-flop states in power gated circuits. This is because it is necessary to replace every self-loop flip-flop with a distinct retention flip-flop with at least one-bit storage for retaining its state since there is no clue as to where the flip-flop state, when waking up, comes from, i.e., from the mux-feedback loop or from the driving flip-flops other than itself. This work breaks this bottleneck by safely treating a large portion of the self-loop flip-flops as if they were the same as the flip-flops with no self-loop. Specifically, we design a novel mechanism of steady state monitoring, operating for a few cycles just before sleeping, on a partial set of self-loop flip-flops, by which the expensive state retention storage never be needed for the monitored flip-flops, contributing to a significant saving on the total size of the always- on state retention storage for power gating.","PeriodicalId":116486,"journal":{"name":"Proceedings of the ACM/IEEE International Symposium on Low Power Electronics and Design","volume":"19 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-08-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121703255","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
相关产品
×
本文献相关产品
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信