Lennart Bamberg;Ardalan Najafi;Alberto Garcia-Ortiz
{"title":"利用神经网络统计实现低功耗 DNN 推断","authors":"Lennart Bamberg;Ardalan Najafi;Alberto Garcia-Ortiz","doi":"10.1109/OJCAS.2024.3388210","DOIUrl":null,"url":null,"abstract":"Specialized compute blocks have been developed for efficient nn execution. However, due to the vast amount of data and parameter movements, the interconnects and on-chip memories form another bottleneck, impairing power and performance. This work addresses this bottleneck by contributing a low-power technique for edge-AI inference engines that combines overhead-free coding with a statistical analysis of the data and parameters of neural networks. Our approach reduces the power consumption of the logic, interconnect, and memory blocks used for data storage and movements by up to 80% for state-of-the-art benchmarks while providing additional power savings for the compute blocks by up to 39 %. These power improvements are achieved with no loss of accuracy and negligible hardware cost.","PeriodicalId":93442,"journal":{"name":"IEEE open journal of circuits and systems","volume":null,"pages":null},"PeriodicalIF":2.4000,"publicationDate":"2024-04-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=10498075","citationCount":"0","resultStr":"{\"title\":\"Exploiting Neural-Network Statistics for Low-Power DNN Inference\",\"authors\":\"Lennart Bamberg;Ardalan Najafi;Alberto Garcia-Ortiz\",\"doi\":\"10.1109/OJCAS.2024.3388210\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Specialized compute blocks have been developed for efficient nn execution. However, due to the vast amount of data and parameter movements, the interconnects and on-chip memories form another bottleneck, impairing power and performance. This work addresses this bottleneck by contributing a low-power technique for edge-AI inference engines that combines overhead-free coding with a statistical analysis of the data and parameters of neural networks. Our approach reduces the power consumption of the logic, interconnect, and memory blocks used for data storage and movements by up to 80% for state-of-the-art benchmarks while providing additional power savings for the compute blocks by up to 39 %. These power improvements are achieved with no loss of accuracy and negligible hardware cost.\",\"PeriodicalId\":93442,\"journal\":{\"name\":\"IEEE open journal of circuits and systems\",\"volume\":null,\"pages\":null},\"PeriodicalIF\":2.4000,\"publicationDate\":\"2024-04-12\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=10498075\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"IEEE open journal of circuits and systems\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://ieeexplore.ieee.org/document/10498075/\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q2\",\"JCRName\":\"ENGINEERING, ELECTRICAL & ELECTRONIC\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE open journal of circuits and systems","FirstCategoryId":"1085","ListUrlMain":"https://ieeexplore.ieee.org/document/10498075/","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"ENGINEERING, ELECTRICAL & ELECTRONIC","Score":null,"Total":0}
引用次数: 0
摘要
为了实现高效的 nn 执行,我们开发了专门的计算模块。然而,由于大量数据和参数的移动,互连和片上存储器构成了另一个瓶颈,损害了功耗和性能。本研究针对这一瓶颈,为边缘人工智能推理引擎提供了一种低功耗技术,该技术将无开销编码与神经网络数据和参数的统计分析相结合。对于最先进的基准,我们的方法将用于数据存储和移动的逻辑、互连和内存块的功耗降低了 80%,同时为计算块节省了 39% 的额外功耗。在实现这些功耗改进的同时,不会降低精度,硬件成本也可忽略不计。
Exploiting Neural-Network Statistics for Low-Power DNN Inference
Specialized compute blocks have been developed for efficient nn execution. However, due to the vast amount of data and parameter movements, the interconnects and on-chip memories form another bottleneck, impairing power and performance. This work addresses this bottleneck by contributing a low-power technique for edge-AI inference engines that combines overhead-free coding with a statistical analysis of the data and parameters of neural networks. Our approach reduces the power consumption of the logic, interconnect, and memory blocks used for data storage and movements by up to 80% for state-of-the-art benchmarks while providing additional power savings for the compute blocks by up to 39 %. These power improvements are achieved with no loss of accuracy and negligible hardware cost.