{"title":"高能效AI芯片专题介绍","authors":"V. Chandra, Yiran Chen, Sung-kyu Yoo","doi":"10.1145/3538502","DOIUrl":null,"url":null,"abstract":"Energy efficiency is one of the most important metrics in AI system designs on both servers and mobile devices. Especially, mobile and edge devices require 10-100X better energy-efficient computing for immersive AR/VR applications as well as AI-based apps on smartphones, smart cameras, etc. Due to battery and cost reasons, such emerging applications demand extreme energy efficiency and high performance to run dozens of heavy neural network models in real time and under stringent power budgets. In order to realize 100X improvements in energy efficiency, we require innovative ideas in both software and hardware. In this special issue, which originated from the Highly Efficient Neural Processing (HENP) workshop held in ESWEEK 2020, we aimed at covering state-of-the-art industrial and academic efforts to achieve orders of magnitude better energy efficiency in software and hardware designs for AI chips. Recent neural networks adopt special layers for better compute efficiency. “MVP: An Efficient CNN Accelerator with Matrix, Vector, and Processing-Near-Memory Units” by Lee et al. proposes an approach to improve area-efficiency of systolic array accelerator for depth-wise separable convolution and squeeze-and-excitation layers which are widely adopted on networks for mobile and embedded systems due to efficiency reasons. Reusing computation results is one of the representative methods in reducing computation cost thereby improving energy efficiency. “Energy Efficient Boosting of GEMM Accelerators for DNN via Reuse” by Cicek et al. proposes a novel reuse-centric hardware accelerator for CNN inference based on the proposed improved detection of neuron vector similarity. Embedded devices are often characterized by continuous sensory inputs and limited computing/programming capabilities. “A Low-Power Programmable Machine Learning Hardware Accelerator Design for Intelligent Edge Devices” by Kee et al. proposes a hardware accelerator, called intelligent boosting engine, to accelerate sensor fusion and the SVM-based motion recognition algorithm with limited programmability. Processing sequential inputs under timing and power budgets is a representative design problem in edge applications. In “Energy Efficient LSTM Inference Accelerator for Real-Time Causal Prediction” by Chen et al. the authors take advantage of fine-grained parallelism, pipelined feedforward and recurrent updates in LSTM and present a bit-sparse quantization to reduce the circuit cost by replacing the original multiplication with the bit-shift operation. Adopting reinforcement learning on edge devices for sequential decision-making and control based on image inputs is desirable, but challenging due to the low efficiency of training and the high cost of inference. In “E2HRL: An Energy-Efficient Hardware Accelerator for Hierarchical Deep Reinforcement Learning”, Shiri et al. proposes a scalable hardware architecture called E2HRL which boosts training speed by learning hierarchical policies and enables high energy efficiency by a cross-layer design methodology.","PeriodicalId":6933,"journal":{"name":"ACM Transactions on Design Automation of Electronic Systems (TODAES)","volume":"1 1","pages":"1 - 2"},"PeriodicalIF":0.0000,"publicationDate":"2022-09-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Introduction to the Special Section on Energy-Efficient AI Chips\",\"authors\":\"V. Chandra, Yiran Chen, Sung-kyu Yoo\",\"doi\":\"10.1145/3538502\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Energy efficiency is one of the most important metrics in AI system designs on both servers and mobile devices. Especially, mobile and edge devices require 10-100X better energy-efficient computing for immersive AR/VR applications as well as AI-based apps on smartphones, smart cameras, etc. Due to battery and cost reasons, such emerging applications demand extreme energy efficiency and high performance to run dozens of heavy neural network models in real time and under stringent power budgets. In order to realize 100X improvements in energy efficiency, we require innovative ideas in both software and hardware. In this special issue, which originated from the Highly Efficient Neural Processing (HENP) workshop held in ESWEEK 2020, we aimed at covering state-of-the-art industrial and academic efforts to achieve orders of magnitude better energy efficiency in software and hardware designs for AI chips. Recent neural networks adopt special layers for better compute efficiency. “MVP: An Efficient CNN Accelerator with Matrix, Vector, and Processing-Near-Memory Units” by Lee et al. proposes an approach to improve area-efficiency of systolic array accelerator for depth-wise separable convolution and squeeze-and-excitation layers which are widely adopted on networks for mobile and embedded systems due to efficiency reasons. Reusing computation results is one of the representative methods in reducing computation cost thereby improving energy efficiency. “Energy Efficient Boosting of GEMM Accelerators for DNN via Reuse” by Cicek et al. proposes a novel reuse-centric hardware accelerator for CNN inference based on the proposed improved detection of neuron vector similarity. Embedded devices are often characterized by continuous sensory inputs and limited computing/programming capabilities. “A Low-Power Programmable Machine Learning Hardware Accelerator Design for Intelligent Edge Devices” by Kee et al. proposes a hardware accelerator, called intelligent boosting engine, to accelerate sensor fusion and the SVM-based motion recognition algorithm with limited programmability. Processing sequential inputs under timing and power budgets is a representative design problem in edge applications. In “Energy Efficient LSTM Inference Accelerator for Real-Time Causal Prediction” by Chen et al. the authors take advantage of fine-grained parallelism, pipelined feedforward and recurrent updates in LSTM and present a bit-sparse quantization to reduce the circuit cost by replacing the original multiplication with the bit-shift operation. Adopting reinforcement learning on edge devices for sequential decision-making and control based on image inputs is desirable, but challenging due to the low efficiency of training and the high cost of inference. In “E2HRL: An Energy-Efficient Hardware Accelerator for Hierarchical Deep Reinforcement Learning”, Shiri et al. proposes a scalable hardware architecture called E2HRL which boosts training speed by learning hierarchical policies and enables high energy efficiency by a cross-layer design methodology.\",\"PeriodicalId\":6933,\"journal\":{\"name\":\"ACM Transactions on Design Automation of Electronic Systems (TODAES)\",\"volume\":\"1 1\",\"pages\":\"1 - 2\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2022-09-21\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"ACM Transactions on Design Automation of Electronic Systems (TODAES)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1145/3538502\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"ACM Transactions on Design Automation of Electronic Systems (TODAES)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3538502","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
摘要
在服务器和移动设备上的人工智能系统设计中,能效是最重要的指标之一。特别是,移动和边缘设备需要10-100倍的高效节能计算来实现沉浸式AR/VR应用,以及智能手机、智能相机等上基于ai的应用。由于电池和成本的原因,这些新兴应用需要极高的能源效率和高性能,以便在严格的功率预算下实时运行数十个重型神经网络模型。为了实现100倍的能源效率提高,我们需要在软件和硬件上都有创新的想法。本期特刊源于ESWEEK 2020举办的高效神经处理(HENP)研讨会,我们旨在涵盖最先进的工业和学术努力,以实现AI芯片软件和硬件设计的数量级能效提高。最近的神经网络采用特殊的层来提高计算效率。Lee等人的“MVP:一个具有矩阵、向量和处理近内存单元的高效CNN加速器”提出了一种提高深度可分离卷积层和挤压激励层收缩阵列加速器的面积效率的方法,由于效率原因,这些层被广泛应用于移动和嵌入式系统的网络。计算结果的再利用是降低计算成本从而提高能源效率的代表性方法之一。Cicek等人的“Energy Efficient Boosting of GEMM Accelerators for DNN via Reuse”一文中基于改进的神经元向量相似性检测,提出了一种新的以重用为中心的CNN推理硬件加速器。嵌入式设备通常以连续的感官输入和有限的计算/编程能力为特征。Kee等人的“智能边缘设备的低功耗可编程机器学习硬件加速器设计”提出了一种称为智能提升引擎的硬件加速器,以加速传感器融合和基于svm的运动识别算法,但可编程性有限。在时序和功率预算下处理顺序输入是边缘应用中的代表性设计问题。在Chen等人的“Energy Efficient LSTM Inference Accelerator for Real-Time Causal Prediction”一文中,作者利用LSTM的细粒度并行性、流水线前馈和循环更新,提出了位稀疏量化,用位移位操作取代原来的乘法运算,从而降低了电路成本。在边缘设备上采用强化学习进行基于图像输入的顺序决策和控制是可取的,但由于训练效率低和推理成本高,因此具有挑战性。在“E2HRL:用于分层深度强化学习的节能硬件加速器”一文中,Shiri等人提出了一种可扩展的硬件架构E2HRL,该架构通过学习分层策略来提高训练速度,并通过跨层设计方法实现高能效。
Introduction to the Special Section on Energy-Efficient AI Chips
Energy efficiency is one of the most important metrics in AI system designs on both servers and mobile devices. Especially, mobile and edge devices require 10-100X better energy-efficient computing for immersive AR/VR applications as well as AI-based apps on smartphones, smart cameras, etc. Due to battery and cost reasons, such emerging applications demand extreme energy efficiency and high performance to run dozens of heavy neural network models in real time and under stringent power budgets. In order to realize 100X improvements in energy efficiency, we require innovative ideas in both software and hardware. In this special issue, which originated from the Highly Efficient Neural Processing (HENP) workshop held in ESWEEK 2020, we aimed at covering state-of-the-art industrial and academic efforts to achieve orders of magnitude better energy efficiency in software and hardware designs for AI chips. Recent neural networks adopt special layers for better compute efficiency. “MVP: An Efficient CNN Accelerator with Matrix, Vector, and Processing-Near-Memory Units” by Lee et al. proposes an approach to improve area-efficiency of systolic array accelerator for depth-wise separable convolution and squeeze-and-excitation layers which are widely adopted on networks for mobile and embedded systems due to efficiency reasons. Reusing computation results is one of the representative methods in reducing computation cost thereby improving energy efficiency. “Energy Efficient Boosting of GEMM Accelerators for DNN via Reuse” by Cicek et al. proposes a novel reuse-centric hardware accelerator for CNN inference based on the proposed improved detection of neuron vector similarity. Embedded devices are often characterized by continuous sensory inputs and limited computing/programming capabilities. “A Low-Power Programmable Machine Learning Hardware Accelerator Design for Intelligent Edge Devices” by Kee et al. proposes a hardware accelerator, called intelligent boosting engine, to accelerate sensor fusion and the SVM-based motion recognition algorithm with limited programmability. Processing sequential inputs under timing and power budgets is a representative design problem in edge applications. In “Energy Efficient LSTM Inference Accelerator for Real-Time Causal Prediction” by Chen et al. the authors take advantage of fine-grained parallelism, pipelined feedforward and recurrent updates in LSTM and present a bit-sparse quantization to reduce the circuit cost by replacing the original multiplication with the bit-shift operation. Adopting reinforcement learning on edge devices for sequential decision-making and control based on image inputs is desirable, but challenging due to the low efficiency of training and the high cost of inference. In “E2HRL: An Energy-Efficient Hardware Accelerator for Hierarchical Deep Reinforcement Learning”, Shiri et al. proposes a scalable hardware architecture called E2HRL which boosts training speed by learning hierarchical policies and enables high energy efficiency by a cross-layer design methodology.