Power-Efficient Layer Mapping for CNNs on Integrated CPU and GPU Platforms: A Case Study

Tian Wang, Kun Cao, Junlong Zhou, Gongxuan Zhang, Xiji Wang
{"title":"Power-Efficient Layer Mapping for CNNs on Integrated CPU and GPU Platforms: A Case Study","authors":"Tian Wang, Kun Cao, Junlong Zhou, Gongxuan Zhang, Xiji Wang","doi":"10.1145/3394885.3431423","DOIUrl":null,"url":null,"abstract":"Heterogeneous MPSoCs consisting of integrated CPUs and GPUs are suitable platforms for embedded applications running on hand- held devices such as smart phones. As the handheld devices are mostly powered by battery, the integrated CPU and GPU MPSoC is usually designed with an emphasis on low-power rather than performance. In this paper, we are interested in exploring a power- efficient layer mapping of convolution neural networks (CNNs) deployed on integrated CPU and GPU platforms. Specifically, we investigate the impact of layer mapping of YoloV3-Tiny (i.e., a widely-used CNN in both industry and academia) on system power consumption through numerous experiments on NVIDIA board Jetson TX2. The experimental results indicate that 1) almost all of the convolution layers are not suitable for mapping to CPU, 2) the pooling layer can be mapped to CPU for reducing power consumption, but the mapping may lead to a decrease in inference speed when the layer’s output tensor size is large, 3) the detection layer can be mapped to CPU as long as its floating-point operation scale is not too large, and 4) the channel and upsampling layers are both suitable for mapping to CPU. These observations obtained in this study can be further utilized to guide the design of power-efficient layer mapping strategies for integrated CPU and GPU platforms.","PeriodicalId":186307,"journal":{"name":"2021 26th Asia and South Pacific Design Automation Conference (ASP-DAC)","volume":"25 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2021-01-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"2","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2021 26th Asia and South Pacific Design Automation Conference (ASP-DAC)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3394885.3431423","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 2

Abstract

Heterogeneous MPSoCs consisting of integrated CPUs and GPUs are suitable platforms for embedded applications running on hand- held devices such as smart phones. As the handheld devices are mostly powered by battery, the integrated CPU and GPU MPSoC is usually designed with an emphasis on low-power rather than performance. In this paper, we are interested in exploring a power- efficient layer mapping of convolution neural networks (CNNs) deployed on integrated CPU and GPU platforms. Specifically, we investigate the impact of layer mapping of YoloV3-Tiny (i.e., a widely-used CNN in both industry and academia) on system power consumption through numerous experiments on NVIDIA board Jetson TX2. The experimental results indicate that 1) almost all of the convolution layers are not suitable for mapping to CPU, 2) the pooling layer can be mapped to CPU for reducing power consumption, but the mapping may lead to a decrease in inference speed when the layer’s output tensor size is large, 3) the detection layer can be mapped to CPU as long as its floating-point operation scale is not too large, and 4) the channel and upsampling layers are both suitable for mapping to CPU. These observations obtained in this study can be further utilized to guide the design of power-efficient layer mapping strategies for integrated CPU and GPU platforms.
集成CPU和GPU平台上cnn的节能层映射:一个案例研究
异构mpsoc由集成的cpu和gpu组成,适合在智能手机等手持设备上运行嵌入式应用程序。由于手持设备主要由电池供电,因此集成CPU和GPU的MPSoC通常设计的重点是低功耗而不是性能。在本文中,我们感兴趣的是探索部署在集成CPU和GPU平台上的卷积神经网络(cnn)的低功耗层映射。具体而言,我们通过在NVIDIA主板Jetson TX2上的大量实验,研究了YoloV3-Tiny(即工业界和学术界广泛使用的CNN)的层映射对系统功耗的影响。实验结果表明,1)几乎所有的卷积层不适合映射到CPU、2)池层可以被映射到CPU为降低功耗,但映射可能导致降低推理速度输出层的张量大小很大,3)检测层可以被映射到CPU只要浮点操作规模不是太大,和4)通道和upsampling层都适合映射到CPU。本研究的观察结果可以进一步用于指导CPU和GPU集成平台的节能层映射策略的设计。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信