Power-Efficient Layer Mapping for CNNs on Integrated CPU and GPU Platforms: A Case Study

2021 26th Asia and South Pacific Design Automation Conference (ASP-DAC) Pub Date : 2021-01-18 DOI:10.1145/3394885.3431423

Tian Wang, Kun Cao, Junlong Zhou, Gongxuan Zhang, Xiji Wang

{"title":"Power-Efficient Layer Mapping for CNNs on Integrated CPU and GPU Platforms: A Case Study","authors":"Tian Wang, Kun Cao, Junlong Zhou, Gongxuan Zhang, Xiji Wang","doi":"10.1145/3394885.3431423","DOIUrl":null,"url":null,"abstract":"Heterogeneous MPSoCs consisting of integrated CPUs and GPUs are suitable platforms for embedded applications running on hand- held devices such as smart phones. As the handheld devices are mostly powered by battery, the integrated CPU and GPU MPSoC is usually designed with an emphasis on low-power rather than performance. In this paper, we are interested in exploring a power- efficient layer mapping of convolution neural networks (CNNs) deployed on integrated CPU and GPU platforms. Specifically, we investigate the impact of layer mapping of YoloV3-Tiny (i.e., a widely-used CNN in both industry and academia) on system power consumption through numerous experiments on NVIDIA board Jetson TX2. The experimental results indicate that 1) almost all of the convolution layers are not suitable for mapping to CPU, 2) the pooling layer can be mapped to CPU for reducing power consumption, but the mapping may lead to a decrease in inference speed when the layer’s output tensor size is large, 3) the detection layer can be mapped to CPU as long as its floating-point operation scale is not too large, and 4) the channel and upsampling layers are both suitable for mapping to CPU. These observations obtained in this study can be further utilized to guide the design of power-efficient layer mapping strategies for integrated CPU and GPU platforms.","PeriodicalId":186307,"journal":{"name":"2021 26th Asia and South Pacific Design Automation Conference (ASP-DAC)","volume":"25 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2021-01-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"2","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2021 26th Asia and South Pacific Design Automation Conference (ASP-DAC)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3394885.3431423","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 2

Abstract

Heterogeneous MPSoCs consisting of integrated CPUs and GPUs are suitable platforms for embedded applications running on hand- held devices such as smart phones. As the handheld devices are mostly powered by battery, the integrated CPU and GPU MPSoC is usually designed with an emphasis on low-power rather than performance. In this paper, we are interested in exploring a power- efficient layer mapping of convolution neural networks (CNNs) deployed on integrated CPU and GPU platforms. Specifically, we investigate the impact of layer mapping of YoloV3-Tiny (i.e., a widely-used CNN in both industry and academia) on system power consumption through numerous experiments on NVIDIA board Jetson TX2. The experimental results indicate that 1) almost all of the convolution layers are not suitable for mapping to CPU, 2) the pooling layer can be mapped to CPU for reducing power consumption, but the mapping may lead to a decrease in inference speed when the layer’s output tensor size is large, 3) the detection layer can be mapped to CPU as long as its floating-point operation scale is not too large, and 4) the channel and upsampling layers are both suitable for mapping to CPU. These observations obtained in this study can be further utilized to guide the design of power-efficient layer mapping strategies for integrated CPU and GPU platforms.

查看原文本刊更多论文

集成CPU和GPU平台上cnn的节能层映射:一个案例研究

异构mpsoc由集成的cpu和gpu组成，适合在智能手机等手持设备上运行嵌入式应用程序。由于手持设备主要由电池供电，因此集成CPU和GPU的MPSoC通常设计的重点是低功耗而不是性能。在本文中，我们感兴趣的是探索部署在集成CPU和GPU平台上的卷积神经网络(cnn)的低功耗层映射。具体而言，我们通过在NVIDIA主板Jetson TX2上的大量实验，研究了YoloV3-Tiny(即工业界和学术界广泛使用的CNN)的层映射对系统功耗的影响。实验结果表明,1)几乎所有的卷积层不适合映射到CPU、2)池层可以被映射到CPU为降低功耗,但映射可能导致降低推理速度输出层的张量大小很大,3)检测层可以被映射到CPU只要浮点操作规模不是太大,和4)通道和upsampling层都适合映射到CPU。本研究的观察结果可以进一步用于指导CPU和GPU集成平台的节能层映射策略的设计。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

2021 26th Asia and South Pacific Design Automation Conference (ASP-DAC)

自引率

0.00%

发文量