通过机会计算缓解gpu上DNN执行的瓶颈

Xianwei Cheng, Hui Zhao, M. Kandemir, S. Mohanty, Beilei Jiang
{"title":"通过机会计算缓解gpu上DNN执行的瓶颈","authors":"Xianwei Cheng, Hui Zhao, M. Kandemir, S. Mohanty, Beilei Jiang","doi":"10.1109/ISQED48828.2020.9136967","DOIUrl":null,"url":null,"abstract":"Edge computing and IoT applications are severely constrained by limited hardware resource. This makes memory-consuming DNN (Deep Neural Network) frameworks not applicable to edge computing. Simple algorithms such as direct convolution are finding their way in embedded machine learning. As one of the most widely used platforms for DNN acceleration, GPUs face the bottleneck of on-chip bandwidth. This work introduces a GPU DNN execution architecture that can relieve the on-chip bandwidth bottleneck by reducing data movement through opportunistic computing. We first investigate data access patterns in the hardware's view. Then we propose two opportunistic computing techniques to predictably perform computation when data is available with the help of assistant warps. By moving computation to data, our techniques are able to significantly reduce data movement and relieve the DNN execution bottleneck. Our evaluation results show that the proposed technique can improve DNN application performance as much as 55%.","PeriodicalId":225828,"journal":{"name":"2020 21st International Symposium on Quality Electronic Design (ISQED)","volume":"34 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2019-10-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"2","resultStr":"{\"title\":\"Alleviating Bottlenecks for DNN Execution on GPUs via Opportunistic Computing\",\"authors\":\"Xianwei Cheng, Hui Zhao, M. Kandemir, S. Mohanty, Beilei Jiang\",\"doi\":\"10.1109/ISQED48828.2020.9136967\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Edge computing and IoT applications are severely constrained by limited hardware resource. This makes memory-consuming DNN (Deep Neural Network) frameworks not applicable to edge computing. Simple algorithms such as direct convolution are finding their way in embedded machine learning. As one of the most widely used platforms for DNN acceleration, GPUs face the bottleneck of on-chip bandwidth. This work introduces a GPU DNN execution architecture that can relieve the on-chip bandwidth bottleneck by reducing data movement through opportunistic computing. We first investigate data access patterns in the hardware's view. Then we propose two opportunistic computing techniques to predictably perform computation when data is available with the help of assistant warps. By moving computation to data, our techniques are able to significantly reduce data movement and relieve the DNN execution bottleneck. Our evaluation results show that the proposed technique can improve DNN application performance as much as 55%.\",\"PeriodicalId\":225828,\"journal\":{\"name\":\"2020 21st International Symposium on Quality Electronic Design (ISQED)\",\"volume\":\"34 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2019-10-15\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"2\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2020 21st International Symposium on Quality Electronic Design (ISQED)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/ISQED48828.2020.9136967\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2020 21st International Symposium on Quality Electronic Design (ISQED)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ISQED48828.2020.9136967","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 2

摘要

边缘计算和物联网应用受到有限的硬件资源的严重限制。这使得消耗内存的DNN(深度神经网络)框架不适用于边缘计算。像直接卷积这样的简单算法正在嵌入式机器学习中找到自己的出路。作为应用最广泛的深度神经网络加速平台之一,gpu面临着片上带宽的瓶颈。这项工作介绍了一个GPU DNN执行架构,可以通过机会计算减少数据移动来缓解片上带宽瓶颈。我们首先研究硬件视图中的数据访问模式。然后,我们提出了两种机会计算技术,在辅助翘曲的帮助下,在数据可用时可预测地执行计算。通过将计算转移到数据,我们的技术能够显著减少数据移动并缓解深度神经网络的执行瓶颈。我们的评估结果表明,该技术可以将深度神经网络的应用性能提高55%。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
Alleviating Bottlenecks for DNN Execution on GPUs via Opportunistic Computing
Edge computing and IoT applications are severely constrained by limited hardware resource. This makes memory-consuming DNN (Deep Neural Network) frameworks not applicable to edge computing. Simple algorithms such as direct convolution are finding their way in embedded machine learning. As one of the most widely used platforms for DNN acceleration, GPUs face the bottleneck of on-chip bandwidth. This work introduces a GPU DNN execution architecture that can relieve the on-chip bandwidth bottleneck by reducing data movement through opportunistic computing. We first investigate data access patterns in the hardware's view. Then we propose two opportunistic computing techniques to predictably perform computation when data is available with the help of assistant warps. By moving computation to data, our techniques are able to significantly reduce data movement and relieve the DNN execution bottleneck. Our evaluation results show that the proposed technique can improve DNN application performance as much as 55%.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信