Xianwei Cheng, Hui Zhao, M. Kandemir, S. Mohanty, Beilei Jiang
{"title":"通过机会计算缓解gpu上DNN执行的瓶颈","authors":"Xianwei Cheng, Hui Zhao, M. Kandemir, S. Mohanty, Beilei Jiang","doi":"10.1109/ISQED48828.2020.9136967","DOIUrl":null,"url":null,"abstract":"Edge computing and IoT applications are severely constrained by limited hardware resource. This makes memory-consuming DNN (Deep Neural Network) frameworks not applicable to edge computing. Simple algorithms such as direct convolution are finding their way in embedded machine learning. As one of the most widely used platforms for DNN acceleration, GPUs face the bottleneck of on-chip bandwidth. This work introduces a GPU DNN execution architecture that can relieve the on-chip bandwidth bottleneck by reducing data movement through opportunistic computing. We first investigate data access patterns in the hardware's view. Then we propose two opportunistic computing techniques to predictably perform computation when data is available with the help of assistant warps. By moving computation to data, our techniques are able to significantly reduce data movement and relieve the DNN execution bottleneck. Our evaluation results show that the proposed technique can improve DNN application performance as much as 55%.","PeriodicalId":225828,"journal":{"name":"2020 21st International Symposium on Quality Electronic Design (ISQED)","volume":"34 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2019-10-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"2","resultStr":"{\"title\":\"Alleviating Bottlenecks for DNN Execution on GPUs via Opportunistic Computing\",\"authors\":\"Xianwei Cheng, Hui Zhao, M. Kandemir, S. Mohanty, Beilei Jiang\",\"doi\":\"10.1109/ISQED48828.2020.9136967\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Edge computing and IoT applications are severely constrained by limited hardware resource. This makes memory-consuming DNN (Deep Neural Network) frameworks not applicable to edge computing. Simple algorithms such as direct convolution are finding their way in embedded machine learning. As one of the most widely used platforms for DNN acceleration, GPUs face the bottleneck of on-chip bandwidth. This work introduces a GPU DNN execution architecture that can relieve the on-chip bandwidth bottleneck by reducing data movement through opportunistic computing. We first investigate data access patterns in the hardware's view. Then we propose two opportunistic computing techniques to predictably perform computation when data is available with the help of assistant warps. By moving computation to data, our techniques are able to significantly reduce data movement and relieve the DNN execution bottleneck. Our evaluation results show that the proposed technique can improve DNN application performance as much as 55%.\",\"PeriodicalId\":225828,\"journal\":{\"name\":\"2020 21st International Symposium on Quality Electronic Design (ISQED)\",\"volume\":\"34 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2019-10-15\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"2\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2020 21st International Symposium on Quality Electronic Design (ISQED)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/ISQED48828.2020.9136967\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2020 21st International Symposium on Quality Electronic Design (ISQED)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ISQED48828.2020.9136967","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Alleviating Bottlenecks for DNN Execution on GPUs via Opportunistic Computing
Edge computing and IoT applications are severely constrained by limited hardware resource. This makes memory-consuming DNN (Deep Neural Network) frameworks not applicable to edge computing. Simple algorithms such as direct convolution are finding their way in embedded machine learning. As one of the most widely used platforms for DNN acceleration, GPUs face the bottleneck of on-chip bandwidth. This work introduces a GPU DNN execution architecture that can relieve the on-chip bandwidth bottleneck by reducing data movement through opportunistic computing. We first investigate data access patterns in the hardware's view. Then we propose two opportunistic computing techniques to predictably perform computation when data is available with the help of assistant warps. By moving computation to data, our techniques are able to significantly reduce data movement and relieve the DNN execution bottleneck. Our evaluation results show that the proposed technique can improve DNN application performance as much as 55%.