在GPU-FPGA混合加速器上实现高能效DNN训练

ICS ... : proceedings of the ... ACM International Conference on Supercomputing. International Conference on Supercomputing Pub Date : 2021-06-03 DOI:10.1145/3447818.3460371

Xin He, Jiawen Liu, Zhen Xie, Hao Chen, Guoyang Chen, Weifeng Zhang, Dong Li

{"title":"在GPU-FPGA混合加速器上实现高能效DNN训练","authors":"Xin He, Jiawen Liu, Zhen Xie, Hao Chen, Guoyang Chen, Weifeng Zhang, Dong Li","doi":"10.1145/3447818.3460371","DOIUrl":null,"url":null,"abstract":"DNN training consumes orders of magnitude more energy than inference and requires innovative use of accelerators to improve energy-efficiency. However, despite having complementary features, GPUs and FPGAs have been mostly used independently for the entire training process, thus neglecting the opportunity in assigning individual but distinct operations to the most suitable hardware. In this paper, we take the initiative to explore new opportunities and viable solutions in enabling energy-efficient DNN training on hybrid accelerators. To overcome fundamental challenges including avoiding training throughput loss, enabling fast design space exploration, and efficient scheduling, we propose a comprehensive framework, Hype-training, that utilizes a combination of offline characterization, performance modeling, and online scheduling of individual operations. Experimental tests using NVIDIA V100 GPUs and Intel Stratix 10 FPGAs show that, Hype-training is able to exploit a mixture of GPUs and FPGAs at a fine granularity to achieve significant energy reduction, by 44.3% on average and up to 59.7%, without any loss in training throughput. Hype-training can also enforce power caps more effectively than state-of-the-art power management mechanisms on GPUs.","PeriodicalId":73273,"journal":{"name":"ICS ... : proceedings of the ... ACM International Conference on Supercomputing. International Conference on Supercomputing","volume":"43 1","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2021-06-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"6","resultStr":"{\"title\":\"Enabling energy-efficient DNN training on hybrid GPU-FPGA accelerators\",\"authors\":\"Xin He, Jiawen Liu, Zhen Xie, Hao Chen, Guoyang Chen, Weifeng Zhang, Dong Li\",\"doi\":\"10.1145/3447818.3460371\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"DNN training consumes orders of magnitude more energy than inference and requires innovative use of accelerators to improve energy-efficiency. However, despite having complementary features, GPUs and FPGAs have been mostly used independently for the entire training process, thus neglecting the opportunity in assigning individual but distinct operations to the most suitable hardware. In this paper, we take the initiative to explore new opportunities and viable solutions in enabling energy-efficient DNN training on hybrid accelerators. To overcome fundamental challenges including avoiding training throughput loss, enabling fast design space exploration, and efficient scheduling, we propose a comprehensive framework, Hype-training, that utilizes a combination of offline characterization, performance modeling, and online scheduling of individual operations. Experimental tests using NVIDIA V100 GPUs and Intel Stratix 10 FPGAs show that, Hype-training is able to exploit a mixture of GPUs and FPGAs at a fine granularity to achieve significant energy reduction, by 44.3% on average and up to 59.7%, without any loss in training throughput. Hype-training can also enforce power caps more effectively than state-of-the-art power management mechanisms on GPUs.\",\"PeriodicalId\":73273,\"journal\":{\"name\":\"ICS ... : proceedings of the ... ACM International Conference on Supercomputing. International Conference on Supercomputing\",\"volume\":\"43 1\",\"pages\":\"\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2021-06-03\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"6\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"ICS ... : proceedings of the ... ACM International Conference on Supercomputing. International Conference on Supercomputing\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1145/3447818.3460371\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"ICS ... : proceedings of the ... ACM International Conference on Supercomputing. International Conference on Supercomputing","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3447818.3460371","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 6

摘要

深度神经网络训练比推理消耗更多的能量，需要创新地使用加速器来提高能源效率。然而，尽管具有互补的功能，gpu和fpga在整个训练过程中大多是独立使用的，从而忽略了将单个但不同的操作分配给最合适的硬件的机会。在本文中，我们主动探索在混合加速器上实现节能DNN训练的新机会和可行的解决方案。为了克服基本挑战，包括避免训练吞吐量损失，实现快速设计空间探索和高效调度，我们提出了一个综合框架，hyper -training，它结合了离线表征，性能建模和单个操作的在线调度。使用NVIDIA V100 gpu和Intel Stratix 10 fpga进行的实验测试表明，hyper -training能够在细粒度上利用gpu和fpga的混合来实现显著的能量降低，平均降低44.3%，最高可达59.7%，而训练吞吐量没有任何损失。与gpu上最先进的电源管理机制相比，宣传训练还可以更有效地强制执行电源上限。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Enabling energy-efficient DNN training on hybrid GPU-FPGA accelerators

DNN training consumes orders of magnitude more energy than inference and requires innovative use of accelerators to improve energy-efficiency. However, despite having complementary features, GPUs and FPGAs have been mostly used independently for the entire training process, thus neglecting the opportunity in assigning individual but distinct operations to the most suitable hardware. In this paper, we take the initiative to explore new opportunities and viable solutions in enabling energy-efficient DNN training on hybrid accelerators. To overcome fundamental challenges including avoiding training throughput loss, enabling fast design space exploration, and efficient scheduling, we propose a comprehensive framework, Hype-training, that utilizes a combination of offline characterization, performance modeling, and online scheduling of individual operations. Experimental tests using NVIDIA V100 GPUs and Intel Stratix 10 FPGAs show that, Hype-training is able to exploit a mixture of GPUs and FPGAs at a fine granularity to achieve significant energy reduction, by 44.3% on average and up to 59.7%, without any loss in training throughput. Hype-training can also enforce power caps more effectively than state-of-the-art power management mechanisms on GPUs.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

ICS ... : proceedings of the ... ACM International Conference on Supercomputing. International Conference on Supercomputing

自引率

0.00%

发文量