Kanokwan Rungsuptaweekoon, V. Visoottiviseth, Ryousei Takano
{"title":"评估嵌入式GPU系统上深度学习推理的功耗效率","authors":"Kanokwan Rungsuptaweekoon, V. Visoottiviseth, Ryousei Takano","doi":"10.1109/INCIT.2017.8257866","DOIUrl":null,"url":null,"abstract":"Deep learning inference on embedded systems requires not only high throughput but also low power consumption. To address this challenge, this paper evaluates the power efficiency of image recognition with YOLO, a real-time object detection algorithm, on the latest NVIDIA embedded GPU systems: Jetson TX1 and TX2. For this evaluation, we deployed the Low-Power Image Recognition Challenge (LPIRC) system and integrated YOLO, a power meter, and target hardware into the system. The experimental results show that Jetson TX2 with Max-N mode has the highest throughput; Jetson TX2 with Max-Q mode has the highest power efficiency. These findings indicate it is possible to adjust the trade-off relationship of throughput and power efficiency in Jetson TX2. Therefore, Jetson TX2 has advantages for image recognition on embedded systems more than Jetson TX1 and a PC server with NVIDIA Tesla P40.","PeriodicalId":405827,"journal":{"name":"2017 2nd International Conference on Information Technology (INCIT)","volume":"57 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2017-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"34","resultStr":"{\"title\":\"Evaluating the power efficiency of deep learning inference on embedded GPU systems\",\"authors\":\"Kanokwan Rungsuptaweekoon, V. Visoottiviseth, Ryousei Takano\",\"doi\":\"10.1109/INCIT.2017.8257866\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Deep learning inference on embedded systems requires not only high throughput but also low power consumption. To address this challenge, this paper evaluates the power efficiency of image recognition with YOLO, a real-time object detection algorithm, on the latest NVIDIA embedded GPU systems: Jetson TX1 and TX2. For this evaluation, we deployed the Low-Power Image Recognition Challenge (LPIRC) system and integrated YOLO, a power meter, and target hardware into the system. The experimental results show that Jetson TX2 with Max-N mode has the highest throughput; Jetson TX2 with Max-Q mode has the highest power efficiency. These findings indicate it is possible to adjust the trade-off relationship of throughput and power efficiency in Jetson TX2. Therefore, Jetson TX2 has advantages for image recognition on embedded systems more than Jetson TX1 and a PC server with NVIDIA Tesla P40.\",\"PeriodicalId\":405827,\"journal\":{\"name\":\"2017 2nd International Conference on Information Technology (INCIT)\",\"volume\":\"57 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2017-11-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"34\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2017 2nd International Conference on Information Technology (INCIT)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/INCIT.2017.8257866\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2017 2nd International Conference on Information Technology (INCIT)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/INCIT.2017.8257866","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 34
摘要
嵌入式系统上的深度学习推理不仅需要高吞吐量,而且需要低功耗。为了解决这一挑战,本文在最新的NVIDIA嵌入式GPU系统Jetson TX1和TX2上,评估了YOLO(一种实时目标检测算法)图像识别的功耗效率。为了进行评估,我们部署了低功耗图像识别挑战(LPIRC)系统,并将YOLO、功率计和目标硬件集成到系统中。实验结果表明,采用Max-N模式的Jetson TX2具有最高的吞吐量;采用Max-Q模式的Jetson TX2具有最高的功率效率。这些发现表明,在Jetson TX2中可以调整吞吐量和功率效率的权衡关系。因此,Jetson TX2在嵌入式系统上的图像识别比Jetson TX1和带有NVIDIA Tesla P40的PC服务器更有优势。
Evaluating the power efficiency of deep learning inference on embedded GPU systems
Deep learning inference on embedded systems requires not only high throughput but also low power consumption. To address this challenge, this paper evaluates the power efficiency of image recognition with YOLO, a real-time object detection algorithm, on the latest NVIDIA embedded GPU systems: Jetson TX1 and TX2. For this evaluation, we deployed the Low-Power Image Recognition Challenge (LPIRC) system and integrated YOLO, a power meter, and target hardware into the system. The experimental results show that Jetson TX2 with Max-N mode has the highest throughput; Jetson TX2 with Max-Q mode has the highest power efficiency. These findings indicate it is possible to adjust the trade-off relationship of throughput and power efficiency in Jetson TX2. Therefore, Jetson TX2 has advantages for image recognition on embedded systems more than Jetson TX1 and a PC server with NVIDIA Tesla P40.