卷积神经网络体系结构感知优化技术的评价

2023 31st Euromicro International Conference on Parallel, Distributed and Network-Based Processing (PDP) Pub Date : 2023-03-01 DOI:10.1109/PDP59025.2023.00036

Raúl Marichal, Guillermo Toyos, Ernesto Dufrechu, P. Ezzatti

{"title":"卷积神经网络体系结构感知优化技术的评价","authors":"Raúl Marichal, Guillermo Toyos, Ernesto Dufrechu, P. Ezzatti","doi":"10.1109/PDP59025.2023.00036","DOIUrl":null,"url":null,"abstract":"The growing need to perform Neural network inference with low latency is giving place to a broad spectrum of heterogeneous devices with deep learning capabilities. Therefore, obtaining the best performance from each device and choosing the most suitable platform for a given problem has become challenging. This paper evaluates multiple inference platforms using architecture-aware optimizations for convolutional neural networks. Specifically, we use TensorRT and OpenVINO frameworks for hardware optimizations on top of the platform-aware NetAdapt algorithm. The experimental evaluation shows that on MobileNet and AlexNet, using NetAdapt with TensorRT or Open-VINO can improve latency up to 10 x and 5.3 x, respectively. Moreover, a throughput test using different batch sizes showed variable performance improvement on the different devices. Discussing the experimental results can guide the selection of devices and optimizations for different AI solutions.","PeriodicalId":153500,"journal":{"name":"2023 31st Euromicro International Conference on Parallel, Distributed and Network-Based Processing (PDP)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2023-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":"{\"title\":\"Evaluation of architecture-aware optimization techniques for Convolutional Neural Networks\",\"authors\":\"Raúl Marichal, Guillermo Toyos, Ernesto Dufrechu, P. Ezzatti\",\"doi\":\"10.1109/PDP59025.2023.00036\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"The growing need to perform Neural network inference with low latency is giving place to a broad spectrum of heterogeneous devices with deep learning capabilities. Therefore, obtaining the best performance from each device and choosing the most suitable platform for a given problem has become challenging. This paper evaluates multiple inference platforms using architecture-aware optimizations for convolutional neural networks. Specifically, we use TensorRT and OpenVINO frameworks for hardware optimizations on top of the platform-aware NetAdapt algorithm. The experimental evaluation shows that on MobileNet and AlexNet, using NetAdapt with TensorRT or Open-VINO can improve latency up to 10 x and 5.3 x, respectively. Moreover, a throughput test using different batch sizes showed variable performance improvement on the different devices. Discussing the experimental results can guide the selection of devices and optimizations for different AI solutions.\",\"PeriodicalId\":153500,\"journal\":{\"name\":\"2023 31st Euromicro International Conference on Parallel, Distributed and Network-Based Processing (PDP)\",\"volume\":\"1 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2023-03-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"1\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2023 31st Euromicro International Conference on Parallel, Distributed and Network-Based Processing (PDP)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/PDP59025.2023.00036\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2023 31st Euromicro International Conference on Parallel, Distributed and Network-Based Processing (PDP)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/PDP59025.2023.00036","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 1

摘要

对低延迟执行神经网络推理的需求日益增长，这让位于具有深度学习能力的广谱异构设备。因此，从每个设备中获得最佳性能并为给定问题选择最合适的平台已成为一项挑战。本文评估了使用结构感知优化卷积神经网络的多个推理平台。具体来说，我们使用TensorRT和OpenVINO框架在平台感知NetAdapt算法之上进行硬件优化。实验评估表明，在MobileNet和AlexNet上，使用NetAdapt与TensorRT或Open-VINO可以分别将延迟提高10倍和5.3倍。此外，使用不同批处理大小的吞吐量测试显示，在不同设备上的性能改进是不同的。讨论实验结果可以指导不同人工智能解决方案的设备选择和优化。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Evaluation of architecture-aware optimization techniques for Convolutional Neural Networks

The growing need to perform Neural network inference with low latency is giving place to a broad spectrum of heterogeneous devices with deep learning capabilities. Therefore, obtaining the best performance from each device and choosing the most suitable platform for a given problem has become challenging. This paper evaluates multiple inference platforms using architecture-aware optimizations for convolutional neural networks. Specifically, we use TensorRT and OpenVINO frameworks for hardware optimizations on top of the platform-aware NetAdapt algorithm. The experimental evaluation shows that on MobileNet and AlexNet, using NetAdapt with TensorRT or Open-VINO can improve latency up to 10 x and 5.3 x, respectively. Moreover, a throughput test using different batch sizes showed variable performance improvement on the different devices. Discussing the experimental results can guide the selection of devices and optimizations for different AI solutions.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

2023 31st Euromicro International Conference on Parallel, Distributed and Network-Based Processing (PDP)

自引率

0.00%

发文量